Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

2008-02-18 Thread Michael Tokarev
Jeremy Higdon wrote:
[]
> I'll put it even more strongly.  My experience is that disabling write
> cache plus disabling barriers is often much faster than enabling both
> barriers and write cache enabled, when doing metadata intensive
> operations, as long as you have a drive that is good at CTQ/NCQ.

Now, and it's VERY interesting at least for me (and is off-topic in
this thread) -- which drive(s) are good at NCQ?  I tried numerous SATA
(NCQ is about sata, right? :) drives, but NCQ either does nothing in
terms of performance or hurts.  Yesterday we ordered another drive
from Hitachi (their "raid edition" thing), -- will try it tomorrow,
but I've no hope here as it's some 5th or 6th model/brand already.

(Ol'good SCSI drives, even 10 years old, shows large difference when
TCQ is enabled...)

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [mm] [PATCH 2/4] Add the soft limit interface v2

2008-02-18 Thread Li Zefan
Li Zefan 写道:
> Balbir Singh wrote:
>> A new configuration file called soft_limit_in_bytes is added. The parsing
>> and configuration rules remain the same as for the limit_in_bytes user
>> interface.
>>
>> A global list of all memory cgroups over their soft limit is maintained.
>> This list is then used to reclaim memory on global pressure. A cgroup is
>> removed from the list when the cgroup is deleted.
>>
>> The global list is protected with a read-write spinlock.
>>
> 
> You are not using read-write spinlock..
> 

Ah, the spinlock is changed to r/w spinlock in [PATCH 3/4].

>> Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
>> ---
>>
>>  mm/memcontrol.c |   33 -
>>  1 file changed, 32 insertions(+), 1 deletion(-)
>>
>> diff -puN mm/memcontrol.c~memory-controller-add-soft-limit-interface 
>> mm/memcontrol.c
>> --- 
>> linux-2.6.25-rc2/mm/memcontrol.c~memory-controller-add-soft-limit-interface  
>> 2008-02-19 12:31:49.0 +0530
>> +++ linux-2.6.25-rc2-balbir/mm/memcontrol.c  2008-02-19 12:31:49.0 
>> +0530
>> @@ -35,6 +35,10 @@
>>  
>>  struct cgroup_subsys mem_cgroup_subsys;
>>  static const int MEM_CGROUP_RECLAIM_RETRIES = 5;
>> +static spinlock_t mem_cgroup_sl_list_lock;  /* spin lock that protects */
>> +/* the list of cgroups over*/
>> +/* their soft limit */
>> +static struct list_head mem_cgroup_sl_exceeded_list;
>>  
>>  /*
>>   * Statistics for memory cgroup.
>> @@ -136,6 +140,10 @@ struct mem_cgroup {
>>   * statistics.
>>   */
>>  struct mem_cgroup_stat stat;
>> +/*
>> + * List of all mem_cgroup's that exceed their soft limit
>> + */
>> +struct list_head sl_exceeded_list;
>>  };
>>  
>>  /*
>> @@ -679,6 +687,18 @@ retry:
>>  goto retry;
>>  }
>>  
>> +/*
>> + * If we exceed our soft limit, we get added to the list of
>> + * cgroups over their soft limit
>> + */
>> +if (!res_counter_check_under_limit(&mem->res, RES_SOFT_LIMIT)) {
>> +spin_lock_irqsave(&mem_cgroup_sl_list_lock, flags);
>> +if (list_empty(&mem->sl_exceeded_list))
>> +list_add_tail(&mem->sl_exceeded_list,
>> +&mem_cgroup_sl_exceeded_list);
>> +spin_unlock_irqrestore(&mem_cgroup_sl_list_lock, flags);
>> +}
>> +
>>  mz = page_cgroup_zoneinfo(pc);
>>  spin_lock_irqsave(&mz->lru_lock, flags);
>>  /* Update statistics vector */
>> @@ -736,13 +756,14 @@ void mem_cgroup_uncharge(struct page_cgr
>>  if (atomic_dec_and_test(&pc->ref_cnt)) {
>>  page = pc->page;
>>  mz = page_cgroup_zoneinfo(pc);
>> +mem = pc->mem_cgroup;
>>  /*
>>   * get page->cgroup and clear it under lock.
>>   * force_empty can drop page->cgroup without checking refcnt.
>>   */
>>  unlock_page_cgroup(page);
>> +
>>  if (clear_page_cgroup(page, pc) == pc) {
>> -mem = pc->mem_cgroup;
>>  css_put(&mem->css);
>>  res_counter_uncharge(&mem->res, PAGE_SIZE);
>>  spin_lock_irqsave(&mz->lru_lock, flags);
>> @@ -1046,6 +1067,12 @@ static struct cftype mem_cgroup_files[] 
>>  .name = "stat",
>>  .open = mem_control_stat_open,
>>  },
>> +{
>> +.name = "soft_limit_in_bytes",
>> +.private = RES_SOFT_LIMIT,
>> +.write = mem_cgroup_write,
>> +.read = mem_cgroup_read,
>> +},
>>  };
>>  
>>  static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
>> @@ -1097,6 +1124,9 @@ mem_cgroup_create(struct cgroup_subsys *
>>  if (unlikely((cont->parent) == NULL)) {
>>  mem = &init_mem_cgroup;
>>  init_mm.mem_cgroup = mem;
>> +INIT_LIST_HEAD(&mem->sl_exceeded_list);
>> +spin_lock_init(&mem_cgroup_sl_list_lock);
>> +INIT_LIST_HEAD(&mem_cgroup_sl_exceeded_list);
>>  } else
>>  mem = kzalloc(sizeof(struct mem_cgroup), GFP_KERNEL);
>>  
>> @@ -1104,6 +1134,7 @@ mem_cgroup_create(struct cgroup_subsys *
>>  return NULL;
>>  
>>  res_counter_init(&mem->res);
>> +INIT_LIST_HEAD(&mem->sl_exceeded_list);
>>  
> 
> mem->sl_exceeded_list initialized twice ?
> 
>>  memset(&mem->info, 0, sizeof(mem->info));
>>  
>> _
>>
> --
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: tbench regression in 2.6.25-rc1

2008-02-18 Thread Eric Dumazet

Zhang, Yanmin a écrit :
On Mon, 2008-02-18 at 12:33 -0500, [EMAIL PROTECTED] wrote: 

On Mon, 18 Feb 2008 16:12:38 +0800, "Zhang, Yanmin" said:


I also think __refcnt is the key. I did a new testing by adding 2 unsigned long
pading before lastuse, so the 3 members are moved to next cache line. The 
performance is
recovered.

How about below patch? Almost all performance is recovered with the new patch.

Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]>

Could you add a comment someplace that says "refcnt wants to be on a different
cache line from input/output/ops or performance tanks badly", to warn some
future kernel hacker who starts adding new fields to the structure?

Ok. Below is the new patch.

1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So 
sizeof(dst_entry)=200
no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core 
tigerton by
moving tclassid to different place. It looks like tclassid could also have 
impact on
performance.
If moving tclassid before metrics, or just don't move tclassid, the performance 
isn't
good. So I move it behind metrics.

2) Add comments before __refcnt.

If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than
the one without the patch.

If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than
the one without the patch.

Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]>

---

--- linux-2.6.25-rc1/include/net/dst.h  2008-02-21 14:33:43.0 +0800
+++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.0 
+0800
@@ -52,15 +52,10 @@ struct dst_entry
unsigned short  header_len; /* more space at head required 
*/
unsigned short  trailer_len;/* space to reserve at tail */
 
-	u32			metrics[RTAX_MAX];

-   struct dst_entry*path;
-
-   unsigned long   rate_last;  /* rate limiting for ICMP */
unsigned intrate_tokens;
+   unsigned long   rate_last;  /* rate limiting for ICMP */
 
-#ifdef CONFIG_NET_CLS_ROUTE

-   __u32   tclassid;
-#endif
+   struct dst_entry*path;
 
 	struct neighbour	*neighbour;

struct hh_cache *hh;
@@ -70,10 +65,20 @@ struct dst_entry
int (*output)(struct sk_buff*);
 
 	struct  dst_ops	*ops;

-   
-   unsigned long   lastuse;
+
+   u32 metrics[RTAX_MAX];
+
+#ifdef CONFIG_NET_CLS_ROUTE
+   __u32   tclassid;
+#endif
+
+   /*
+* __refcnt wants to be on a different cache line from
+* input/output/ops or performance tanks badly
+*/
atomic_t__refcnt;   /* client references*/
int __use;
+   unsigned long   lastuse;
union {
struct dst_entry *next;
struct rtable*rt_next;





I prefer this patch, but unfortunatly your perf numbers are for 64 bits kernels.

Could you please test now with 32 bits one ?

Thank you
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] Fix Unlikely(x) == y

2008-02-18 Thread Adrian Bunk
On Tue, Feb 19, 2008 at 08:46:03AM +1100, Michael Ellerman wrote:
> On Mon, 2008-02-18 at 16:13 +0200, Adrian Bunk wrote:
> > On Mon, Feb 18, 2008 at 03:01:35PM +0100, Geert Uytterhoeven wrote:
> > > On Mon, 18 Feb 2008, Adrian Bunk wrote:
> > > > 
> > > > This means it generates faster code with a current gcc for your 
> > > > platform.
> > > > 
> > > > But a future gcc might e.g. replace the whole loop with a division
> > > > (gcc SVN head (that will soon become gcc 4.3) already does 
> > > > transformations like replacing loops with divisions [1]).
> > > 
> > > Hence shouldn't we ask the gcc people what's the purpose of 
> > > __builtin_expect(),
> > > if it doesn't live up to its promise?
> > 
> > That's a different issue.
> > 
> > My point here is that we do not know how the latest gcc available in the 
> > year 2010 might transform this code, and how a likely/unlikely placed 
> > there might influence gcc's optimizations then.
> 
> You're right, we don't know. But if giving the compiler _more_
> information causes it to produce vastly inferior code then we should be
> filing gcc bugs. After all the unlikely/likely is just a hint, if gcc
> knows better it can always ignore it.

It's the other way round, gcc assumes that you know better than gcc when 
you give it a __builtin_expect().

The example you gave had only a 1:3 ratio, which is far outside of the 
ratios where __builtin_expect() should be used.

What if you gave this annotation for the 1:3 case and gcc generates code 
that performs better for ratios > 1:1000 but much worse for a 1:3 ratio
since your hint did override a better estimate of gcc?

And I'm sure that > 90% of all kernel developers (including me) are 
worse in such respects than the gcc heuristics.

I'm a firm believer in the following:
- it's the programmer's job to write clean and efficient C code
- it's the compiler's job to convert C code into efficient assembler
  code

The stable interface between the programmer and the compiler is C, and 
when the programmer starts manually messing with internals of the 
compiler that's a layering violation that requires a _good_ 
justification.

With a "good justification" not consisting of some microbenchmark but of 
measurements of the actual annotations in the kernel code.

> cheers

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NULL pointer in kmem_cache_alloc with 2.6.25-rc1

2008-02-18 Thread Pekka Enberg
Hi Yanmin,

> On Fri, 2008-02-15 at 08:42 -0800, Christoph Lameter wrote:
> > > Kernel panic at line 1637 in file mm/slub.c because 
> > > object=c->freelist=NULL.
> >
> > H. freelist should never be NULL. Could you rerun the test and boot with
> > slub_debug to make sure that there is no memory corruption?

On Feb 19, 2008 9:03 AM, Zhang, Yanmin <[EMAIL PROTECTED]> wrote:
> 1) Without slub_debug option, sometime I could trigger it, sometimes not.
> 2) With slub_debug option, or just enable debug for slab skbuff_fclone_cache
> and skbuff_head_cache, I couldn't trigger it.
>
> I will do more testing and investigation, as the bug also exists in 
> 2.6.25-rc2.

Could you please try Ingo's patch: http://lkml.org/lkml/2008/2/19/13

Looks like there are some problems with SLUB_FASTPATH.

Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8][for -mm] mem_notify v6

2008-02-18 Thread KOSAKI Motohiro
Hi Paul,

Thank you for wonderful interestings comment.
your comment is really nice.

I was HPC guy with large NUMA box at past. 
I promise i don't ignroe hpc user.
but unfortunately I didn't have experience of use CPUSET
because at that point, it was under development yet.

I hope discuss you that CPUSET usage case and mem_notify requirement.
to be honest, I thought hpc user doesn't use mem_notify, sorry.


> I have what seems, intuitively, a similar problem at the opposite
> end of the world, on big-honkin NUMA boxes (hundreds or thousands of
> CPUs, terabytes of main memory.)  The problem there is often best
> resolved if we can kill the offending task, rather than shrink its
> memory footprint.  The situation is that several compute intensive
> multi-threaded jobs are running, each in their own dedicated cpuset.

agreed.

> So we like to identify such jobs as soon as they begin to swap,
> and kill them very very quickly (before the direct reclaim code
> in mm/vmscan.c can push more than a few pages to the swap device.)

you think kill the process just after swap, right?
but unfortunately, almost user hope receive notification before swap ;-)
because avoid swap.

I think we need discuss this point more.


> For a much earlier, unsuccessful, attempt to accomplish this, see:
> 
>   [Patch] cpusets policy kill no swap
>   http://lkml.org/lkml/2005/3/19/148
> 
> Now, it may well be that we are too far apart to share any part of
> a solution; one seldom uses the same technology to build a Tour de
> France bicycle as one uses to build a Lockheed C-5A Galaxy heavy
> cargo transport.
> 
> One clear difference is the policy of what action we desire to take
> when under memory pressure: do we invite user space to free memory so
> as to avoid the wrath of the oom killer, or do we go to the opposite
> extreme, seeking a nearly instantant killing, faster than the oom
> killer can even begin its search for a victim.

Hmm, sorry
I understand your patch yet, because I don't know CPUSET so much.

I learn CPUSET more, about this week and I'll reply again about next week ;-)


> Another clear difference is the use of cpusets, which are a major and
> vital part of administering the big NUMA boxes, and I presume are not
> even compiled into embedded kernels (correct?).  This difference maybe
> unbridgeable ... these big NUMA systems require per-cpuset mechanisms,
> whereas embedded may require builds without cpusets.

Yes, some embedded distribution(i.e. monta vista) distribute as source.
but embedded people strongly dislike bloat code size.
I think they never turn on CPUSET.

I hope mem_notify works fine without CPUSET.


> 1) You have a little bit of code in the kernel to throttle the
>thundering herd problem.  Perhaps this could be moved to user space
>... one user daemon that is always notified of such memory pressure
>alarms, and in turn notifies interested applications.  This might
>avoid the need to add poll_wait_exclusive() to the kernel.  And it
>moves any fussy details of how to tame the thundering herd out of
>the kernel.

I think you talk about user space oom manager.
it and many user process are obviously different.

I doubt memory manager daemon model doesn't works on desktop and
typical server.
thus, current implementaion optimize to no manager environment.

of course, it doesn't mean i refuse add to code for oom manager.
it is very interesting idea.

i hope discussion it more.


> 2) Another possible mechanism for communicating events from
>the kernel to user space is inotify.  For example, I added
>the line:
> 
>   fsnotify_modify(dentry);   # dentry is current tasks cpuset

Excellent!
that is really good idea.

thaks.


> 3) Perhaps, instead of sending simple events, one could update
>a meter of the rate of recent such events, such as the per-cpuset
>'memory_pressure' mechanism does.  This might lead to addressing
>Andrew Morton's comment:
> 
>   If this feature is useful then I'd expect that some
>   applications would want notification at different times, or at
>   different levels of VM distress.  So this semi-randomly-chosen
>   notification point just won't be strong enough in real-world
>   use.

Hmmm, I don't think so.
I think timing of memmory_pressure_notify(1) is already best.

the page move active list to inactive list indicate swap I/O happen
a bit after.

but memmory_pressure_notify(0) is a bit messy.
I'll try to improve more simplify.


> 4) A place that I found well suited for my purposes (watching for
>swapping from direct reclaim) was just before the lines in the
>pageout() routine in mm/vmscan.c:
> 
>   if (clear_page_dirty_for_io(page)) {
>   ...
>   res = mapping->a_ops->writepage(page, &wbc);
> 
>It seemed that testing "PageAnon(page)" here allowed me to easily
>distinguish between dirty pages going back to the file system, and
>pages going to swap (this detail is

linux-next: Tree for Feb 19

2008-02-18 Thread Stephen Rothwell
Hi all,

I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
allmodconfig for both powerpc and x86_64.

There were a couple of merge conflicts and a couple of build failures and
the appropriate people contacted.

We are up to 26 trees, more are welcome (even if they are currently empty).

-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpdYq5YChIFv.pgp
Description: PGP signature


Re: tbench regression in 2.6.25-rc1

2008-02-18 Thread Eric Dumazet

Zhang, Yanmin a écrit :

On Mon, 2008-02-18 at 11:11 +0100, Eric Dumazet wrote:

On Mon, 18 Feb 2008 16:12:38 +0800
"Zhang, Yanmin" <[EMAIL PROTECTED]> wrote:


On Fri, 2008-02-15 at 15:22 -0800, David Miller wrote:

From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Fri, 15 Feb 2008 15:21:48 +0100


On linux-2.6.25-rc1 x86_64 :

offsetof(struct dst_entry, lastuse)=0xb0
offsetof(struct dst_entry, __refcnt)=0xb8
offsetof(struct dst_entry, __use)=0xbc
offsetof(struct dst_entry, next)=0xc0

So it should be optimal... I dont know why tbench prefers __refcnt being 
on 0xc0, since in this case lastuse will be on a different cache line...


Each incoming IP packet will need to change lastuse, __refcnt and __use, 
so keeping them in the same cache line is a win.


I suspect then that even this patch could help tbench, since it avoids 
writing lastuse...

I think your suspicions are right, and even moreso
it helps to keep __refcnt out of the same cache line
as input/output/ops which are read-almost-entirely :-

I think you are right. The issue is these three variables sharing the same 
cache line
with input/output/ops.


)

I haven't done an exhaustive analysis, but it seems that
the write traffic to lastuse and __refcnt are about the
same.  However if we find that __refcnt gets hit more
than lastuse in this workload, it explains the regression.

I also think __refcnt is the key. I did a new testing by adding 2 unsigned long
pading before lastuse, so the 3 members are moved to next cache line. The 
performance is
recovered.

How about below patch? Almost all performance is recovered with the new patch.

Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]>

---

--- linux-2.6.25-rc1/include/net/dst.h  2008-02-21 14:33:43.0 +0800
+++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-21 14:36:22.0 
+0800
@@ -52,11 +52,10 @@ struct dst_entry
unsigned short  header_len; /* more space at head required 
*/
unsigned short  trailer_len;/* space to reserve at tail */
 
-	u32			metrics[RTAX_MAX];

-   struct dst_entry*path;
-
-   unsigned long   rate_last;  /* rate limiting for ICMP */
unsigned intrate_tokens;
+   unsigned long   rate_last;  /* rate limiting for ICMP */
+
+   struct dst_entry*path;
 
 #ifdef CONFIG_NET_CLS_ROUTE

__u32   tclassid;
@@ -70,10 +69,12 @@ struct dst_entry
int (*output)(struct sk_buff*);
 
 	struct  dst_ops	*ops;

-   
-   unsigned long   lastuse;
+
+   u32 metrics[RTAX_MAX];
+
atomic_t__refcnt;   /* client references*/
int __use;
+   unsigned long   lastuse;
union {
struct dst_entry *next;
struct rtable*rt_next;



Well, after this patch, we grow dst_entry by 8 bytes :

With my .config, it doesn't grow. Perhaps because of CONFIG_NET_CLS_ROUTE, I 
don't
enable it. I will move tclassid under ops.


sizeof(struct dst_entry)=0xd0
offsetof(struct dst_entry, input)=0x68
offsetof(struct dst_entry, output)=0x70
offsetof(struct dst_entry, __refcnt)=0xb4
offsetof(struct dst_entry, lastuse)=0xc0
offsetof(struct dst_entry, __use)=0xb8
sizeof(struct rtable)=0x140


So we dirty two cache lines instead of one, unless your cpu have 128 bytes 
cache lines ?

I am quite suprised that my patch to not change lastuse if already set to 
jiffies changes nothing...

If you have some time, could you also test this (unrelated) patch ?

We can avoid dirty all the time a cache line of loopback device.

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index f2a6e71..0a4186a 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -150,7 +150,10 @@ static int loopback_xmit(struct sk_buff *skb, struct 
net_device *dev)
return 0;
}
 #endif
-   dev->last_rx = jiffies;
+#ifdef CONFIG_SMP
+   if (dev->last_rx != jiffies)
+#endif
+   dev->last_rx = jiffies;
 
/* it's OK to use per_cpu_ptr() because BHs are off */

pcpu_lstats = netdev_priv(dev);


Although I didn't test it, I don't think it's ok. The key is __refcnt shares 
the same
cache line with ops/input/output.



Note it was unrelated to struct dst, but dirtying of one cache line of 
'loopback netdevice'


I tested it, and tbench result was better with this patch : 890 MB/s instead 
of 870 MB/s on a bi dual core machine.



I was curious of the potential gain on your 16 cores (4x4) machine.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [mm] [PATCH 2/4] Add the soft limit interface v2

2008-02-18 Thread Li Zefan
Balbir Singh wrote:
> A new configuration file called soft_limit_in_bytes is added. The parsing
> and configuration rules remain the same as for the limit_in_bytes user
> interface.
> 
> A global list of all memory cgroups over their soft limit is maintained.
> This list is then used to reclaim memory on global pressure. A cgroup is
> removed from the list when the cgroup is deleted.
> 
> The global list is protected with a read-write spinlock.
> 

You are not using read-write spinlock..

> 
> Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
> ---
> 
>  mm/memcontrol.c |   33 -
>  1 file changed, 32 insertions(+), 1 deletion(-)
> 
> diff -puN mm/memcontrol.c~memory-controller-add-soft-limit-interface 
> mm/memcontrol.c
> --- 
> linux-2.6.25-rc2/mm/memcontrol.c~memory-controller-add-soft-limit-interface   
> 2008-02-19 12:31:49.0 +0530
> +++ linux-2.6.25-rc2-balbir/mm/memcontrol.c   2008-02-19 12:31:49.0 
> +0530
> @@ -35,6 +35,10 @@
>  
>  struct cgroup_subsys mem_cgroup_subsys;
>  static const int MEM_CGROUP_RECLAIM_RETRIES = 5;
> +static spinlock_t mem_cgroup_sl_list_lock;   /* spin lock that protects */
> + /* the list of cgroups over*/
> + /* their soft limit */
> +static struct list_head mem_cgroup_sl_exceeded_list;
>  
>  /*
>   * Statistics for memory cgroup.
> @@ -136,6 +140,10 @@ struct mem_cgroup {
>* statistics.
>*/
>   struct mem_cgroup_stat stat;
> + /*
> +  * List of all mem_cgroup's that exceed their soft limit
> +  */
> + struct list_head sl_exceeded_list;
>  };
>  
>  /*
> @@ -679,6 +687,18 @@ retry:
>   goto retry;
>   }
>  
> + /*
> +  * If we exceed our soft limit, we get added to the list of
> +  * cgroups over their soft limit
> +  */
> + if (!res_counter_check_under_limit(&mem->res, RES_SOFT_LIMIT)) {
> + spin_lock_irqsave(&mem_cgroup_sl_list_lock, flags);
> + if (list_empty(&mem->sl_exceeded_list))
> + list_add_tail(&mem->sl_exceeded_list,
> + &mem_cgroup_sl_exceeded_list);
> + spin_unlock_irqrestore(&mem_cgroup_sl_list_lock, flags);
> + }
> +
>   mz = page_cgroup_zoneinfo(pc);
>   spin_lock_irqsave(&mz->lru_lock, flags);
>   /* Update statistics vector */
> @@ -736,13 +756,14 @@ void mem_cgroup_uncharge(struct page_cgr
>   if (atomic_dec_and_test(&pc->ref_cnt)) {
>   page = pc->page;
>   mz = page_cgroup_zoneinfo(pc);
> + mem = pc->mem_cgroup;
>   /*
>* get page->cgroup and clear it under lock.
>* force_empty can drop page->cgroup without checking refcnt.
>*/
>   unlock_page_cgroup(page);
> +
>   if (clear_page_cgroup(page, pc) == pc) {
> - mem = pc->mem_cgroup;
>   css_put(&mem->css);
>   res_counter_uncharge(&mem->res, PAGE_SIZE);
>   spin_lock_irqsave(&mz->lru_lock, flags);
> @@ -1046,6 +1067,12 @@ static struct cftype mem_cgroup_files[] 
>   .name = "stat",
>   .open = mem_control_stat_open,
>   },
> + {
> + .name = "soft_limit_in_bytes",
> + .private = RES_SOFT_LIMIT,
> + .write = mem_cgroup_write,
> + .read = mem_cgroup_read,
> + },
>  };
>  
>  static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
> @@ -1097,6 +1124,9 @@ mem_cgroup_create(struct cgroup_subsys *
>   if (unlikely((cont->parent) == NULL)) {
>   mem = &init_mem_cgroup;
>   init_mm.mem_cgroup = mem;
> + INIT_LIST_HEAD(&mem->sl_exceeded_list);
> + spin_lock_init(&mem_cgroup_sl_list_lock);
> + INIT_LIST_HEAD(&mem_cgroup_sl_exceeded_list);
>   } else
>   mem = kzalloc(sizeof(struct mem_cgroup), GFP_KERNEL);
>  
> @@ -1104,6 +1134,7 @@ mem_cgroup_create(struct cgroup_subsys *
>   return NULL;
>  
>   res_counter_init(&mem->res);
> + INIT_LIST_HEAD(&mem->sl_exceeded_list);
>  

mem->sl_exceeded_list initialized twice ?

>   memset(&mem->info, 0, sizeof(mem->info));
>  
> _
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix building lguest as module.

2008-02-18 Thread Ingo Molnar

* Tony Breeds <[EMAIL PROTECTED]> wrote:

> +#endif
> +
> +#if defined(CONFIG_LGUEST) || defined(CONFIG_LGUEST_MODULE)
> + BLANK();

hm. Rusty's original fix is now upstream. I've done a delta to your 
patch, find the fix is below.

Ingo

->
Subject: lguest: fix build breakage
From: Tony Breeds <[EMAIL PROTECTED]>
Date: Tue Feb 19 08:16:03 CET 2008

[ [EMAIL PROTECTED]: merged to Rusty's patch ]

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/x86/kernel/asm-offsets_32.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-x86.q/arch/x86/kernel/asm-offsets_32.c
===
--- linux-x86.q.orig/arch/x86/kernel/asm-offsets_32.c
+++ linux-x86.q/arch/x86/kernel/asm-offsets_32.c
@@ -128,7 +128,7 @@ void foo(void)
OFFSET(XEN_vcpu_info_pending, vcpu_info, evtchn_upcall_pending);
 #endif
 
-#ifdef CONFIG_LGUEST_GUEST
+#if defined(CONFIG_LGUEST) || defined(CONFIG_LGUEST_MODULE)
BLANK();
OFFSET(LGUEST_DATA_irq_enabled, lguest_data, irq_enabled);
OFFSET(LGUEST_DATA_pgdir, lguest_data, pgdir);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.25-rc2

2008-02-18 Thread Pekka Enberg
On Feb 19, 2008 8:54 AM, Torsten Kaiser <[EMAIL PROTECTED]> wrote:
> > > [ 5282.056415] [ cut here ]
> > > [ 5282.059757] kernel BUG at lib/list_debug.c:33!
> > > [ 5282.062055] invalid opcode:  [1] SMP
> > > [ 5282.062055] CPU 3
> >
> > hm. Your crashes do seem to span multiple subsystems, but it always
> > seems to be around the SLUB code. Could you try the patch below? The
> > SLUB code has a new optimization and i'm not 100% sure about it. [the
> > hack below switches the SLUB optimization off by disabling the CPU
> > feature it relies on.]
> >
> > Ingo
> >
> > ->
> >  arch/x86/Kconfig |4 
> >  1 file changed, 4 deletions(-)
> >
> > Index: linux/arch/x86/Kconfig
> > ===
> > --- linux.orig/arch/x86/Kconfig
> > +++ linux/arch/x86/Kconfig
> > @@ -59,10 +59,6 @@ config HAVE_LATENCYTOP_SUPPORT
> >  config SEMAPHORE_SLEEPERS
> > def_bool y
> >
> > -config FAST_CMPXCHG_LOCAL
> > -   bool
> > -   default y
> > -
> >  config MMU
> > def_bool y
> >
>
> $ grep FAST_CMPXCHG_LOCAL */.config
> linux-2.6.24-rc2-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
> linux-2.6.24-rc3-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
> linux-2.6.24-rc3-mm2/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
> linux-2.6.24-rc6-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
> linux-2.6.24-rc8-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
> linux-2.6.25-rc1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
> linux-2.6.25-rc2-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
> linux-2.6.25-rc2/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
>
> -rc2-mm1 still worked for me.
>
> Did you mean the new SLUB_FASTPATH?
> $ grep "define SLUB_FASTPATH" */mm/slub.c
> linux-2.6.25-rc1/mm/slub.c:#define SLUB_FASTPATH
> linux-2.6.25-rc2-mm1/mm/slub.c:#define SLUB_FASTPATH
> linux-2.6.25-rc2/mm/slub.c:#define SLUB_FASTPATH
>
> The 2.6.24-rc3+ mm-kernels did crash for me, but don't seem to contain this...
>
> On the other hand:
> From the crash in 2.6.25-rc2-mm1:
> [59987.116182] RIP  [] kmem_cache_alloc_node+0x6d/0xa0
>
> (gdb) list *0x8029f83d
> 0x8029f83d is in kmem_cache_alloc_node (mm/slub.c:1646).
> 1641if (unlikely(is_end(object) || !node_match(c, node))) 
> {
> 1642object = __slab_alloc(s, gfpflags,
> node, addr, c);
> 1643break;
> 1644}
> 1645stat(c, ALLOC_FASTPATH);
> 1646} while (cmpxchg_local(&c->freelist, object, 
> object[c->offset])
> 1647
>  != object);
> 1648#else
> 1649unsigned long flags;
> 1650
>
> That code is part for SLUB_FASTPATH.
>
> I'm willing to test the patch, but don't know how fast I can find the
> time to do it, so my answer if your patch helps might be delayed until
> the weekend.

Mathieu, Christoph is on vacation and I'm not at all that familiar
with this cmpxchg_local() optimization, so if you could take a peek at
this bug report to see if you can spot something obviously wrong with
it, I would much appreciate that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

2008-02-18 Thread Jeremy Higdon
On Tue, Feb 19, 2008 at 09:16:44AM +1100, David Chinner wrote:
> On Mon, Feb 18, 2008 at 04:24:27PM +0300, Michael Tokarev wrote:
> > First, I still don't understand why in God's sake barriers are "working"
> > while regular cache flushes are not.  Almost no consumer-grade hard drive
> > supports write barriers, but they all support regular cache flushes, and
> > the latter should be enough (while not the most speed-optimal) to ensure
> > data safety.  Why to require write cache disable (like in XFS FAQ) instead
> > of going the flush-cache-when-appropriate (as opposed to write-barrier-
> > when-appropriate) way?
> 
> Devil's advocate:
> 
> Why should we need to support multiple different block layer APIs
> to do the same thing? Surely any hardware that doesn't support barrier
> operations can emulate them with cache flushes when they receive a
> barrier I/O from the filesystem
> 
> Also, given that disabling the write cache still allows CTQ/NCQ to
> operate effectively and that in most cases WCD+CTQ is as fast as
> WCE+barriers, the simplest thing to do is turn off volatile write
> caches and not require any extra software kludges for safe
> operation.


I'll put it even more strongly.  My experience is that disabling write
cache plus disabling barriers is often much faster than enabling both
barriers and write cache enabled, when doing metadata intensive
operations, as long as you have a drive that is good at CTQ/NCQ.

The only time write cache + barriers is significantly faster is when
doing single threaded data writes, such as direct I/O, or if CTQ/NCQ
is not enabled, or the drive does a poor job at it.

jeremy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] the proposal of improve page reclaim by throttle

2008-02-18 Thread KOSAKI Motohiro
Hi Nick,

> Yeah this is definitely needed and a nice result.
> 
> I'm worried about a) placing a global limit on parallelism, and b)
> placing a limit on parallelism at all.

sorry, i don't understand yet.
a) and b) have any relation?

> 
> I think it should maybe be a per-zone thing...
> 
> What happens if you make it a per-zone mutex, and allow just a single
> process to reclaim pages from a given zone at a time? I guess that is
> going to slow down throughput a little bit in some cases though...

That makes sense.

OK.
I'll repost after 2-3 days.

Thanks.

- kosaki


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[mm] [PATCH 4/4] Add soft limit documentation v2

2008-02-18 Thread Balbir Singh

Add documentation for the soft limit feature.

Changelog v2 (Thanks to the review by Randy Dunlap)
1. Change several misuses of it's to its
2. Fix spelling errors and punctuation

Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---

 Documentation/controllers/memory.txt |   18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff -puN 
Documentation/controllers/memory.txt~memory-controller-add-soft-limit-documentation
 Documentation/controllers/memory.txt
--- 
linux-2.6.25-rc2/Documentation/controllers/memory.txt~memory-controller-add-soft-limit-documentation
2008-02-19 12:31:53.0 +0530
+++ linux-2.6.25-rc2-balbir/Documentation/controllers/memory.txt
2008-02-19 12:31:53.0 +0530
@@ -201,6 +201,22 @@ The memory.force_empty gives an interfac
 
 will drop all charges in cgroup. Currently, this is maintained for test.
 
+The file memory.soft_limit_in_bytes allows users to set soft limits. A soft
+limit is set in a manner similar to limit. The limit feature described
+earlier is a hard limit. A group can never exceed its hard limit. A soft
+limit on the other hand can be exceeded. A group will be shrunk back
+to its soft limit, when there is memory pressure/contention.
+
+Ideally the soft limit should always be set to a value smaller than the
+hard limit. However, the code does not force the user to do so. The soft
+limit can be greater than the hard limit; then the soft limit has
+no meaning in that setup, since the group will always be restrained to its
+hard limit.
+
+Example setting of soft limit
+
+# echo -n 100M > memory.soft_limit_in_bytes
+
 4. Testing
 
 Balbir posted lmbench, AIM9, LTP and vmmstress results [10] and [11].
@@ -221,7 +237,7 @@ some of the pages cached in the cgroup (
 
 4.2 Task migration
 
-When a task migrates from one cgroup to another, it's charge is not
+When a task migrates from one cgroup to another, its charge is not
 carried forward. The pages allocated from the original cgroup still
 remain charged to it, the charge is dropped when the page is freed or
 reclaimed.
_

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[mm] [PATCH 3/4] Reclaim from groups over their soft limit under memory pressure v2

2008-02-18 Thread Balbir Singh


Changelog v2

1. Take reference on mem->css in pushback (YAMAMOTO Takshi)
2. Move away from trying to reclaim nr_pages over soft limit to swap
   cluster at a time (KAMEZAWA Hiroyuki)

The global list of all cgroups over their soft limit is scanned under
memory pressure. We call mem_cgroup_pushback_groups_over_soft_limit
from __alloc_pages() prior to calling try_to_free_pages(), in an attempt
to rescue memory from groups that are using memory above their soft limit.
If this attempt is unsuccessfull, we call try_to_free_pages() and take
the normal global reclaim path.


Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---

 include/linux/memcontrol.h  |9 +
 include/linux/res_counter.h |   11 ++
 include/linux/swap.h|4 +-
 mm/memcontrol.c |   76 
 mm/page_alloc.c |   10 +
 mm/vmscan.c |   12 --
 6 files changed, 110 insertions(+), 12 deletions(-)

diff -puN include/linux/memcontrol.h~memory-controller-reclaim-on-contention 
include/linux/memcontrol.h
--- 
linux-2.6.25-rc2/include/linux/memcontrol.h~memory-controller-reclaim-on-contention
 2008-02-19 12:31:51.0 +0530
+++ linux-2.6.25-rc2-balbir/include/linux/memcontrol.h  2008-02-19 
12:31:51.0 +0530
@@ -71,6 +71,8 @@ extern long mem_cgroup_calc_reclaim_acti
struct zone *zone, int priority);
 extern long mem_cgroup_calc_reclaim_inactive(struct mem_cgroup *mem,
struct zone *zone, int priority);
+extern unsigned long
+mem_cgroup_pushback_groups_over_soft_limit(struct zone **zones, gfp_t 
gfp_mask);
 
 #else /* CONFIG_CGROUP_MEM_CONT */
 static inline void mm_init_cgroup(struct mm_struct *mm,
@@ -179,6 +181,13 @@ static inline long mem_cgroup_calc_recla
 {
return 0;
 }
+
+static inline unsigned long
+mem_cgroup_pushback_groups_over_soft_limit(struct zone **zones, gfp_t gfp_mask)
+{
+   return 0;
+}
+
 #endif /* CONFIG_CGROUP_MEM_CONT */
 
 #endif /* _LINUX_MEMCONTROL_H */
diff -puN include/linux/res_counter.h~memory-controller-reclaim-on-contention 
include/linux/res_counter.h
--- 
linux-2.6.25-rc2/include/linux/res_counter.h~memory-controller-reclaim-on-contention
2008-02-19 12:31:51.0 +0530
+++ linux-2.6.25-rc2-balbir/include/linux/res_counter.h 2008-02-19 
12:31:51.0 +0530
@@ -140,4 +140,15 @@ static inline bool res_counter_check_und
return ret;
 }
 
+static inline long long res_counter_sl_excess(struct res_counter *cnt)
+{
+   unsigned long flags;
+   long long ret;
+
+   spin_lock_irqsave(&cnt->lock, flags);
+   ret = cnt->usage - cnt->soft_limit;
+   spin_unlock_irqrestore(&cnt->lock, flags);
+   return ret;
+}
+
 #endif
diff -puN include/linux/swap.h~memory-controller-reclaim-on-contention 
include/linux/swap.h
--- 
linux-2.6.25-rc2/include/linux/swap.h~memory-controller-reclaim-on-contention   
2008-02-19 12:31:51.0 +0530
+++ linux-2.6.25-rc2-balbir/include/linux/swap.h2008-02-19 
12:31:51.0 +0530
@@ -184,7 +184,9 @@ extern void swap_setup(void);
 extern unsigned long try_to_free_pages(struct zone **zones, int order,
gfp_t gfp_mask);
 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
-   gfp_t gfp_mask);
+   gfp_t gfp_mask,
+   unsigned long nr_pages,
+   struct zone **zones);
 extern int __isolate_lru_page(struct page *page, int mode);
 extern unsigned long shrink_all_memory(unsigned long nr_pages);
 extern int vm_swappiness;
diff -puN mm/memcontrol.c~memory-controller-reclaim-on-contention 
mm/memcontrol.c
--- linux-2.6.25-rc2/mm/memcontrol.c~memory-controller-reclaim-on-contention
2008-02-19 12:31:51.0 +0530
+++ linux-2.6.25-rc2-balbir/mm/memcontrol.c 2008-02-19 12:31:51.0 
+0530
@@ -35,7 +35,7 @@
 
 struct cgroup_subsys mem_cgroup_subsys;
 static const int MEM_CGROUP_RECLAIM_RETRIES = 5;
-static spinlock_t mem_cgroup_sl_list_lock; /* spin lock that protects */
+static rwlock_t mem_cgroup_sl_list_lock;   /* spin lock that protects */
/* the list of cgroups over*/
/* their soft limit */
 static struct list_head mem_cgroup_sl_exceeded_list;
@@ -646,7 +646,8 @@ retry:
if (!(gfp_mask & __GFP_WAIT))
goto out;
 
-   if (try_to_free_mem_cgroup_pages(mem, gfp_mask))
+   if (try_to_free_mem_cgroup_pages(mem, gfp_mask,
+   SWAP_CLUSTER_MAX, NULL))
continue;
 
/*
@@ -692,11 +693,11 @@ retry:
 * cgroups over their soft lim

[mm] [PATCH 2/4] Add the soft limit interface v2

2008-02-18 Thread Balbir Singh

A new configuration file called soft_limit_in_bytes is added. The parsing
and configuration rules remain the same as for the limit_in_bytes user
interface.

A global list of all memory cgroups over their soft limit is maintained.
This list is then used to reclaim memory on global pressure. A cgroup is
removed from the list when the cgroup is deleted.

The global list is protected with a read-write spinlock.


Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---

 mm/memcontrol.c |   33 -
 1 file changed, 32 insertions(+), 1 deletion(-)

diff -puN mm/memcontrol.c~memory-controller-add-soft-limit-interface 
mm/memcontrol.c
--- linux-2.6.25-rc2/mm/memcontrol.c~memory-controller-add-soft-limit-interface 
2008-02-19 12:31:49.0 +0530
+++ linux-2.6.25-rc2-balbir/mm/memcontrol.c 2008-02-19 12:31:49.0 
+0530
@@ -35,6 +35,10 @@
 
 struct cgroup_subsys mem_cgroup_subsys;
 static const int MEM_CGROUP_RECLAIM_RETRIES = 5;
+static spinlock_t mem_cgroup_sl_list_lock; /* spin lock that protects */
+   /* the list of cgroups over*/
+   /* their soft limit */
+static struct list_head mem_cgroup_sl_exceeded_list;
 
 /*
  * Statistics for memory cgroup.
@@ -136,6 +140,10 @@ struct mem_cgroup {
 * statistics.
 */
struct mem_cgroup_stat stat;
+   /*
+* List of all mem_cgroup's that exceed their soft limit
+*/
+   struct list_head sl_exceeded_list;
 };
 
 /*
@@ -679,6 +687,18 @@ retry:
goto retry;
}
 
+   /*
+* If we exceed our soft limit, we get added to the list of
+* cgroups over their soft limit
+*/
+   if (!res_counter_check_under_limit(&mem->res, RES_SOFT_LIMIT)) {
+   spin_lock_irqsave(&mem_cgroup_sl_list_lock, flags);
+   if (list_empty(&mem->sl_exceeded_list))
+   list_add_tail(&mem->sl_exceeded_list,
+   &mem_cgroup_sl_exceeded_list);
+   spin_unlock_irqrestore(&mem_cgroup_sl_list_lock, flags);
+   }
+
mz = page_cgroup_zoneinfo(pc);
spin_lock_irqsave(&mz->lru_lock, flags);
/* Update statistics vector */
@@ -736,13 +756,14 @@ void mem_cgroup_uncharge(struct page_cgr
if (atomic_dec_and_test(&pc->ref_cnt)) {
page = pc->page;
mz = page_cgroup_zoneinfo(pc);
+   mem = pc->mem_cgroup;
/*
 * get page->cgroup and clear it under lock.
 * force_empty can drop page->cgroup without checking refcnt.
 */
unlock_page_cgroup(page);
+
if (clear_page_cgroup(page, pc) == pc) {
-   mem = pc->mem_cgroup;
css_put(&mem->css);
res_counter_uncharge(&mem->res, PAGE_SIZE);
spin_lock_irqsave(&mz->lru_lock, flags);
@@ -1046,6 +1067,12 @@ static struct cftype mem_cgroup_files[] 
.name = "stat",
.open = mem_control_stat_open,
},
+   {
+   .name = "soft_limit_in_bytes",
+   .private = RES_SOFT_LIMIT,
+   .write = mem_cgroup_write,
+   .read = mem_cgroup_read,
+   },
 };
 
 static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
@@ -1097,6 +1124,9 @@ mem_cgroup_create(struct cgroup_subsys *
if (unlikely((cont->parent) == NULL)) {
mem = &init_mem_cgroup;
init_mm.mem_cgroup = mem;
+   INIT_LIST_HEAD(&mem->sl_exceeded_list);
+   spin_lock_init(&mem_cgroup_sl_list_lock);
+   INIT_LIST_HEAD(&mem_cgroup_sl_exceeded_list);
} else
mem = kzalloc(sizeof(struct mem_cgroup), GFP_KERNEL);
 
@@ -1104,6 +1134,7 @@ mem_cgroup_create(struct cgroup_subsys *
return NULL;
 
res_counter_init(&mem->res);
+   INIT_LIST_HEAD(&mem->sl_exceeded_list);
 
memset(&mem->info, 0, sizeof(mem->info));
 
_

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NULL pointer in kmem_cache_alloc with 2.6.25-rc1

2008-02-18 Thread Zhang, Yanmin
On Fri, 2008-02-15 at 08:42 -0800, Christoph Lameter wrote:
> On Fri, 15 Feb 2008, Zhang, Yanmin wrote:
> 
> > On my 16-core tigerton, kernel panic when I ran hackbench process testing. 
> > See
> > below log.
> > 
> 
> > Kernel panic at line 1637 in file mm/slub.c because object=c->freelist=NULL.
> 
> H. freelist should never be NULL. Could you rerun the test and boot with 
> slub_debug to make sure that there is no memory corruption?
1) Without slub_debug option, sometime I could trigger it, sometimes not.
2) With slub_debug option, or just enable debug for slab skbuff_fclone_cache
and skbuff_head_cache, I couldn't trigger it.

I will do more testing and investigation, as the bug also exists in 2.6.25-rc2.

-yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[mm] [PATCH 1/4] Modify resource counters to add soft limit support v2

2008-02-18 Thread Balbir Singh


Changelog v2
1. Remove memory controller specific comments from resource counters

The resource counter member limit is split into soft and hard limits.
The same locking rule apply for both limits.


Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---

 include/linux/res_counter.h |   32 
 kernel/res_counter.c|   11 +++
 mm/memcontrol.c |   10 +-
 3 files changed, 36 insertions(+), 17 deletions(-)

diff -puN 
include/linux/res_counter.h~memory-controller-res_counters-soft-limit-setup 
include/linux/res_counter.h
--- 
linux-2.6.25-rc2/include/linux/res_counter.h~memory-controller-res_counters-soft-limit-setup
2008-02-19 12:31:47.0 +0530
+++ linux-2.6.25-rc2-balbir/include/linux/res_counter.h 2008-02-19 
12:31:47.0 +0530
@@ -27,7 +27,11 @@ struct res_counter {
/*
 * the limit that usage cannot exceed
 */
-   unsigned long long limit;
+   unsigned long long hard_limit;
+   /*
+* the limit that usage can exceed
+*/
+   unsigned long long soft_limit;
/*
 * the number of unsuccessful attempts to consume the resource
 */
@@ -64,7 +68,8 @@ ssize_t res_counter_write(struct res_cou
 
 enum {
RES_USAGE,
-   RES_LIMIT,
+   RES_SOFT_LIMIT,
+   RES_HARD_LIMIT,
RES_FAILCNT,
 };
 
@@ -101,11 +106,21 @@ int res_counter_charge(struct res_counte
 void res_counter_uncharge_locked(struct res_counter *counter, unsigned long 
val);
 void res_counter_uncharge(struct res_counter *counter, unsigned long val);
 
-static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
+static inline bool res_counter_limit_check_locked(struct res_counter *cnt,
+   int member)
 {
-   if (cnt->usage < cnt->limit)
-   return true;
-
+   switch (member) {
+   case RES_HARD_LIMIT:
+   if (cnt->usage < cnt->hard_limit)
+   return true;
+   break;
+   case RES_SOFT_LIMIT:
+   if (cnt->usage < cnt->soft_limit)
+   return true;
+   break;
+   default:
+   BUG_ON(1);
+   }
return false;
 }
 
@@ -113,13 +128,14 @@ static inline bool res_counter_limit_che
  * Helper function to detect if the cgroup is within it's limit or
  * not. It's currently called from cgroup_rss_prepare()
  */
-static inline bool res_counter_check_under_limit(struct res_counter *cnt)
+static inline bool res_counter_check_under_limit(struct res_counter *cnt,
+   int member)
 {
bool ret;
unsigned long flags;
 
spin_lock_irqsave(&cnt->lock, flags);
-   ret = res_counter_limit_check_locked(cnt);
+   ret = res_counter_limit_check_locked(cnt, member);
spin_unlock_irqrestore(&cnt->lock, flags);
return ret;
 }
diff -puN kernel/res_counter.c~memory-controller-res_counters-soft-limit-setup 
kernel/res_counter.c
--- 
linux-2.6.25-rc2/kernel/res_counter.c~memory-controller-res_counters-soft-limit-setup
   2008-02-19 12:31:47.0 +0530
+++ linux-2.6.25-rc2-balbir/kernel/res_counter.c2008-02-19 
12:31:47.0 +0530
@@ -16,12 +16,13 @@
 void res_counter_init(struct res_counter *counter)
 {
spin_lock_init(&counter->lock);
-   counter->limit = (unsigned long long)LLONG_MAX;
+   counter->soft_limit = (unsigned long long)LLONG_MAX;
+   counter->hard_limit = (unsigned long long)LLONG_MAX;
 }
 
 int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
 {
-   if (counter->usage + val > counter->limit) {
+   if (counter->usage + val > counter->hard_limit) {
counter->failcnt++;
return -ENOMEM;
}
@@ -65,8 +66,10 @@ res_counter_member(struct res_counter *c
switch (member) {
case RES_USAGE:
return &counter->usage;
-   case RES_LIMIT:
-   return &counter->limit;
+   case RES_SOFT_LIMIT:
+   return &counter->soft_limit;
+   case RES_HARD_LIMIT:
+   return &counter->hard_limit;
case RES_FAILCNT:
return &counter->failcnt;
};
diff -puN mm/memcontrol.c~memory-controller-res_counters-soft-limit-setup 
mm/memcontrol.c
--- 
linux-2.6.25-rc2/mm/memcontrol.c~memory-controller-res_counters-soft-limit-setup
2008-02-19 12:31:47.0 +0530
+++ linux-2.6.25-rc2-balbir/mm/memcontrol.c 2008-02-19 12:31:47.0 
+0530
@@ -568,7 +568,7 @@ unsigned long mem_cgroup_isolate_pages(u
  * Charge the memory controller for page usage.
  * Return
  * 0 if the charge was successful
- * < 0 if the cgroup is over its limit
+ * < 0 if the cgroup is over its hard limit
  */
 static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask, enum ch

[mm][PATCH 0/4] Add soft limits to the memory controller v2

2008-02-18 Thread Balbir Singh
This patchset implements the basic changes required to implement soft limits
in the memory controller. A soft limit is a variation of the currently
supported hard limit feature. A memory cgroup can exceed it's soft limit
provided there is no contention for memory.

These patches were tested on a PowerPC box, by running a programs in parallel,
and checking their behaviour for various soft limit values.

These patches were developed on top of 2.6.25-rc2-mm1. Comments, suggestions,
criticism are all welcome!

TODOs:

1. Currently there is no ordering of memory cgroups over their limit.
   We use a simple linked list to maintain a list of groups over their
   limit. In the future, we might want to create a heap of objects ordered
   by the amount by which they exceed soft limit.
2. Distribute the excessive (non-contended) resources between groups
   in the ratio of their soft limits
3. Merge with KAMEZAWA's and YAMAMOTO's water mark and background reclaim
   patches in the long-term


series
--
memory-controller-res_counters-soft-limit-setup.patch
memory-controller-add-soft-limit-interface.patch
memory-controller-reclaim-on-contention.patch
memory-controller-add-soft-limit-documentation.patch

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


LILO fatal error after going from 2.6.22 to 2.6.24.2

2008-02-18 Thread Rasmus Andersen
Hello,

I have just upgraded from 2.6.22 to 2.6.24.2 and after booting into the
new kernel and seeing that everything went right, I wanted to make the
new kernel the default boot kernel. But running LILO I got

   Fatal: Linux experimental device 0x04x needs to be defined.
   Check 'man lilo.conf' under 'disk=' and 'max-partitions='

(Full output from lilo -v 5 below). Apart from setting 2.6.24.2 as
default this is the same lilo config as 2.6.22 was able to write fine.
Can anyone tell me what I (or the kernel) am/is doing wrong?

Thanks,
  Rasmus



LILO version 22.7.3, Copyright (C) 1992-1998 Werner Almesberger
Development beyond version 21 Copyright (C) 1999-2006 John Coffman
Released 11-Aug-2006 and compiled at 17:06:40 on Jan 17 2007

raid_setup: dev=000C  rdev=0340
raid_setup returns offset =   ndisk = 0
 BIOS   VolumeID   Device
Reading boot sector from /dev/hdb
geo_get: device 0340, all=1
pf_hard_disk_scan: (7,0) /dev/loop0
pf_hard_disk_scan: (3,64) /dev/hdb
pf_hard_disk_scan: (3,65) /dev/hdb1
lookup_dev:  number=0340
lookup_dev:  number=0340
pf:  dev=0340  id=B661B661  name=/dev/hdb
geo_query_dev: device=0340
lookup_dev:  number=0340
exit geo_query_dev
bios_dev:  device 0340
lookup_dev:  number=0340
bios_dev:  masked device 0340, which is /dev/hdb
bios_dev: geometry check found 0 matches
bios_dev: (0x82)  vol-ID=17FF01F9  *PT=08070AD0
bios_dev: (0x81)  vol-ID=7C4DC2CA  *PT=08070A88
bios_dev: (0x80)  vol-ID=B661B661  *PT=08070A40
bios_dev: PT match found 1 match (0x80)
pf_hard_disk_scan: (3,66) /dev/hdb2
pf_hard_disk_scan: (8,0) /dev/sda
pf_hard_disk_scan: (8,1) /dev/sda1
lookup_dev:  number=0800
lookup_dev:  number=0800
pf:  dev=0800  id=7C4DC2CA  name=/dev/sda
geo_query_dev: device=0800
lookup_dev:  number=0800
lookup_dev:  number=0300
exit geo_query_dev
bios_dev:  device 0800
lookup_dev:  number=0800
bios_dev:  masked device 0800, which is /dev/sda
bios_dev: geometry check found 0 matches
bios_dev: (0x82)  vol-ID=17FF01F9  *PT=08070AD0
bios_dev: (0x81)  vol-ID=7C4DC2CA  *PT=08070A88
bios_dev: (0x80)  vol-ID=B661B661  *PT=08070A40
bios_dev: PT match found 1 match (0x81)
pf_hard_disk_scan: (8,16) /dev/sdb
pf_hard_disk_scan: (8,17) /dev/sdb1
lookup_dev:  number=0810
lookup_dev:  number=0810
pf:  dev=0810  id=17FF01F9  name=/dev/sdb
geo_query_dev: device=0810
lookup_dev:  number=0810
lookup_dev:  number=0300
exit geo_query_dev
bios_dev:  device 0810
lookup_dev:  number=0810
bios_dev:  masked device 0810, which is /dev/sdb
bios_dev: geometry check found 0 matches
bios_dev: (0x82)  vol-ID=17FF01F9  *PT=08070AD0
bios_dev: (0x81)  vol-ID=7C4DC2CA  *PT=08070A88
bios_dev: (0x80)  vol-ID=B661B661  *PT=08070A40
bios_dev: PT match found 1 match (0x81)
pf_hard_disk_scan: (9,0) /dev/md0
pf_hard_disk_scan: (253,0) /dev/dm-0
Caching device /dev/dm-0 (0xFD00)
pf_hard_disk_scan: (253,1) /dev/dm-1
Caching device /dev/dm-1 (0xFD01)
pf_hard_disk_scan: (253,2) /dev/dm-2
Caching device /dev/dm-2 (0xFD02)
pf_hard_disk_scan: (253,3) /dev/dm-3
Caching device /dev/dm-3 (0xFD03)
pf_hard_disk_scan: (253,4) /dev/dm-4
Caching device /dev/dm-4 (0xFD04)
pf_hard_disk_scan: (253,5) /dev/dm-5
Caching device /dev/dm-5 (0xFD05)
pf_hard_disk_scan: (253,6) /dev/dm-6
Caching device /dev/dm-6 (0xFD06)
  0340  B661B661  /dev/hdb
  0800  7C4DC2CA  /dev/sda
  0810  17FF01F9  /dev/sdb
pf_hard_disk_scan: ndevs=3
  0340  B661B661  /dev/hdb
  0800  7C4DC2CA  /dev/sda
  0810  17FF01F9  /dev/sdb
Resolve invalid VolumeIDs
Resolve duplicate VolumeIDs
  0340  B661B661  /dev/hdb
  0800  7C4DC2CA  /dev/sda
  0810  17FF01F9  /dev/sdb
device codes (user assigned pf) = 0
device codes (user assigned) = 0
device codes (BIOS assigned) = 3
Filling in '/dev/sdb' = 0x82
device codes (canonical) = 7
geo_query_dev: device=0340
lookup_dev:  number=0340
exit geo_query_dev
bios_dev:  device 0340
lookup_dev:  number=0340
bios_dev:  masked device 0340, which is /dev/hdb
bios_dev: geometry check found 0 matches
bios_dev: (0x82)  vol-ID=17FF01F9  *PT=08070AD0
bios_dev: (0x81)  vol-ID=7C4DC2CA  *PT=08070A88
bios_dev: (0x80)  vol-ID=B661B661  *PT=08070A40
bios_dev: PT match found 1 match (0x80)
Device 0x0340: BIOS drive 0x80, 255 heads, 9729 cylinders,
   63 sectors. Partition offset: 0 sectors.
registering bios=0x80  device=0x0340
Using Volume ID B661B661 on bios 80
geo_get: device FD06, all=1
geo_query_dev: device=FD06
lookup_dev:  number=FD06
Fatal: Linux experimental device 0x04x needs to be defined.
Check 'man lilo.conf' under 'disk=' and 'max-partitions='
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.25-rc2

2008-02-18 Thread Torsten Kaiser
On Feb 19, 2008 7:11 AM, Ingo Molnar <[EMAIL PROTECTED]> wrote:
> * Torsten Kaiser <[EMAIL PROTECTED]> wrote:
> > On Feb 15, 2008 10:23 PM, Linus Torvalds <[EMAIL PROTECTED]> wrote:
> > >
> > > Ok,
> > >  this kernel is a winner.
> >
> > Sadly not for me:
> > [ 5282.056415] [ cut here ]
> > [ 5282.059757] kernel BUG at lib/list_debug.c:33!
> > [ 5282.062055] invalid opcode:  [1] SMP
> > [ 5282.062055] CPU 3
>
> hm. Your crashes do seem to span multiple subsystems, but it always
> seems to be around the SLUB code. Could you try the patch below? The
> SLUB code has a new optimization and i'm not 100% sure about it. [the
> hack below switches the SLUB optimization off by disabling the CPU
> feature it relies on.]
>
> Ingo
>
> ->
>  arch/x86/Kconfig |4 
>  1 file changed, 4 deletions(-)
>
> Index: linux/arch/x86/Kconfig
> ===
> --- linux.orig/arch/x86/Kconfig
> +++ linux/arch/x86/Kconfig
> @@ -59,10 +59,6 @@ config HAVE_LATENCYTOP_SUPPORT
>  config SEMAPHORE_SLEEPERS
> def_bool y
>
> -config FAST_CMPXCHG_LOCAL
> -   bool
> -   default y
> -
>  config MMU
> def_bool y
>

$ grep FAST_CMPXCHG_LOCAL */.config
linux-2.6.24-rc2-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.24-rc3-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.24-rc3-mm2/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.24-rc6-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.24-rc8-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.25-rc1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.25-rc2-mm1/.config:CONFIG_FAST_CMPXCHG_LOCAL=y
linux-2.6.25-rc2/.config:CONFIG_FAST_CMPXCHG_LOCAL=y

-rc2-mm1 still worked for me.

Did you mean the new SLUB_FASTPATH?
$ grep "define SLUB_FASTPATH" */mm/slub.c
linux-2.6.25-rc1/mm/slub.c:#define SLUB_FASTPATH
linux-2.6.25-rc2-mm1/mm/slub.c:#define SLUB_FASTPATH
linux-2.6.25-rc2/mm/slub.c:#define SLUB_FASTPATH

The 2.6.24-rc3+ mm-kernels did crash for me, but don't seem to contain this...

On the other hand:
>From the crash in 2.6.25-rc2-mm1:
[59987.116182] RIP  [] kmem_cache_alloc_node+0x6d/0xa0

(gdb) list *0x8029f83d
0x8029f83d is in kmem_cache_alloc_node (mm/slub.c:1646).
1641if (unlikely(is_end(object) || !node_match(c, node))) {
1642object = __slab_alloc(s, gfpflags,
node, addr, c);
1643break;
1644}
1645stat(c, ALLOC_FASTPATH);
1646} while (cmpxchg_local(&c->freelist, object, object[c->offset])
1647
 != object);
1648#else
1649unsigned long flags;
1650

That code is part for SLUB_FASTPATH.

I'm willing to test the patch, but don't know how fast I can find the
time to do it, so my answer if your patch helps might be delayed until
the weekend.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: tbench regression in 2.6.25-rc1

2008-02-18 Thread Zhang, Yanmin
On Mon, 2008-02-18 at 12:33 -0500, [EMAIL PROTECTED] wrote: 
> On Mon, 18 Feb 2008 16:12:38 +0800, "Zhang, Yanmin" said:
> 
> > I also think __refcnt is the key. I did a new testing by adding 2 unsigned 
> > long
> > pading before lastuse, so the 3 members are moved to next cache line. The 
> > performance is
> > recovered.
> > 
> > How about below patch? Almost all performance is recovered with the new 
> > patch.
> > 
> > Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]>
> 
> Could you add a comment someplace that says "refcnt wants to be on a different
> cache line from input/output/ops or performance tanks badly", to warn some
> future kernel hacker who starts adding new fields to the structure?
Ok. Below is the new patch.

1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So 
sizeof(dst_entry)=200
no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core 
tigerton by
moving tclassid to different place. It looks like tclassid could also have 
impact on
performance.
If moving tclassid before metrics, or just don't move tclassid, the performance 
isn't
good. So I move it behind metrics.

2) Add comments before __refcnt.

If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than
the one without the patch.

If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than
the one without the patch.

Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]>

---

--- linux-2.6.25-rc1/include/net/dst.h  2008-02-21 14:33:43.0 +0800
+++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-22 12:52:19.0 
+0800
@@ -52,15 +52,10 @@ struct dst_entry
unsigned short  header_len; /* more space at head required 
*/
unsigned short  trailer_len;/* space to reserve at tail */
 
-   u32 metrics[RTAX_MAX];
-   struct dst_entry*path;
-
-   unsigned long   rate_last;  /* rate limiting for ICMP */
unsigned intrate_tokens;
+   unsigned long   rate_last;  /* rate limiting for ICMP */
 
-#ifdef CONFIG_NET_CLS_ROUTE
-   __u32   tclassid;
-#endif
+   struct dst_entry*path;
 
struct neighbour*neighbour;
struct hh_cache *hh;
@@ -70,10 +65,20 @@ struct dst_entry
int (*output)(struct sk_buff*);
 
struct  dst_ops *ops;
-   
-   unsigned long   lastuse;
+
+   u32 metrics[RTAX_MAX];
+
+#ifdef CONFIG_NET_CLS_ROUTE
+   __u32   tclassid;
+#endif
+
+   /*
+* __refcnt wants to be on a different cache line from
+* input/output/ops or performance tanks badly
+*/
atomic_t__refcnt;   /* client references*/
int __use;
+   unsigned long   lastuse;
union {
struct dst_entry *next;
struct rtable*rt_next;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.25-rc2

2008-02-18 Thread Torsten Kaiser
On Feb 19, 2008 12:54 AM, Linus Torvalds <[EMAIL PROTECTED]> wrote:
>
>
> On Sat, 16 Feb 2008, Torsten Kaiser wrote:
> >
> > [ 5282.056415] [ cut here ]
> > [ 5282.059757] kernel BUG at lib/list_debug.c:33!
>
> Is there any chance that you could try to bisect this, if it's repeatable
> enough for you? Even if you can't bisect it *all* the way, it would be
> really good to do a handful of bisection runs which should already
> hopefully narrow it down a bit more.
>
> Linus
>

It's repeatable, but not in a really reliable way.
So to mark a kernel good I need to compile around 100 KDE packages,
and even then I'm not 100% sure, if it's good or if I was just lucky.

But I did a partly bisect against 2.6.24-rc6-mm1:
2.6.24-rc6 + mm-patches up to (including) git.nfsd -> worked
2.6.24-rc6 + mm-patches up to (including) git.xfs -> crashed

I think the only added patch between rc2-mm1 and rc3-mm2 in that range
where the iommu changes that I later ruled out.
That leaves some git trees as suspects:
git-ocfs2.patch
git-selinux.patch
git-s390.patch
git-sched.patch
git-sh.patch
git-scsi-misc.patch
git-unionfs.patch
git-v9fs.patch
git-watchdog.patch
git-wireless.patch
git-ipwireless_cs.patch
git-x86.patch
git-xfs.patch

(see http://marc.info/?l=linux-kernel&m=120276641105256 )

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH, RFC] kthread: (possibly) a missing memory barrier in kthread_stop()

2008-02-18 Thread Nick Piggin
On Tuesday 19 February 2008 10:03, Dmitry Adamushko wrote:
> Hi,
>
>
> [ description ]
>
> Subject: kthread: add a memory barrier to kthread_stop()
>
> 'kthread' threads do a check in the following order:
> - set_current_state(TASK_INTERRUPTIBLE);
> - kthread_should_stop();
>
> and set_current_state() implies an smp_mb().
>
> on another side (kthread_stop), wake_up_process() does not seem to
> guarantee a full mb.
>
> And 'kthread_stop_info.k' must be visible before wake_up_process()
> checks for/modifies a state of the 'kthread' task.
>
> (the patch is at the end of the message)
>
>
> [ more detailed description ]
>
> the current code might well be safe in case a to-be-stopped 'kthread'
> task is _not_ running on another CPU at the moment when kthread_stop()
> is called (in this case, 'rq->lock' will act as a kind of synch.
> point/barrier).
>
> Another case is as follows:
>
> CPU#0:
>
> ...
> while (kthread_should_stop()) {
>
>if (condition)
>  schedule();
>
>/* ... do something useful ... */   <--- EIP
>
>set_current_state(TASK_INTERRUPTIBLE);
> }
>
> so a 'kthread' task is about to call
> set_current_state(TASK_INTERRUPTIBLE) ...
>
>
> (in the mean time)
>
> CPU#1:
>
> kthread_stop()
>
> -> kthread_stop_info.k = k (*)
> -> wake_up_process()
>
> wake_up_process() looks like:
>
> (try_to_wake_up)
>
> IRQ_OFF
> LOCK
>
> old_state = p->state;
> if (!(old_state & state))  (**)
>  goto out;
>
> ...
>
> UNLOCK
> IRQ_ON
>
>
> let's suppose (*) and (**) are reordered
> (according to Documentation/memory-barriers.txt, neither IRQ_OFF nor
> LOCK may prevent it from happening).
>
> - the state is TASK_RUNNING, so we are about to return.
>
> - CPU#1 is about to execute (*) (it's guaranteed to be done before
> spin_unlock(&rq->lock) at the end of try_to_wake_up())
>
>
> (in the mean time)
>
> CPU#0:
>
> - set_current_state(TASK_INTERRUPTIBLE);
> - kthread_should_stop();
>
> here, kthread_stop_info.k is not yet visible
>
> - schedule()
>
> ...
>
> we missed a 'kthread_stop' event.
>
> hum?

Looks like you are correct to me.


> TIA,
>
> ---
>
> From: Dmitry Adamushko <[EMAIL PROTECTED]>
> Subject: kthread: add a memory barrier to kthread_stop()
>
> 'kthread' threads do a check in the following order:
> - set_current_state(TASK_INTERRUPTIBLE);
> - kthread_should_stop();
>
> and set_current_state() implies an smp_mb().
>
> on another side (kthread_stop), wake_up_process() is not guaranteed to
> act as a full mb.
>
> 'kthread_stop_info.k' must be visible before wake_up_process() checks
> for/modifies a state of the 'kthread' task.
>
>
> Signed-off-by: Dmitry Adamushko <[EMAIL PROTECTED]>
>
>
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 0ac8878..5167110 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -211,6 +211,10 @@ int kthread_stop(struct task_struct *k)
>
>   /* Now set kthread_should_stop() to true, and wake it up. */
>   kthread_stop_info.k = k;
> +
> + /* The previous store operation must not get ahead of the wakeup. */
> + smp_mb();
> +
>   wake_up_process(k);
>   put_task_struct(k);
>
>
>
> --

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] serial: remove double initializer

2008-02-18 Thread Harvey Harrison
Commit:
02c9b5cf9acd8a85313b892dc5196ccf133d4884 serial: add ADDI-DATA GmbH 
Communication cardsin8250_pci.c and pci_ids.h

Added a second initializer, perhaps sopmething else was intended?

Signed-off-by: Harvey Harrison <[EMAIL PROTECTED]>
---
 drivers/serial/8250_pci.c |7 ---
 1 files changed, 0 insertions(+), 7 deletions(-)

diff --git a/drivers/serial/8250_pci.c b/drivers/serial/8250_pci.c
index a8bec49..f97224c 100644
--- a/drivers/serial/8250_pci.c
+++ b/drivers/serial/8250_pci.c
@@ -1214,13 +1214,6 @@ static struct pciserial_board pci_boards[] __devinitdata 
= {
.base_baud  = 115200,
.uart_offset= 8,
},
-   [pbn_b0_8_115200] = {
-   .flags  = FL_BASE0,
-   .num_ports  = 8,
-   .base_baud  = 115200,
-   .uart_offset= 8,
-   },
-
[pbn_b0_1_921600] = {
.flags  = FL_BASE0,
.num_ports  = 1,
-- 
1.5.4.2.200.g99e75



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] Fix Unlikely(x) == y

2008-02-18 Thread Nick Piggin
On Tuesday 19 February 2008 16:58, Willy Tarreau wrote:
> On Tue, Feb 19, 2008 at 01:33:53PM +1100, Nick Piggin wrote:
> > > Note in particular the last predictors; assuming branch ending
> > > with goto, including call, causing early function return or
> > > returning negative constant are not taken. Just these alone
> > > are likely 95+% of the unlikelies in the kernel.
> >
> > Yes, gcc should be able to do pretty good heuristics, considering
> > the quite good numbers that cold CPU predictors can attain. However
> > for really performance critical code (or really "never" executed
> > code), then I think it is OK to have the hints and not have to rely
> > on gcc heuristics.
>
> in my experience, the real problem is that gcc does what *it* wants and not
> what *you* want. I've been annoyed a lot by the way it coded some loops
> that could really be blazingly fast, but which resulted in a ton of
> branches due to its predictors. And using unlikely() there was a real mess,
> because instead of just hinting the compiler with probabilities to write
> some linear code for the *most* common case, it ended up with awful
> branches everywhere with code sent far away and even duplicated for some
> branches.
>
> Sometimes, for performance critical paths, I would like gcc to be dumb and
> follow *my* code and not its hard-coded probabilities. For instance, in a
> tree traversal, you really know how you want to build your loop. And these
> days, it seems like the single method of getting it your way is doing asm,
> which obviously is not portable :-(

Probably all true.


> Maybe one thing we would need would be the ability to assign probabilities
> to each branch based on what we expect, so that gcc could build a better
> tree keeping most frequently used code tight.

I don't know if that would *directly* lead to gcc being smarter. I
think perhaps they probably don't benchmark on code bases that have
much explicit annotation (I'm sure they wouldn't seriously benchmark
any parts of Linux as part of daily development). I think the key is
to continue to use annotations _properly_, and eventually gcc should
go in the right direction if enough code uses it.

And if you have really good examples like it sounds like above, then
I guess that should be reported to gcc?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] bttv: struct member initialized twice

2008-02-18 Thread Harvey Harrison
fixes sparse warning:
drivers/media/video/bt8xx/bttv-driver.c:3391:3: warning: Initializer entry 
defined twice
drivers/media/video/bt8xx/bttv-driver.c:3392:3:   also defined here

Signed-off-by: Harvey Harrison <[EMAIL PROTECTED]>
---
 drivers/media/video/bt8xx/bttv-driver.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/media/video/bt8xx/bttv-driver.c 
b/drivers/media/video/bt8xx/bttv-driver.c
index 5404fcc..a080c14 100644
--- a/drivers/media/video/bt8xx/bttv-driver.c
+++ b/drivers/media/video/bt8xx/bttv-driver.c
@@ -3389,7 +3389,6 @@ static struct video_device bttv_video_template =
.vidiocgmbuf= vidiocgmbuf,
 #endif
.vidioc_g_crop  = bttv_g_crop,
-   .vidioc_g_crop  = bttv_g_crop,
.vidioc_s_crop  = bttv_s_crop,
.vidioc_g_fbuf  = bttv_g_fbuf,
.vidioc_s_fbuf  = bttv_s_fbuf,
-- 
1.5.4.2.200.g99e75



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.25-rc2

2008-02-18 Thread Ingo Molnar

* Torsten Kaiser <[EMAIL PROTECTED]> wrote:

> On Feb 15, 2008 10:23 PM, Linus Torvalds <[EMAIL PROTECTED]> wrote:
> >
> > Ok,
> >  this kernel is a winner.
> 
> Sadly not for me:
> [ 5282.056415] [ cut here ]
> [ 5282.059757] kernel BUG at lib/list_debug.c:33!
> [ 5282.062055] invalid opcode:  [1] SMP
> [ 5282.062055] CPU 3

hm. Your crashes do seem to span multiple subsystems, but it always 
seems to be around the SLUB code. Could you try the patch below? The 
SLUB code has a new optimization and i'm not 100% sure about it. [the 
hack below switches the SLUB optimization off by disabling the CPU 
feature it relies on.]

Ingo

->
 arch/x86/Kconfig |4 
 1 file changed, 4 deletions(-)

Index: linux/arch/x86/Kconfig
===
--- linux.orig/arch/x86/Kconfig
+++ linux/arch/x86/Kconfig
@@ -59,10 +59,6 @@ config HAVE_LATENCYTOP_SUPPORT
 config SEMAPHORE_SLEEPERS
def_bool y
 
-config FAST_CMPXCHG_LOCAL
-   bool
-   default y
-
 config MMU
def_bool y
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] Single RQ group scheduling

2008-02-18 Thread Mike Galbraith

On Mon, 2008-02-18 at 10:55 +0100, Peter Zijlstra wrote:
> This is my current queue for single RQ group scheduling.

I took these out for a brief maxcpus=1 spin yesterday, and noticed
something.  Running 4 copies of chew-max, one as a user, the context
switch rate was high (~1800/s).  I increased sched_min_granularity_ns to
see if that would lower it, and it did, but it also upset fairness.
With sched_min_granularity_ns bumped to half of sched_latency_ns (40ms
default) it was a largish skew.

top - 06:58:46 up 3 min, 15 users,  load average: 4.71, 2.53, 1.04
Tasks: 214 total,   9 running, 204 sleeping,   0 stopped,   1 zombie
Cpu(s): 39.0%us, 61.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  P COMMAND
 5389 mikeg 20   0  1464  364  304 R 40.1  0.0   0:29.71 0 chew-max
 5373 root  20   0  1464  364  304 R 19.0  0.0   0:16.30 0 chew-max
 5388 root  20   0  1464  364  304 R 19.0  0.0   0:15.79 0 chew-max
 5374 root  20   0  1464  364  304 R 18.6  0.0   0:15.90 0 chew-max

-Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] bitmap onto and fold operators for mempolicy extensions

2008-02-18 Thread Paul Jackson
David wrote:
> So what is the MPOL_F_RELATIVE_NODES behavior?  Is it a combination of 
> nodes_onto() and nodes_fold()?

MPOL_F_RELATIVE_NODES should always combination of nodes_onto() and
nodes_fold().

The reason I say that is consistency with the end cases.  That is,
we need fold in the case that no requested nodes have numbers
smaller than the weight of the cpusets mems_allowed.  At the opposite
extreme, when -all- requested node numbers are smaller than the weight
of mems_allowed, fold has no affect, so it's fine to use there.  And
for the in between cases, when some requested nodes are smaller and
some not, fold is one possible reasonable answer.  So use it everywhere
for consistency.

> So it's easy enough to do this:
>   case MPOL_INTERLEAVE:
>   if (flags & MPOL_F_RELATIVE_NODES) {
>   nodes_onto(pol->v.nodes, pol->user_nodemask,
>  cpuset_context_nmask);
>   if (nodes_empty(pol->v.nodes))

I hate any more special cases than I have to have -- so not that way.

> But what if we require a combination?  Say the user asked for a policy of 
> MPOL_INTERLEAVE | MPOL_F_RELATIVE_NODES over nodes 4-8 in a cpuset 
> constrained to mems 0-7?  Should the resultant be 0,4-7 (combination of 
> nodes_onto() and nodes_fold()) or simply be 4-7 (just nodes_onto())?

All fold, all the time ;).  So this one should be 0,4-7.


> And what if the MPOL_INTERLEAVE | MPOL_F_RELATIVE_NODES nodemask is 0,4-8 
> in the same cpuset constrained to mems 0-7?  Should the resultant be
> 
>  - 0,4-7 (nodes_onto() and nodes_fold()),
> 
>  - 0,4-7 (just nodes_onto()), or
> 
>  - 0-1,4-7 (nodes_onto(), nodes_fold(), and shift)?
> 
> The last option, 0-1,4-7, is the only one that preserves the same weight 
> as the relative nodemask.

Again, 0,4-7.

Weight preservation is not a goal here, and would require special cases.

If someone crams a five pound cpuset relative memory policy into a two
poound cpuset, they will likely loose weight.  What matters in that
case is that things plod along without too much distress, and that
when the job is restored to a proper size cpuset, its memory policies
recover their full health.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIE ASPM support hangs my laptop pretty often

2008-02-18 Thread Shaohua Li

On Wed, 2008-02-06 at 01:40 +0800, Дамјан Георгиевски wrote:
> I've patched my kernel with the PCIe ASPM and after setting
> echo powersave > /sys/module/pcie_aspm/parameters/policy
> 
> I started to experience random hangs of my laptop.
> Hardware info:
> Thinkpad x60s 1704-5UG
> also tested on a firends X60s 1702-F6U
> 
> Kernel is 2.6.24 + these patches:
>  tuxonice 3.0-rc5
>  thinkpad_acpi v0.19-20080107
>  tp_smapi 0.36
Hi,
Sorry for the long delay, I'm just back from vocation. Some devices or
chipsets don't work well with ASPM. This is one of the reason why the
default policy of the patch is per BIOS setting. Ideally drivers should
disable ASPM for specific devices, the patch provides an API
(pci_disable_link_state) for this too. As Auke suggested, you can use
the per-device interface to control separate links to see which device
is broken. If you found one, please report to driver maintainer and me,
we can disable ASPM in the driver.

Thanks,
Shaohua

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] Fix Unlikely(x) == y

2008-02-18 Thread Willy Tarreau
On Tue, Feb 19, 2008 at 01:33:53PM +1100, Nick Piggin wrote:
> > Note in particular the last predictors; assuming branch ending
> > with goto, including call, causing early function return or
> > returning negative constant are not taken. Just these alone
> > are likely 95+% of the unlikelies in the kernel.
> 
> Yes, gcc should be able to do pretty good heuristics, considering
> the quite good numbers that cold CPU predictors can attain. However
> for really performance critical code (or really "never" executed
> code), then I think it is OK to have the hints and not have to rely
> on gcc heuristics.

in my experience, the real problem is that gcc does what *it* wants and not
what *you* want. I've been annoyed a lot by the way it coded some loops that
could really be blazingly fast, but which resulted in a ton of branches due
to its predictors. And using unlikely() there was a real mess, because instead
of just hinting the compiler with probabilities to write some linear code for
the *most* common case, it ended up with awful branches everywhere with code
sent far away and even duplicated for some branches.

Sometimes, for performance critical paths, I would like gcc to be dumb and
follow *my* code and not its hard-coded probabilities. For instance, in a
tree traversal, you really know how you want to build your loop. And these
days, it seems like the single method of getting it your way is doing asm,
which obviously is not portable :-(

Maybe one thing we would need would be the ability to assign probabilities
to each branch based on what we expect, so that gcc could build a better
tree keeping most frequently used code tight.

Hmm I've just noticed -fno-guess-branch-probability in the man, I never tried
it.

regards,
Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Optiarc DVD RW AD-5200A audio playing

2008-02-18 Thread Borislav Petkov
On Tue, Feb 19, 2008 at 12:18:48AM +0100, Bartlomiej Zolnierkiewicz wrote:
> On Monday 18 February 2008, Stefan Bader wrote:
> > Borislav Petkov wrote:
> > > On Sat, Feb 16, 2008 at 04:24:01PM +0100, Bartlomiej Zolnierkiewicz wrote:
> > >> Hi,
> > >>
> > >> On Saturday 16 February 2008, Borislav Petkov wrote:
> > >>> On Fri, Feb 15, 2008 at 02:53:27PM -0500, Stefan Bader wrote:
> >  Hello Borislav,
> > 
> >  I worked on a problem with an DVD driver (model=Optiarc DVD RW 
> >  AD-5200A)
> >  which obviously has the same problem as some Matshita drives. The
> >  following patch was reported to enabled audio playing on this drive.
> >  Would this approach be suitable for upstream or are there other
> >  solutions to this problem?
> > 
> >  Regards,
> >  Stefan
> > 
> >  --- a/drivers/ide/ide-cd.c
> >  +++ b/drivers/ide/ide-cd.c
> >  @@ -2988,7 +2988,8 @@ int ide_cdrom_probe_capabilities (ide_drive_t 
> >  *drive)
> > if (strcmp(drive->id->model, "MATSHITADVD-ROM SR-8187") == 0 ||
> > strcmp(drive->id->model, "MATSHITADVD-ROM SR-8186") == 0 ||
> > strcmp(drive->id->model, "MATSHITADVD-ROM SR-8176") == 0 ||
> >  -  strcmp(drive->id->model, "MATSHITADVD-ROM SR-8174") == 0)
> >  +  strcmp(drive->id->model, "MATSHITADVD-ROM SR-8174") == 0 ||
> >  +  strcmp(drive->id->model, "Optiarc DVD RW AD-5200A") == 0)
> > CDROM_CONFIG_FLAGS(drive)->audio_play = 1;
> > 
> >   #if ! STANDARD_ATAPI
> > >>> Hi Stefan,
> > >>>
> > >>> just to make sure that the audioplay bit is not set in the capabilities 
> > >>> page,
> > >>> can you please try the following patch applied against 2.6.25-rc2 and 
> > >>> send me
> > >>> the output. Thanks!
> > >>>
> > >>> @Bart: by the way, this cdi->mask thingy is kinda unintuitive doing 
> > >>> double
> > >>> negation to check whether a feature is supported or not. Yeah, this 
> > >>> comes from
> > >>> "above," i.e. uniform cdrom layer. But still, shouldn't we use a 
> > >>> cdi->caps_flags
> > >>> or something similar instead, which mirrors the caps page bits setting?
> > >> It seems so (at least having negative flags is very unintuitive) but they
> > >> might be some reason for this ugliness, Jens?
> > >>
> > >> [ Please also remember that since cdrom layer is _uniform_ it may be not
> > >>   possible and/or desirable to have 1-1 mapping between caps page bits
> > >>   and the future cdi->caps_flags. ]
> > >>
> > >>> commit 435f0f4496a1b32af2d542f43b2370a890fe2f83
> > >>> Author: Borislav Petkov <[EMAIL PROTECTED]>
> > >>> Date:   Sat Feb 16 09:56:36 2008 +0100
> > >>>
> > >>> ide-cd: Enable audio play quirk for Optiarc DVD RW AD-5200A drive
> > >>> 
> > >>> Reported-by: Stefan Bader <[EMAIL PROTECTED]>
> > >>> Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
> > >>>
> > >>> diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
> > >>> index f77db6b..2c9d06e 100644
> > >>> --- a/drivers/ide/ide-cd.c
> > >>> +++ b/drivers/ide/ide-cd.c
> > >>> @@ -1750,6 +1750,10 @@ int ide_cdrom_probe_capabilities (ide_drive_t 
> > >>> *drive)
> > >>> cdi->mask &= ~(CDC_DVD_RAM | CDC_RAM);
> > >>> if (buf[8 + 3] & 0x10)
> > >>> cdi->mask &= ~CDC_DVD_R;
> > >>> +   if (!(buf[8 + 4] & 0x01)) {
> > >> Hmm, shouldn't there be '&& (cd->cd_flags & IDE_CD_FLAG_PLAY_AUDIO_OK)'
> > >> to prevent false positives?
> > > 
> > > I wanted to see whether the caps page reports the audioplay bit off...
> > > 
> > >>> +   printk(KERN_INFO "ide-cd: audio play not advertised in 
> > >>> caps page,"
> > >> Would be nice to also printk() the device name.
> > > 
> > > ... but printing the device model is actually a good idea and this will 
> > > rule out
> > > false positives, so Stefan, please drop the previous patch and test the 
> > > updated
> > > one below. Thanks.
> > > 
> > > 
> > 
> > Hi Borislav,
> > 
> > the problem is that I don't own this drive myself and the owner is
> > running a 2.6.22 kernel and is normally not doing any kernel compiles.
> > But I could provide him a modified patch.
> > Though, if you just want to know whether the cap bit was really unset, I
> > think we know this already. When I got the problem report we checked
> > /proc/sys/dev/cdrom/info and that showed the "Can play audio" bit as 0.
> > Which is the reason I gave the owner the patch for adding the model to
> > the excemption list. And from his feedback I take that the drive plays
> > audio tracks with the patch in use.
> 
> Borislav, I guess that this is good enough proof that audioplay bit is off.

indeed.

> Could you please send me the final version of the patch?

commit fa4af2fab0804bead4da6ecbf468118f05111229
Author: Borislav Petkov <[EMAIL PROTECTED]>
Date:   Sat Feb 16 09:56:36 2008 +0100

ide-cd: Enable audio play quirk for Optiarc DVD RW AD-5200A drive

Reported-by: Stefan Bader

Re: keyboard dead with 45b5035

2008-02-18 Thread Pierre Ossman
On Mon, 18 Feb 2008 21:50:12 +0100
Pierre Ossman <[EMAIL PROTECTED]> wrote:

> On Mon, 18 Feb 2008 20:50:01 +0100
> "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> 
> > On Monday, 18 of February 2008, Pierre Ossman wrote:
> > > The patch "[RTNETLINK]: Send a single notification on device state 
> > > changes." kills (at least)
> > > the keyboard here. Everything seems to work fine in single user mode, but 
> > > when init starts
> > > spawning of logins, the keyboard goes bye-bye. Even the power button is 
> > > ignored. :/ 
> > 
> > Please try with the patch from http://lkml.org/lkml/2008/2/18/331 .
> > 
> 
> That solved it.
> 

Perhaps not quite. When I returned to my laptop this morning, the keyboard was 
gone again. Did a hard reboot, and the machine locked up a few seconds after 
starting X. I'll see if it can be reproduced...

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Question about synchronous write on SSD

2008-02-18 Thread Kyungmin Park
Hi,

Don't you remember the topic "solid state drive access and context
switching" [1].
I want to measure it is really better performance on SSD?

To write it on ssd synchronously, I hacked the
'generic_make_request()' [2] and got following results.

# echo 3 > /proc/sys/vm/drop_caches
# tiotest -f 100 -R -d /dev/sdd1
Tiotest results for 4 concurrent io threads:
| Write 400 MBs |4.7 s |  84.306 MB/s |   8.4 %  |  77.5 % |
| Read  400 MBs |4.3 s |  92.945 MB/s |   7.2 %  |  53.5 % |
Tiotest latency results:
| Write|0.126 ms |  706.379 ms |  0.0 |   0.0 |
| Read |0.161 ms |  311.738 ms |  0.0 |   0.0 |

# echo 3 > /proc/sys/vm/drop_caches
# tiotest -f 1000 -R -d /dev/sdd1
Tiotest results for 4 concurrent io threads:
| Write4000 MBs |   47.5 s |  84.124 MB/s |   7.0 %  |  83.6 % |
| Read 4000 MBs |   41.9 s |  95.530 MB/s |   7.8 %  |  55.6 % |
Tiotest latency results:
| Write|0.176 ms |  714.677 ms |  0.0 |   0.0 |
| Read |0.161 ms |  311.815 ms |  0.0 |   0.0 |

However it's same performance as before. It means the patch is meaningless.

Could you tell me is it the proper place to hack or others?

Thank you,
Kyungmin Park

p.s. Of cource I got the following message
WARNING: at block/blk-core.c:1351 generic_make_request+0x126/0x3d8()

1. http://lkml.org/lkml/2007/12/3/247
2. simple hack
diff --git a/block/blk-core.c b/block/blk-core.c
index e9754dc..7262720 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1345,6 +1345,14 @@ static inline void
__generic_make_request(struct bio *bio)
if (bio_check_eod(bio, nr_sectors))
goto end_io;

+   /* FIXME simple hack by kmpark */
+   if (MINOR(bio->bi_bdev->bd_dev) == 49 &&
+   MAJOR(bio->bi_bdev->bd_dev) == 8 && bio_data_dir(bio) == WRITE) {
+   WARN_ON_ONCE(1);
+   /* Write synchronous */
+   bio->bi_rw |= (1 << BIO_RW_SYNC);
+   }
+
/*
 * Resolve the mapping until finished. (drivers are
 * still free to implement/resolve their own stacking
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH] the proposal of improve page reclaim by throttle

2008-02-18 Thread KOSAKI Motohiro

background

current VM implementation doesn't has limit of # of parallel reclaim.
when heavy workload, it bring to 2 bad things
  - heavy lock contention
  - unnecessary swap out

abount 2 month ago, KAMEZA Hiroyuki proposed the patch of page 
reclaim throttle and explain it improve reclaim time.
http://marc.info/?l=linux-mm&m=119667465917215&w=2

but unfortunately it works only memcgroup reclaim.
Today, I implement it again for support global reclaim and mesure it.


test machine, method and result
==

CPU:  IA64 x8
MEM:  8GB
SWAP: 2GB


got hackbench from
http://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c

$ /usr/bin/time hackbench 120 process 1000

this parameter mean consume all physical memory and 
1GB swap space on my test environment.



before:
hackbench result:   282.30
/usr/bin/time result
user:   14.16
sys:1248.47
elapse: 432.93
major fault:29026
max parallel reclaim tasks: 1298
max consumption time of
 try_to_free_pages():   70394 

after:
hackbench result:   30.36
/usr/bin/time result
user:   14.26
sys:294.44
elapse: 118.01
major fault:3064
max parallel reclaim tasks: 4
max consumption time of
 try_to_free_pages():   12234 


conclusion
=
this patch improve 3 things.
1. reduce unnecessary swap
   (see above major fault. about 90% reduced)
2. improve throughput performance
   (see above hackbench result. about 90% reduced)
3. improve interactive performance.
   (see above max consumption of try_to_free_pages.
about 80% reduced)
4. reduce lock contention.
   (see above sys time. about 80% reduced)


Now, we got about 1000% performance improvement of hackbench :)



foture works
==
 - more discussion with memory controller guys.



Signed-off-by: KOSAKI Motohiro <[EMAIL PROTECTED]>
CC: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>
CC: Balbir Singh <[EMAIL PROTECTED]>
CC: Rik van Riel <[EMAIL PROTECTED]>
CC: Lee Schermerhorn <[EMAIL PROTECTED]>

---
 include/linux/nodemask.h |1 
 mm/vmscan.c  |   49 +--
 2 files changed, 48 insertions(+), 2 deletions(-)

Index: b/include/linux/nodemask.h
===
--- a/include/linux/nodemask.h  2008-02-19 13:58:05.0 +0900
+++ b/include/linux/nodemask.h  2008-02-19 13:58:23.0 +0900
@@ -431,6 +431,7 @@ static inline int num_node_state(enum no
 
 #define num_online_nodes() num_node_state(N_ONLINE)
 #define num_possible_nodes()   num_node_state(N_POSSIBLE)
+#define num_highmem_nodes()num_node_state(N_HIGH_MEMORY)
 #define node_online(node)  node_state((node), N_ONLINE)
 #define node_possible(node)node_state((node), N_POSSIBLE)
 
Index: b/mm/vmscan.c
===
--- a/mm/vmscan.c   2008-02-19 13:58:05.0 +0900
+++ b/mm/vmscan.c   2008-02-19 14:04:06.0 +0900
@@ -127,6 +127,11 @@ long vm_total_pages;   /* The total number
 static LIST_HEAD(shrinker_list);
 static DECLARE_RWSEM(shrinker_rwsem);
 
+static atomic_t nr_reclaimers = ATOMIC_INIT(0);
+static DECLARE_WAIT_QUEUE_HEAD(reclaim_throttle_waitq);
+#define RECLAIM_LIMIT (2 * num_highmem_nodes())
+
+
 #ifdef CONFIG_CGROUP_MEM_CONT
 #define scan_global_lru(sc)(!(sc)->mem_cgroup)
 #else
@@ -1421,6 +1426,46 @@ out:
return ret;
 }
 
+static unsigned long try_to_free_pages_throttled(struct zone **zones,
+int order,
+gfp_t gfp_mask,
+struct scan_control *sc)
+{
+   unsigned long nr_reclaimed = 0;
+   unsigned long start_time;
+   int i;
+
+   start_time = jiffies;
+
+   wait_event(reclaim_throttle_waitq,
+  atomic_add_unless(&nr_reclaimers, 1, RECLAIM_LIMIT));
+
+   /* more reclaim until needed? */
+   if (unlikely(time_after(jiffies, start_time + HZ))) {
+   for (i = 0; zones[i] != NULL; i++) {
+   struct zone *zone = zones[i];
+   int classzone_idx = zone_idx(zones[0]);
+
+   if (!populated_zone(zone))
+   continue;
+
+   if (zone_watermark_ok(zone, order, 4*zone->pages_high,
+ classzone_idx, 

Re: [bootup crash, -git] Re: patch pci-pcie-aspm-support.patchadded to gregkh-2.6 tree

2008-02-18 Thread Shaohua Li

On Sun, 2008-02-03 at 02:51 +0800, Greg KH wrote:
> On Sat, Feb 02, 2008 at 11:55:06AM +0100, Ingo Molnar wrote:
> >
> > * [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> >
> > > This is a note to let you know that I've just added the patch
> titled
> > >
> > >  Subject: PCI: PCIE ASPM support
> > >
> > > to my gregkh-2.6 tree.  Its filename is
> > >
> > >  pci-pcie-aspm-support.patch
> >
> > uhm. One week ago this patch was added to your PCI tree. It never
> > touched -mm AFAICS and today it was merged upstream (commit
> > 6c723d5bd89f03fc3ef627d50f89ade054d2ee3b):
> >
> > Which is not necessarily a problem in itself, as long as you test it
> > through and are reasonably sure that it wont break systems en masse.
> But
> > this patch very evidently was not tested in any sufficient manner on
> its
> > primary platform (x86) because my randconfig testsystems (bog
> standard
> > x86 hw) started crashing during bootup almost immediately:
> 
> Ugh, this is causing just too many problems, I'm just going to revert
> it.
> 
> Shaohua, care to look into this crash, fix up the config issues, and
> resubmit it when it's working a bit better?
Sorry for the long delay, I just back from vocation. I fixed all the
issues I found in the list (except one hang issue, which should be fixed
in specific driver, I'll reply that thread soon), can you re-add the
patch to your test tree.

PCI Express ASPM defines a protocol for PCI Express components in the D0
state to reduce Link power by placing their Links into a low power state
and instructing the other end of the Link to do likewise. This
capability allows hardware-autonomous, dynamic Link power reduction
beyond what is achievable by software-only controlled power management.
However, The device should be configured by software appropriately.
Enabling ASPM will save power, but will introduce device latency.

This patch adds ASPM support in Linux. It introduces a global policy for
ASPM, a sysfs file /sys/module/pcie_aspm/parameters/policy can control
it. The interface can be used as a boot option too. Currently we have
below setting:
-default, BIOS default setting
-powersave, highest power saving mode, enable all available ASPM
state and clock power management
-performance, highest performance, disable ASPM and clock power
management
By default, the 'default' policy is used currently.

In my test, power difference between powersave mode and performance mode
is about 1.3w in a system with 3 PCIE links.

Note: some devices might not work well with aspm, either because chipset
issue or device issue. The patch provide API (pci_disable_link_state),
driver can disable ASPM for specific device.

Signed-off-by: Shaohua Li <[EMAIL PROTECTED]>
---
 drivers/pci/pci-sysfs.c   |5 
 drivers/pci/pci.c |4 
 drivers/pci/pcie/Kconfig  |   20 +
 drivers/pci/pcie/Makefile |3 
 drivers/pci/pcie/aspm.c   |  801 ++
 drivers/pci/probe.c   |5 
 drivers/pci/remove.c  |4 
 include/linux/aspm.h  |   56 +++
 include/linux/pci-acpi.h  |1 
 include/linux/pci.h   |5 
 include/linux/pci_regs.h  |8 
 11 files changed, 912 insertions(+)

Index: linux/drivers/pci/pcie/Makefile
===
--- linux.orig/drivers/pci/pcie/Makefile2008-01-24 10:23:09.0 
+0800
+++ linux/drivers/pci/pcie/Makefile 2008-02-19 10:54:12.0 +0800
@@ -2,6 +2,9 @@
 # Makefile for PCI-Express PORT Driver
 #
 
+# Build PCI Express ASPM if needed
+obj-$(CONFIG_PCIEASPM) += aspm.o
+
 pcieportdrv-y  := portdrv_core.o portdrv_pci.o portdrv_bus.o
 
 obj-$(CONFIG_PCIEPORTBUS)  += pcieportdrv.o
Index: linux/drivers/pci/pcie/aspm.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux/drivers/pci/pcie/aspm.c   2008-02-19 13:35:55.0 +0800
@@ -0,0 +1,801 @@
+/*
+ * File:   drivers/pci/pcie/aspm.c
+ * Enabling PCIE link L0s/L1 state and Clock Power Management
+ *
+ * Copyright (C) 2007 Intel
+ * Copyright (C) Zhang Yanmin ([EMAIL PROTECTED])
+ * Copyright (C) Shaohua Li ([EMAIL PROTECTED])
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "../pci.h"
+
+#ifdef MODULE_PARAM_PREFIX
+#undef MODULE_PARAM_PREFIX
+#endif
+#define MODULE_PARAM_PREFIX "pcie_aspm."
+
+struct endpoint_state {
+   unsigned int l0s_acceptable_latency;
+   unsigned int l1_acceptable_latency;
+};
+
+struct pcie_link_state {
+   struct list_head sibiling;
+   struct pci_dev *pdev;
+
+   /* ASPM state */
+   unsigned int support_state;
+   unsigned int enabled_state;
+   unsigned int bios_aspm_state;
+   /* upstream component */
+   unsigned int l0s_upper_latency;
+   unsigned int l1_upper_latency;
+   /* downstream component

Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

2008-02-18 Thread David Chinner
On Tue, Feb 19, 2008 at 02:56:43AM +, Alasdair G Kergon wrote:
> On Tue, Feb 19, 2008 at 09:16:44AM +1100, David Chinner wrote:
> > Surely any hardware that doesn't support barrier
> > operations can emulate them with cache flushes when they receive a
> > barrier I/O from the filesystem
>  
> My complaint about having to support them within dm when more than one
> device is involved is because any efficiencies disappear: you can't send
> further I/O to any one device until all the other devices have completed
> their barrier (or else later I/O to that device could overtake the
> barrier on another device).

Right - it's a horrible performance hit.

But - how is what you describe any different to the filesystem doing:

- flush block device
- issue I/O
- wait for completion
- flush block device

around any I/O that it would otherwise simply tag as a barrier?
That serialisation at the filesystem layer is a horrible, horrible
performance hi.

And then there's the fact that we can't implement that in XFS
because all the barrier I/Os we issue are asynchronous.  We'd
basically have to serialise all metadata operations and now we
are talking about far worse performance hits than implementing
barrier emulation in the block device.

Also, it's instructive to look at the implementation of
blkdev_issue_flush() - the API one is supposed to use to trigger a
full block device flush. It doesn't work on DM/MD either, because
it uses a no-I/O barrier bio:

bio->bi_end_io = bio_end_empty_barrier;
bio->bi_private = &wait;
bio->bi_bdev = bdev;
submit_bio(1 << BIO_RW_BARRIER, bio);

wait_for_completion(&wait);

So, if the underlying block device doesn't support barriers,
there's no point in changing the filesystem to issue flushes,
either...

> And then I argue that it would be better
> for the filesystem to have the information that these are not hardware
> barriers so it has the opportunity of tuning its behaviour (e.g.
> flushing less often because it's a more expensive operation).

There is generally no option from the filesystem POV to "flush
less". Either we use barrier I/Os where we need to and are safe with
volatile caches or we corrupt filesystems with volatile caches when
power loss occurs. There is no in-between where "flushing less"
will save us from corruption

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] Fix b43 driver build for arm

2008-02-18 Thread Sam Ravnborg
> > 
> > It is a consistencycheck between host and target
> > layout of data.
> > You need to pad the structure so it becomes 8 byte in size.
> 
> Ok, I looked at the code and it is hightly questionable to me that this
> check does work in a crosscompile environment (which the ARM build
> most likely is).
> 
> It seems to check the size of the structure in the .o file against
> the size of the structure on the _host_ where it is compiled.
> I can't see why we would want to check _anything_ of the target stuff
> to the host this stuff is compiled on.
> I can compile an ARM kernel on any machine I want.
> 
> There actually is a comment:
>  * Check that sizeof(device_id type) are consistent with size of section
>  * in .o file. If in-consistent then userspace and kernel does not agree
>  * on actual size which is a bug.
> 
> So it seems what this check _wants_ to compare the sizeof the structure
> in the kernel to the size of the stucture in the userland of the target 
> system.
> But it does _not_ do that.
> It does compare the size of the structure in the kernel against the size of
> the stucture in userland on the machine it is _compiled_ on.
> That is wrong.
In at least 99% of the cases this is OK and the check has found
several bugs where things would not have worked due to different
alignmnet between kernel and userland. Just think about the
issues in a mixed 32/64 bit world.

In this particular case you _could_ be right that it would work
on ARM userland. But the only way to be sure is to pad the
structure so it become 8 byte in size.
Or as was recently done in the m68k case to tell the
compiler to specifically align it to 8 byte boundary.

Sam
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [RFC] Smack update for file capabilities

2008-02-18 Thread Casey Schaufler

From: Casey Schaufler <[EMAIL PROTECTED]>

This patch assumes "Smack unlabeled outgoing ambient packets - v4"
which is one reason it's RFC.

Update the Smack LSM to allow the registration of the capability
"module" as a secondary LSM. Integrate the new hooks required for
file based capabilities.

Signed-off-by: Casey Schaufler <[EMAIL PROTECTED]>

---

security/smack/smack_lsm.c |   63 +--
1 file changed, 60 insertions(+), 3 deletions(-)

diff -uprN -X linux-2.6.25-g0216-precap//Documentation/dontdiff 
linux-2.6.25-g0216-precap/security/smack/smack_lsm.c 
linux-2.6.25-g0216/security/smack/smack_lsm.c
--- linux-2.6.25-g0216-precap/security/smack/smack_lsm.c2008-02-18 
10:53:45.0 -0800
+++ linux-2.6.25-g0216/security/smack/smack_lsm.c   2008-02-18 
14:15:25.0 -0800
@@ -584,6 +584,12 @@ static int smack_inode_getattr(struct vf
static int smack_inode_setxattr(struct dentry *dentry, char *name,
void *value, size_t size, int flags)
{
+   int rc;
+
+   rc = cap_inode_setxattr(dentry, name, value, size, flags);
+   if (rc != 0)
+   return rc;
+
if (!capable(CAP_MAC_ADMIN)) {
if (strcmp(name, XATTR_NAME_SMACK) == 0 ||
strcmp(name, XATTR_NAME_SMACKIPIN) == 0 ||
@@ -658,6 +664,12 @@ static int smack_inode_getxattr(struct d
 */
static int smack_inode_removexattr(struct dentry *dentry, char *name)
{
+   int rc;
+
+   rc = cap_inode_removexattr(dentry, name);
+   if (rc != 0)
+   return rc;
+
if (strcmp(name, XATTR_NAME_SMACK) == 0 && !capable(CAP_MAC_ADMIN))
return -EPERM;

@@ -1016,7 +1028,12 @@ static void smack_task_getsecid(struct t
 */
static int smack_task_setnice(struct task_struct *p, int nice)
{
-   return smk_curacc(p->security, MAY_WRITE);
+   int rc;
+
+   rc = cap_task_setnice(p, nice);
+   if (rc == 0)
+   rc = smk_curacc(p->security, MAY_WRITE);
+   return rc;
}

/**
@@ -1028,7 +1045,12 @@ static int smack_task_setnice(struct tas
 */
static int smack_task_setioprio(struct task_struct *p, int ioprio)
{
-   return smk_curacc(p->security, MAY_WRITE);
+   int rc;
+
+   rc = cap_task_setioprio(p, ioprio);
+   if (rc == 0)
+   rc = smk_curacc(p->security, MAY_WRITE);
+   return rc;
}

/**
@@ -1053,7 +1075,12 @@ static int smack_task_getioprio(struct t
static int smack_task_setscheduler(struct task_struct *p, int policy,
   struct sched_param *lp)
{
-   return smk_curacc(p->security, MAY_WRITE);
+   int rc;
+
+   rc = cap_task_setscheduler(p, policy, lp);
+   if (rc == 0)
+   rc = smk_curacc(p->security, MAY_WRITE);
+   return rc;
}

/**
@@ -1093,6 +1120,11 @@ static int smack_task_movememory(struct 
static int smack_task_kill(struct task_struct *p, struct siginfo *info,

   int sig, u32 secid)
{
+   int rc;
+
+   rc = cap_task_kill(p, info, sig, secid);
+   if (rc != 0)
+   return rc;
/*
 * Special cases where signals really ought to go through
 * in spite of policy. Stephen Smalley suggests it may
@@ -1778,6 +1810,27 @@ static int smack_ipc_permission(struct k
return smk_curacc(isp, may);
}

+/* module stacking operations */
+
+/**
+ * smack_register_security - stack capability module
+ * @name: module name
+ * @ops: module operations - ignored
+ *
+ * Allow the capability module to register.
+ */
+static int smack_register_security(const char *name,
+  struct security_operations *ops)
+{
+   if (strcmp(name, "capability") != 0)
+   return -EINVAL;
+
+   printk(KERN_INFO "%s:  Registering secondary module %s\n",
+  __func__, name);
+
+   return 0;
+}
+
/**
 * smack_d_instantiate - Make sure the blob is correct on an inode
 * @opt_dentry: unused
@@ -2412,6 +2465,8 @@ static struct security_operations smack_
.inode_post_setxattr =  smack_inode_post_setxattr,
.inode_getxattr =   smack_inode_getxattr,
.inode_removexattr =smack_inode_removexattr,
+   .inode_need_killpriv =  cap_inode_need_killpriv,
+   .inode_killpriv =   cap_inode_killpriv,
.inode_getsecurity =smack_inode_getsecurity,
.inode_setsecurity =smack_inode_setsecurity,
.inode_listsecurity =   smack_inode_listsecurity,
@@ -2471,6 +2526,8 @@ static struct security_operations smack_
.netlink_send = cap_netlink_send,
.netlink_recv = cap_netlink_recv,

+   .register_security =smack_register_security,
+
.d_instantiate =smack_d_instantiate,

.getprocattr =  smack_getprocattr,

--
To unsubscribe from this list: send the line "unsubscribe 

Re: [RFC] [PATCH] Fix b43 driver build for arm

2008-02-18 Thread Gordon Farquharson
On Feb 18, 2008 5:01 PM, Michael Buesch <[EMAIL PROTECTED]> wrote:
>
> On Tuesday 19 February 2008 00:42:12 Sam Ravnborg wrote:
> > On Tue, Feb 19, 2008 at 12:17:04AM +0100, Michael Buesch wrote:
> > > On Tuesday 19 February 2008 00:00:58 Russell King wrote:
> > > > > > Why can't we have an array of this structure on ARM?
> > > > > >
> > > > > > struct ssb_device_id {
> > > > > >__u16   vendor;
> > > > >
> > > > > 2 bytes
> > > > >
> > > > > >__u16   coreid;
> > > > >
> > > > > 2 bytes
> > > > >
> > > > > >__u8revision;
> > > > >
> > > > > 1 byte
> > > > >
> > > > > > };
> > > > >
> > > > > and therefore sizeof this structure will be 5 bytes, but because of 
> > > > > the
> > > > > ABI rules (which are _explicitly_ allowed by the C standard), it'll
> > > > > become 8 bytes due to padding afterwards.
> > > >
> > > > Another guess might be that, if using AEABI, this structure might
> > > > be 6 bytes in size, but the linker will align structures to 4 bytes.
> > >
> > > If the struct is padded to 6 bytes and the linker aligns it to 4 byte
> > > everything will be naturally aligned, as far as I can see.
> > >
> > > > FATAL: drivers/net/wireless/b43/b43: sizeof(struct ssb_device_id)=6 is
> > > > not a modulo of the size of section __mod_ssb_device_table=64.
> > > > Fix definition of struct ssb_device_id in mod_devicetable.h
> > >
> > > So this message tells me the table size is 64 bytes. There are 8 entries,
> > > so it seems the structure is padded to 8 bytes.
> > > But above that it says that sizeof(struct ssb_device_id)=6
> > >
> > > IMO this sanity check is broken and not the code.
> > >
> > > Where does this sanity check message come from? The linker?
> > $ git grep 'not a modulo'
> > scripts/mod/file2alias.c:   fatal("%s: sizeof(struct 
> > %s_device_id)=%lu is not a modulo "
> >
> > It is a consistencycheck between host and target
> > layout of data.
> > You need to pad the structure so it becomes 8 byte in size.
>
> Ok, I looked at the code and it is hightly questionable to me that this
> check does work in a crosscompile environment (which the ARM build
> most likely is).
>
> It seems to check the size of the structure in the .o file against
> the size of the structure on the _host_ where it is compiled.
> I can't see why we would want to check _anything_ of the target stuff
> to the host this stuff is compiled on.
> I can compile an ARM kernel on any machine I want.
>
> There actually is a comment:
>  * Check that sizeof(device_id type) are consistent with size of section
>  * in .o file. If in-consistent then userspace and kernel does not agree
>  * on actual size which is a bug.
>
> So it seems what this check _wants_ to compare the sizeof the structure
> in the kernel to the size of the stucture in the userland of the target 
> system.
> But it does _not_ do that.
> It does compare the size of the structure in the kernel against the size of
> the stucture in userland on the machine it is _compiled_ on.
> That is wrong.

Does this thread [1] provide any clues as to the Right Thing (TM) to do?

It should be noted that Linus and Andrew signed off on the m68k fix
[2]. I'm CC'ing them and Al Viro on this email to solicit their input.

Gordon

[1] http://www.gossamer-threads.com/lists/linux/kernel/801528
[2] 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7492d4a416d68ab4bd254b36ffcc4e0138daa8ff

-- 
Gordon Farquharson
GnuPG Key ID: 32D6D676
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: My system stops during startup with curretn git tree of 2.6.25-rc2

2008-02-18 Thread David Miller
From: Laszlo Attila Toth <[EMAIL PROTECTED]>
Date: Mon, 18 Feb 2008 18:03:47 +0100

> Hello,
> 
> Rafael J. Wysocki wrote:
> > On Monday, 18 of February 2008, Laszlo Attila Toth wrote:
> >
> > 
> > All in all, I gather you wanted me to test the patch below. :-)
> > 
> > Yes, that helps.
> 
> Thanks for the testing. Dave already reverted the patch, also I don't 
> know I should resend it or leave it as is.

After this sits and gets testing and feedback for a few days
you'll need to resubmit the entire original patch with these
fixes added on top.

For now, it's staying reverted.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] Fix Unlikely(x) == y

2008-02-18 Thread Nick Piggin
On Tuesday 19 February 2008 13:40, Arjan van de Ven wrote:
> On Tue, 19 Feb 2008 13:33:53 +1100
>
> Nick Piggin <[EMAIL PROTECTED]> wrote:
> > Actually one thing I don't like about gcc is that I think it still
> > emits cmovs for likely/unlikely branches, which is silly (the gcc
> > developers seem to be in love with that instruction). If that goes
> > away, then branch hints may be even better.
>
> only for -Os and only if the result is smaller afaik.

What is your evidence for saying this? Because here, with the latest
kernel and recent gcc-4.3 snapshot, it spits out cmov like crazy even
when compiled with -O2.

[EMAIL PROTECTED]:~/usr/src/linux-2.6$ grep cmov kernel/sched.s | wc -l
45

And yes it even does for hinted branches and even at -O2/3

[EMAIL PROTECTED]:~/tests$ cat cmov.c
int test(int a, int b)
{
if (__builtin_expect(a < b, 0))
return a;
else
return b;
}
[EMAIL PROTECTED]:~/tests$ gcc-4.3 -S -O2 cmov.c
[EMAIL PROTECTED]:~/tests$ head -13 cmov.s
.file   "cmov.c"
.text
.p2align 4,,15
..globl test
.type   test, @function
test:
..LFB2:
cmpl%edi, %esi
cmovle  %esi, %edi
movl%edi, %eax
ret
..LFE2:
.size   test, .-test

This definitely should be a branch, IMO.

> (cmov tends to be a performance loss most of the time so for -O2 and such
> it isn't used as far as I know.. it does make for nice small code however
> ;-)

It shouldn't be hard to work out the cutover point based on how
expensive cmov is, how expensive branch and branch mispredicts are,
and how often the branch is likely to be mispredicted. For an
unpredictable branch, cmov is normally quite a good win even on
modern CPUs. But gcc overuses it I think.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] bluetooth : do not move child device other than rfcomm

2008-02-18 Thread David Miller
From: Dave Young <[EMAIL PROTECTED]>
Date: Mon, 18 Feb 2008 15:58:05 +0800

> hci conn child devices other than rfcomm tty should not be moved here.
> This is my lost, thanks for Barnaby's reporting and testing.
> 
> Signed-off-by: Dave Young <[EMAIL PROTECTED]> 

Applied, thanks Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] bluetooth : put hci dev after del conn

2008-02-18 Thread David Miller
From: Dave Young <[EMAIL PROTECTED]>
Date: Mon, 18 Feb 2008 15:55:55 +0800

> Move hci_dev_put to del_conn to avoid hci dev going away before hci conn.

This looks correct so I have applied it.

> Signed-off-by: Dave Young <[EMAIL PROTECTED]> 

Please remove the extraneous space at the end of your
signoff line next time :-)

Also, I reworked the loop in del_conn() so that it no longer
generates a compile warning, so I had to apply your patch
by hand.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 11/11] split out libfs/aops.c from libfs.c

2008-02-18 Thread Arnd Bergmann
Consolidate all address space manipulation code in libfs in a single
source file.

Signed-off-by: Arnd Bergman <[EMAIL PROTECTED]>


Index: linux-2.6/fs/libfs.c
===
--- linux-2.6.orig/fs/libfs.c
+++ /dev/null
@@ -1,116 +0,0 @@
-/*
- * fs/libfs.c
- * Library for filesystems writers.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-
-int simple_readpage(struct file *file, struct page *page)
-{
-   clear_highpage(page);
-   flush_dcache_page(page);
-   SetPageUptodate(page);
-   unlock_page(page);
-   return 0;
-}
-
-int simple_prepare_write(struct file *file, struct page *page,
-   unsigned from, unsigned to)
-{
-   if (!PageUptodate(page)) {
-   if (to - from != PAGE_CACHE_SIZE)
-   zero_user_segments(page,
-   0, from,
-   to, PAGE_CACHE_SIZE);
-   }
-   return 0;
-}
-
-int simple_write_begin(struct file *file, struct address_space *mapping,
-   loff_t pos, unsigned len, unsigned flags,
-   struct page **pagep, void **fsdata)
-{
-   struct page *page;
-   pgoff_t index;
-   unsigned from;
-
-   index = pos >> PAGE_CACHE_SHIFT;
-   from = pos & (PAGE_CACHE_SIZE - 1);
-
-   page = __grab_cache_page(mapping, index);
-   if (!page)
-   return -ENOMEM;
-
-   *pagep = page;
-
-   return simple_prepare_write(file, page, from, from+len);
-}
-
-static int simple_commit_write(struct file *file, struct page *page,
-  unsigned from, unsigned to)
-{
-   struct inode *inode = page->mapping->host;
-   loff_t pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
-
-   if (!PageUptodate(page))
-   SetPageUptodate(page);
-   /*
-* No need to use i_size_read() here, the i_size
-* cannot change under us because we hold the i_mutex.
-*/
-   if (pos > inode->i_size)
-   i_size_write(inode, pos);
-   set_page_dirty(page);
-   return 0;
-}
-
-int simple_write_end(struct file *file, struct address_space *mapping,
-   loff_t pos, unsigned len, unsigned copied,
-   struct page *page, void *fsdata)
-{
-   unsigned from = pos & (PAGE_CACHE_SIZE - 1);
-
-   /* zero the stale part of the page if we did a short copy */
-   if (copied < len) {
-   void *kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + from + copied, 0, len - copied);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
-   }
-
-   simple_commit_write(file, page, from, from+copied);
-
-   unlock_page(page);
-   page_cache_release(page);
-
-   return copied;
-}
-
-ssize_t simple_read_from_buffer(void __user *to, size_t count, loff_t *ppos,
-   const void *from, size_t available)
-{
-   loff_t pos = *ppos;
-   if (pos < 0)
-   return -EINVAL;
-   if (pos >= available)
-   return 0;
-   if (count > available - pos)
-   count = available - pos;
-   if (copy_to_user(to, from + pos, count))
-   return -EFAULT;
-   *ppos = pos + count;
-   return count;
-}
-
-EXPORT_SYMBOL(simple_write_begin);
-EXPORT_SYMBOL(simple_write_end);
-EXPORT_SYMBOL(simple_prepare_write);
-EXPORT_SYMBOL(simple_readpage);
-EXPORT_SYMBOL(simple_read_from_buffer);
Index: linux-2.6/fs/libfs/Makefile
===
--- linux-2.6.orig/fs/libfs/Makefile
+++ linux-2.6/fs/libfs/Makefile
@@ -1,3 +1,3 @@
 libfs-y += file.o
 
-obj-$(CONFIG_LIBFS) += libfs.o inode.o super.o dentry.o
+obj-$(CONFIG_LIBFS) += libfs.o inode.o super.o dentry.o aops.o
Index: linux-2.6/fs/libfs/aops.c
===
--- /dev/null
+++ linux-2.6/fs/libfs/aops.c
@@ -0,0 +1,113 @@
+/*
+ * fs/libfs/aops.c
+ * Library for filesystems writers -- address space operations
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+int simple_readpage(struct file *file, struct page *page)
+{
+   clear_highpage(page);
+   flush_dcache_page(page);
+   SetPageUptodate(page);
+   unlock_page(page);
+   return 0;
+}
+EXPORT_SYMBOL(simple_readpage);
+
+int simple_prepare_write(struct file *file, struct page *page,
+   unsigned from, unsigned to)
+{
+   if (!PageUptodate(page)) {
+   if (to - from != PAGE_CACHE_SIZE)
+   zero_user_segments(page,
+   0, from,
+   to, PAGE_CACHE_SIZE);
+   }
+   return 0;
+}
+EXPORT_SYMBOL(simple_prepare_write);
+
+int simple_write_begin(struct file *file, struct address_space *mapping,
+  

[RFC 10/11] split out libfs/inode.c from libfs.c

2008-02-18 Thread Arnd Bergmann
Consolidate all inode manipulation code in libfs in a single
source file.

Signed-off-by: Arnd Bergman <[EMAIL PROTECTED]>

Index: linux-2.6/fs/libfs.c
===
--- linux-2.6.orig/fs/libfs.c
+++ linux-2.6/fs/libfs.c
@@ -12,78 +12,6 @@
 
 #include 
 
-int simple_link(struct dentry *old_dentry, struct inode *dir, struct dentry 
*dentry)
-{
-   struct inode *inode = old_dentry->d_inode;
-
-   inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
-   inc_nlink(inode);
-   atomic_inc(&inode->i_count);
-   dget(dentry);
-   d_instantiate(dentry, inode);
-   return 0;
-}
-
-int simple_empty(struct dentry *dentry)
-{
-   struct dentry *child;
-   int ret = 0;
-
-   spin_lock(&dcache_lock);
-   list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child)
-   if (simple_positive(child))
-   goto out;
-   ret = 1;
-out:
-   spin_unlock(&dcache_lock);
-   return ret;
-}
-
-int simple_unlink(struct inode *dir, struct dentry *dentry)
-{
-   struct inode *inode = dentry->d_inode;
-
-   inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
-   drop_nlink(inode);
-   dput(dentry);
-   return 0;
-}
-
-int simple_rmdir(struct inode *dir, struct dentry *dentry)
-{
-   if (!simple_empty(dentry))
-   return -ENOTEMPTY;
-
-   drop_nlink(dentry->d_inode);
-   simple_unlink(dir, dentry);
-   drop_nlink(dir);
-   return 0;
-}
-
-int simple_rename(struct inode *old_dir, struct dentry *old_dentry,
-   struct inode *new_dir, struct dentry *new_dentry)
-{
-   struct inode *inode = old_dentry->d_inode;
-   int they_are_dirs = S_ISDIR(old_dentry->d_inode->i_mode);
-
-   if (!simple_empty(new_dentry))
-   return -ENOTEMPTY;
-
-   if (new_dentry->d_inode) {
-   simple_unlink(new_dir, new_dentry);
-   if (they_are_dirs)
-   drop_nlink(old_dir);
-   } else if (they_are_dirs) {
-   drop_nlink(old_dir);
-   inc_nlink(new_dir);
-   }
-
-   old_dir->i_ctime = old_dir->i_mtime = new_dir->i_ctime =
-   new_dir->i_mtime = inode->i_ctime = CURRENT_TIME;
-
-   return 0;
-}
-
 int simple_readpage(struct file *file, struct page *page)
 {
clear_highpage(page);
@@ -183,11 +111,6 @@ ssize_t simple_read_from_buffer(void __u
 
 EXPORT_SYMBOL(simple_write_begin);
 EXPORT_SYMBOL(simple_write_end);
-EXPORT_SYMBOL(simple_empty);
-EXPORT_SYMBOL(simple_link);
 EXPORT_SYMBOL(simple_prepare_write);
 EXPORT_SYMBOL(simple_readpage);
-EXPORT_SYMBOL(simple_rename);
-EXPORT_SYMBOL(simple_rmdir);
-EXPORT_SYMBOL(simple_unlink);
 EXPORT_SYMBOL(simple_read_from_buffer);
Index: linux-2.6/fs/libfs/inode.c
===
--- linux-2.6.orig/fs/libfs/inode.c
+++ linux-2.6/fs/libfs/inode.c
@@ -417,4 +417,79 @@ exit:
 }
 EXPORT_SYMBOL_GPL(simple_rename_named);
 
+int simple_link(struct dentry *old_dentry, struct inode *dir, struct dentry 
*dentry)
+{
+   struct inode *inode = old_dentry->d_inode;
 
+   inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+   inc_nlink(inode);
+   atomic_inc(&inode->i_count);
+   dget(dentry);
+   d_instantiate(dentry, inode);
+   return 0;
+}
+EXPORT_SYMBOL(simple_link);
+
+int simple_empty(struct dentry *dentry)
+{
+   struct dentry *child;
+   int ret = 0;
+
+   spin_lock(&dcache_lock);
+   list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child)
+   if (simple_positive(child))
+   goto out;
+   ret = 1;
+out:
+   spin_unlock(&dcache_lock);
+   return ret;
+}
+EXPORT_SYMBOL(simple_empty);
+
+int simple_unlink(struct inode *dir, struct dentry *dentry)
+{
+   struct inode *inode = dentry->d_inode;
+
+   inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+   drop_nlink(inode);
+   dput(dentry);
+   return 0;
+}
+EXPORT_SYMBOL(simple_unlink);
+
+int simple_rmdir(struct inode *dir, struct dentry *dentry)
+{
+   if (!simple_empty(dentry))
+   return -ENOTEMPTY;
+
+   drop_nlink(dentry->d_inode);
+   simple_unlink(dir, dentry);
+   drop_nlink(dir);
+   return 0;
+}
+EXPORT_SYMBOL(simple_rmdir);
+
+int simple_rename(struct inode *old_dir, struct dentry *old_dentry,
+   struct inode *new_dir, struct dentry *new_dentry)
+{
+   struct inode *inode = old_dentry->d_inode;
+   int they_are_dirs = S_ISDIR(old_dentry->d_inode->i_mode);
+
+   if (!simple_empty(new_dentry))
+   return -ENOTEMPTY;
+
+   if (new_dentry->d_inode) {
+   simple_unlink(new_dir, new_dentry);
+   if (they_are_dirs)
+   drop_nlink(old_dir);
+   } else if (they_are_dirs) {
+   drop_nlink(old_dir);
+   inc_nlink(new_dir);

[RFC 03/11] slim down debugfs

2008-02-18 Thread Arnd Bergmann
With most of debugfs now copied to generic code in libfs,
we can remove the original copy and replace it with thin
wrappers around libfs.

Signed-off-by: Arnd Bergmann <[EMAIL PROTECTED]>
Index: linux-2.6/fs/Kconfig
===
--- linux-2.6.orig/fs/Kconfig
+++ linux-2.6/fs/Kconfig
@@ -1001,6 +1001,14 @@ config CONFIGFS_FS
  Both sysfs and configfs can and should exist together on the
  same system. One is not a replacement for the other.
 
+config LIBFS
+   tristate
+   default m
+   help
+ libfs is a helper library used by many of the simpler file
+ systems. Parts of libfs can be modular when all of its users
+ are modules as well, and the users should select this symbol.
+
 endmenu
 
 menu "Miscellaneous filesystems"
Index: linux-2.6/fs/debugfs/file.c
===
--- linux-2.6.orig/fs/debugfs/file.c
+++ linux-2.6/fs/debugfs/file.c
@@ -19,55 +19,6 @@
 #include 
 #include 
 
-static ssize_t default_read_file(struct file *file, char __user *buf,
-size_t count, loff_t *ppos)
-{
-   return 0;
-}
-
-static ssize_t default_write_file(struct file *file, const char __user *buf,
-  size_t count, loff_t *ppos)
-{
-   return count;
-}
-
-static int default_open(struct inode *inode, struct file *file)
-{
-   if (inode->i_private)
-   file->private_data = inode->i_private;
-
-   return 0;
-}
-
-const struct file_operations debugfs_file_operations = {
-   .read = default_read_file,
-   .write =default_write_file,
-   .open = default_open,
-};
-
-static void *debugfs_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
-   nd_set_link(nd, dentry->d_inode->i_private);
-   return NULL;
-}
-
-const struct inode_operations debugfs_link_operations = {
-   .readlink   = generic_readlink,
-   .follow_link= debugfs_follow_link,
-};
-
-static int debugfs_u8_set(void *data, u64 val)
-{
-   *(u8 *)data = val;
-   return 0;
-}
-static int debugfs_u8_get(void *data, u64 *val)
-{
-   *val = *(u8 *)data;
-   return 0;
-}
-DEFINE_SIMPLE_ATTRIBUTE(fops_u8, debugfs_u8_get, debugfs_u8_set, "%llu\n");
-
 /**
  * debugfs_create_u8 - create a debugfs file that is used to read and write an 
unsigned 8-bit value
  * @name: a pointer to a string containing the name of the file to create.
@@ -95,22 +46,10 @@ DEFINE_SIMPLE_ATTRIBUTE(fops_u8, debugfs
 struct dentry *debugfs_create_u8(const char *name, mode_t mode,
 struct dentry *parent, u8 *value)
 {
-   return debugfs_create_file(name, mode, parent, value, &fops_u8);
+   return debugfs_create_file(name, mode, parent, value, &simple_fops_u8);
 }
 EXPORT_SYMBOL_GPL(debugfs_create_u8);
 
-static int debugfs_u16_set(void *data, u64 val)
-{
-   *(u16 *)data = val;
-   return 0;
-}
-static int debugfs_u16_get(void *data, u64 *val)
-{
-   *val = *(u16 *)data;
-   return 0;
-}
-DEFINE_SIMPLE_ATTRIBUTE(fops_u16, debugfs_u16_get, debugfs_u16_set, "%llu\n");
-
 /**
  * debugfs_create_u16 - create a debugfs file that is used to read and write 
an unsigned 16-bit value
  * @name: a pointer to a string containing the name of the file to create.
@@ -138,22 +77,10 @@ DEFINE_SIMPLE_ATTRIBUTE(fops_u16, debugf
 struct dentry *debugfs_create_u16(const char *name, mode_t mode,
  struct dentry *parent, u16 *value)
 {
-   return debugfs_create_file(name, mode, parent, value, &fops_u16);
+   return debugfs_create_file(name, mode, parent, value, &simple_fops_u16);
 }
 EXPORT_SYMBOL_GPL(debugfs_create_u16);
 
-static int debugfs_u32_set(void *data, u64 val)
-{
-   *(u32 *)data = val;
-   return 0;
-}
-static int debugfs_u32_get(void *data, u64 *val)
-{
-   *val = *(u32 *)data;
-   return 0;
-}
-DEFINE_SIMPLE_ATTRIBUTE(fops_u32, debugfs_u32_get, debugfs_u32_set, "%llu\n");
-
 /**
  * debugfs_create_u32 - create a debugfs file that is used to read and write 
an unsigned 32-bit value
  * @name: a pointer to a string containing the name of the file to create.
@@ -181,23 +108,10 @@ DEFINE_SIMPLE_ATTRIBUTE(fops_u32, debugf
 struct dentry *debugfs_create_u32(const char *name, mode_t mode,
 struct dentry *parent, u32 *value)
 {
-   return debugfs_create_file(name, mode, parent, value, &fops_u32);
+   return debugfs_create_file(name, mode, parent, value, &simple_fops_u32);
 }
 EXPORT_SYMBOL_GPL(debugfs_create_u32);
 
-static int debugfs_u64_set(void *data, u64 val)
-{
-   *(u64 *)data = val;
-   return 0;
-}
-
-static int debugfs_u64_get(void *data, u64 *val)
-{
-   *val = *(u64 *)data;
-   return 0;
-}
-DEFINE_SIMPLE_ATTRIBUTE(fops_u64, debugfs_u64_get, debugfs_u64_set, "%llu\n");
-
 /**
  * debugfs_create_u64 - create a debugfs f

[RFC 07/11] split out libfs/file.c from libfs.c

2008-02-18 Thread Arnd Bergmann
Consolidate all file manipulation code in libfs in a single
source file.

Signed-off-by: Arnd Bergmann <[EMAIL PROTECTED]>
Index: linux-2.6/fs/libfs.c
===
--- linux-2.6.orig/fs/libfs.c
+++ linux-2.6/fs/libfs.c
@@ -421,165 +421,6 @@ ssize_t simple_read_from_buffer(void __u
 }
 
 /*
- * Transaction based IO.
- * The file expects a single write which triggers the transaction, and then
- * possibly a read which collects the result - which is stored in a
- * file-local buffer.
- */
-char *simple_transaction_get(struct file *file, const char __user *buf, size_t 
size)
-{
-   struct simple_transaction_argresp *ar;
-   static DEFINE_SPINLOCK(simple_transaction_lock);
-
-   if (size > SIMPLE_TRANSACTION_LIMIT - 1)
-   return ERR_PTR(-EFBIG);
-
-   ar = (struct simple_transaction_argresp *)get_zeroed_page(GFP_KERNEL);
-   if (!ar)
-   return ERR_PTR(-ENOMEM);
-
-   spin_lock(&simple_transaction_lock);
-
-   /* only one write allowed per open */
-   if (file->private_data) {
-   spin_unlock(&simple_transaction_lock);
-   free_page((unsigned long)ar);
-   return ERR_PTR(-EBUSY);
-   }
-
-   file->private_data = ar;
-
-   spin_unlock(&simple_transaction_lock);
-
-   if (copy_from_user(ar->data, buf, size))
-   return ERR_PTR(-EFAULT);
-
-   return ar->data;
-}
-
-ssize_t simple_transaction_read(struct file *file, char __user *buf, size_t 
size, loff_t *pos)
-{
-   struct simple_transaction_argresp *ar = file->private_data;
-
-   if (!ar)
-   return 0;
-   return simple_read_from_buffer(buf, size, pos, ar->data, ar->size);
-}
-
-int simple_transaction_release(struct inode *inode, struct file *file)
-{
-   free_page((unsigned long)file->private_data);
-   return 0;
-}
-
-/* Simple attribute files */
-
-struct simple_attr {
-   int (*get)(void *, u64 *);
-   int (*set)(void *, u64);
-   char get_buf[24];   /* enough to store a u64 and "\n\0" */
-   char set_buf[24];
-   void *data;
-   const char *fmt;/* format for read operation */
-   struct mutex mutex; /* protects access to these buffers */
-};
-
-/* simple_attr_open is called by an actual attribute open file operation
- * to set the attribute specific access operations. */
-int simple_attr_open(struct inode *inode, struct file *file,
-int (*get)(void *, u64 *), int (*set)(void *, u64),
-const char *fmt)
-{
-   struct simple_attr *attr;
-
-   attr = kmalloc(sizeof(*attr), GFP_KERNEL);
-   if (!attr)
-   return -ENOMEM;
-
-   attr->get = get;
-   attr->set = set;
-   attr->data = inode->i_private;
-   attr->fmt = fmt;
-   mutex_init(&attr->mutex);
-
-   file->private_data = attr;
-
-   return nonseekable_open(inode, file);
-}
-
-int simple_attr_release(struct inode *inode, struct file *file)
-{
-   kfree(file->private_data);
-   return 0;
-}
-
-/* read from the buffer that is filled with the get function */
-ssize_t simple_attr_read(struct file *file, char __user *buf,
-size_t len, loff_t *ppos)
-{
-   struct simple_attr *attr;
-   size_t size;
-   ssize_t ret;
-
-   attr = file->private_data;
-
-   if (!attr->get)
-   return -EACCES;
-
-   ret = mutex_lock_interruptible(&attr->mutex);
-   if (ret)
-   return ret;
-
-   if (*ppos) {/* continued read */
-   size = strlen(attr->get_buf);
-   } else {/* first read */
-   u64 val;
-   ret = attr->get(attr->data, &val);
-   if (ret)
-   goto out;
-
-   size = scnprintf(attr->get_buf, sizeof(attr->get_buf),
-attr->fmt, (unsigned long long)val);
-   }
-
-   ret = simple_read_from_buffer(buf, len, ppos, attr->get_buf, size);
-out:
-   mutex_unlock(&attr->mutex);
-   return ret;
-}
-
-/* interpret the buffer as a number to call the set function with */
-ssize_t simple_attr_write(struct file *file, const char __user *buf,
- size_t len, loff_t *ppos)
-{
-   struct simple_attr *attr;
-   u64 val;
-   size_t size;
-   ssize_t ret;
-
-   attr = file->private_data;
-   if (!attr->set)
-   return -EACCES;
-
-   ret = mutex_lock_interruptible(&attr->mutex);
-   if (ret)
-   return ret;
-
-   ret = -EFAULT;
-   size = min(sizeof(attr->set_buf) - 1, len);
-   if (copy_from_user(attr->set_buf, buf, size))
-   goto out;
-
-   ret = len; /* claim we got the whole input */
-   attr->set_buf[size] = '\0';
-   val = simple_strtol(attr->set_buf, NULL, 0);
-   attr->set(attr->data, val);
-out:
-   mutex_unlock(&attr->mutex);
- 

[RFC 00/11] possible debugfs/libfs consolidation

2008-02-18 Thread Arnd Bergmann
I noticed that there is a lot of duplication in pseudo
file systems, so I started looking into how to consolidate
them. I ended up with a largish rework of the structure
of libfs and moving almost all of debugfs in there as well.

As an example, I also have patches that reduce debugfs,
securityfs and usbfs to the point where they are mostly
thin wrappers around libfs, with large comment blocks.
Other file systems could be changed in the same way, but
I first like to see if people agree that I'm on the right
track.

These patches have seen practically no testing so far,
so don't expect them to work, but please tell me what
you think about the concept.

Arnd <><

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 08/11] split out libfs/dentry.c from libfs.c

2008-02-18 Thread Arnd Bergmann
Consolidate all dentry manipulation code in libfs in a single
source file.

Signed-off-by: Arnd Bergmann <[EMAIL PROTECTED]>

Index: linux-2.6/fs/libfs.c
===
--- linux-2.6.orig/fs/libfs.c
+++ linux-2.6/fs/libfs.c
@@ -12,188 +12,6 @@
 
 #include 
 
-int simple_getattr(struct vfsmount *mnt, struct dentry *dentry,
-  struct kstat *stat)
-{
-   struct inode *inode = dentry->d_inode;
-   generic_fillattr(inode, stat);
-   stat->blocks = inode->i_mapping->nrpages << (PAGE_CACHE_SHIFT - 9);
-   return 0;
-}
-
-int simple_statfs(struct dentry *dentry, struct kstatfs *buf)
-{
-   buf->f_type = dentry->d_sb->s_magic;
-   buf->f_bsize = PAGE_CACHE_SIZE;
-   buf->f_namelen = NAME_MAX;
-   return 0;
-}
-
-/*
- * Retaining negative dentries for an in-memory filesystem just wastes
- * memory and lookup time: arrange for them to be deleted immediately.
- */
-static int simple_delete_dentry(struct dentry *dentry)
-{
-   return 1;
-}
-
-/*
- * Lookup the data. This is trivial - if the dentry didn't already
- * exist, we know it is negative.  Set d_op to delete negative dentries.
- */
-struct dentry *simple_lookup(struct inode *dir, struct dentry *dentry, struct 
nameidata *nd)
-{
-   static struct dentry_operations simple_dentry_operations = {
-   .d_delete = simple_delete_dentry,
-   };
-
-   if (dentry->d_name.len > NAME_MAX)
-   return ERR_PTR(-ENAMETOOLONG);
-   dentry->d_op = &simple_dentry_operations;
-   d_add(dentry, NULL);
-   return NULL;
-}
-
-int simple_sync_file(struct file * file, struct dentry *dentry, int datasync)
-{
-   return 0;
-}
- 
-int dcache_dir_open(struct inode *inode, struct file *file)
-{
-   static struct qstr cursor_name = {.len = 1, .name = "."};
-
-   file->private_data = d_alloc(file->f_path.dentry, &cursor_name);
-
-   return file->private_data ? 0 : -ENOMEM;
-}
-
-int dcache_dir_close(struct inode *inode, struct file *file)
-{
-   dput(file->private_data);
-   return 0;
-}
-
-loff_t dcache_dir_lseek(struct file *file, loff_t offset, int origin)
-{
-   mutex_lock(&file->f_path.dentry->d_inode->i_mutex);
-   switch (origin) {
-   case 1:
-   offset += file->f_pos;
-   case 0:
-   if (offset >= 0)
-   break;
-   default:
-   mutex_unlock(&file->f_path.dentry->d_inode->i_mutex);
-   return -EINVAL;
-   }
-   if (offset != file->f_pos) {
-   file->f_pos = offset;
-   if (file->f_pos >= 2) {
-   struct list_head *p;
-   struct dentry *cursor = file->private_data;
-   loff_t n = file->f_pos - 2;
-
-   spin_lock(&dcache_lock);
-   list_del(&cursor->d_u.d_child);
-   p = file->f_path.dentry->d_subdirs.next;
-   while (n && p != &file->f_path.dentry->d_subdirs) {
-   struct dentry *next;
-   next = list_entry(p, struct dentry, 
d_u.d_child);
-   if (!d_unhashed(next) && next->d_inode)
-   n--;
-   p = p->next;
-   }
-   list_add_tail(&cursor->d_u.d_child, p);
-   spin_unlock(&dcache_lock);
-   }
-   }
-   mutex_unlock(&file->f_path.dentry->d_inode->i_mutex);
-   return offset;
-}
-
-/* Relationship between i_mode and the DT_xxx types */
-static inline unsigned char dt_type(struct inode *inode)
-{
-   return (inode->i_mode >> 12) & 15;
-}
-
-/*
- * Directory is locked and all positive dentries in it are safe, since
- * for ramfs-type trees they can't go away without unlink() or rmdir(),
- * both impossible due to the lock on directory.
- */
-
-int dcache_readdir(struct file * filp, void * dirent, filldir_t filldir)
-{
-   struct dentry *dentry = filp->f_path.dentry;
-   struct dentry *cursor = filp->private_data;
-   struct list_head *p, *q = &cursor->d_u.d_child;
-   ino_t ino;
-   int i = filp->f_pos;
-
-   switch (i) {
-   case 0:
-   ino = dentry->d_inode->i_ino;
-   if (filldir(dirent, ".", 1, i, ino, DT_DIR) < 0)
-   break;
-   filp->f_pos++;
-   i++;
-   /* fallthrough */
-   case 1:
-   ino = parent_ino(dentry);
-   if (filldir(dirent, "..", 2, i, ino, DT_DIR) < 0)
-   break;
-   filp->f_pos++;
-   i++;
-   /* fallthrough */
-   default:
-  

[RFC 01/11] add generic versions of debugfs file operations

2008-02-18 Thread Arnd Bergmann
The file operations in debugfs are rather generic and can
be used by other file systems, so it can be interesting to
include them in libfs, with more generic names, and exported
to modules.

This patch adds a new copy of these operations to libfs,
so that the debugfs version can later be cut down.

Signed-off-by: Arnd Bergmann <[EMAIL PROTECTED]>

Index: linux-2.6/fs/Makefile
===
--- linux-2.6.orig/fs/Makefile
+++ linux-2.6/fs/Makefile
@@ -13,6 +13,8 @@ obj-y :=  open.o read_write.o file_table.
pnode.o drop_caches.o splice.o sync.o utimes.o \
stack.o
 
+obj-$(CONFIG_LIBFS) += libfs/
+
 ifeq ($(CONFIG_BLOCK),y)
 obj-y +=   buffer.o bio.o block_dev.o direct-io.o mpage.o ioprio.o
 else
Index: linux-2.6/include/linux/libfs.h
===
--- /dev/null
+++ linux-2.6/include/linux/libfs.h
@@ -0,0 +1,21 @@
+#ifndef __LIBFS_H__
+#define __LIBFS_H__
+
+#include 
+
+extern const struct file_operations simple_fops_u8;
+extern const struct file_operations simple_fops_x8;
+extern const struct file_operations simple_fops_u16;
+extern const struct file_operations simple_fops_x16;
+extern const struct file_operations simple_fops_u32;
+extern const struct file_operations simple_fops_x32;
+extern const struct file_operations simple_fops_u64;
+extern const struct file_operations simple_fops_bool;
+extern const struct file_operations simple_fops_blob;
+
+struct simple_blob_wrapper {
+   void *data;
+   unsigned long size;
+};
+
+#endif /* __LIBFS_H__ */
Index: linux-2.6/fs/libfs/Makefile
===
--- /dev/null
+++ linux-2.6/fs/libfs/Makefile
@@ -0,0 +1,3 @@
+libfs-y += file.o
+
+obj-$(CONFIG_LIBFS) += libfs.o
Index: linux-2.6/fs/libfs/file.c
===
--- /dev/null
+++ linux-2.6/fs/libfs/file.c
@@ -0,0 +1,126 @@
+/*
+ * fs/libfs/file.c
+ * Library for filesystems writers.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+
+/* commonly used attribute file operations */
+static int simple_u8_set(void *data, u64 val)
+{
+   *(u8 *)data = val;
+   return 0;
+}
+static int simple_u8_get(void *data, u64 *val)
+{
+   *val = *(u8 *)data;
+   return 0;
+}
+DEFINE_SIMPLE_EXPORTED_ATTRIBUTE(simple_fops_u8, simple_u8_get, simple_u8_set, 
"%llu\n");
+DEFINE_SIMPLE_EXPORTED_ATTRIBUTE(simple_fops_x8, simple_u8_get, simple_u8_set, 
"0x%02llx\n");
+
+static int simple_u16_set(void *data, u64 val)
+{
+   *(u16 *)data = val;
+   return 0;
+}
+static int simple_u16_get(void *data, u64 *val)
+{
+   *val = *(u16 *)data;
+   return 0;
+}
+DEFINE_SIMPLE_EXPORTED_ATTRIBUTE(simple_fops_u16, simple_u16_get, 
simple_u16_set, "%llu\n");
+DEFINE_SIMPLE_EXPORTED_ATTRIBUTE(simple_fops_x16, simple_u16_get, 
simple_u16_set, "0x%04llx\n");
+
+static int simple_u32_set(void *data, u64 val)
+{
+   *(u32 *)data = val;
+   return 0;
+}
+static int simple_u32_get(void *data, u64 *val)
+{
+   *val = *(u32 *)data;
+   return 0;
+}
+DEFINE_SIMPLE_EXPORTED_ATTRIBUTE(simple_fops_u32, simple_u32_get, 
simple_u32_set, "%llu\n");
+DEFINE_SIMPLE_EXPORTED_ATTRIBUTE(simple_fops_x32, simple_u32_get, 
simple_u32_set, "0x%08llx\n");
+
+static int simple_u64_set(void *data, u64 val)
+{
+   *(u64 *)data = val;
+   return 0;
+}
+
+static int simple_u64_get(void *data, u64 *val)
+{
+   return *(u64 *)data;
+   return 0;
+}
+DEFINE_SIMPLE_EXPORTED_ATTRIBUTE(simple_fops_u64, simple_u64_get, 
simple_u64_set, "%llu\n");
+
+static ssize_t read_file_bool(struct file *file, char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+   char buf[3];
+   u32 *val = file->private_data;
+
+   if (*val)
+   buf[0] = 'Y';
+   else
+   buf[0] = 'N';
+   buf[1] = '\n';
+   buf[2] = 0x00;
+   return simple_read_from_buffer(user_buf, count, ppos, buf, 2);
+}
+
+static ssize_t write_file_bool(struct file *file, const char __user *user_buf,
+  size_t count, loff_t *ppos)
+{
+   char buf[32];
+   int buf_size;
+   u32 *val = file->private_data;
+
+   buf_size = min(count, (sizeof(buf)-1));
+   if (copy_from_user(buf, user_buf, buf_size))
+   return -EFAULT;
+
+   switch (buf[0]) {
+   case 'y':
+   case 'Y':
+   case '1':
+   *val = 1;
+   break;
+   case 'n':
+   case 'N':
+   case '0':
+   *val = 0;
+   break;
+   }
+
+   return count;
+}
+
+const struct file_operations simple_fops_bool = {
+   .read = read_file_bool,
+   .write =write_file_bool,
+   .open = simple_open,
+};
+EXPORT_SYMBOL_GPL(simple_fops_bool);
+
+static ssize_t read_file_blob(struct file *file, char __user *user_buf,
+

[RFC 05/11] slim down usbfs

2008-02-18 Thread Arnd Bergmann
Half of the usbfs code is the same as debugfs, so we can
replace it now with calls to the generic libfs versions.

Signed-off-by: Arnd Bergmann <[EMAIL PROTECTED]>
Index: linux-2.6/drivers/usb/core/inode.c
===
--- linux-2.6.orig/drivers/usb/core/inode.c
+++ linux-2.6/drivers/usb/core/inode.c
@@ -47,11 +47,10 @@
 #define USBFS_DEFAULT_BUSMODE (S_IXUGO | S_IRUGO)
 #define USBFS_DEFAULT_LISTMODE S_IRUGO
 
-static struct super_operations usbfs_ops;
-static const struct file_operations default_file_operations;
-static struct vfsmount *usbfs_mount;
-static int usbfs_mount_count;  /* = 0 */
-static int ignore_mount = 0;
+static DEFINE_SIMPLE_FS(usb_fs_type, "usbfs", NULL, USBDEVICE_SUPER_MAGIC);
+static struct dentry *usbfs_root;
+
+static int ignore_mount = 1;
 
 static struct dentry *devices_usbfs_dentry;
 static int num_buses;  /* = 0 */
@@ -263,186 +262,11 @@ static int remount(struct super_block *s
return -EINVAL;
}
 
-   if (usbfs_mount && usbfs_mount->mnt_sb)
-   update_sb(usbfs_mount->mnt_sb);
-
-   return 0;
-}
-
-static struct inode *usbfs_get_inode (struct super_block *sb, int mode, dev_t 
dev)
-{
-   struct inode *inode = new_inode(sb);
-
-   if (inode) {
-   inode->i_mode = mode;
-   inode->i_uid = current->fsuid;
-   inode->i_gid = current->fsgid;
-   inode->i_blocks = 0;
-   inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
-   switch (mode & S_IFMT) {
-   default:
-   init_special_inode(inode, mode, dev);
-   break;
-   case S_IFREG:
-   inode->i_fop = &default_file_operations;
-   break;
-   case S_IFDIR:
-   inode->i_op = &simple_dir_inode_operations;
-   inode->i_fop = &simple_dir_operations;
-
-   /* directory inodes start off with i_nlink == 2 (for 
"." entry) */
-   inc_nlink(inode);
-   break;
-   }
-   }
-   return inode; 
-}
-
-/* SMP-safe */
-static int usbfs_mknod (struct inode *dir, struct dentry *dentry, int mode,
-   dev_t dev)
-{
-   struct inode *inode = usbfs_get_inode(dir->i_sb, mode, dev);
-   int error = -EPERM;
-
-   if (dentry->d_inode)
-   return -EEXIST;
-
-   if (inode) {
-   d_instantiate(dentry, inode);
-   dget(dentry);
-   error = 0;
-   }
-   return error;
-}
-
-static int usbfs_mkdir (struct inode *dir, struct dentry *dentry, int mode)
-{
-   int res;
-
-   mode = (mode & (S_IRWXUGO | S_ISVTX)) | S_IFDIR;
-   res = usbfs_mknod (dir, dentry, mode, 0);
-   if (!res)
-   inc_nlink(dir);
-   return res;
-}
-
-static int usbfs_create (struct inode *dir, struct dentry *dentry, int mode)
-{
-   mode = (mode & S_IALLUGO) | S_IFREG;
-   return usbfs_mknod (dir, dentry, mode, 0);
-}
-
-static inline int usbfs_positive (struct dentry *dentry)
-{
-   return dentry->d_inode && !d_unhashed(dentry);
-}
-
-static int usbfs_empty (struct dentry *dentry)
-{
-   struct list_head *list;
-
-   spin_lock(&dcache_lock);
-
-   list_for_each(list, &dentry->d_subdirs) {
-   struct dentry *de = list_entry(list, struct dentry, 
d_u.d_child);
-   if (usbfs_positive(de)) {
-   spin_unlock(&dcache_lock);
-   return 0;
-   }
-   }
-
-   spin_unlock(&dcache_lock);
-   return 1;
-}
-
-static int usbfs_unlink (struct inode *dir, struct dentry *dentry)
-{
-   struct inode *inode = dentry->d_inode;
-   mutex_lock(&inode->i_mutex);
-   drop_nlink(dentry->d_inode);
-   dput(dentry);
-   mutex_unlock(&inode->i_mutex);
-   d_delete(dentry);
-   return 0;
-}
-
-static int usbfs_rmdir(struct inode *dir, struct dentry *dentry)
-{
-   int error = -ENOTEMPTY;
-   struct inode * inode = dentry->d_inode;
-
-   mutex_lock(&inode->i_mutex);
-   dentry_unhash(dentry);
-   if (usbfs_empty(dentry)) {
-   drop_nlink(dentry->d_inode);
-   drop_nlink(dentry->d_inode);
-   dput(dentry);
-   inode->i_flags |= S_DEAD;
-   drop_nlink(dir);
-   error = 0;
-   }
-   mutex_unlock(&inode->i_mutex);
-   if (!error)
-   d_delete(dentry);
-   dput(dentry);
-   return error;
-}
-
-
-/* default file operations */
-static ssize_t default_read_file (struct file *file, char __user *buf,
- size_t count, loff_t *ppos)
-{
-   return 0;
-}
-
-static ssize_t default_write_file (struct file *file, const char __user *buf,
-  size_t count, loff_t *ppos)
-{
-   ret

[RFC 09/11] split out libfs/super.c from libfs.c

2008-02-18 Thread Arnd Bergmann
Consolidate all super block manipulation code in libfs in a single
source file.

Signed-off-by: Arnd Bergman <[EMAIL PROTECTED]>
Index: linux-2.6/fs/libfs.c
===
--- linux-2.6.orig/fs/libfs.c
+++ linux-2.6/fs/libfs.c
@@ -12,63 +12,6 @@
 
 #include 
 
-static const struct super_operations simple_super_operations = {
-   .statfs = simple_statfs,
-};
-
-/*
- * Common helper for pseudo-filesystems (sockfs, pipefs, bdev - stuff that
- * will never be mountable)
- */
-int get_sb_pseudo(struct file_system_type *fs_type, char *name,
-   const struct super_operations *ops, unsigned long magic,
-   struct vfsmount *mnt)
-{
-   struct super_block *s = sget(fs_type, NULL, set_anon_super, NULL);
-   struct dentry *dentry;
-   struct inode *root;
-   struct qstr d_name = {.name = name, .len = strlen(name)};
-
-   if (IS_ERR(s))
-   return PTR_ERR(s);
-
-   s->s_flags = MS_NOUSER;
-   s->s_maxbytes = ~0ULL;
-   s->s_blocksize = 1024;
-   s->s_blocksize_bits = 10;
-   s->s_magic = magic;
-   s->s_op = ops ? ops : &simple_super_operations;
-   s->s_time_gran = 1;
-   root = new_inode(s);
-   if (!root)
-   goto Enomem;
-   /*
-* since this is the first inode, make it number 1. New inodes created
-* after this must take care not to collide with it (by passing
-* max_reserved of 1 to iunique).
-*/
-   root->i_ino = 1;
-   root->i_mode = S_IFDIR | S_IRUSR | S_IWUSR;
-   root->i_uid = root->i_gid = 0;
-   root->i_atime = root->i_mtime = root->i_ctime = CURRENT_TIME;
-   dentry = d_alloc(NULL, &d_name);
-   if (!dentry) {
-   iput(root);
-   goto Enomem;
-   }
-   dentry->d_sb = s;
-   dentry->d_parent = dentry;
-   d_instantiate(dentry, root);
-   s->s_root = dentry;
-   s->s_flags |= MS_ACTIVE;
-   return simple_set_mnt(mnt, s);
-
-Enomem:
-   up_write(&s->s_umount);
-   deactivate_super(s);
-   return -ENOMEM;
-}
-
 int simple_link(struct dentry *old_dentry, struct inode *dir, struct dentry 
*dentry)
 {
struct inode *inode = old_dentry->d_inode;
@@ -238,7 +181,6 @@ ssize_t simple_read_from_buffer(void __u
return count;
 }
 
-EXPORT_SYMBOL(get_sb_pseudo);
 EXPORT_SYMBOL(simple_write_begin);
 EXPORT_SYMBOL(simple_write_end);
 EXPORT_SYMBOL(simple_empty);
Index: linux-2.6/fs/libfs/super.c
===
--- linux-2.6.orig/fs/libfs/super.c
+++ linux-2.6/fs/libfs/super.c
@@ -54,6 +54,60 @@ static const struct super_operations sim
 };
 
 /*
+ * Common helper for pseudo-filesystems (sockfs, pipefs, bdev - stuff that
+ * will never be mountable)
+ */
+int get_sb_pseudo(struct file_system_type *fs_type, char *name,
+   const struct super_operations *ops, unsigned long magic,
+   struct vfsmount *mnt)
+{
+   struct super_block *s = sget(fs_type, NULL, set_anon_super, NULL);
+   struct dentry *dentry;
+   struct inode *root;
+   struct qstr d_name = {.name = name, .len = strlen(name)};
+
+   if (IS_ERR(s))
+   return PTR_ERR(s);
+
+   s->s_flags = MS_NOUSER;
+   s->s_maxbytes = ~0ULL;
+   s->s_blocksize = 1024;
+   s->s_blocksize_bits = 10;
+   s->s_magic = magic;
+   s->s_op = ops ? ops : &simple_super_operations;
+   s->s_time_gran = 1;
+   root = new_inode(s);
+   if (!root)
+   goto Enomem;
+   /*
+* since this is the first inode, make it number 1. New inodes created
+* after this must take care not to collide with it (by passing
+* max_reserved of 1 to iunique).
+*/
+   root->i_ino = 1;
+   root->i_mode = S_IFDIR | S_IRUSR | S_IWUSR;
+   root->i_uid = root->i_gid = 0;
+   root->i_atime = root->i_mtime = root->i_ctime = CURRENT_TIME;
+   dentry = d_alloc(NULL, &d_name);
+   if (!dentry) {
+   iput(root);
+   goto Enomem;
+   }
+   dentry->d_sb = s;
+   dentry->d_parent = dentry;
+   d_instantiate(dentry, root);
+   s->s_root = dentry;
+   s->s_flags |= MS_ACTIVE;
+   return simple_set_mnt(mnt, s);
+
+Enomem:
+   up_write(&s->s_umount);
+   deactivate_super(s);
+   return -ENOMEM;
+}
+EXPORT_SYMBOL(get_sb_pseudo);
+
+/*
  * the inodes created here are not hashed. If you use iunique to generate
  * unique inode values later for this filesystem, then you must take care
  * to pass it an appropriate max_reserved value to avoid collisions.

-- 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 04/11] slim down securityfs

2008-02-18 Thread Arnd Bergmann
With the new simple_fs_type in place, securityfs practically
becomes a nop and we just need to leave code around to manage
its mount point.

Signed-off-by: Arnd Bergmann <[EMAIL PROTECTED]>
Index: linux-2.6/security/inode.c
===
--- linux-2.6.orig/security/inode.c
+++ linux-2.6/security/inode.c
@@ -13,176 +13,14 @@
  */
 
 /* #define DEBUG */
+
 #include 
-#include 
-#include 
-#include 
 #include 
-#include 
 #include 
 
 #define SECURITYFS_MAGIC   0x73636673
 
-static struct vfsmount *mount;
-static int mount_count;
-
-/*
- * TODO:
- *   I think I can get rid of these default_file_ops, but not quite sure...
- */
-static ssize_t default_read_file(struct file *file, char __user *buf,
-size_t count, loff_t *ppos)
-{
-   return 0;
-}
-
-static ssize_t default_write_file(struct file *file, const char __user *buf,
-  size_t count, loff_t *ppos)
-{
-   return count;
-}
-
-static int default_open(struct inode *inode, struct file *file)
-{
-   if (inode->i_private)
-   file->private_data = inode->i_private;
-
-   return 0;
-}
-
-static const struct file_operations default_file_ops = {
-   .read = default_read_file,
-   .write =default_write_file,
-   .open = default_open,
-};
-
-static struct inode *get_inode(struct super_block *sb, int mode, dev_t dev)
-{
-   struct inode *inode = new_inode(sb);
-
-   if (inode) {
-   inode->i_mode = mode;
-   inode->i_uid = 0;
-   inode->i_gid = 0;
-   inode->i_blocks = 0;
-   inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
-   switch (mode & S_IFMT) {
-   default:
-   init_special_inode(inode, mode, dev);
-   break;
-   case S_IFREG:
-   inode->i_fop = &default_file_ops;
-   break;
-   case S_IFDIR:
-   inode->i_op = &simple_dir_inode_operations;
-   inode->i_fop = &simple_dir_operations;
-
-   /* directory inodes start off with i_nlink == 2 (for 
"." entry) */
-   inc_nlink(inode);
-   break;
-   }
-   }
-   return inode;
-}
-
-/* SMP-safe */
-static int mknod(struct inode *dir, struct dentry *dentry,
-int mode, dev_t dev)
-{
-   struct inode *inode;
-   int error = -EPERM;
-
-   if (dentry->d_inode)
-   return -EEXIST;
-
-   inode = get_inode(dir->i_sb, mode, dev);
-   if (inode) {
-   d_instantiate(dentry, inode);
-   dget(dentry);
-   error = 0;
-   }
-   return error;
-}
-
-static int mkdir(struct inode *dir, struct dentry *dentry, int mode)
-{
-   int res;
-
-   mode = (mode & (S_IRWXUGO | S_ISVTX)) | S_IFDIR;
-   res = mknod(dir, dentry, mode, 0);
-   if (!res)
-   inc_nlink(dir);
-   return res;
-}
-
-static int create(struct inode *dir, struct dentry *dentry, int mode)
-{
-   mode = (mode & S_IALLUGO) | S_IFREG;
-   return mknod(dir, dentry, mode, 0);
-}
-
-static inline int positive(struct dentry *dentry)
-{
-   return dentry->d_inode && !d_unhashed(dentry);
-}
-
-static int fill_super(struct super_block *sb, void *data, int silent)
-{
-   static struct tree_descr files[] = {{""}};
-
-   return simple_fill_super(sb, SECURITYFS_MAGIC, files);
-}
-
-static int get_sb(struct file_system_type *fs_type,
- int flags, const char *dev_name,
- void *data, struct vfsmount *mnt)
-{
-   return get_sb_single(fs_type, flags, data, fill_super, mnt);
-}
-
-static struct file_system_type fs_type = {
-   .owner =THIS_MODULE,
-   .name = "securityfs",
-   .get_sb =   get_sb,
-   .kill_sb =  kill_litter_super,
-};
-
-static int create_by_name(const char *name, mode_t mode,
- struct dentry *parent,
- struct dentry **dentry)
-{
-   int error = 0;
-
-   *dentry = NULL;
-
-   /* If the parent is not specified, we create it in the root.
-* We need the root dentry to do this, which is in the super
-* block. A pointer to that is in the struct vfsmount that we
-* have around.
-*/
-   if (!parent ) {
-   if (mount && mount->mnt_sb) {
-   parent = mount->mnt_sb->s_root;
-   }
-   }
-   if (!parent) {
-   pr_debug("securityfs: Ah! can not find a parent!\n");
-   return -EFAULT;
-   }
-
-   mutex_lock(&parent->d_inode->i_mutex);
-   *dentry = lookup_one_len(name, parent, strlen(name));
-   if (!IS_ERR(dentry)) {
-   if ((mode & S_IFMT) == S_IFDIR)
-

[RFC 06/11] split out linux/libfs.h from linux/fs.h

2008-02-18 Thread Arnd Bergmann
With libfs turning into a larger subsystem, it makes
sense to have a separate header that is not included
by the low-level vfs code.

Signed-off-by: Arnd Bergmann <[EMAIL PROTECTED]>
Index: linux-2.6/fs/debugfs/inode.c
===
--- linux-2.6.orig/fs/debugfs/inode.c
+++ linux-2.6/fs/debugfs/inode.c
@@ -18,6 +18,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
Index: linux-2.6/fs/dcache.c
===
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -947,6 +947,7 @@ struct dentry *d_alloc_name(struct dentr
q.hash = full_name_hash(q.name, q.len);
return d_alloc(parent, &q);
 }
+EXPORT_SYMBOL(d_alloc_name);
 
 /**
  * d_instantiate - fill in inode information for a dentry
Index: linux-2.6/drivers/usb/core/inode.c
===
--- linux-2.6.orig/drivers/usb/core/inode.c
+++ linux-2.6/drivers/usb/core/inode.c
@@ -27,6 +27,7 @@
 
 /*/
 
+#include 
 #include 
 #include 
 #include 
Index: linux-2.6/fs/binfmt_misc.c
===
--- linux-2.6.orig/fs/binfmt_misc.c
+++ linux-2.6/fs/binfmt_misc.c
@@ -16,6 +16,7 @@
  *  2001-02-28 AV: rewritten into something that resembles C. Original didn't.
  */
 
+#include 
 #include 
 #include 
 #include 
Index: linux-2.6/fs/configfs/mount.c
===
--- linux-2.6.orig/fs/configfs/mount.c
+++ linux-2.6/fs/configfs/mount.c
@@ -25,6 +25,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
Index: linux-2.6/fs/debugfs/file.c
===
--- linux-2.6.orig/fs/debugfs/file.c
+++ linux-2.6/fs/debugfs/file.c
@@ -15,6 +15,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
Index: linux-2.6/fs/fuse/control.c
===
--- linux-2.6.orig/fs/fuse/control.c
+++ linux-2.6/fs/fuse/control.c
@@ -9,6 +9,7 @@
 #include "fuse_i.h"
 
 #include 
+#include 
 #include 
 
 #define FUSE_CTL_SUPER_MAGIC 0x65735543
Index: linux-2.6/fs/nfsd/nfsctl.c
===
--- linux-2.6.orig/fs/nfsd/nfsctl.c
+++ linux-2.6/fs/nfsd/nfsctl.c
@@ -8,6 +8,7 @@
 
 #include 
 
+#include 
 #include 
 #include 
 #include 
Index: linux-2.6/net/sunrpc/rpc_pipe.c
===
--- linux-2.6.orig/net/sunrpc/rpc_pipe.c
+++ linux-2.6/net/sunrpc/rpc_pipe.c
@@ -8,6 +8,7 @@
  * Copyright (c) 2002, Trond Myklebust <[EMAIL PROTECTED]>
  *
  */
+#include 
 #include 
 #include 
 #include 
Index: linux-2.6/security/inode.c
===
--- linux-2.6.orig/security/inode.c
+++ linux-2.6/security/inode.c
@@ -16,6 +16,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #define SECURITYFS_MAGIC   0x73636673
Index: linux-2.6/security/selinux/selinuxfs.c
===
--- linux-2.6.orig/security/selinux/selinuxfs.c
+++ linux-2.6/security/selinux/selinuxfs.c
@@ -14,6 +14,7 @@
  * the Free Software Foundation, version 2.
  */
 
+#include 
 #include 
 #include 
 #include 
Index: linux-2.6/virt/kvm/kvm_main.c
===
--- linux-2.6.orig/virt/kvm/kvm_main.c
+++ linux-2.6/virt/kvm/kvm_main.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
Index: linux-2.6/arch/powerpc/platforms/cell/spufs/spufs.h
===
--- linux-2.6.orig/arch/powerpc/platforms/cell/spufs/spufs.h
+++ linux-2.6/arch/powerpc/platforms/cell/spufs/spufs.h
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
Index: linux-2.6/fs/autofs4/autofs_i.h
===
--- linux-2.6.orig/fs/autofs4/autofs_i.h
+++ linux-2.6/fs/autofs4/autofs_i.h
@@ -22,6 +22,7 @@
 #define AUTOFS_IOC_COUNT 32
 
 #include 
+#include 
 #include 
 #include 
 #include 
Index: linux-2.6/include/linux/fs.h
===
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -1957,12 +1957,7 @@ extern struct dentry *simple_lookup(stru
 extern ssize_t generic_read_dir(struct file *, char __user *, size_t, loff_t 
*);
 extern const struct file_operations simple_dir_operations;
 extern const struct inode_operations simple_dir_inode_operations;
-struct tree_descr { char *name; const struct file_operations *ops; int mode; };
 struct dentry *d_alloc_name(struct dentry *, const char *);
-extern int simple_fill_super(struct super_block *, int, cons

[RFC 02/11] introduce simple_fs_type

2008-02-18 Thread Arnd Bergmann
There is a number of pseudo file systems in the kernel
that are basically copies of debugfs, all implementing the
same boilerplate code, just with different bugs.

This adds yet another copy to the kernel in the libfs directory,
with generalized helpers that can be used by any of them.

The most interesting function here is the new "struct dentry *
simple_register_filesystem(struct simple_fs_type *type)", which
returns the root directory of a new file system that can then
be passed to simple_create_file() and similar functions as a
parent.

Signed-off-by: Arnd Bergman <[EMAIL PROTECTED]>
Index: linux-2.6/fs/libfs.c
===
--- linux-2.6.orig/fs/libfs.c
+++ linux-2.6/fs/libfs.c
@@ -263,11 +263,6 @@ int simple_link(struct dentry *old_dentr
return 0;
 }
 
-static inline int simple_positive(struct dentry *dentry)
-{
-   return dentry->d_inode && !d_unhashed(dentry);
-}
-
 int simple_empty(struct dentry *dentry)
 {
struct dentry *child;
@@ -409,109 +404,6 @@ int simple_write_end(struct file *file, 
return copied;
 }
 
-/*
- * the inodes created here are not hashed. If you use iunique to generate
- * unique inode values later for this filesystem, then you must take care
- * to pass it an appropriate max_reserved value to avoid collisions.
- */
-int simple_fill_super(struct super_block *s, int magic, struct tree_descr 
*files)
-{
-   struct inode *inode;
-   struct dentry *root;
-   struct dentry *dentry;
-   int i;
-
-   s->s_blocksize = PAGE_CACHE_SIZE;
-   s->s_blocksize_bits = PAGE_CACHE_SHIFT;
-   s->s_magic = magic;
-   s->s_op = &simple_super_operations;
-   s->s_time_gran = 1;
-
-   inode = new_inode(s);
-   if (!inode)
-   return -ENOMEM;
-   /*
-* because the root inode is 1, the files array must not contain an
-* entry at index 1
-*/
-   inode->i_ino = 1;
-   inode->i_mode = S_IFDIR | 0755;
-   inode->i_uid = inode->i_gid = 0;
-   inode->i_blocks = 0;
-   inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
-   inode->i_op = &simple_dir_inode_operations;
-   inode->i_fop = &simple_dir_operations;
-   inode->i_nlink = 2;
-   root = d_alloc_root(inode);
-   if (!root) {
-   iput(inode);
-   return -ENOMEM;
-   }
-   for (i = 0; !files->name || files->name[0]; i++, files++) {
-   if (!files->name)
-   continue;
-
-   /* warn if it tries to conflict with the root inode */
-   if (unlikely(i == 1))
-   printk(KERN_WARNING "%s: %s passed in a files array"
-   "with an index of 1!\n", __func__,
-   s->s_type->name);
-
-   dentry = d_alloc_name(root, files->name);
-   if (!dentry)
-   goto out;
-   inode = new_inode(s);
-   if (!inode)
-   goto out;
-   inode->i_mode = S_IFREG | files->mode;
-   inode->i_uid = inode->i_gid = 0;
-   inode->i_blocks = 0;
-   inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
-   inode->i_fop = files->ops;
-   inode->i_ino = i;
-   d_add(dentry, inode);
-   }
-   s->s_root = root;
-   return 0;
-out:
-   d_genocide(root);
-   dput(root);
-   return -ENOMEM;
-}
-
-static DEFINE_SPINLOCK(pin_fs_lock);
-
-int simple_pin_fs(struct file_system_type *type, struct vfsmount **mount, int 
*count)
-{
-   struct vfsmount *mnt = NULL;
-   spin_lock(&pin_fs_lock);
-   if (unlikely(!*mount)) {
-   spin_unlock(&pin_fs_lock);
-   mnt = vfs_kern_mount(type, 0, type->name, NULL);
-   if (IS_ERR(mnt))
-   return PTR_ERR(mnt);
-   spin_lock(&pin_fs_lock);
-   if (!*mount)
-   *mount = mnt;
-   }
-   mntget(*mount);
-   ++*count;
-   spin_unlock(&pin_fs_lock);
-   mntput(mnt);
-   return 0;
-}
-
-void simple_release_fs(struct vfsmount **mount, int *count)
-{
-   struct vfsmount *mnt;
-   spin_lock(&pin_fs_lock);
-   mnt = *mount;
-   if (!--*count)
-   *mount = NULL;
-   spin_unlock(&pin_fs_lock);
-   mntput(mnt);
-}
-
 ssize_t simple_read_from_buffer(void __user *to, size_t count, loff_t *ppos,
const void *from, size_t available)
 {
@@ -786,14 +678,11 @@ EXPORT_SYMBOL(simple_dir_inode_operation
 EXPORT_SYMBOL(simple_dir_operations);
 EXPORT_SYMBOL(simple_empty);
 EXPORT_SYMBOL(d_alloc_name);
-EXPORT_SYMBOL(simple_fill_super);
 EXPORT_SYMBOL(simple_getattr);
 EXPORT_SYMBOL(simple_link);
 EXPORT_SYMBOL(simple_lookup);
-EXPORT_SYMBOL(simple_pin_fs);
 EXPORT_SYMBOL(simple_prepare_write);
 EXPORT_SYMBOL(simple_readpag

Re: 2.6.24-sha1: RIP [] iov_iter_advance+0x38/0x70

2008-02-18 Thread Nick Piggin
On Wednesday 13 February 2008 09:27, Alexey Dobriyan wrote:
> On Tue, Feb 12, 2008 at 02:04:30PM -0800, Andrew Morton wrote:

> > > [ 4057.31] Pid: 7035, comm: ftest03 Not tainted
> > > 2.6.24-25f666300625d894ebe04bac2b4b3aadb907c861 #2 [ 4057.31] RIP:
> > > 0010:[]  []
> > > iov_iter_advance+0x38/0x70 [ 4057.31] RSP: 0018:810110329b20 
> > > EFLAGS: 00010246
> > > [ 4057.31] RAX:  RBX: 0800 RCX:
> > >  [ 4057.31] RDX:  RSI:
> > > 0800 RDI: 810110329ba8 [ 4057.31] RBP:
> > > 0800 R08:  R09: 810101dbc000 [
> > > 4057.31] R10: 0004 R11:  R12:
> > > 00026000 [ 4057.31] R13: 81010d765c98 R14:
> > > 1000 R15:  [ 4057.31] FS: 
> > > 7fee589146d0() GS:80501000() knlGS:
> > > [ 4057.31] CS:  0010 DS:  ES:  CR0: 8005003b [
> > > 4057.31] CR2: 810101dbc008 CR3: 0001103da000 CR4:
> > > 06e0 [ 4057.31] DR0:  DR1:
> > >  DR2:  [ 4057.31] DR3:
> > >  DR6: 0ff0 DR7: 0400 [
> > > 4057.31] Process ftest03 (pid: 7035, threadinfo 810110328000,
> > > task 810160b0) [ 4057.31] Stack:  8025b413
> > > 81010d765ab0 804e6fd8 001201d2 [ 4057.31] 
> > > 810110329db8 00026000 810110329d38 81017b9fb500 [
> > > 4057.31]  81010d765c98 804175e0 81010d765ab0
> > >  [ 4057.31] Call Trace:
> > > [ 4057.31]  [] ?
> > > generic_file_buffered_write+0x1e3/0x6f0 [ 4057.31] 
> > > [] ? current_fs_time+0x1e/0x30 [ 4057.31] 
> > > [] ? __generic_file_aio_write_nolock+0x28f/0x440 [
> > > 4057.31]  [] ? generic_file_aio_write+0x63/0xd0 [
> > > 4057.31]  [] ? ext3_file_write+0x23/0xc0 [
> > > 4057.31]  [] ? ext3_file_write+0x0/0xc0 [
> > > 4057.31]  [] ? do_sync_readv_writev+0xcb/0x110 [
> > > 4057.31]  [] ? autoremove_wake_function+0x0/0x30
> > > [ 4057.31]  [] ?
> > > debug_check_no_locks_freed+0x7d/0x130 [ 4057.31] 
> > > [] ? trace_hardirqs_on+0xcf/0x150 [ 4057.31] 
> > > [] ? __kmalloc+0x15/0xc0
> > > [ 4057.31]  [] ? rw_copy_check_uvector+0x9d/0x130
> > > [ 4057.31]  [] ? do_readv_writev+0xe0/0x170
> > > [ 4057.31]  [] ? mutex_lock_nested+0x1a7/0x280
> > > [ 4057.31]  [] ? trace_hardirqs_on+0xcf/0x150
> > > [ 4057.31]  [] ?
> > > __mutex_unlock_slowpath+0xc9/0x170 [ 4057.31]  []
> > > ? trace_hardirqs_on+0xcf/0x150 [ 4057.31]  [] ?
> > > trace_hardirqs_on_thunk+0x35/0x3a [ 4057.31]  []
> > > ? sys_writev+0x53/0x90
> > > [ 4057.31]  [] ?
> > > system_call_after_swapgs+0x7b/0x80 [ 4057.31]
> > > [ 4057.31]
> > > [ 4057.31] Code: 48 01 77 10 48 29 77 18 c3 0f 0b eb fe 66 66 90 66
> > > 66 90 4c 8b 0f 48 8b 4f 10 49 89 f0 eb 07 66 66 66 90 49 29 c0 4d 85 c0
> > > 75 07 <49> 83 79 08 00 75 23 49 8b 51 08 48 89 d0 48 29 c8 49 39 c0 49
> > > [ 4057.31] RIP  [] iov_iter_advance+0x38/0x70 [
> > > 4057.31]  RSP 
> > > [ 4057.31] CR2: 810101dbc008
> > > [ 4057.31] Kernel panic - not syncing: Fatal exception

Can you try this patch please?
Index: linux-2.6/mm/filemap.c
===
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1753,9 +1753,10 @@ static void __iov_iter_advance_iov(struc
 
 		/*
 		 * The !iov->iov_len check ensures we skip over unlikely
-		 * zero-length segments.
+		 * zero-length segments. But we mustn't try to "skip" if
+		 * we have come to the end (i->count == bytes).
 		 */
-		while (bytes || !iov->iov_len) {
+		while (bytes || (unlikely(!iov->iov_len) && i->count > bytes)) {
 			int copy = min(bytes, iov->iov_len - base);
 
 			bytes -= copy;


Re: [PATCH] correct inconsistent ntp interval/tick_length usage

2008-02-18 Thread Roman Zippel
Hi,

On Mon, 18 Feb 2008, john stultz wrote:

> If we are building a train station, and each train car is 60ft, it
> doesn't make much sense to build 1000ft stations, right? Instead you'll
> do better if you build a 1020ft station.

That would assume trains are always 60ft long, which is the basic error in 
your assumption.

Since we're using analogies: What you're doing is to put you winter 
clothes on your weight scale and reset the scale to zero to offset for the 
weigth of the clothes. If you stand now with your bathing clothes on the 
scale, does that mean you lost weight?
That's all you do - you change the scale and slightly screw the scale for 
everyone else trying to use it.

To keep in mind what time adjusting is supposed to do:

freq = 1sec + time_freq

What we do instead is:

freq + tick_adj = 1sec + time_freq

Where exactly is now the problem to integrate tick_adj into time_freq? The 
result is _exactly_ the same. The only visible difference is a slightly 
higher time_freq value and as long as it is within the 500 ppm limit there 
is simply no problem.

> And yes, if we remove CLOCK_TICK_ADJUST, that would also resolve the 
> (A!=B) issue, but it doesn't address the error from #2 below.
> [..]
> 2) We need a solution that handles granularity error well, as this is a
> moderate source of error for course clocksources such as jiffies.
> CLOCK_TICK_ADJUST does cover this fairly well in most cases. I suspect
> we could do even better, but that will take some deeper changes.

How exactly does CLOCK_TICK_ADJUST address this problem? The error due to 
insufficient resolution is still there, all it does is shifting the scale, 
so it's not immediately visible.

> My understanding of your approach (removing CLOCK_TICK_ADJUST),
> addresses issues #1 and #3, but hurts issue #2.

What exactly is hurt?

bye, Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] x86_64: check msr to get mmconfig for amd family 10h opteron v3

2008-02-18 Thread Yinghai Lu
On Feb 15, 2008 1:31 AM, Yinghai Lu <[EMAIL PROTECTED]> wrote:
> From: Yinghai Lu <[EMAIL PROTECTED]>
>
> so even booting kernel with acpi=off or even MCFG is not there, we still can
> use MMCONFIG.
>
> Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]>
> Cc: Thomas Gleixner <[EMAIL PROTECTED]>
> Cc: Ingo Molnar <[EMAIL PROTECTED]>
> Cc: Andi Kleen <[EMAIL PROTECTED]>
> Cc: Greg KH <[EMAIL PROTECTED]>
> Cc: "H. Peter Anvin" <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> ---
>
>  arch/x86/pci/mmconfig-shared.c |   67 ---
>  1 file changed, 61 insertions(+), 6 deletions(-)
>
> Index: linux-2.6/arch/x86/pci/mmconfig-shared.c
> ===
> --- linux-2.6.orig/arch/x86/pci/mmconfig-shared.c
> +++ linux-2.6/arch/x86/pci/mmconfig-shared.c

Ingo/Thomas,

It seems you missed this one in the 5.

this one should be safe. it only reads msr.

Andi had concern with other one that was touching msr. I will keep
that one in my local tree.

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Fix building lguest as module.

2008-02-18 Thread Tony Breeds
On Mon, Feb 04, 2008 at 07:11:10AM +1100, Rusty Russell wrote:
 
> Lguest guest support and host support are separate config options: they used
> to be tied together.  Sort out which parts of asm-offsets are needed for Guest
> and Host.

 With rusty's patch applied the errors still persist in some
configs.  Please try  the patch below.

Fixes the following errors from modpost when the lguest (host) support is
modular.

ERROR: "LGUEST_PAGES_guest_gdt_desc" [drivers/lguest/lg.ko] undefined!
ERROR: "LGUEST_PAGES_host_gdt_desc" [drivers/lguest/lg.ko] undefined!
ERROR: "LGUEST_PAGES_host_cr3" [drivers/lguest/lg.ko] undefined!
ERROR: "LGUEST_PAGES_regs" [drivers/lguest/lg.ko] undefined!
ERROR: "LGUEST_PAGES_host_idt_desc" [drivers/lguest/lg.ko] undefined!
ERROR: "LGUEST_PAGES_guest_gdt" [drivers/lguest/lg.ko] undefined!
ERROR: "LGUEST_PAGES_host_sp" [drivers/lguest/lg.ko] undefined!
ERROR: "LGUEST_PAGES_regs_trapnum" [drivers/lguest/lg.ko] undefined!
ERROR: "LGUEST_PAGES_guest_idt_desc" [drivers/lguest/lg.ko] undefined!

Lguest guest support and host support are separate config options: they used
to be tied together.  Sort out which parts of asm-offsets are needed for Guest
and Host.

Signed-off-by: Tony Breeds <[EMAIL PROTECTED]>
---

Original patch from rusty (http://lkml.org/lkml/2008/2/3/168) didn't completely
fix the problem.  I think this matches the original intent.

Not sure of the right way to attribute this patch, clearlyt it's mostly
Rusty's work.

 arch/x86/kernel/asm-offsets_32.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index afd8446..ae9d289 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -20,10 +20,8 @@
 
 #include 
 
-#ifdef CONFIG_LGUEST_GUEST
 #include 
 #include "../../../drivers/lguest/lg.h"
-#endif
 
 #define DEFINE(sym, val) \
 asm volatile("\n->" #sym " %0 " #val : : "i" (val))
@@ -134,6 +132,10 @@ void foo(void)
BLANK();
OFFSET(LGUEST_DATA_irq_enabled, lguest_data, irq_enabled);
OFFSET(LGUEST_DATA_pgdir, lguest_data, pgdir);
+#endif
+
+#if defined(CONFIG_LGUEST) || defined(CONFIG_LGUEST_MODULE)
+   BLANK();
OFFSET(LGUEST_PAGES_host_gdt_desc, lguest_pages, state.host_gdt_desc);
OFFSET(LGUEST_PAGES_host_idt_desc, lguest_pages, state.host_idt_desc);
OFFSET(LGUEST_PAGES_host_cr3, lguest_pages, state.host_cr3);
-- 
1.5.4.1




Yours Tony

  linux.conf.auhttp://linux.conf.au/ || http://lca2008.linux.org.au/
  Jan 28 - Feb 02 2008 The Australian Linux Technical Conference!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Status of storage autosuspend

2008-02-18 Thread Alan Stern
On Mon, 18 Feb 2008, Pavel Machek wrote:

> > Should we ignore this issue and submit the patches anyway?
> 
> I think you should. "Easy" (and clean) solution to that issue is to
> just return -EPERM from SG_IOCTL if autosuspend is configured in ;-).

:-)

Okay, I'll update the patches to 2.6.25-rc2 and submit them in a few
days.  (Actually the SCSI patch has to go in first and the usb-storage
patch afterward, which will probably cause it to be delayed one kernel
version.  I don't know any good way to handle these cross-subsystem
updates...)

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NULL pointer in kmem_cache_alloc with 2.6.25-rc1

2008-02-18 Thread Zhang, Yanmin
On Mon, 2008-02-18 at 08:52 -0800, Arjan van de Ven wrote:
> On Mon, 18 Feb 2008 04:59:18 -0800
> Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > On Fri, 15 Feb 2008 14:47:01 +0800 "Zhang, Yanmin"
> > <[EMAIL PROTECTED]> wrote:
> > 
> > > Call Trace:
> > >  [] ? __alloc_skb+0x31/0x121
> > >  [] ? sock_alloc_send_skb+0x77/0x1d2
> > >  [] ? autoremove_wake_function+0x0/0x2e
> > >  [] ? memcpy_fromiovec+0x36/0x66
> > >  [] ? unix_stream_sendmsg+0x165/0x333
> > >  [] ? sock_aio_write+0xd1/0xe0
> > >  [] ? __wake_up_common+0x41/0x74
> > >  [] ? do_sync_write+0xc9/0x10c
> > >  [] ? __do_fault+0x382/0x3cd
> > >  [] ? autoremove_wake_function+0x0/0x2e
> > >  [] ? handle_mm_fault+0x38a/0x70d
> > >  [] ? error_exit+0x0/0x51
> > >  [] ? __dequeue_entity+0x1c/0x32
> > >  [] ? vfs_write+0xc0/0x136
> > >  [] ? sys_write+0x45/0x6e
> > >  [] ? system_call_after_swapgs+0x7b/0x80
> > 
> > off-topic, but...  Why are all the backtrace decodes here marked as
> > being unreliable?
At least 80279948 is correct. The register values and ip address looks 
like
matching the disassembled codes.

> 
> probably because the stack is a tad confused, so the back tracer doesn't see
> even a single good stack frame.
> 
> Is CONFIG_FRAME_POINTER on?
No.

With 2.6.25-rc2, 3 x86-64 machines hit the same issue.

-yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] [PATCH v2] adt7473: New driver for Analog Devices ADT7473 sensor chip

2008-02-18 Thread Mark M. Hoffman
Hi Darrick:

* Darrick J. Wong <[EMAIL PROTECTED]> [2008-02-18 13:33:23 -0800]:
> adt7473: New driver for Analog Devices ADT7473 sensor chip
> 
> This driver reports voltage, temperature and fan sensor readings
> on an ADT7473 chip.
> 
> Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]>
> ---
> 
>  Documentation/hwmon/adt7473 |   79 +++
>  drivers/hwmon/Kconfig   |   10 
>  drivers/hwmon/Makefile  |1 
>  drivers/hwmon/adt7473.c | 1157 
> +++
>  4 files changed, 1247 insertions(+), 0 deletions(-)
> 

Applied to hwmon-2.6.git/testing, thanks.

-- 
Mark M. Hoffman
[EMAIL PROTECTED]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

2008-02-18 Thread Alasdair G Kergon
On Tue, Feb 19, 2008 at 09:16:44AM +1100, David Chinner wrote:
> Surely any hardware that doesn't support barrier
> operations can emulate them with cache flushes when they receive a
> barrier I/O from the filesystem
 
My complaint about having to support them within dm when more than one
device is involved is because any efficiencies disappear: you can't send
further I/O to any one device until all the other devices have completed
their barrier (or else later I/O to that device could overtake the
barrier on another device).  And then I argue that it would be better
for the filesystem to have the information that these are not hardware
barriers so it has the opportunity of tuning its behaviour (e.g.
flushing less often because it's a more expensive operation).

Alasdair
-- 
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] [PATCH] remove goto statement

2008-02-18 Thread Glauber Costa
This patch removes goto statements in favour of plain returns
in places that had nothing left behind that would justify
such construction
---
 drivers/acpi/processor_core.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 06a230a..70f62b6 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -651,7 +651,7 @@ static int __cpuinit acpi_processor_star
 
result = acpi_processor_add_fs(device);
if (result)
-   goto end;
+   return result;
 
status = acpi_install_notify_handler(pr->handle, ACPI_DEVICE_NOTIFY,
 acpi_processor_notify, pr);
@@ -675,7 +675,7 @@ #endif
"%s is registered as cooling_device%d\n",
device->dev.bus_id, cdev->id);
else
-   goto end;
+   return result;
 
result = sysfs_create_link(&device->dev.kobj, &cdev->device.kobj,
"thermal_cooling");
-- 
1.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: tbench regression in 2.6.25-rc1

2008-02-18 Thread Zhang, Yanmin
On Mon, 2008-02-18 at 11:11 +0100, Eric Dumazet wrote:
> On Mon, 18 Feb 2008 16:12:38 +0800
> "Zhang, Yanmin" <[EMAIL PROTECTED]> wrote:
> 
> > On Fri, 2008-02-15 at 15:22 -0800, David Miller wrote:
> > > From: Eric Dumazet <[EMAIL PROTECTED]>
> > > Date: Fri, 15 Feb 2008 15:21:48 +0100
> > > 
> > > > On linux-2.6.25-rc1 x86_64 :
> > > > 
> > > > offsetof(struct dst_entry, lastuse)=0xb0
> > > > offsetof(struct dst_entry, __refcnt)=0xb8
> > > > offsetof(struct dst_entry, __use)=0xbc
> > > > offsetof(struct dst_entry, next)=0xc0
> > > > 
> > > > So it should be optimal... I dont know why tbench prefers __refcnt 
> > > > being 
> > > > on 0xc0, since in this case lastuse will be on a different cache line...
> > > > 
> > > > Each incoming IP packet will need to change lastuse, __refcnt and 
> > > > __use, 
> > > > so keeping them in the same cache line is a win.
> > > > 
> > > > I suspect then that even this patch could help tbench, since it avoids 
> > > > writing lastuse...
> > > 
> > > I think your suspicions are right, and even moreso
> > > it helps to keep __refcnt out of the same cache line
> > > as input/output/ops which are read-almost-entirely :-
> > I think you are right. The issue is these three variables sharing the same 
> > cache line
> > with input/output/ops.
> > 
> > > )
> > > 
> > > I haven't done an exhaustive analysis, but it seems that
> > > the write traffic to lastuse and __refcnt are about the
> > > same.  However if we find that __refcnt gets hit more
> > > than lastuse in this workload, it explains the regression.
> > I also think __refcnt is the key. I did a new testing by adding 2 unsigned 
> > long
> > pading before lastuse, so the 3 members are moved to next cache line. The 
> > performance is
> > recovered.
> > 
> > How about below patch? Almost all performance is recovered with the new 
> > patch.
> > 
> > Signed-off-by: Zhang Yanmin <[EMAIL PROTECTED]>
> > 
> > ---
> > 
> > --- linux-2.6.25-rc1/include/net/dst.h  2008-02-21 14:33:43.0 
> > +0800
> > +++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-21 14:36:22.0 
> > +0800
> > @@ -52,11 +52,10 @@ struct dst_entry
> > unsigned short  header_len; /* more space at head required 
> > */
> > unsigned short  trailer_len;/* space to reserve at tail */
> >  
> > -   u32 metrics[RTAX_MAX];
> > -   struct dst_entry*path;
> > -
> > -   unsigned long   rate_last;  /* rate limiting for ICMP */
> > unsigned intrate_tokens;
> > +   unsigned long   rate_last;  /* rate limiting for ICMP */
> > +
> > +   struct dst_entry*path;
> >  
> >  #ifdef CONFIG_NET_CLS_ROUTE
> > __u32   tclassid;
> > @@ -70,10 +69,12 @@ struct dst_entry
> > int (*output)(struct sk_buff*);
> >  
> > struct  dst_ops *ops;
> > -   
> > -   unsigned long   lastuse;
> > +
> > +   u32 metrics[RTAX_MAX];
> > +
> > atomic_t__refcnt;   /* client references*/
> > int __use;
> > +   unsigned long   lastuse;
> > union {
> > struct dst_entry *next;
> > struct rtable*rt_next;
> > 
> > 
> 
> Well, after this patch, we grow dst_entry by 8 bytes :
With my .config, it doesn't grow. Perhaps because of CONFIG_NET_CLS_ROUTE, I 
don't
enable it. I will move tclassid under ops.

> 
> sizeof(struct dst_entry)=0xd0
> offsetof(struct dst_entry, input)=0x68
> offsetof(struct dst_entry, output)=0x70
> offsetof(struct dst_entry, __refcnt)=0xb4
> offsetof(struct dst_entry, lastuse)=0xc0
> offsetof(struct dst_entry, __use)=0xb8
> sizeof(struct rtable)=0x140
> 
> 
> So we dirty two cache lines instead of one, unless your cpu have 128 bytes 
> cache lines ?
> 
> I am quite suprised that my patch to not change lastuse if already set to 
> jiffies changes nothing...
> 
> If you have some time, could you also test this (unrelated) patch ?
> 
> We can avoid dirty all the time a cache line of loopback device.
> 
> diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
> index f2a6e71..0a4186a 100644
> --- a/drivers/net/loopback.c
> +++ b/drivers/net/loopback.c
> @@ -150,7 +150,10 @@ static int loopback_xmit(struct sk_buff *skb, struct 
> net_device *dev)
> return 0;
> }
>  #endif
> -   dev->last_rx = jiffies;
> +#ifdef CONFIG_SMP
> +   if (dev->last_rx != jiffies)
> +#endif
> +   dev->last_rx = jiffies;
>  
> /* it's OK to use per_cpu_ptr() because BHs are off */
> pcpu_lstats = netdev_priv(dev);
> 
Although I didn't test it, I don't think it's ok. The key is __refcnt shares 
the same
cache line with ops/input/output.

-yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at 

Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices

2008-02-18 Thread Alasdair G Kergon
On Mon, Feb 18, 2008 at 08:52:10AM -0500, Ric Wheeler wrote:
> I understand that. Most of the time, dm or md devices are composed of 
> uniform components which will uniformly support (or not) the cache flush 
> commands used by barriers.
 
As a dm developer, it's "almost none of the time" because trivial
configurations aren't the ones that require lots of testing effort.

Let's stop arguing over "most of the time":-)

As Andi points out, there are certainly enough real-world users of
"single linear or crypt target using one physical device" for it to be
worth our supporting it.

Alasdair
-- 
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


LSI Logic MegaRAID SATA 150-4 / LSI Logic New Generation RAID Device Drivers (MEGARAID_NEWGEN) problems (megaraid abort: scsi cmd:14600, do now own)

2008-02-18 Thread David M. Strang

Greetings -

A couple months back I purchased a LSI Logic MegaRAID ATA 150-4 
controller, as well as 3 Seagate 500GB SATA-II hard drives to use in my 
system. Previously, I was using a pair of WD4000YR's in software raid, 
which seemed to work well. I've just not gotten around to working on 
migrating my data to these new drivers + controller, and it's giving me 
some issues. As with most, I'm having some severe performance issues, 
the performance is simply abysmal. Before getting into the details, here 
is a quick overview of my configuration:


System:
Tyan Tiger i7320/R (S5350) System Board
2x Intel Xeon 3.0 GHz
4GB RAM

LSI Logic MegaRAID ATA 150-4 controller -  Firmware Revision: 713S
3x Seagate 7200.10 (Perpendicular Recording) ST3500630AS 500GB SATA-II 
drives configured as a RAID-1 array with a HotSpare.


Also, connected to the onboard controller is a WD4000YR, where all of my 
data currently resides.


I'm running Gentoo Hardended AMD64 MultiLib 
(/usr/portage/profiles/hardened/amd64/multilib)


My current kernel revision is 2.6.23-hardened-r7.

Here are some (possibly) relevant snippets from dmesg during startup:

...
megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
megaraid: probe new device 0x1000:0x1960:0x1000:0x4523: bus 3:slot 3:func 0
ACPI: PCI Interrupt :03:03.0[A] -> GSI 24 (level, low) -> IRQ 24
megaraid: fw version:[713S] bios version:[G121]
scsi0 : LSI Logic MegaRAID driver
scsi[0]: scanning scsi channel 0 [Phy 0] for non-raid devices
scsi[0]: scanning scsi channel 1 [virtual] for logical drives
scsi 0:1:0:0: Direct-Access MegaRAID LD 0 RAID1  476G 713S PQ: 0 ANSI: 2
sd 0:1:0:0: [sda] 976762880 512-byte hardware sectors (500103 MB)
sd 0:1:0:0: [sda] Write Protect is off
sd 0:1:0:0: [sda] Mode Sense: 00 00 00 00
sd 0:1:0:0: [sda] Asking for cache data failed
sd 0:1:0:0: [sda] Assuming drive cache: write through
sd 0:1:0:0: [sda] 976762880 512-byte hardware sectors (500103 MB)
sd 0:1:0:0: [sda] Write Protect is off
sd 0:1:0:0: [sda] Mode Sense: 00 00 00 00
sd 0:1:0:0: [sda] Asking for cache data failed
sd 0:1:0:0: [sda] Assuming drive cache: write through
sda: sda1 sda2 sda3 sda4
sd 0:1:0:0: [sda] Attached SCSI disk
ata_piix :00:1f.2: version 2.12
ata_piix :00:1f.2: MAP [ P0 -- P1 -- ]
ACPI: PCI Interrupt :00:1f.2[A] -> GSI 18 (level, low) -> IRQ 18
PCI: Setting latency timer of device :00:1f.2 to 64
scsi1 : ata_piix
scsi2 : ata_piix
ata1: SATA max UDMA/133 cmd 0x000114a0 ctl 0x0001149a 
bmdma 0x00011470 irq 18
ata2: SATA max UDMA/133 cmd 0x00011490 ctl 0x00011486 
bmdma 0x00011478 irq 18

ata1.00: ATA-7: WDC WD4000YR-01PLB0, 01.06A01, max UDMA/133
ata1.00: 781422768 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata1.00: configured for UDMA/133
scsi 1:0:0:0: Direct-Access ATA  WDC WD4000YR-01P 01.0 PQ: 0 ANSI: 5
sd 1:0:0:0: [sdb] 781422768 512-byte hardware sectors (400088 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sd 1:0:0:0: [sdb] 781422768 512-byte hardware sectors (400088 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sdb: sdb1 sdb2 sdb3 sdb4
sd 1:0:0:0: [sdb] Attached SCSI disk
...

My controller is configured for Write Back Caching, Adaptive Read Ahead, 
and Direct I/O (I've also tried cached I/O but it scared me...)


The first thing I'm noticing is the horrible performance on the raid 
disk, compared to the single standalone hard disk. Here is the output 
from hdparm -tT on the single disk:


-([EMAIL PROTECTED])-(~)- # hdparm -tT /dev/sdb1

/dev/sdb1:
Timing cached reads:   1670 MB in  2.00 seconds = 835.00 MB/sec
Timing buffered disk reads:  140 MB in  3.01 seconds =  46.45 MB/sec

And then, the output from the raid-1 array:

-([EMAIL PROTECTED])-(~)- # hdparm -tT /dev/sda1

/dev/sda1:
Timing cached reads:   1718 MB in  2.00 seconds = 859.65 MB/sec
Timing buffered disk reads:   92 MB in  3.09 seconds =  29.76 MB/sec

I'm not sure what the deal is with the buffered disk reads being so much 
WORSE than a single disk. So poor performance is a concern, but what's 
more alarming are the messages showing up in DMESG. When I first tried 
Cached IO - performance seemed good... except, dmesg was littered with 
these errors (?):


megaraid: aborting-14610 cmd=2a 
megaraid abort: scsi cmd:14610, do now own
megaraid: aborting-14612 cmd=2a 
megaraid abort: scsi cmd:14612, do now own
megaraid: aborting-14614 cmd=2a 
megaraid abort: scsi cmd:14614, do now own
...
megaraid: 38 outstanding commands. Max wait 300 sec
megaraid mbox: Wait for 38 commands to complete:300
megaraid mbox: reset sequence completed sucessfully

I'm not certain what these mean... why am I getting aborts?

S

Re: [PATCH 1/3] Fix Unlikely(x) == y

2008-02-18 Thread Arjan van de Ven
On Tue, 19 Feb 2008 13:33:53 +1100
Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> Actually one thing I don't like about gcc is that I think it still
> emits cmovs for likely/unlikely branches, which is silly (the gcc
> developers seem to be in love with that instruction). If that goes
> away, then branch hints may be even better.

only for -Os and only if the result is smaller afaik.
(cmov tends to be a performance loss most of the time so for -O2 and such it
isn't used as far as I know.. it does make for nice small code however ;-)

 


-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Implement barrier support for single device DM devices

2008-02-18 Thread Alasdair G Kergon
On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote:
> Alasdair G Kergon wrote:
> > On Fri, Feb 15, 2008 at 01:08:21PM +0100, Andi Kleen wrote:
> >> Implement barrier support for single device DM devices
> > Thanks.  We've got some (more-invasive) dm patches in the works that
> > attempt to use flushing to emulate barriers where we can't just
> > pass them down like that.
> I wonder if it's worth the effort to try to implement this.

The decision got taken to allocate barrier bios to implement the basic
flush so dm has little choice in this matter now.  (If you're going to
implement barriers for flush, you might as well implement them more
generally.)

Maybe I should spell this out more clearly for those who weren't
tracking this block layer change:  AFAIK You cannot currently flush a
device-mapper block device without doing some jiggery-pokery.

> For example, how safe
> xfs is if barriers are not supported or turned off?  

The last time we tried xfs with dm it didn't seem to notice -EOPNOTSUPP
everywhere it should => recovery may find corruption.

Alasdair
-- 
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] Fix Unlikely(x) == y

2008-02-18 Thread Nick Piggin
On Tuesday 19 February 2008 01:39, Andi Kleen wrote:
> Arjan van de Ven <[EMAIL PROTECTED]> writes:
> > you have more faith in the authors knowledge of how his code actually
> > behaves than I think is warranted  :)
>
> iirc there was a mm patch some time ago to keep track of the actual
> unlikely values at runtime and it showed indeed some wrong ones. But the
> far majority of them are probably correct.
>
> > Or faith in that he knows what "unlikely" means.
> > I should write docs about this; but unlikely() means:
> > 1) It happens less than 0.01% of the cases.
> > 2) The compiler couldn't have figured this out by itself
> >(NULL pointer checks are compiler done already, same for some other
> > conditions) 3) It's a hot codepath where shaving 0.5 cycles (less even on
> > x86) matters (and the author is ok with taking a 500 cycles hit if he's
> > wrong)
>
> One more thing unlikely() does is to move the unlikely code out of line.
> So it should conserve some icache in critical functions, which might
> well be worth some more cycles (don't have numbers though).

I actually once measured context switching performance in the scheduler,
and removing the  unlikely hint for testing RT tasks IIRC gave about 5%
performance drop.

This was on a P4 which is very different from more modern CPUs both in
terms of branch performance characteristics, and icache characteristics.
However, the P4's branch predictor is pretty good, and it should easily
be able to correctly predict the rt_task check if it has enough entries.
So I think much of the savings came from code transformation and movement.
Anyway, it is definitely worthwhile if used correctly.

Actually one thing I don't like about gcc is that I think it still emits
cmovs for likely/unlikely branches, which is silly (the gcc developers
seem to be in love with that instruction). If that goes away, then
branch hints may be even better.

>
> But overall I agree with you that unlikely is in most cases a bad
> idea (and I submitted the original patch introducing it originally @). That
> is because it is often used in situations where gcc's default branch
> prediction heuristics do would make exactly the same decision
>
>if (unlikely(x == NULL))
>
> is simply totally useless because gcc already assumes all x == NULL
> tests are unlikely. I appended some of the builtin heuristics from
> a recent gcc source so people can see them.
>
> Note in particular the last predictors; assuming branch ending
> with goto, including call, causing early function return or
> returning negative constant are not taken. Just these alone
> are likely 95+% of the unlikelies in the kernel.

Yes, gcc should be able to do pretty good heuristics, considering
the quite good numbers that cold CPU predictors can attain. However
for really performance critical code (or really "never" executed
code), then I think it is OK to have the hints and not have to rely
on gcc heuristics.

>
> -Andi

[snip]

Interesting, thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] signal(ia64_ia32): add a signal stack overflow check

2008-02-18 Thread Shi Weihua
The similar check has been added to x86_32(i386) in commit
id 83bd01024b1fdfc41d9b758e5669e80fca72df66.
So we add this check to ia64_ia32 and improve it a liitle bit in that
we need to check for stack overflow only when the signal is on stack.

Signed-off-by: Shi Weihua <[EMAIL PROTECTED]> 

---

The previous patch has a comment mistake. Now I correct it.

---
--- linux-2.6.25-rc2.orig/arch/ia64/ia32/ia32_signal.c  2008-02-16 
04:57:20.0 +0800
+++ linux-2.6.25-rc2/arch/ia64/ia32/ia32_signal.c   2008-02-19 
09:57:28.0 +0800
@@ -766,8 +766,19 @@ get_sigframe (struct k_sigaction *ka, st
 
/* This is the X/Open sanctioned signal stack switching.  */
if (ka->sa.sa_flags & SA_ONSTACK) {
-   if (!on_sig_stack(esp))
+   int onstack = sas_ss_flags(esp);
+
+   if (onstack == 0)
esp = current->sas_ss_sp + current->sas_ss_size;
+   else if (onstack == SS_ONSTACK) {
+   /*
+* If we are on the alternate signal stack and would
+* overflow it, don't. Return an always-bogus address
+* instead so we will die with SIGSEGV.
+*/
+   if (!likely(on_sig_stack(esp - frame_size)))
+   return (void __user *) -1L;
+   }
}
/* Legacy stack switching not supported */
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] signal(ia64): add a signal stack overflow check

2008-02-18 Thread Shi Weihua
The similar check has been added to x86_32(i386) in commit
id 83bd01024b1fdfc41d9b758e5669e80fca72df66.
So we add this check to ia64 and improve it a liitle bit in that
we need to check for stack overflow only when the signal is on stack.

Signed-off-by: Shi Weihua <[EMAIL PROTECTED]> 

---

The previous patch has a comment mistake. Now I correct it.

---
--- linux-2.6.25-rc2.orig/arch/ia64/kernel/signal.c 2008-02-16 
04:57:20.0 +0800
+++ linux-2.6.25-rc2/arch/ia64/kernel/signal.c  2008-02-19 09:57:05.0 
+0800
@@ -342,15 +342,33 @@ setup_frame (int sig, struct k_sigaction
 
new_sp = scr->pt.r12;
tramp_addr = (unsigned long) __kernel_sigtramp;
-   if ((ka->sa.sa_flags & SA_ONSTACK) && sas_ss_flags(new_sp) == 0) {
-   new_sp = current->sas_ss_sp + current->sas_ss_size;
-   /*
-* We need to check for the register stack being on the signal 
stack
-* separately, because it's switched separately (memory stack 
is switched
-* in the kernel, register stack is switched in the signal 
trampoline).
-*/
-   if (!rbs_on_sig_stack(scr->pt.ar_bspstore))
-   new_rbs = (current->sas_ss_sp + sizeof(long) - 1) & 
~(sizeof(long) - 1);
+   if (ka->sa.sa_flags & SA_ONSTACK) {
+   int onstack = sas_ss_flags(new_sp);
+
+   if (onstack == 0) {
+   new_sp = current->sas_ss_sp + current->sas_ss_size;
+   /*
+* We need to check for the register stack being on the
+* signal stack separately, because it's switched
+* separately (memory stack is switched in the kernel,
+* register stack is switched in the signal trampoline).
+*/
+   if (!rbs_on_sig_stack(scr->pt.ar_bspstore))
+   new_rbs = ALIGN(current->sas_ss_sp,
+   sizeof(long));
+   } else if (onstack == SS_ONSTACK) {
+   unsigned long check_sp;
+
+   /*
+* If we are on the alternate signal stack and would
+* overflow it, don't. Return an always-bogus address
+* instead so we will die with SIGSEGV.
+*/
+   check_sp = (new_sp - sizeof(*frame)) & -STACK_ALIGN;
+   if (!likely(on_sig_stack(check_sp)))
+   return force_sigsegv_info(sig, (void __user *)
+ check_sp);
+   }
}
frame = (void __user *) ((new_sp - sizeof(*frame)) & -STACK_ALIGN);
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] capabilities: implement per-process securebits

2008-02-18 Thread Andrew G. Morgan

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Andrew

Here is the patch to add per-process securebits again. This version
includes Serge's argument type fix (thanks), but is otherwise unchanged
from the one posted a couple of weeks back. It is against Linus' tree as
of a the 15th.

This change is all code that lives inside the capability LSM and the new
securebits implementation is only active if
CONFIG_SECURITY_FILE_CAPABILITIES is enabled (it doesn't make much sense
to support this feature without filesystem capabilities).

As indicated in the last round, this patch has been Acked by Serge and
Reviewed by James (Cc:d).

Thanks

Andrew
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFHuj4g+bHCR3gb8jsRAjBqAKCuMrlQqIOTY+5Tba6aM5HHcy3cWQCgvA2p
v+MAuce9OULRL9vOKdivq8Q=
=L/XN
-END PGP SIGNATURE-
From 006ddf6903983dd596e360ab1ab8e537b29fab46 Mon Sep 17 00:00:00 2001
From: Andrew G. Morgan <[EMAIL PROTECTED]>
Date: Mon, 18 Feb 2008 15:23:28 -0800
Subject: [PATCH] Implement per-process securebits

[This patch represents a no-op unless CONFIG_SECURITY_FILE_CAPABILITIES
 is enabled at configure time.]

Filesystem capability support makes it possible to do away with
(set)uid-0 based privilege and use capabilities instead. That is, with
filesystem support for capabilities but without this present patch,
it is (conceptually) possible to manage a system with capabilities
alone and never need to obtain privilege via (set)uid-0.

Of course, conceptually isn't quite the same as currently possible
since few user applications, certainly not enough to run a viable
system, are currently prepared to leverage capabilities to exercise
privilege. Further, many applications exist that may never get
upgraded in this way, and the kernel will continue to want to support
their setuid-0 base privilege needs.

Where pure-capability applications evolve and replace setuid-0
binaries, it is desirable that there be a mechanisms by which they
can contain their privilege. In addition to leveraging the per-process
bounding and inheritable sets, this should include suppressing the
privilege of the uid-0 superuser from the process' tree of children.

The feature added by this patch can be leveraged to suppress the
privilege associated with (set)uid-0. This suppression requires
CAP_SETPCAP to initiate, and only immediately affects the 'current'
process (it is inherited through fork()/exec()). This
reimplementation differs significantly from the historical support for
securebits which was system-wide, unwieldy and which has ultimately
withered to a dead relic in the source of the modern kernel.

With this patch applied a process, that is capable(CAP_SETPCAP), can
now drop all legacy privilege (through uid=0) for itself and all
subsequently fork()'d/exec()'d children with:

  prctl(PR_SET_SECUREBITS, 0x2f);

[2008/02/18: This version includes an int -> long argument fix from Serge.]

Signed-off-by: Andrew G. Morgan <[EMAIL PROTECTED]>
Acked-by: Serge Hallyn <[EMAIL PROTECTED]>
Reviewed-by: James Morris <[EMAIL PROTECTED]>
---
 include/linux/capability.h |3 +-
 include/linux/init_task.h  |3 +-
 include/linux/prctl.h  |9 +++-
 include/linux/sched.h  |3 +-
 include/linux/securebits.h |   25 ---
 include/linux/security.h   |   14 +++---
 kernel/sys.c   |   25 +--
 security/capability.c  |1 +
 security/commoncap.c   |  103 
 security/dummy.c   |2 +-
 security/security.c|4 +-
 security/selinux/hooks.c   |5 +-
 12 files changed, 139 insertions(+), 58 deletions(-)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index 7d50ff6..eaab759 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -155,6 +155,7 @@ typedef struct kernel_cap_struct {
  *   Add any capability from current's capability bounding set
  *   to the current process' inheritable set
  *   Allow taking bits out of capability bounding set
+ *   Allow modification of the securebits for a process
  */
 
 #define CAP_SETPCAP  8
@@ -490,8 +491,6 @@ extern const kernel_cap_t __cap_init_eff_set;
 int capable(int cap);
 int __capable(struct task_struct *t, int cap);
 
-extern long cap_prctl_drop(unsigned long cap);
-
 #endif /* __KERNEL__ */
 
 #endif /* !_LINUX_CAPABILITY_H */
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 1f74e1d..e8a894a 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define INIT_FDTABLE \
@@ -169,7 +170,7 @@ extern struct group_info init_groups;
.cap_inheritable = CAP_INIT_INH_SET,\
.cap_permitted  = CAP_FULL_SET, \
.cap_bset   = CAP_INIT_BSET,\
-   .keep_capabilities = 0, \
+  

Re: [PATCH 3/5] signal(x86_ia32): add a signal stack overflow check

2008-02-18 Thread Shi Weihua
The similar check has been added to x86_32(i386) in commit
id 83bd01024b1fdfc41d9b758e5669e80fca72df66.
So we add this check to x86_ia32 and improve it a liitle bit in that
we need to check for stack overflow only when the signal is on stack.

Signed-off-by: Shi Weihua <[EMAIL PROTECTED]> 

---

The previous patch has a comment mistake. Now I correct it.

---
--- linux-2.6.25-rc2.orig/arch/x86/ia32/ia32_signal.c   2008-02-16 
04:57:20.0 +0800
+++ linux-2.6.25-rc2/arch/x86/ia32/ia32_signal.c2008-02-19 
09:56:43.0 +0800
@@ -406,8 +406,19 @@ static void __user *get_sigframe(struct 
 
/* This is the X/Open sanctioned signal stack switching.  */
if (ka->sa.sa_flags & SA_ONSTACK) {
-   if (sas_ss_flags(sp) == 0)
+   int onstack = sas_ss_flags(sp);
+
+   if (onstack == 0)
sp = current->sas_ss_sp + current->sas_ss_size;
+   else if (onstack == SS_ONSTACK) {
+   /*
+* If we are on the alternate signal stack and would
+* overflow it, don't. Return an always-bogus address
+* instead so we will die with SIGSEGV.
+*/
+if (!likely(on_sig_stack(sp - frame_size)))
+   return (void __user *) -1L;
+   }
}
 
/* This is the legacy signal stack switching. */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] remove goto statement

2008-02-18 Thread Li Zefan
Glauber Costa wrote:
> Li Zefan wrote:
>> Glauber Costa 写道:
>>> This patch removes goto statements in favour of plain returns
>>> in places that had nothing left behind that would justify
>>> such construction
>>> ---
>>>  drivers/acpi/processor_core.c |4 ++--
>>>  1 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
>>> index 06a230a..70f62b6 100644
>>> --- a/drivers/acpi/processor_core.c
>>> +++ b/drivers/acpi/processor_core.c
>>> @@ -651,7 +651,7 @@ static int __cpuinit acpi_processor_star
>>>  
>>> result = acpi_processor_add_fs(device);
>>> if (result)
>>> -   goto end;
>>> +   return result;
>>>  
>>> status = acpi_install_notify_handler(pr->handle, ACPI_DEVICE_NOTIFY,
>>>  acpi_processor_notify, pr);
>>> @@ -675,7 +675,7 @@ #endif
>>> "%s is registered as cooling_device%d\n",
>>> device->dev.bus_id, cdev->id);
>>> else
>>> -   goto end;
>>> +   return result;
>>>  
>>> result = sysfs_create_link(&device->dev.kobj, &cdev->device.kobj,
>>> "thermal_cooling");
>> Seems you forgot to remove the 'end' label ?
> yes, in fact, thanks for pointing.
> 
> However, the patches are as split up as I could do, and it should not
> affect the others. I can re send this one, the whole series, or
> whatever, depending on what the maintainer wants.
> 
> So, what's gonna be?
> 

IMO a revised [PATCH 4/4] will do, since it won't affect the other 3 patches :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] signal(x86_64): add a signal stack overflow check

2008-02-18 Thread Shi Weihua
The similar check has been added to x86_32(i386) in commit
id 83bd01024b1fdfc41d9b758e5669e80fca72df66.
So we add this check to x86_64 and improve it a liitle bit in that
we need to check for stack overflow only when the signal is on stack.

Signed-off-by: Shi Weihua <[EMAIL PROTECTED]> 

---

The previous patch has a comment mistake. Now I correct it.

---
--- linux-2.6.25-rc2.orig/arch/x86/kernel/signal_64.c   2008-02-16 
04:57:20.0 +0800
+++ linux-2.6.25-rc2/arch/x86/kernel/signal_64.c2008-02-19 
09:56:20.0 +0800
@@ -205,8 +205,19 @@ get_stack(struct k_sigaction *ka, struct
 
/* This is the X/Open sanctioned signal stack switching.  */
if (ka->sa.sa_flags & SA_ONSTACK) {
-   if (sas_ss_flags(sp) == 0)
+   int onstack = sas_ss_flags(sp);
+
+   if (onstack == 0)
sp = current->sas_ss_sp + current->sas_ss_size;
+   else if (onstack == SS_ONSTACK) {
+   /*
+* If we are on the alternate signal stack and would
+* overflow it, don't. Return an always-bogus address
+* instead so we will die with SIGSEGV.
+*/
+   if (!likely(on_sig_stack(sp - size)))
+   return (void __user *) -1L;
+   }
}
 
return (void __user *)round_down(sp - size, 16);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] signal(x86_32): Improve the signal stack overflow check

2008-02-18 Thread Shi Weihua
We need to check for stack overflow only when the signal is on stack.
So we can improve the patch "http://lkml.org/lkml/2007/11/27/101"; as following. 

Signed-off-by: Shi Weihua <[EMAIL PROTECTED]> 

---

The previous patch has a comment mistake. Now I correct it.

---
--- linux-2.6.25-rc2.orig/arch/x86/kernel/signal_32.c   2008-02-16 
04:57:20.0 +0800
+++ linux-2.6.25-rc2/arch/x86/kernel/signal_32.c2008-02-19 
09:55:59.0 +0800
@@ -299,17 +299,21 @@ get_sigframe(struct k_sigaction *ka, str
/* Default to using normal stack */
sp = regs->sp;
 
-   /*
-* If we are on the alternate signal stack and would overflow it, don't.
-* Return an always-bogus address instead so we will die with SIGSEGV.
-*/
-   if (on_sig_stack(sp) && !likely(on_sig_stack(sp - frame_size)))
-   return (void __user *) -1L;
-
/* This is the X/Open sanctioned signal stack switching.  */
if (ka->sa.sa_flags & SA_ONSTACK) {
-   if (sas_ss_flags(sp) == 0)
+   int onstack = sas_ss_flags(sp);
+
+   if (onstack == 0)
sp = current->sas_ss_sp + current->sas_ss_size;
+   else if (onstack == SS_ONSTACK) {
+   /*
+* If we are on the alternate signal stack and would
+* overflow it, don't. Return an always-bogus address
+* instead so we will die with SIGSEGV.
+*/
+   if (!likely(on_sig_stack(sp - frame_size)))
+   return (void __user *) -1L;
+   }
}
 
/* This is the legacy signal stack switching. */

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] remove goto statement

2008-02-18 Thread Glauber Costa
Li Zefan wrote:
> Glauber Costa 写道:
>> This patch removes goto statements in favour of plain returns
>> in places that had nothing left behind that would justify
>> such construction
>> ---
>>  drivers/acpi/processor_core.c |4 ++--
>>  1 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
>> index 06a230a..70f62b6 100644
>> --- a/drivers/acpi/processor_core.c
>> +++ b/drivers/acpi/processor_core.c
>> @@ -651,7 +651,7 @@ static int __cpuinit acpi_processor_star
>>  
>>  result = acpi_processor_add_fs(device);
>>  if (result)
>> -goto end;
>> +return result;
>>  
>>  status = acpi_install_notify_handler(pr->handle, ACPI_DEVICE_NOTIFY,
>>   acpi_processor_notify, pr);
>> @@ -675,7 +675,7 @@ #endif
>>  "%s is registered as cooling_device%d\n",
>>  device->dev.bus_id, cdev->id);
>>  else
>> -goto end;
>> +return result;
>>  
>>  result = sysfs_create_link(&device->dev.kobj, &cdev->device.kobj,
>>  "thermal_cooling");
> 
> Seems you forgot to remove the 'end' label ?
yes, in fact, thanks for pointing.

However, the patches are as split up as I could do, and it should not
affect the others. I can re send this one, the whole series, or
whatever, depending on what the maintainer wants.

So, what's gonna be?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] queue usb USB_CDC_GET_ENCAPSULATED_RESPONSE message

2008-02-18 Thread David Brownell
On Monday 18 February 2008, Jan Altenberg wrote:
> Hi all,
> 
> commit 0cf4f2de0a0f4100795f38ef894d4910678c74f8 introduced a bug, which
> prevents sending an USB_CDC_GET_ENCAPSULATED_RESPONSE message. This
> breaks the RNDIS initialization (especially / only Windoze machines
> dislike this behavior...).
> 
> Signed-off-by: Benedikt Spranger <[EMAIL PROTECTED]>
> Signed-off-by: Jan Altenberg <[EMAIL PROTECTED]>

Acked-by: David Brownell <[EMAIL PROTECTED]>


> ---
>  drivers/usb/gadget/ether.c |1 +
>  1 file changed, 1 insertion(+)
> 
> Index: linux-2.6.24/drivers/usb/gadget/ether.c
> ===
> --- linux-2.6.24.orig/drivers/usb/gadget/ether.c
> +++ linux-2.6.24/drivers/usb/gadget/ether.c
> @@ -1568,6 +1568,7 @@ done_set_intf:
>   memcpy(req->buf, buf, n);
>   req->complete = rndis_response_complete;
>   rndis_free_response(dev->rndis_config, buf);
> + value = n;
>   }
>   /* else stalls ... spec says to avoid that */
>   }
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][BLUETOOTH] add HCI_BROKEN_ISOC for 0e5e:6622 (bugzilla #9027)

2008-02-18 Thread SDiZ
(resending this as i have got no reply in the last week)

Marcel Holtmann wrote:
> Hi,
 This patch fix bugzilla #9027.
 ``Syslog flooded with "hci_scodata_packet: hci0 SCO packet for unknown
 connection handle 92" message"

 see http://bugzilla.kernel.org/show_bug.cgi?id=9027
 
>>> when we get the content of /proc/bus/usb/devices for this one. Do you
>>> have the manufacturer name of it or at least an idea which kind of
>>> device this is. Check "hciconfig hci0 version".
>>>   
>> It have no text on the chips and package, so I don't know the
>> manufacturer name.
>> 
> and what about "hciconfig hci0 version"?
>   

hci0: Type: USB
BD Address: 11:11:11:11:11:11 ACL MTU: 672:3 SCO MTU: 48:1
HCI Ver: 2.0 (0x3) HCI Rev: 0x1f4 LMP Ver: 2.0 (0x3) LMP Subver: 0x1f4
Manufacturer: CONWISE Technology Corporation Ltd (66)

>> /proc/bus/usb/devices shows:
>> [...]
>
> That is not all. Otherwise it is violating the Bluetooth HCI USB
> specification and we have bigger problems when switching USB alternate
> endpoints.
>   

hmm.. there are some entries for usb mouse and hubs, nothing else.
I have attached the whole file for your reference.

Other then the annoying syslog message, everything work fine.

> Regards
>
> Marcel
>
>
>   

Regards,
Yuk-Pong, Cheng (SDiZ)






T:  Bus=05 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=480 MxCh= 8
B:  Alloc=  0/800 us ( 0%), #Int=  0, #Iso=  0
D:  Ver= 2.00 Cls=09(hub  ) Sub=00 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor= ProdID= Rev= 2.06
S:  Manufacturer=Linux 2.6.24-1-686 ehci_hcd
S:  Product=EHCI Host Controller
S:  SerialNumber=:00:1d.7
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   4 Ivl=256ms

T:  Bus=04 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12  MxCh= 2
B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor= ProdID= Rev= 2.06
S:  Manufacturer=Linux 2.6.24-1-686 uhci_hcd
S:  Product=UHCI Host Controller
S:  SerialNumber=:00:1d.3
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms

T:  Bus=03 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12  MxCh= 2
B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor= ProdID= Rev= 2.06
S:  Manufacturer=Linux 2.6.24-1-686 uhci_hcd
S:  Product=UHCI Host Controller
S:  SerialNumber=:00:1d.2
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms

T:  Bus=02 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12  MxCh= 2
B:  Alloc= 56/900 us ( 6%), #Int=  2, #Iso=  1
D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor= ProdID= Rev= 2.06
S:  Manufacturer=Linux 2.6.24-1-686 uhci_hcd
S:  Product=UHCI Host Controller
S:  SerialNumber=:00:1d.1
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms

T:  Bus=02 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  4 Spd=12  MxCh= 0
D:  Ver= 1.10 Cls=e0(unk. ) Sub=01 Prot=01 MxPS=16 #Cfgs=  1
P:  Vendor=0e5e ProdID=6622 Rev= 1.34
C:* #Ifs= 2 Cfg#= 1 Atr=80 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(unk. ) Sub=01 Prot=01 Driver=hci_usb
E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=1ms
E:  Ad=82(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:  If#= 1 Alt= 0 #EPs= 2 Cls=e0(unk. ) Sub=01 Prot=01 Driver=hci_usb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
I:  If#= 1 Alt= 1 #EPs= 2 Cls=e0(unk. ) Sub=01 Prot=01 Driver=hci_usb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
I:* If#= 1 Alt= 2 #EPs= 2 Cls=e0(unk. ) Sub=01 Prot=01 Driver=hci_usb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms

T:  Bus=02 Lev=01 Prnt=01 Port=01 Cnt=02 Dev#=  3 Spd=1.5 MxCh= 0
D:  Ver= 1.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
P:  Vendor=05e3 ProdID=1205 Rev= 1.10
S:  Product=USB Mouse
C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=01 Prot=02 Driver=usbhid
E:  Ad=81(I) Atr=03(Int.) MxPS=   4 Ivl=10ms

T:  Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12  MxCh= 2
B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor= ProdID= Rev= 2.06
S:  Manufacturer=Linux 2.6.24-1-686 uhci_hcd
S:  Product=UHCI Host Controller
S:  SerialNumber=:00:1d.0
C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms


Re: [PATCH] documentation: move spidev_fdx example to its own source file

2008-02-18 Thread David Brownell
On Monday 18 February 2008, Randy Dunlap wrote:
> From: Randy Dunlap <[EMAIL PROTECTED]>
> 
> cc: [EMAIL PROTECTED]
> cc: [EMAIL PROTECTED]
> 
> Move sample source code to its own source file so that it can be used
> easier and build-tested/check/maintained by anyone.
> 
> (Makefile changes are in a separate patch for all of Documentation/.)
> 
> Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>

Acked-by: David Brownell <[EMAIL PROTECTED]>

> ---
>  Documentation/spi/spidev   |  168 ---
>  Documentation/spi/spidev_fdx.c |  158 +
>  2 files changed, 160 insertions(+), 166 deletions(-)
> 
> --- linux-2625-rc2-docsrc.orig/Documentation/spi/spidev
> +++ linux-2625-rc2-docsrc/Documentation/spi/spidev
> @@ -126,8 +126,8 @@ NOTES:
>  FULL DUPLEX CHARACTER DEVICE API
>  
>  
> -See the sample program below for one example showing the use of the full
> -duplex programming interface.  (Although it doesn't perform a full duplex
> +See the spidev_fdx.c sample program for one example showing the use of the
> +full duplex programming interface.  (Although it doesn't perform a full 
> duplex
>  transfer.)  The model is the same as that used in the kernel spi_sync()
>  request; the individual transfers offer the same capabilities as are
>  available to kernel drivers (except that it's not asynchronous).
> @@ -141,167 +141,3 @@ and bitrate for each transfer segment.)
>  
>  To make a full duplex request, provide both rx_buf and tx_buf for the
>  same transfer.  It's even OK if those are the same buffer.
> -
> -
> -SAMPLE PROGRAM
> -==
> -
> - CUT HERE
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -#include 
> -#include 
> -#include 
> -
> -#include 
> -#include 
> -
> -
> -static int verbose;
> -
> -static void do_read(int fd, int len)
> -{
> - unsigned char   buf[32], *bp;
> - int status;
> -
> - /* read at least 2 bytes, no more than 32 */
> - if (len < 2)
> - len = 2;
> - else if (len > sizeof(buf))
> - len = sizeof(buf);
> - memset(buf, 0, sizeof buf);
> -
> - status = read(fd, buf, len);
> - if (status < 0) {
> - perror("read");
> - return;
> - }
> - if (status != len) {
> - fprintf(stderr, "short read\n");
> - return;
> - }
> -
> - printf("read(%2d, %2d): %02x %02x,", len, status,
> - buf[0], buf[1]);
> - status -= 2;
> - bp = buf + 2;
> - while (status-- > 0)
> - printf(" %02x", *bp++);
> - printf("\n");
> -}
> -
> -static void do_msg(int fd, int len)
> -{
> - struct spi_ioc_transfer xfer[2];
> - unsigned char   buf[32], *bp;
> - int status;
> -
> - memset(xfer, 0, sizeof xfer);
> - memset(buf, 0, sizeof buf);
> -
> - if (len > sizeof buf)
> - len = sizeof buf;
> -
> - buf[0] = 0xaa;
> - xfer[0].tx_buf = (__u64) buf;
> - xfer[0].len = 1;
> -
> - xfer[1].rx_buf = (__u64) buf;
> - xfer[1].len = len;
> -
> - status = ioctl(fd, SPI_IOC_MESSAGE(2), xfer);
> - if (status < 0) {
> - perror("SPI_IOC_MESSAGE");
> - return;
> - }
> -
> - printf("response(%2d, %2d): ", len, status);
> - for (bp = buf; len; len--)
> - printf(" %02x", *bp++);
> - printf("\n");
> -}
> -
> -static void dumpstat(const char *name, int fd)
> -{
> - __u8mode, lsb, bits;
> - __u32   speed;
> -
> - if (ioctl(fd, SPI_IOC_RD_MODE, &mode) < 0) {
> - perror("SPI rd_mode");
> - return;
> - }
> - if (ioctl(fd, SPI_IOC_RD_LSB_FIRST, &lsb) < 0) {
> - perror("SPI rd_lsb_fist");
> - return;
> - }
> - if (ioctl(fd, SPI_IOC_RD_BITS_PER_WORD, &bits) < 0) {
> - perror("SPI bits_per_word");
> - return;
> - }
> - if (ioctl(fd, SPI_IOC_RD_MAX_SPEED_HZ, &speed) < 0) {
> - perror("SPI max_speed_hz");
> - return;
> - }
> -
> - printf("%s: spi mode %d, %d bits %sper word, %d Hz max\n",
> - name, mode, bits, lsb ? "(lsb first) " : "", speed);
> -}
> -
> -int main(int argc, char **argv)
> -{
> - int c;
> - int readcount = 0;
> - int msglen = 0;
> - int fd;
> - const char  *name;
> -
> - while ((c = getopt(argc, argv, "hm:r:v")) != EOF) {
> - switch (c) {
> - case 'm':
> - msglen = atoi(optarg);
> - if (msglen < 0)
> - goto usage;
> - continue;
> - case 'r':
> - readcount = atoi(optarg);
> - if (readcount < 0)
> - goto usage;
> - continue;
> - case 'v':
> -  

Re: [PATCH 1/5] signal(x86_32): Improve the signal stack overflow check

2008-02-18 Thread Shi Weihua


[EMAIL PROTECTED] wrote::
> On Mon, 18 Feb 2008 18:22:05 +0800, Shi Weihua said:
> 
>> -/*
>> - * If we are on the alternate signal stack and would overflow it, don't.
>notice 
> this ^
>> - * Return an always-bogus address instead so we will die with SIGSEGV.
> 
>> + * If we are on the alternate signal stack and would
>> + * overflow it, don't return an always-bogus address
> missing here ^
>> + * instead so we will die with SIGSEGV.
> 
> "don't. Return" is a vastly different semantic than "don't return".
> 
> I think the same comment error was cut-n-pasted into all 5 patches...
> 

Sorry, it's my mistake.
I will correct the all 5 patches. Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] signal(ia64): add a signal stack overflow check

2008-02-18 Thread Shi Weihua


Matthew Wilcox wrote::
> On Mon, Feb 18, 2008 at 06:26:23PM +0800, Shi Weihua wrote:
>> +if (!rbs_on_sig_stack(scr->pt.ar_bspstore))
>> +new_rbs = (current->sas_ss_sp +
>> +  sizeof(long) - 1) & ~(sizeof(long) - 
>> 1);
> 
> I know you're only moving this code, but how about fixing it to use
> ALIGN at the same time?
> 
> + if (!rbs_on_sig_stack(scr->pt.ar_bspstore))
> + new_rbs = ALIGN(current->sas_ss_sp,
> + sizeof(long));
> 

Of course,we can improve the code by ALIGN.

---
The similar check has been added to x86_32(i386) in commit
id 83bd01024b1fdfc41d9b758e5669e80fca72df66.

So we add this check to ia64 and improve it a liitle bit in that
we need to check for stack overflow only when the signal is on stack.

Signed-off-by: Shi Weihua <[EMAIL PROTECTED]> 

---
--- linux-2.6.25-rc2.orig/arch/ia64/kernel/signal.c 2008-02-16 
04:57:20.0 +0800
+++ linux-2.6.25-rc2/arch/ia64/kernel/signal.c  2008-02-19 09:57:05.0 
+0800
@@ -342,15 +342,33 @@ setup_frame (int sig, struct k_sigaction
 
new_sp = scr->pt.r12;
tramp_addr = (unsigned long) __kernel_sigtramp;
-   if ((ka->sa.sa_flags & SA_ONSTACK) && sas_ss_flags(new_sp) == 0) {
-   new_sp = current->sas_ss_sp + current->sas_ss_size;
-   /*
-* We need to check for the register stack being on the signal 
stack
-* separately, because it's switched separately (memory stack 
is switched
-* in the kernel, register stack is switched in the signal 
trampoline).
-*/
-   if (!rbs_on_sig_stack(scr->pt.ar_bspstore))
-   new_rbs = (current->sas_ss_sp + sizeof(long) - 1) & 
~(sizeof(long) - 1);
+   if (ka->sa.sa_flags & SA_ONSTACK) {
+   int onstack = sas_ss_flags(new_sp);
+
+   if (onstack == 0) {
+   new_sp = current->sas_ss_sp + current->sas_ss_size;
+   /*
+* We need to check for the register stack being on the
+* signal stack separately, because it's switched
+* separately (memory stack is switched in the kernel,
+* register stack is switched in the signal trampoline).
+*/
+   if (!rbs_on_sig_stack(scr->pt.ar_bspstore))
+   new_rbs = ALIGN(current->sas_ss_sp,
+   sizeof(long));
+   } else if (onstack == SS_ONSTACK) {
+   unsigned long check_sp;
+
+   /*
+* If we are on the alternate signal stack and would
+* overflow it, don't. Return an always-bogus address
+* instead so we will die with SIGSEGV.
+*/
+   check_sp = (new_sp - sizeof(*frame)) & -STACK_ALIGN;
+   if (!likely(on_sig_stack(check_sp)))
+   return force_sigsegv_info(sig, (void __user *)
+ check_sp);
+   }
}
frame = (void __user *) ((new_sp - sizeof(*frame)) & -STACK_ALIGN);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.25-rc2 regression: LVM cannot find volume group

2008-02-18 Thread Alasdair G Kergon
On Sat, Feb 16, 2008 at 11:37:37PM +0100, Jiri Slaby wrote:
> # CONFIG_SYSFS_DEPRECATED is not set

IMHO That should be *set* by default until everyone has had time to
update their userspace software to cope with the changed sysfs layout.

Alasdair
-- 
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: IO queueing and complete affinity w/ threads: Some results

2008-02-18 Thread Nick Piggin
On Mon, Feb 18, 2008 at 02:33:17PM +0100, Andi Kleen wrote:
> Jens Axboe <[EMAIL PROTECTED]> writes:
> 
> > and that scrapping the remote
> > softirq trigger stuff is sanest.
> 
> I actually liked Nick's queued smp_function_call_single() patch. So even
> if it was not used for block I would still like to see it being merged 
> in some form to speed up all the other IPI users.

Yeah, that hasn't been forgotten (nor have your comments about folding
my special function into smp_call_function_single).

The call function path is terribly unscalable at the moment on a lot
of architectures, and also it isn't allowed to be used with interrupts
off due to deadlock (which the queued version can allow, provided
that wait=0).

I will get around to sending that upstream soon.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] remove goto statement

2008-02-18 Thread Li Zefan
Glauber Costa 写道:
> This patch removes goto statements in favour of plain returns
> in places that had nothing left behind that would justify
> such construction
> ---
>  drivers/acpi/processor_core.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
> index 06a230a..70f62b6 100644
> --- a/drivers/acpi/processor_core.c
> +++ b/drivers/acpi/processor_core.c
> @@ -651,7 +651,7 @@ static int __cpuinit acpi_processor_star
>  
>   result = acpi_processor_add_fs(device);
>   if (result)
> - goto end;
> + return result;
>  
>   status = acpi_install_notify_handler(pr->handle, ACPI_DEVICE_NOTIFY,
>acpi_processor_notify, pr);
> @@ -675,7 +675,7 @@ #endif
>   "%s is registered as cooling_device%d\n",
>   device->dev.bus_id, cdev->id);
>   else
> - goto end;
> + return result;
>  
>   result = sysfs_create_link(&device->dev.kobj, &cdev->device.kobj,
>   "thermal_cooling");

Seems you forgot to remove the 'end' label ?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc2-mm1

2008-02-18 Thread Kevin Winchester
Steven Rostedt wrote:
> 
> On Mon, 18 Feb 2008, Andrew Morton wrote:
> 
>>> I don't think I've seen anyone else report this, but if I'm wrong, I'm
>>> sure someone will point me to the thread.
>> No, I think it's new.
>>
> 
>> Looks like an ftrace-vs-lockdep problem.
>>
> 
> Is there a .config around to look at?
> 

Sorry, here it is.

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.25-rc2-mm1
# Sun Feb 17 13:12:53 2008
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
# CONFIG_X86_64 is not set
CONFIG_X86=y
# CONFIG_GENERIC_LOCKBREAK is not set
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
# CONFIG_GENERIC_GPIO is not set
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
# CONFIG_GENERIC_TIME_VSYSCALL is not set
CONFIG_ARCH_HAS_CPU_RELAX=y
# CONFIG_HAVE_SETUP_PER_CPU_AREA is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_ZONE_DMA32 is not set
CONFIG_ARCH_POPULATES_NODE_MAP=y
# CONFIG_AUDIT_ARCH is not set
CONFIG_ARCH_SUPPORTS_AOUT=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_KTIME_SCALAR=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
# CONFIG_SYSVIPC is not set
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
# CONFIG_CGROUPS is not set
CONFIG_GROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_RT_GROUP_SCHED is not set
CONFIG_USER_SCHED=y
# CONFIG_CGROUP_SCHED is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_BLK_DEV_INITRD is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_SYSCTL_SYSCALL_CHECK=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
# CONFIG_COMPAT_BRK is not set
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
# CONFIG_PROFILING is not set
CONFIG_MARKERS=y
CONFIG_HAVE_OPROFILE=y
CONFIG_HAVE_KPROBES=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_MODULES is not set
CONFIG_BLOCK=y
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set
CONFIG_BLK_DEV_BSG=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
# CONFIG_IOSCHED_AS is not set
# CONFIG_IOSCHED_DEADLINE is not set
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
# CONFIG_CLASSIC_RCU is not set
CONFIG_PREEMPT_RCU=y

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
# CONFIG_SMP is not set
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_X86_RDC321X is not set
# CONFIG_X86_VSMP is not set
CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y
# CONFIG_PARAVIRT_GUEST is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
CONFIG_MK8=y
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_MPSC is not set

[PATCH] cifs: remove GLOBAL_EXTERN macro

2008-02-18 Thread Harvey Harrison
Global vaiables should be defined in C files, not in headers.

1) Comment out unused vars
GlobalDnotifyRsp_Q
GlobalUidList

2) Declare vars in cifsfs.c that need it and change to extern in
cifsglob.h

3) Change to extern in cifsglob.h for vars that were already being
declared in cifsfs.c

4) Remove GLOBAL_EXTERN

Signed-off-by: Harvey Harrison <[EMAIL PROTECTED]>
---
Andrew, Steve, this is a revised patch that addresses your comments on
the patch withdrawn from -mm.

 fs/cifs/cifsfs.c   |   31 -
 fs/cifs/cifsglob.h |   76 
 2 files changed, 64 insertions(+), 43 deletions(-)

diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index fcc4342..aae6752 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -37,7 +37,6 @@
 #include 
 #include "cifsfs.h"
 #include "cifspdu.h"
-#define DECLARE_GLOBALS_HERE
 #include "cifsglob.h"
 #include "cifsproto.h"
 #include "cifs_debug.h"
@@ -85,6 +84,34 @@ module_param(cifs_max_pending, int, 0);
 MODULE_PARM_DESC(cifs_max_pending, "Simultaneous requests to server. "
   "Default: 50 Range: 2 to 256");
 
+struct list_head GlobalSMBSessionList;
+struct list_head GlobalTreeConnectionList;
+rwlock_t GlobalSMBSeslock;
+
+struct list_head GlobalOplock_Q;
+
+struct list_head GlobalDnotifyReqList;
+
+unsigned int GlobalCurrentXid;
+unsigned int GlobalTotalActiveXid;
+unsigned int GlobalMaxActiveXid;
+spinlock_t GlobalMid_Lock;
+char Local_System_Name[15];
+
+atomic_t sesInfoAllocCount;
+atomic_t tconInfoAllocCount;
+atomic_t tcpSesAllocCount;
+atomic_t tcpSesReconnectCount;
+atomic_t tconInfoReconnectCount;
+
+atomic_t bufAllocCount;/* current number allocated  */
+#ifdef CONFIG_CIFS_STATS2
+atomic_t totBufAllocCount; /* total allocated over all time */
+atomic_t totSmBufAllocCount;
+#endif
+atomic_t smBufAllocCount;
+atomic_t midCount;
+
 extern mempool_t *cifs_sm_req_poolp;
 extern mempool_t *cifs_req_poolp;
 extern mempool_t *cifs_mid_poolp;
@@ -1001,7 +1028,7 @@ init_cifs(void)
INIT_LIST_HEAD(&GlobalOplock_Q);
 #ifdef CONFIG_CIFS_EXPERIMENTAL
INIT_LIST_HEAD(&GlobalDnotifyReqList);
-   INIT_LIST_HEAD(&GlobalDnotifyRsp_Q);
+/* INIT_LIST_HEAD(&GlobalDnotifyRsp_Q); */
 #endif
 /*
  *  Initialize Global counters
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 5d32d8d..c45acfd 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -583,79 +583,73 @@ require use of the stronger protocol */
  *
  /
 
-#ifdef DECLARE_GLOBALS_HERE
-#define GLOBAL_EXTERN
-#else
-#define GLOBAL_EXTERN extern
-#endif
-
 /*
  * The list of servers that did not respond with NT LM 0.12.
  * This list helps improve performance and eliminate the messages indicating
  * that we had a communications error talking to the server in this list.
  */
 /* Feature not supported */
-/* GLOBAL_EXTERN struct servers_not_supported *NotSuppList; */
+/* extern struct servers_not_supported *NotSuppList; */
 
 /*
  * The following is a hash table of all the users we know about.
  */
-GLOBAL_EXTERN struct smbUidInfo *GlobalUidList[UID_HASH];
+/* extern struct smbUidInfo *GlobalUidList[UID_HASH]; */
 
-/* GLOBAL_EXTERN struct list_head GlobalServerList; BB not implemented yet */
-GLOBAL_EXTERN struct list_head GlobalSMBSessionList;
-GLOBAL_EXTERN struct list_head GlobalTreeConnectionList;
-GLOBAL_EXTERN rwlock_t GlobalSMBSeslock;  /* protects list inserts on 3 above 
*/
+/* extern struct list_head GlobalServerList; BB not implemented yet */
+extern struct list_head GlobalSMBSessionList;
+extern struct list_head GlobalTreeConnectionList;
+extern rwlock_t GlobalSMBSeslock;  /* protects list inserts on 3 above */
 
-GLOBAL_EXTERN struct list_head GlobalOplock_Q;
+extern struct list_head GlobalOplock_Q;
 
 /* Outstanding dir notify requests */
-GLOBAL_EXTERN struct list_head GlobalDnotifyReqList;
+extern struct list_head GlobalDnotifyReqList;
 /* DirNotify response queue */
-GLOBAL_EXTERN struct list_head GlobalDnotifyRsp_Q;
+/* extern struct list_head GlobalDnotifyRsp_Q; */
 
 /*
  * Global transaction id (XID) information
  */
-GLOBAL_EXTERN unsigned int GlobalCurrentXid;   /* protected by GlobalMid_Sem */
-GLOBAL_EXTERN unsigned int GlobalTotalActiveXid; /* prot by GlobalMid_Sem */
-GLOBAL_EXTERN unsigned int GlobalMaxActiveXid; /* prot by GlobalMid_Sem */
-GLOBAL_EXTERN spinlock_t GlobalMid_Lock;  /* protects above & list operations 
*/
+extern unsigned int GlobalCurrentXid;  /* protected by GlobalMid_Sem */
+extern unsigned int GlobalTotalActiveXid; /* prot by GlobalMid_Sem */
+extern unsigned int GlobalMaxActiveXid;/* prot by GlobalMid_Sem */
+extern spinlock_t GlobalMid_Lock;  /* protects above & list operations */
  /* on midQ entries */
-GLOBAL_EXTERN char Local_System_Name[15];
+extern char Local_System_Name[15];
 
 /*
  *  Global counters, updated atomically
 

Re: TG3 network data corruption regression 2.6.24/2.6.23.4

2008-02-18 Thread Michael Chan
On Mon, 2008-02-18 at 16:35 -0800, David Miller wrote:

> One consequence of Herbert's change is that the chip will see a
> different datastream.  The initial skb->data linear area will be
> smaller, and the transition to the fragmented area of pages will be
> quicker.
> 

I see.  Perhaps when we get to the end of the data-stream, there is a
tiny frag that the chip cannot handle.  That's the only thing I can
think of.

Please try this patch to see if the problem goes away.  This will
disable SG on 5701 so we always get linear SKBs.

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index db606b6..bb37e76 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -12717,6 +12717,9 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
} else
tp->tg3_flags &= ~TG3_FLAG_RX_CHECKSUMS;
 
+   if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701)
+   dev->features &= ~(NETIF_F_IP_CSUM | NETIF_F_SG);
+
/* flow control autonegotiation is default behavior */
tp->tg3_flags |= TG3_FLAG_PAUSE_AUTONEG;
tp->link_config.flowctrl = TG3_FLOW_CTRL_TX | TG3_FLOW_CTRL_RX;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] correct inconsistent ntp interval/tick_length usage

2008-02-18 Thread john stultz

On Sat, 2008-02-16 at 05:24 +0100, Roman Zippel wrote:
> Hi,
> 
> On Wed, 13 Feb 2008, john stultz wrote:
> 
> > Oh! So your issue is that since time_freq is stored in ppm, or in effect
> > a usecs per sec offset, when we add it to something other then 1 second
> > we mis-apply what NTP tells us to. Is that right?
> 
> Pretty much everything is centered around that 1sec, so the closer the 
> update frequency is to it the better.
> 
> > Right, so if NTP has us apply a 10ppm adjustment, instead of doing:
> > NSEC_PER_SEC + 10,000
> > 
> > We're doing:
> > NSEC_PER_SEC + CLOCK_TICK_ADJUST + 10,000
> > 
> > Which, if I'm doing my math right, results in a 10.002ppm adjustment
> > (using the 999847467ns number above), instead of just a 10ppm
> > adjustment.
> > 
> > Now, true, this is an error, but it is a pretty small one. Even at the
> > maximum 500ppm value, it only results in an error of 76 parts per
> > billion. As you'll see below, that tends to be less error then what we
> > get from the clock granularity. Is there something else I'm missing here
> > or is this really the core issue you're concerned with?
> 
> The error accumulates and there is no good reason to do this for the 
> common case.

I agree, but we can also easily resolve this by scaling the ppm
adjustment to the interval length, rather then just applying it as
usec/sec. No?

So instead of:
adjusted_length = NSEC_PER_SEC + time_adj

We could do:
adjusted_length = ntp_interval_length + 
(ntp_interval_length * time_adj)/MILLION


So this fixes the time_adj scaling error, while letting us be more
flexible w/ our interval length, so we can better map the requested
length to the clocksource granularity, lessening the granularity error.


> > HZ=1000 CLOCK_TICK_ADJUST=-152533
> > jiffies 467 ppb error
> > jiffies NOHZ467 ppb error
> > pit 0 ppb error
> > pit NOHZ0 ppb error
> > acpi_pm -280 ppb error
> > acpi_pm NOHZ279 ppb error
> > 
> > HZ=1000 CLOCK_TICK_ADJUST=0
> > jiffies 153000 ppb error
> > jiffies NOHZ153000 ppb error
> > pit 152533 ppb error
> > pit NOHZ0 ppb error
> > acpi_pm -127112 ppb error
> > acpi_pm NOHZ279 ppb error
> > 
> > So you are right, w/ pit & NO_HZ, the granularity error is always very
> > small both with or without CLOCK_TICK_ADJUST. 
> 
> If you change the frequency of acpi_pm to 3579000 you'll get this:
> 
> HZ=1000 CLOCK_TICK_ADJUST=0
> jiffies 153000 ppb error
> jiffies NOHZ153000 ppb error
> pit 152533 ppb error
> pit NOHZ0 ppb error
> acpi_pm 0 ppb error
> acpi_pm NOHZ0 ppb error
> 
> HZ=1000 CLOCK_TICK_ADJUST=-152533
> jiffies 0 ppb error
> jiffies NOHZ466 ppb error
> pit -467 ppb error
> pit NOHZ-1 ppb error
> acpi_pm 126407 ppb error
> acpi_pm NOHZ22 ppb error

Right, but the acpi_pm's frequency isn't 3579000. It's supposed to be
3579545. That is injecting error, and I believe it to be different then
what I'm claiming is achieved by CLOCK_TICK_ADJUST.

Further, the specific example above was agreeing with you that  that
CLOCK_TICK_ADJUST=0 looks ok for NOHZ, with the exception in the case of
the jiffies clocksource.

That one still has the 153ppm drift. What do we do about that?


> CLOCK_TICK_ADJUST has only any meaning for PIT (and indirectly for 
> jiffies). For every other clock you just add some random value, where 
> it doesn't do _any_ good.

While not the main point, the aside I included about the acpi_pm being
interesting, is that because the ACPI PM's frequency is a multiple of
the PIT's (and thus jiffies), the same granularity issues arise
(although to a lesser degree). That is why CLOCK_TICK_ADJUST helps it in
the !NOHZ case.

> The worst case error there will always be (ntp_hz/freq/2*10^9nsec), all 
> you do with CLOCK_TICK_ADJUST is to do shift it around, but it doesn't 
> actually fix the error - it's still there.
> 
> > However, without CLOCK_TICK_ADJUST, the jiffies error increases for all
> > values of HZ except 100 (which at first seems odd, but seems to be due
> > to loss from rounding in the ACTHZ calculation).
> 
> jiffies depends on the timer resolution, so it will practically produce 
> the same results as PIT (assuming it's used to generate the timer tick).
> 
> > One interesting surprise in the data: With CLOCK_TICK_ADJUST=0, the
> > acpi_pm's error frequency shot up in the !NO_HZ cases. This ends up
> > being due to the acpi_pm being a very close to a multiple (3x) of the
> > pit frequency, so CLOCK_TICK_ADJUST helps it as well.
> 
> What exactly does it help with?
> All you are doing is number cosmetics, it has _no_ practically value and 
> only decreases the quality of timekeeping.

Unless you want to clarify what aspect of "quality" you're talking
about, I'd very much disagree with your claim. This is not cosmetics,
this is about addressin

Re: [PATCH 1/5] signal(x86_32): Improve the signal stack overflow check

2008-02-18 Thread Shi Weihua


Ingo Molnar wrote::
> * Shi Weihua <[EMAIL PROTECTED]> wrote:
> 
>> We need to check for stack overflow only when the signal is on stack. 
>> So we can improve the patch "http://lkml.org/lkml/2007/11/27/101"; as 
>> following.
> 
> hm, does this address Roland's observations at:
> 
>http://lkml.org/lkml/2007/11/28/13
> 
> ?

Yes. please check it.

> 
>   Ingo
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] documentation: fix firmware_sample_firmware_class to build

2008-02-18 Thread David Rientjes
Change sysfs_remove_bin_file() to have a return value of void in the 
!CONFIG_SYSFS case, matching the return value of the same function with 
the opposite configuration.

Also moves unnecessary ';' in empty void functions.

Cc: Randy Dunlap <[EMAIL PROTECTED]>
Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 include/linux/sysfs.h |   11 +++
 1 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -93,7 +93,7 @@ int __must_check sysfs_create_file(struct kobject *kobj,
   const struct attribute *attr);
 int __must_check sysfs_chmod_file(struct kobject *kobj, struct attribute *attr,
  mode_t mode);
-void sysfs_remove_file(struct kobject *kobj, const struct attribute *attr);
+void wsysfs_remove_file(struct kobject *kobj, const struct attribute *attr);
 
 int __must_check sysfs_create_bin_file(struct kobject *kobj,
   struct bin_attribute *attr);
@@ -131,7 +131,6 @@ static inline int sysfs_create_dir(struct kobject *kobj)
 
 static inline void sysfs_remove_dir(struct kobject *kobj)
 {
-   ;
 }
 
 static inline int sysfs_rename_dir(struct kobject *kobj, const char *new_name)
@@ -160,7 +159,6 @@ static inline int sysfs_chmod_file(struct kobject *kobj,
 static inline void sysfs_remove_file(struct kobject *kobj,
 const struct attribute *attr)
 {
-   ;
 }
 
 static inline int sysfs_create_bin_file(struct kobject *kobj,
@@ -169,10 +167,9 @@ static inline int sysfs_create_bin_file(struct kobject 
*kobj,
return 0;
 }
 
-static inline int sysfs_remove_bin_file(struct kobject *kobj,
-   struct bin_attribute *attr)
+static inline void sysfs_remove_bin_file(struct kobject *kobj,
+struct bin_attribute *attr)
 {
-   return 0;
 }
 
 static inline int sysfs_create_link(struct kobject *kobj,
@@ -183,7 +180,6 @@ static inline int sysfs_create_link(struct kobject *kobj,
 
 static inline void sysfs_remove_link(struct kobject *kobj, const char *name)
 {
-   ;
 }
 
 static inline int sysfs_create_group(struct kobject *kobj,
@@ -195,7 +191,6 @@ static inline int sysfs_create_group(struct kobject *kobj,
 static inline void sysfs_remove_group(struct kobject *kobj,
  const struct attribute_group *grp)
 {
-   ;
 }
 
 static inline int sysfs_add_file_to_group(struct kobject *kobj,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git patches] IDE fixes

2008-02-18 Thread Bartlomiej Zolnierkiewicz

Hi,

This update fixes all known/open 2.6.25 IDE regressions + few other things:

- fix ide-cd cd/dvd burning regression (Kiyoshi Ueda)

- fix falconide/macide regressions (Geert Uytterhoeven)

- another device needs HPA workaround (Mikko Rapeli)

- ht6560b bugfixes (Jan Evert van Grootheest)

- new PCI id (used by Everex Cloudbook) for via82cxxx (Andrew Smith)


Linus, please pull from:

master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6.git/

to receive the following updates:

 MAINTAINERS|2 +-
 drivers/ata/libata-core.c  |1 +
 drivers/ide/ide-cd.c   |6 +-
 drivers/ide/ide-disk.c |1 +
 drivers/ide/ide-generic.c  |6 --
 drivers/ide/legacy/falconide.c |4 +++-
 drivers/ide/legacy/ht6560b.c   |   25 +
 drivers/ide/legacy/macide.c|2 +-
 drivers/ide/pci/via82cxxx.c|1 +
 include/linux/Kbuild   |2 +-
 include/linux/hdsmart.h|4 ++--
 include/linux/pci_ids.h|1 +
 12 files changed, 34 insertions(+), 21 deletions(-)


Andrew Smith (1):
  via82cxxx: add new PCI id for cx700

Bartlomiej Zolnierkiewicz (2):
  falconide: locking bugfix
  linux/hdsmart.h: fix goofups (take 2)

Borislav Petkov (1):
  MAINTAINERS: update ide-cd maintainer's email address

Geert Uytterhoeven (1):
  ide: Add missing base addresses for falconide and macide

Jan Evert van Grootheest (2):
  ht6560b can only do up to PIO mode 4
  ht6560b: force prefetch for some devices

Kiyoshi Ueda (1):
  ide-cd: fix missing residual count setting in DMA mode

Mikko Rapeli (1):
  ide/libata: ST310211A has buggy HPA too


diff --git a/MAINTAINERS b/MAINTAINERS
index 1d2edb4..082d1ee 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1924,7 +1924,7 @@ S:Maintained
 
 IDE/ATAPI CDROM DRIVER
 P: Borislav Petkov
-M: [EMAIL PROTECTED]
+M: [EMAIL PROTECTED]
 L: [EMAIL PROTECTED]
 S: Maintained
 
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index beaa3a9..f46eb6f 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4190,6 +4190,7 @@ static const struct ata_blacklist_entry 
ata_device_blacklist [] = {
/* Devices which report 1 sector over size HPA */
{ "ST340823A",  NULL,   ATA_HORKAGE_HPA_SIZE, },
{ "ST320413A",  NULL,   ATA_HORKAGE_HPA_SIZE, },
+   { "ST310211A",  NULL,   ATA_HORKAGE_HPA_SIZE, },
 
/* Devices which get the IVB wrong */
{ "QUANTUM FIREBALLlct10 05", "A03.0900", ATA_HORKAGE_IVB, },
diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 354c91d..310e497 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -1207,9 +1207,13 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t 
*drive)
 end_request:
if (blk_pc_request(rq)) {
unsigned long flags;
+   unsigned int dlen = rq->data_len;
+
+   if (dma)
+   rq->data_len = 0;
 
spin_lock_irqsave(&ide_lock, flags);
-   if (__blk_end_request(rq, 0, rq->data_len))
+   if (__blk_end_request(rq, 0, dlen))
BUG();
HWGROUP(drive)->rq = NULL;
spin_unlock_irqrestore(&ide_lock, flags);
diff --git a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
index aed8b31..8f5bed4 100644
--- a/drivers/ide/ide-disk.c
+++ b/drivers/ide/ide-disk.c
@@ -397,6 +397,7 @@ static inline int idedisk_supports_lba48(const struct 
hd_driveid *id)
 static const struct drive_list_entry hpa_list[] = {
{ "ST340823A",  NULL },
{ "ST320413A",  NULL },
+   { "ST310211A",  NULL },
{ NULL, NULL }
 };
 
diff --git a/drivers/ide/ide-generic.c b/drivers/ide/ide-generic.c
index 709b9e4..9ebec08 100644
--- a/drivers/ide/ide-generic.c
+++ b/drivers/ide/ide-generic.c
@@ -17,9 +17,6 @@ static int __init ide_generic_init(void)
u8 idx[MAX_HWIFS];
int i;
 
-   if (ide_hwifs[0].io_ports[IDE_DATA_OFFSET])
-   ide_get_lock(NULL, NULL); /* for atari only */
-
for (i = 0; i < MAX_HWIFS; i++) {
ide_hwif_t *hwif = &ide_hwifs[i];
 
@@ -31,9 +28,6 @@ static int __init ide_generic_init(void)
 
ide_device_add_all(idx, NULL);
 
-   if (ide_hwifs[0].io_ports[IDE_DATA_OFFSET])
-   ide_release_lock(); /* for atari only */
-
return 0;
 }
 
diff --git a/drivers/ide/legacy/falconide.c b/drivers/ide/legacy/falconide.c
index f044048..8949ce7 100644
--- a/drivers/ide/legacy/falconide.c
+++ b/drivers/ide/legacy/falconide.c
@@ -54,7 +54,7 @@ static void __init falconide_setup_ports(hw_regs_t *hw)
for (i = 1; i < 8; i++)
hw->io_ports[i] = ATA_HD_BASE + 1 + i * 4;
 
-   hw->io_ports[IDE_CONTROL_OFFSET] = ATA_HD_CONTROL;
+   hw->io_ports[IDE_CONTROL_OFFSET] = ATA_HD_BASE + ATA_HD_CONTROL;
 
hw->irq =

Re: [PATCH] ide-cd: fix missing residual count setting in DMA mode (Was 2.6.25-rc1/2 CD/DVD burning broken)

2008-02-18 Thread Bartlomiej Zolnierkiewicz
On Tuesday 19 February 2008, Kiyoshi Ueda wrote:
> Hi,
> 
> On Mon, 18 Feb 2008 23:37:48 +0100, Borislav Petkov wrote:
> > On Mon, Feb 18, 2008 at 09:20:41PM +0100, Borislav Petkov wrote:
> > > On Mon, Feb 18, 2008 at 01:58:27PM -0500, Kiyoshi Ueda wrote:
> > > > Hi Andreas,
> > > > 
> > > > On Sat, 16 Feb 2008 21:52:21 +0100, Andreas Schwab wrote:
> > > > > Since commit aaa04c28cb9a1efd42541fdb7ab648231c2a2263 
> > > > > [blk_end_request:
> > > > > changing ide-cd (take 4)] I cannot burn any CD/DVD any more, getting 
> > > > > the
> > > > > following error from wodim:
> > > > > 
> > > > > Errno: 0 (Success), write_g1 scsi sendcmd: no error
> > > > > CDB:  2A 00 00 00 00 00 00 00 1F 00
> > > > > status: 0x2 (CHECK CONDITION)
> > > > > Sense Bytes: 70 00 05 00 00 00 00 0E 00 00 00 00 21 02 00 00
> > > > > Sense Key: 0x5 Illegal Request, Segment 0
> > > > > Sense Code: 0x21 Qual 0x02 (invalid address for write) Fru 0x0
> > > > > Sense flags: Blk 0 (not valid) 
> > > > > resid: 63488
> > > > 
> > > > Could you try this patch?
> > > > I've only done a compile test, so this patch may not work.
> > > > 
> > > > During the conversion to blk_end_request, the code was changed
> > > > *not* to set rq->data_len = 0.
> > > > I removed that part because I thought it was just a trigger to
> > > > call post_transform_command().  However, since data_len can be
> > > > used as a residual length of the transfer, it might have to remain
> > > > there.
> > > > Actually, wodim seems checking the residual count how far it wrote
> > > > (e.g. wodim/wodim.c:write_track_data()).
> > > 
> > > and there seems to be some discrepancy between the different burning 
> > > tools for i
> > > just did test burning a cdimage with cdrdao unter 2.6.25-rc2 and it 
> > > _works_
> > > flawlessly:
> > > 
> > > # cdrdao write --device /dev/hdc test.toc
> > > 
> > > Cdrdao version 1.2.2 - (C) Andreas Mueller <[EMAIL PROTECTED]>
> > >   SCSI interface library - (C) Joerg Schilling
> > >   Paranoia DAE library - (C) Monty
> > > 
> > > Check http://cdrdao.sourceforge.net/drives.html#dt for current driver 
> > > tables.
> > > 
> > > Using libscg version 'ubuntu-0.8ubuntu1'
> > > 
> > > /dev/hdc: TOSHIBA ODD-DVD SD-R6372Rev: 1730
> > > Using driver: Generic SCSI-3/MMC - Version 2.0 (options 0x)
> > > 
> > > Starting write at speed 4...
> > > Pausing 10 seconds - hit CTRL-C to abort.
> > > Process can be aborted with QUIT signal (usually CTRL-\).
> > > Turning BURN-Proof on
> > > Executing power calibration...
> > > Power calibration successful.
> > > Writing track 01 (mode MODE1_RAW/MODE1_RAW )...
> > > Wrote 1 of 18 MB (Buffers 100%  94%).
> > > Wrote 2 of 18 MB (Buffers 100%  94%).
> > > Wrote 3 of 18 MB (Buffers 100%  94%).
> > > Wrote 4 of 18 MB (Buffers 100%  94%).
> > > Wrote 5 of 18 MB (Buffers 100%  94%).
> > > Wrote 6 of 18 MB (Buffers 100%  94%).
> > > Wrote 7 of 18 MB (Buffers 100%  94%).
> > > Wrote 8 of 18 MB (Buffers 100%  94%).
> > > Wrote 9 of 18 MB (Buffers 100%  94%).
> > > Wrote 10 of 18 MB (Buffers 100%  94%).
> > > Wrote 11 of 18 MB (Buffers 100%  94%).
> > > Wrote 12 of 18 MB (Buffers 100%  94%).
> > > Wrote 13 of 18 MB (Buffers 100%  94%).
> > > Wrote 14 of 18 MB (Buffers 100%  94%).
> > > Wrote 15 of 18 MB (Buffers 100%  94%).
> > > Wrote 16 of 18 MB (Buffers 100%  94%).
> > > Wrote 17 of 18 MB (Buffers 100%  94%).
> > > Wrote 18 of 18 MB (Buffers 100%  94%).
> > > 
> > > Wrote 8056 blocks. Buffer fill min 100%/max 100%.
> > > Flushing cache...
> > > Writing finished successfully.
> > > 
> > > > This patch brings back the rq->data_len = 0.
> > > > 
> > > > --- 2.6.25-rc2/drivers/ide/ide-cd.c 2008-02-15 15:57:20.0 
> > > > -0500
> > > > +++ ide-fix/drivers/ide/ide-cd.c2008-02-18 01:23:40.0 
> > > > -0500
> > > > @@ -1207,9 +1207,13 @@ static ide_startstop_t cdrom_newpc_intr(
> > > >  end_request:
> > > > if (blk_pc_request(rq)) {
> > > > unsigned long flags;
> > > > +   unsigned int dlen = rq->data_len;
> > > > +
> > > > +   if (dma)
> > > > +   rq->data_len = 0;
> > > >  
> > > > spin_lock_irqsave(&ide_lock, flags);
> > > > -   if (__blk_end_request(rq, 0, rq->data_len))
> > > > +   if (__blk_end_request(rq, 0, dlen))
> > > > BUG();
> > > > HWGROUP(drive)->rq = NULL;
> > > > spin_unlock_irqrestore(&ide_lock, flags);
> > > > 
> > > > Thanks,
> > > > Kiyoshi Ueda
> > > 
> > > next will test the one above...
> > 
> > 
> > confirmed here too - burning succeeds both with wodim and cdrdao.
> 
> Thank you for testing, Laura, Andreas, Boris.
> And I'm really sorry about the bug, all.
> 
> 
> Bart,
> Please review and apply the patch below to fix the bug.
> 
> 
> [PATCH] ide-cd: fix missing residual count setting in DMA mode
> 
> This patch fixes the missing residual count setting in DMA mode,
> which was introduced during the conversion

Re: linux-next: Tree for Feb 18

2008-02-18 Thread Bartlomiej Zolnierkiewicz
On Monday 18 February 2008, David Miller wrote:
> From: Stephen Rothwell <[EMAIL PROTECTED]>
> Date: Mon, 18 Feb 2008 19:08:41 +1100
> 
> > I have created today's linux-next tree at
> > git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git.
> 
> The patch below fixes the allmodconfig build on sparc64
> for me.
> 
> I notice on the build status page that sparc64 allmodconfig
> fails in the CODA filesystem bits.  I suspect this is some
> cross-build issue, and even moreso it appears the "u_quad"
> definition is to blame and the way that gets defined in the
> CODA fs headers is beyond questionable :-/
> 
> Anyways, here is the build fix:
> 
> [SPARC64]: Add ide_default_irq() implementation.
> 
> Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
> 
> diff --git a/include/asm-sparc64/ide.h b/include/asm-sparc64/ide.h
> index c5fdabe..5bfb064 100644
> --- a/include/asm-sparc64/ide.h
> +++ b/include/asm-sparc64/ide.h
> @@ -24,6 +24,11 @@
>  # endif
>  #endif
>  
> +static inline int ide_default_irq(unsigned long base)
> +{
> + return 0;
> +}
> +
>  #define __ide_insl(data_reg, buffer, wcount) \
>   __ide_insw(data_reg, buffer, (wcount)<<1)
>  #define __ide_outsl(data_reg, buffer, wcount) \

Thanks, I fixed it in the "guilty" patch to preserve bisectability.

From: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]>
Subject: [PATCH] ide: add CONFIG_IDE_ARCH_OBSOLETE_DEFAULTS (take 2)

* Add CONFIG_IDE_ARCH_OBSOLETE_DEFAULTS to drivers/ide/Kconfig and use
  it instead of defining IDE_ARCH_OBSOLETE_DEFAULTS in .

v2:
* Define ide_default_irq() in ide-probe.c/ns87415.c if not already defined
  and drop defining ide_default_irq() for CONFIG_IDE_ARCH_OBSOLETE_DEFAULTS=n.

  [ Thanks to Stephen Rothwell and David Miller for noticing the problem. ]

Cc: Stephen Rothwell <[EMAIL PROTECTED]>
Cc: David Miller <[EMAIL PROTECTED]>
Signed-off-by: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]>
---
 drivers/ide/Kconfig |3 +++
 drivers/ide/ide-probe.c |4 
 drivers/ide/ide.c   |4 
 drivers/ide/pci/ns87415.c   |4 
 include/asm-alpha/ide.h |3 ---
 include/asm-ia64/ide.h  |2 --
 include/asm-m32r/ide.h  |2 --
 include/asm-mips/mach-generic/ide.h |2 --
 include/asm-powerpc/ide.h   |2 --
 include/asm-x86/ide.h   |2 --
 include/linux/ide.h |7 ---
 11 files changed, 15 insertions(+), 20 deletions(-)

Index: b/drivers/ide/Kconfig
===
--- a/drivers/ide/Kconfig
+++ b/drivers/ide/Kconfig
@@ -1099,6 +1099,9 @@ config BLK_DEV_IDEDMA
 config IDE_ARCH_OBSOLETE_INIT
def_bool ALPHA || (ARM && !ARCH_L7200) || BLACKFIN || X86 || IA64 || 
M32R || MIPS || PARISC || PPC || (SUPERH64 && BLK_DEV_IDEPCI) || SPARC
 
+config IDE_ARCH_OBSOLETE_DEFAULTS
+   def_bool ALPHA || X86 || IA64 || M32R || MIPS || PPC32
+
 endif
 
 config BLK_DEV_HD_ONLY
Index: b/drivers/ide/ide-probe.c
===
--- a/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -1229,6 +1229,10 @@ static void drive_release_dev (struct de
complete(&drive->gendev_rel_comp);
 }
 
+#ifndef ide_default_irq
+#define ide_default_irq(irq) 0
+#endif
+
 static int hwif_init(ide_hwif_t *hwif)
 {
int old_irq;
Index: b/drivers/ide/ide.c
===
--- a/drivers/ide/ide.c
+++ b/drivers/ide/ide.c
@@ -167,6 +167,10 @@ static void ide_port_init_devices_data(i
}
 }
 
+#ifndef CONFIG_IDE_ARCH_OBSOLETE_DEFAULTS
+# define ide_default_io_base(index)(0)
+# define ide_init_default_irq(base)(0)
+#endif
 
 /*
  * init_ide_data() sets reasonable default values into all fields
Index: b/drivers/ide/pci/ns87415.c
===
--- a/drivers/ide/pci/ns87415.c
+++ b/drivers/ide/pci/ns87415.c
@@ -181,6 +181,10 @@ static int ns87415_ide_dma_setup(ide_dri
return 1;
 }
 
+#ifndef ide_default_irq
+#define ide_default_irq(irq) 0
+#endif
+
 static void __devinit init_hwif_ns87415 (ide_hwif_t *hwif)
 {
struct pci_dev *dev = to_pci_dev(hwif->dev);
Index: b/include/asm-alpha/ide.h
===
--- a/include/asm-alpha/ide.h
+++ b/include/asm-alpha/ide.h
@@ -13,9 +13,6 @@
 
 #ifdef __KERNEL__
 
-
-#define IDE_ARCH_OBSOLETE_DEFAULTS
-
 static inline int ide_default_irq(unsigned long base)
 {
switch (base) {
Index: b/include/asm-ia64/ide.h
===
--- a/include/asm-ia64/ide.h
+++ b/include/asm-ia64/ide.h
@@ -16,8 +16,6 @@
 
 #include 
 
-#define IDE_ARCH_OBSOLETE_DEFAULTS
-
 static inline int ide_default_irq(unsigned long base)
 {
switch (base) {
Index: b/include/asm-m32r/ide.h
===

Re: [PATCH] next-20080218 build failure at pmac_ide_macio_attach ()

2008-02-18 Thread Bartlomiej Zolnierkiewicz
On Monday 18 February 2008, Kamalesh Babulal wrote:
> Hi,
> 
> The next-20080218 kernel build fails on the powerpc(s)
> 
> drivers/ide/ppc/pmac.c: In function ‘pmac_ide_macio_attach’:
> drivers/ide/ppc/pmac.c:1094: error: conversion to non-scalar type requested
> drivers/ide/ppc/pmac.c: In function ‘pmac_ide_pci_attach’:
> drivers/ide/ppc/pmac.c:1232: error: conversion to non-scalar type requested
> make[3]: *** [drivers/ide/ppc/pmac.o] Error 1
> make[2]: *** [drivers/ide/ppc] Error 2
> make[1]: *** [drivers/ide] Error 2
> make: *** [drivers] Error 2
> 
> I Have tested this patch for build failure only.
> 
> Signed-off-by: Kamalesh Babulal <[EMAIL PROTECTED]>
> ---
> --- linux-2.6.25-rc1/drivers/ide/ppc/pmac.c   2008-02-18 18:41:48.0 
> +0530
> +++ linux-2.6.25-rc1/drivers/ide/ppc/~pmac.c  2008-02-18 19:20:37.0 
> +0530
> @@ -1091,7 +1091,7 @@ pmac_ide_macio_attach(struct macio_dev *
>   int irq, rc;
>   hw_regs_t hw;
>  
> - pmif = (struct pmac_ide_hwif)kzalloc(sizeof(*pmif), GFP_KERNEL);
> + pmif = (struct pmac_ide_hwif*)kzalloc(sizeof(*pmif), GFP_KERNEL);
>   if (pmif == NULL)
>   return -ENOMEM;
>  
> @@ -1229,7 +1229,7 @@ pmac_ide_pci_attach(struct pci_dev *pdev
>   return -ENODEV;
>   }
>  
> - pmif = (struct pmac_ide_hwif)kzalloc(sizeof(*pmif), GFP_KERNEL);
> + pmif = (struct pmac_ide_hwif*)kzalloc(sizeof(*pmif), GFP_KERNEL);
>   if (pmif == NULL)
>   return -ENOMEM;
>  

Thanks, I integrated it with the "guilty" patch to preserve bisectability.

From: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]>
Subject: [PATCH] ide-pmac: dynamically allocate struct pmac_ide_hwif instances 
(take 2)

* Dynamically allocate struct pmac_ide_hwif instances in pmac_ide_macio_attach()
  and pmac_ide_pci_attach(), then remove no longer needed pmac_ide[].

v2:
* Build fix from Kamalesh Babulal.

Cc: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
Cc: Kamalesh Babulal <[EMAIL PROTECTED]>
Signed-off-by: Bartlomiej Zolnierkiewicz <[EMAIL PROTECTED]>
---
 drivers/ide/ppc/pmac.c |   49 +
 1 file changed, 33 insertions(+), 16 deletions(-)

Index: b/drivers/ide/ppc/pmac.c
===
--- a/drivers/ide/ppc/pmac.c
+++ b/drivers/ide/ppc/pmac.c
@@ -79,8 +79,6 @@ typedef struct pmac_ide_hwif {

 } pmac_ide_hwif_t;
 
-static pmac_ide_hwif_t pmac_ide[MAX_HWIFS];
-
 enum {
controller_ohare,   /* OHare based */
controller_heathrow,/* Heathrow/Paddington */
@@ -1094,29 +1092,34 @@ pmac_ide_macio_attach(struct macio_dev *
int i, rc;
hw_regs_t hw;
 
+   pmif = kzalloc(sizeof(*pmif), GFP_KERNEL);
+   if (pmif == NULL)
+   return -ENOMEM;
+
i = 0;
-   while (i < MAX_HWIFS && (ide_hwifs[i].io_ports[IDE_DATA_OFFSET] != 0
-   || pmac_ide[i].node != NULL))
+   while (i < MAX_HWIFS && (ide_hwifs[i].io_ports[IDE_DATA_OFFSET] != 0))
++i;
if (i >= MAX_HWIFS) {
printk(KERN_ERR "ide-pmac: MacIO interface attach with no 
slot\n");
printk(KERN_ERR "  %s\n", mdev->ofdev.node->full_name);
-   return -ENODEV;
+   rc = -ENODEV;
+   goto out_free_pmif;
}
 
-   pmif = &pmac_ide[i];
hwif = &ide_hwifs[i];
 
if (macio_resource_count(mdev) == 0) {
printk(KERN_WARNING "ide%d: no address for %s\n",
   i, mdev->ofdev.node->full_name);
-   return -ENXIO;
+   rc = -ENXIO;
+   goto out_free_pmif;
}
 
/* Request memory resource for IO ports */
if (macio_request_resource(mdev, 0, "ide-pmac (ports)")) {
printk(KERN_ERR "ide%d: can't request mmio resource !\n", i);
-   return -EBUSY;
+   rc = -EBUSY;
+   goto out_free_pmif;
}

/* XXX This is bogus. Should be fixed in the registry by checking
@@ -1166,11 +1169,15 @@ pmac_ide_macio_attach(struct macio_dev *
iounmap(pmif->dma_regs);
macio_release_resource(mdev, 1);
}
-   memset(pmif, 0, sizeof(*pmif));
macio_release_resource(mdev, 0);
+   kfree(pmif);
}
 
return rc;
+
+out_free_pmif:
+   kfree(pmif);
+   return rc;
 }
 
 static int
@@ -1223,30 +1230,36 @@ pmac_ide_pci_attach(struct pci_dev *pdev
printk(KERN_ERR "ide-pmac: cannot find MacIO node for Kauai ATA 
interface\n");
return -ENODEV;
}
+
+   pmif = kz

Re: Optiarc DVD RW AD-5200A audio playing

2008-02-18 Thread Bartlomiej Zolnierkiewicz
On Monday 18 February 2008, Stefan Bader wrote:
> Borislav Petkov wrote:
> > On Sat, Feb 16, 2008 at 04:24:01PM +0100, Bartlomiej Zolnierkiewicz wrote:
> >> Hi,
> >>
> >> On Saturday 16 February 2008, Borislav Petkov wrote:
> >>> On Fri, Feb 15, 2008 at 02:53:27PM -0500, Stefan Bader wrote:
>  Hello Borislav,
> 
>  I worked on a problem with an DVD driver (model=Optiarc DVD RW AD-5200A)
>  which obviously has the same problem as some Matshita drives. The
>  following patch was reported to enabled audio playing on this drive.
>  Would this approach be suitable for upstream or are there other
>  solutions to this problem?
> 
>  Regards,
>  Stefan
> 
>  --- a/drivers/ide/ide-cd.c
>  +++ b/drivers/ide/ide-cd.c
>  @@ -2988,7 +2988,8 @@ int ide_cdrom_probe_capabilities (ide_drive_t 
>  *drive)
>   if (strcmp(drive->id->model, "MATSHITADVD-ROM SR-8187") == 0 ||
>   strcmp(drive->id->model, "MATSHITADVD-ROM SR-8186") == 0 ||
>   strcmp(drive->id->model, "MATSHITADVD-ROM SR-8176") == 0 ||
>  -strcmp(drive->id->model, "MATSHITADVD-ROM SR-8174") == 0)
>  +strcmp(drive->id->model, "MATSHITADVD-ROM SR-8174") == 0 ||
>  +strcmp(drive->id->model, "Optiarc DVD RW AD-5200A") == 0)
>   CDROM_CONFIG_FLAGS(drive)->audio_play = 1;
> 
>   #if ! STANDARD_ATAPI
> >>> Hi Stefan,
> >>>
> >>> just to make sure that the audioplay bit is not set in the capabilities 
> >>> page,
> >>> can you please try the following patch applied against 2.6.25-rc2 and 
> >>> send me
> >>> the output. Thanks!
> >>>
> >>> @Bart: by the way, this cdi->mask thingy is kinda unintuitive doing double
> >>> negation to check whether a feature is supported or not. Yeah, this comes 
> >>> from
> >>> "above," i.e. uniform cdrom layer. But still, shouldn't we use a 
> >>> cdi->caps_flags
> >>> or something similar instead, which mirrors the caps page bits setting?
> >> It seems so (at least having negative flags is very unintuitive) but they
> >> might be some reason for this ugliness, Jens?
> >>
> >> [ Please also remember that since cdrom layer is _uniform_ it may be not
> >>   possible and/or desirable to have 1-1 mapping between caps page bits
> >>   and the future cdi->caps_flags. ]
> >>
> >>> commit 435f0f4496a1b32af2d542f43b2370a890fe2f83
> >>> Author: Borislav Petkov <[EMAIL PROTECTED]>
> >>> Date:   Sat Feb 16 09:56:36 2008 +0100
> >>>
> >>> ide-cd: Enable audio play quirk for Optiarc DVD RW AD-5200A drive
> >>> 
> >>> Reported-by: Stefan Bader <[EMAIL PROTECTED]>
> >>> Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
> >>>
> >>> diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
> >>> index f77db6b..2c9d06e 100644
> >>> --- a/drivers/ide/ide-cd.c
> >>> +++ b/drivers/ide/ide-cd.c
> >>> @@ -1750,6 +1750,10 @@ int ide_cdrom_probe_capabilities (ide_drive_t 
> >>> *drive)
> >>>   cdi->mask &= ~(CDC_DVD_RAM | CDC_RAM);
> >>>   if (buf[8 + 3] & 0x10)
> >>>   cdi->mask &= ~CDC_DVD_R;
> >>> + if (!(buf[8 + 4] & 0x01)) {
> >> Hmm, shouldn't there be '&& (cd->cd_flags & IDE_CD_FLAG_PLAY_AUDIO_OK)'
> >> to prevent false positives?
> > 
> > I wanted to see whether the caps page reports the audioplay bit off...
> > 
> >>> + printk(KERN_INFO "ide-cd: audio play not advertised in caps 
> >>> page,"
> >> Would be nice to also printk() the device name.
> > 
> > ... but printing the device model is actually a good idea and this will 
> > rule out
> > false positives, so Stefan, please drop the previous patch and test the 
> > updated
> > one below. Thanks.
> > 
> > 
> 
> Hi Borislav,
> 
> the problem is that I don't own this drive myself and the owner is
> running a 2.6.22 kernel and is normally not doing any kernel compiles.
> But I could provide him a modified patch.
> Though, if you just want to know whether the cap bit was really unset, I
> think we know this already. When I got the problem report we checked
> /proc/sys/dev/cdrom/info and that showed the "Can play audio" bit as 0.
> Which is the reason I gave the owner the patch for adding the model to
> the excemption list. And from his feedback I take that the drive plays
> audio tracks with the patch in use.

Borislav, I guess that this is good enough proof that audioplay bit is off.

Could you please send me the final version of the patch?

> Stefan
> 
> > commit 6cc44b0ce5c9270b15d456eb9ffa91b855e4e0d0
> > Author: Borislav Petkov <[EMAIL PROTECTED]>
> > Date:   Sat Feb 16 09:56:36 2008 +0100
> > 
> > ide-cd: Enable audio play quirk for Optiarc DVD RW AD-5200A drive
> > 
> > Reported-by: Stefan Bader <[EMAIL PROTECTED]>
> > Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
> > 
> > diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
> > index f77db6b..4c9984f 100644
> > --- a/drivers/ide/ide-cd.c
> > +++ b/drivers/ide/ide-cd.c
> > @@ -1750,6 +1750,11 @@ int ide_cdrom_

Re: inode leak in 2.6.24?

2008-02-18 Thread Ferenc Wagner
David Chinner <[EMAIL PROTECTED]> writes:

> On Sat, Feb 16, 2008 at 12:18:58AM +0100, Ferenc Wagner wrote:
> 
>> 5 days ago I pulled the git tree (HEAD was
>> 25f666300625d894ebe04bac2b4b3aadb907c861), added two minor patches
>> (the vmsplice fix and the GFS1 exports), compiled and booted the
>> kernel.  Things are working OK, but I noticed that inode usage has
>> been steadily rising since then (see attached graph, unless lost in
>> transit).  The real filesystems used by the machine are XFS.  I wonder
>> if it may be some kind of bug and if yes, whether it has been fixed
>> already.  Feel free to ask for any missing information.
>
> Output of /proc/slabinfo, please. If you could get a sample when the
> machine has just booted, one at the typical post-boot steady state
> as well as one after it has increased well past normal, it would
> help identify what type of inode is increasing differently.

Ugh.  Your message came just a little bit too late, I rebooted the
machine a couple of hours ago for applying an IPv6 patch, without
saving /proc/slabinfo.  The currently running kernel is 2.6.24.2 +
GFS1 exports + IPv6 fix, and I snapshotted /proc/slabinfo approx. 3
hours after reboot.  At least we will see whether this version also
produces the problem, it isn't too different after all (for some
"too").

Btw. there was no steady-state with the previous kernel, the increase
started right after reboot, which means that by tomorrow I'll be able
to tell whether it's increasing again or this kernel doesn't exhibit
such effect.

> Also, can you tell us what metrics you are graphing (i.e. where
> in /proc or /sys you are getting them from)?

I wonder why I assumed everybody knows Munin's graphs by heart...
In short: "inode table size" is the first value from
/proc/sys/fs/inode-nr; "open inodes" is the same minus the second
value.  In other words:

awk '{print "used.value " $1-$2 "\nmax.value " $1}' < /proc/sys/fs/inode-nr

I'll come back shortly with the new findings.  If nothing turns up,
it's possible to boot up the previous kernel (or -- if needed --
current git) with this IPv6 fix added and check that again.
-- 
Thanks,
Feri.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   >