date:20070705

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Nigel Cunningham

Hi Kyle.

On Friday 06 July 2007 15:01:48 Kyle Moffett wrote:
> On Jul 06, 2007, at 00:03:15, Nigel Cunningham wrote:
> > The kind of thing Linus was talking about would limit you (as  
> > swsusp and uswsusp do now) to only half the amount of memory.
> 
> How so?  Suppose hibernate is implemented like this:

You're not talking about the same thing Linus was suggesting. He was just 
wanting a result = sys_snapshot() sort of call. That would limit us to half 
the amount of memory.

I've looked over what you've written below and want to consider it in detail. 
Right now though, I don't have the time. I'll try to get back to you 
promptly.

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.

pgpVkczwEaMJi.pgp
Description: PGP signature

Re: Valgrinding the kernel?

2007-07-05 Thread Jeremy Fitzhardinge


Dan Kegel wrote:

It'd be nice to see if Valgrind could catch uninitialized
references in the kernel, if only to see if Coverity is
missing anything that happens in practice.

Back in December 2002, Valgrind started to run UML:
http://user-mode-linux.sourceforge.net/diary.html
http://marc.info/?l=linux-kernel=104035199923121=2
but it wasn't quite usable, and it seems broken since then.
The last note I could find about this was from Jeff In July 2005:
http://marc.info/?l=linux-kernel=112273702329952=2

Has there been any motion since then? 


Not that I know of.  I think all the pieces are in place now.  The 
original problem was that Valgrind didn't deal with clone and didn't 
have accurate signal support.  I fixed that.  Then the problem was 
dealing with the densely packed small kernel stacks.  Valgrind now has a 
way of registering stack regions, so that it can distinguish between a 
stack switch and a normal function call.


So, I think all it needs now is to scatter some valgrind client requests 
around the kernel and give it a spin.  See, simple ;)


   J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[-mm PATCH 2/8] Memory controller containers setup (v2)

2007-07-05 Thread Balbir Singh


Setup the memory container and add basic hooks and controls to integrate
and work with the container.

Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---

 include/linux/container_subsys.h |6 +
 include/linux/memcontrol.h   |   19 +
 init/Kconfig |8 ++
 mm/Makefile  |1 
 mm/memcontrol.c  |  141 +++
 5 files changed, 175 insertions(+)

diff -puN include/linux/container_subsys.h~mem-control-setup 
include/linux/container_subsys.h
--- linux-2.6.22-rc6/include/linux/container_subsys.h~mem-control-setup 
2007-07-05 13:45:17.0 -0700
+++ linux-2.6.22-rc6-balbir/include/linux/container_subsys.h2007-07-05 
13:45:17.0 -0700
@@ -30,3 +30,9 @@ SUBSYS(ns)
 #endif
 
 /* */
+
+#ifdef CONFIG_CONTAINER_MEM_CONT
+SUBSYS(mem_container)
+#endif
+
+/* */
diff -puN init/Kconfig~mem-control-setup init/Kconfig
--- linux-2.6.22-rc6/init/Kconfig~mem-control-setup 2007-07-05 
13:45:17.0 -0700
+++ linux-2.6.22-rc6-balbir/init/Kconfig2007-07-05 13:45:17.0 
-0700
@@ -360,6 +360,14 @@ config CONTAINER_NS
   for instance virtual servers and checkpoint/restart
   jobs.
 
+config CONTAINER_MEM_CONT
+   bool "Memory controller for containers"
+   select CONTAINERS
+   select RESOURCE_COUNTERS
+   help
+ Provides a memory controller that manages both page cache and
+ RSS memory.
+
 config PROC_PID_CPUSET
bool "Include legacy /proc//cpuset file"
depends on CPUSETS
diff -puN mm/Makefile~mem-control-setup mm/Makefile
--- linux-2.6.22-rc6/mm/Makefile~mem-control-setup  2007-07-05 
13:45:17.0 -0700
+++ linux-2.6.22-rc6-balbir/mm/Makefile 2007-07-05 13:45:17.0 -0700
@@ -30,4 +30,5 @@ obj-$(CONFIG_FS_XIP) += filemap_xip.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
 obj-$(CONFIG_QUICKLIST) += quicklist.o
+obj-$(CONFIG_CONTAINER_MEM_CONT) += memcontrol.o
 
diff -puN /dev/null mm/memcontrol.c
--- /dev/null   2007-06-01 08:12:04.0 -0700
+++ linux-2.6.22-rc6-balbir/mm/memcontrol.c 2007-07-05 13:45:17.0 
-0700
@@ -0,0 +1,141 @@
+/* memcontrol.c - Memory Controller
+ *
+ * Copyright IBM Corporation, 2007
+ * Author Balbir Singh <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ */
+
+#include 
+#include 
+#include 
+
+struct container_subsys mem_container_subsys;
+
+/*
+ * The memory controller data structure. The memory controller controls both
+ * page cache and RSS per container. We would eventually like to provide
+ * statistics based on the statistics developed by Rik Van Riel for clock-pro,
+ * to help the administrator determine what knobs to tune.
+ *
+ * TODO: Add a water mark for the memory controller. Reclaim will begin when
+ * we hit the water mark.
+ */
+struct mem_container {
+   struct container_subsys_state css;
+   /*
+* the counter to account for memory usage
+*/
+   struct res_counter res;
+};
+
+/*
+ * A meta page is associated with every page descriptor. The meta page
+ * helps us identify information about the container
+ */
+struct meta_page {
+   struct list_head list;  /* per container LRU list */
+   struct page *page;
+   struct mem_container *mem_container;
+};
+
+
+static inline struct mem_container *mem_container_from_cont(struct container
+   *cnt)
+{
+   return container_of(container_subsys_state(cnt,
+   mem_container_subsys_id), struct mem_container,
+   css);
+}
+
+static ssize_t mem_container_read(struct container *cont, struct cftype *cft,
+   struct file *file, char __user *userbuf, size_t nbytes,
+   loff_t *ppos)
+{
+   return res_counter_read(_container_from_cont(cont)->res,
+   cft->private, userbuf, nbytes, ppos);
+}
+
+static ssize_t mem_container_write(struct container *cont, struct cftype *cft,
+   struct file *file, const char __user *userbuf,
+   size_t nbytes, loff_t *ppos)
+{
+   return res_counter_write(_container_from_cont(cont)->res,
+   cft->private, userbuf, nbytes, ppos);
+}
+
+static struct cftype mem_container_usage = {
+   .name = "mem_usage",
+   .private = RES_USAGE,
+   .read = mem_container_read,
+};
+
+static struct cftype mem_container_limit = {
+   .name = "mem_limit",
+   .private =

[-mm PATCH 8/8] Add switch to control what type of pages to limit (v2)

2007-07-05 Thread Balbir Singh



Choose if we want cached pages to be accounted or not. By default both
are accounted for. A new set of tunables are added.

echo -n 1 > mem_control_type

switches the accounting to account for only mapped pages

echo -n 2 > mem_control_type

switches the behaviour back


Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---

 include/linux/memcontrol.h |9 +++
 mm/filemap.c   |2 
 mm/memcontrol.c|  129 ++---
 mm/swap_state.c|2 
 4 files changed, 122 insertions(+), 20 deletions(-)

diff -puN mm/memcontrol.c~mem-control-choose-rss-vs-rss-and-pagecache 
mm/memcontrol.c
--- 
linux-2.6.22-rc6/mm/memcontrol.c~mem-control-choose-rss-vs-rss-and-pagecache
2007-07-05 20:00:07.0 -0700
+++ linux-2.6.22-rc6-balbir/mm/memcontrol.c 2007-07-05 20:09:40.0 
-0700
@@ -22,6 +22,8 @@
 #include 
 #include 
 
+#include 
+
 struct container_subsys mem_container_subsys;
 
 /*
@@ -52,6 +54,7 @@ struct mem_container {
 * spin_lock to protect the per container LRU
 */
spinlock_t lru_lock;
+   unsigned long control_type; /* control RSS or RSS+Pagecache */
 };
 
 /*
@@ -65,6 +68,14 @@ struct meta_page {
atomic_t ref_cnt;
 };
 
+enum {
+   MEM_CONTAINER_TYPE_UNSPEC = 0,
+   MEM_CONTAINER_TYPE_MAPPED,
+   MEM_CONTAINER_TYPE_ALL,
+   MEM_CONTAINER_TYPE_MAX,
+} mem_control_type;
+
+static struct mem_container init_mem_container;
 
 static inline struct mem_container *mem_container_from_cont(struct container
*cnt)
@@ -301,6 +312,22 @@ err:
 }
 
 /*
+ * See if the cached pages should be charged at all?
+ */
+int mem_container_cache_charge(struct page *page, struct mm_struct *mm)
+{
+   struct mem_container *mem;
+   if (!mm)
+   mm = _mm;
+
+   mem = rcu_dereference(mm->mem_container);
+   if (mem->control_type & MEM_CONTAINER_TYPE_ALL)
+   return mem_container_charge(page, mm);
+   else
+   return 0;
+}
+
+/*
  * Uncharging is always a welcome operation, we never complain, simply
  * uncharge.
  */
@@ -311,7 +338,9 @@ void mem_container_uncharge(struct meta_
unsigned long flags;
 
/*
-* This can happen for PAGE_ZERO
+* This can happen for PAGE_ZERO. This can also handle cases when
+* a page is not charged at all and we are switching between
+* handling the control_type.
 */
if (!mp)
return;
@@ -350,26 +379,59 @@ static ssize_t mem_container_write(struc
cft->private, userbuf, nbytes, ppos);
 }
 
-static struct cftype mem_container_usage = {
-   .name = "mem_usage",
-   .private = RES_USAGE,
-   .read = mem_container_read,
-};
+static ssize_t mem_control_type_write(struct container *cont,
+   struct cftype *cft, struct file *file,
+   const char __user *userbuf,
+   size_t nbytes, loff_t *pos)
+{
+   int ret;
+   char *buf, *end;
+   unsigned long tmp;
+   struct mem_container *mem;
 
-static struct cftype mem_container_limit = {
-   .name = "mem_limit",
-   .private = RES_LIMIT,
-   .write = mem_container_write,
-   .read = mem_container_read,
-};
+   mem = mem_container_from_cont(cont);
+   buf = kmalloc(nbytes + 1, GFP_KERNEL);
+   ret = -ENOMEM;
+   if (buf == NULL)
+   goto out;
 
-static struct cftype mem_container_failcnt = {
-   .name = "mem_failcnt",
-   .private = RES_FAILCNT,
-   .read = mem_container_read,
-};
+   buf[nbytes] = 0;
+   ret = -EFAULT;
+   if (copy_from_user(buf, userbuf, nbytes))
+   goto out_free;
+
+   ret = -EINVAL;
+   tmp = simple_strtoul(buf, , 10);
+   if (*end != '\0')
+   goto out_free;
+
+   if (tmp <= MEM_CONTAINER_TYPE_UNSPEC || tmp >= MEM_CONTAINER_TYPE_MAX)
+   goto out_free;
+
+   mem->control_type = tmp;
+   ret = nbytes;
+out_free:
+   kfree(buf);
+out:
+   return ret;
+}
 
-static struct mem_container init_mem_container;
+static ssize_t mem_control_type_read(struct container *cont,
+   struct cftype *cft,
+   struct file *file, char __user *userbuf,
+   size_t nbytes, loff_t *ppos)
+{
+   unsigned long val;
+   char buf[64], *s;
+   struct mem_container *mem;
+
+   mem = mem_container_from_cont(cont);
+   s = buf;
+   val = mem->control_type;
+   s += sprintf(s, "%lu\n", val);
+   return simple_read_from_buffer((void __user *)userbuf, nbytes,
+   ppos, buf, s - buf);
+}
 
 static int mem_container_create(struct container_subsys *ss,
struct container *cont)
@@ -392,9 +454,36 @@ static int mem_container_create(struct c

[-mm PATCH 7/8] Memory controller OOM handling (v2)

2007-07-05 Thread Balbir Singh


Out of memory handling for containers over their limit. A task from the
container over limit is chosen using the existing OOM logic and killed.

TODO:
1. As discussed in the OLS BOF session, consider implementing a user
space policy for OOM handling.

Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---

 include/linux/memcontrol.h |1 +
 mm/memcontrol.c|1 +
 mm/oom_kill.c  |   42 ++
 3 files changed, 40 insertions(+), 4 deletions(-)

diff -puN include/linux/memcontrol.h~mem-control-out-of-memory 
include/linux/memcontrol.h
--- linux-2.6.22-rc6/include/linux/memcontrol.h~mem-control-out-of-memory   
2007-07-05 18:49:35.0 -0700
+++ linux-2.6.22-rc6-balbir/include/linux/memcontrol.h  2007-07-05 
18:49:35.0 -0700
@@ -33,6 +33,7 @@ extern unsigned long mem_container_isola
int mode, struct zone *z,
struct mem_container *mem_cont,
int active);
+extern void mem_container_out_of_memory(struct mem_container *mem);
 
 #else /* CONFIG_CONTAINER_MEM_CONT */
 static inline void mm_init_container(struct mm_struct *mm,
diff -puN mm/memcontrol.c~mem-control-out-of-memory mm/memcontrol.c
--- linux-2.6.22-rc6/mm/memcontrol.c~mem-control-out-of-memory  2007-07-05 
18:49:35.0 -0700
+++ linux-2.6.22-rc6-balbir/mm/memcontrol.c 2007-07-05 18:49:35.0 
-0700
@@ -266,6 +266,7 @@ int mem_container_charge(struct page *pa
if (res_counter_check_under_limit(>res))
continue;
 
+   mem_container_out_of_memory(mem);
goto free_mp;
}
 
diff -puN mm/oom_kill.c~mem-control-out-of-memory mm/oom_kill.c
--- linux-2.6.22-rc6/mm/oom_kill.c~mem-control-out-of-memory2007-07-05 
18:49:35.0 -0700
+++ linux-2.6.22-rc6-balbir/mm/oom_kill.c   2007-07-05 18:49:35.0 
-0700
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 int sysctl_panic_on_oom;
 /* #define DEBUG */
@@ -47,7 +48,8 @@ int sysctl_panic_on_oom;
  *of least surprise ... (be careful when you change it)
  */
 
-unsigned long badness(struct task_struct *p, unsigned long uptime)
+unsigned long badness(struct task_struct *p, unsigned long uptime,
+   struct mem_container *mem)
 {
unsigned long points, cpu_time, run_time, s;
struct mm_struct *mm;
@@ -60,6 +62,13 @@ unsigned long badness(struct task_struct
return 0;
}
 
+#ifdef CONFIG_CONTAINER_MEM_CONT
+   if (mem != NULL && mm->mem_container != mem) {
+   task_unlock(p);
+   return 0;
+   }
+#endif
+
/*
 * The memory size of the process is the basis for the badness.
 */
@@ -204,7 +213,8 @@ static inline int constrained_alloc(stru
  *
  * (not docbooked, we don't want this one cluttering up the manual)
  */
-static struct task_struct *select_bad_process(unsigned long *ppoints)
+static struct task_struct *select_bad_process(unsigned long *ppoints,
+   struct mem_container *mem)
 {
struct task_struct *g, *p;
struct task_struct *chosen = NULL;
@@ -258,7 +268,7 @@ static struct task_struct *select_bad_pr
if (p->oomkilladj == OOM_DISABLE)
continue;
 
-   points = badness(p, uptime.tv_sec);
+   points = badness(p, uptime.tv_sec, mem);
if (points > *ppoints || !chosen) {
chosen = p;
*ppoints = points;
@@ -372,6 +382,30 @@ static int oom_kill_process(struct task_
return oom_kill_task(p);
 }
 
+#ifdef CONFIG_CONTAINER_MEM_CONT
+void mem_container_out_of_memory(struct mem_container *mem)
+{
+   unsigned long points = 0;
+   struct task_struct *p;
+
+   container_lock();
+   rcu_read_lock();
+retry:
+   p = select_bad_process(, mem);
+   if (PTR_ERR(p) == -1UL)
+   goto out;
+
+   if (!p)
+   p = current;
+
+   if (oom_kill_process(p, points, "Memory container out of memory"))
+   goto retry;
+out:
+   rcu_read_unlock();
+   container_unlock();
+}
+#endif
+
 static BLOCKING_NOTIFIER_HEAD(oom_notify_list);
 
 int register_oom_notifier(struct notifier_block *nb)
@@ -444,7 +478,7 @@ retry:
 * Rambo mode: Shoot down a process and hope it solves whatever
 * issues we may have.
 */
-   p = select_bad_process();
+   p = select_bad_process(, NULL);
 
if (PTR_ERR(p) == -1UL)
goto out;
_

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Nigel Cunningham

Hi.

On Friday 06 July 2007 14:41:40 Benjamin Herrenschmidt wrote:
> 
> > I/O from swsusp and suspend2 use bios directly, so the page cache isn't an 
> > issue for them (apart from the fact that Suspend2 saves the page cache 
> > separately so it can get a full image). Not sure about uswsusp.
> > 
> > Only having half the amount of memory doesn't sound like a big limitation 
for 
> > modern desktops & laptops, but don't forget that there are embedded guys 
> > wanting to hbernate too :)
> 
> Wait wait wait ... uses the BIOS ? what do you mean ?

You misread me, Ben. Sorry for not being clearer. bios as in struct bio.

Regards,

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.


pgpM3fdiAFHHD.pgp
Description: PGP signature

[-mm PATCH 3/8] Memory controller accounting setup (v2)

2007-07-05 Thread Balbir Singh


Basic setup routines, the mm_struct has a pointer to the container that
it belongs to and the the page has a meta_page associated with it.


Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---

 include/linux/memcontrol.h |   32 ++
 include/linux/mm_types.h   |4 +++
 include/linux/sched.h  |4 +++
 kernel/fork.c  |   10 ++---
 mm/memcontrol.c|   47 ++---
 5 files changed, 91 insertions(+), 6 deletions(-)

diff -puN include/linux/memcontrol.h~mem-control-accounting-setup 
include/linux/memcontrol.h
--- linux-2.6.22-rc6/include/linux/memcontrol.h~mem-control-accounting-setup
2007-07-05 13:45:17.0 -0700
+++ linux-2.6.22-rc6-balbir/include/linux/memcontrol.h  2007-07-05 
13:45:17.0 -0700
@@ -15,5 +15,37 @@
 #ifndef _LINUX_MEMCONTROL_H
 #define _LINUX_MEMCONTROL_H
 
+struct mem_container;
+struct meta_page;
+
+#ifdef CONFIG_CONTAINER_MEM_CONT
+
+extern void mm_init_container(struct mm_struct *mm, struct task_struct *p);
+extern void mm_free_container(struct mm_struct *mm);
+extern void page_assign_meta_page(struct page *page, struct meta_page *mp);
+extern struct meta_page *page_get_meta_page(struct page *page);
+
+#else /* CONFIG_CONTAINER_MEM_CONT */
+static inline void mm_init_container(struct mm_struct *mm,
+   struct task_struct *p)
+{
+}
+
+static inline void mm_free_container(struct mm_struct *mm)
+{
+}
+
+static inline void page_assign_meta_page(struct page *page,
+   struct meta_page *mp)
+{
+}
+
+static inline struct meta_page *page_get_meta_page(struct page *page)
+{
+   return NULL;
+}
+
+#endif /* CONFIG_CONTAINER_MEM_CONT */
+
 #endif /* _LINUX_MEMCONTROL_H */
 
diff -puN include/linux/mm_types.h~mem-control-accounting-setup 
include/linux/mm_types.h
--- linux-2.6.22-rc6/include/linux/mm_types.h~mem-control-accounting-setup  
2007-07-05 13:45:17.0 -0700
+++ linux-2.6.22-rc6-balbir/include/linux/mm_types.h2007-07-05 
13:45:17.0 -0700
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct address_space;
 
@@ -83,6 +84,9 @@ struct page {
unsigned int gfp_mask;
unsigned long trace[8];
 #endif
+#ifdef CONFIG_CONTAINER_MEM_CONT
+   struct meta_page *meta_page;
+#endif
 };
 
 #endif /* _LINUX_MM_TYPES_H */
diff -puN include/linux/sched.h~mem-control-accounting-setup 
include/linux/sched.h
--- linux-2.6.22-rc6/include/linux/sched.h~mem-control-accounting-setup 
2007-07-05 13:45:17.0 -0700
+++ linux-2.6.22-rc6-balbir/include/linux/sched.h   2007-07-05 
13:45:17.0 -0700
@@ -87,6 +87,7 @@ struct sched_param {
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -416,6 +417,9 @@ struct mm_struct {
/* aio bits */
rwlock_tioctx_list_lock;
struct kioctx   *ioctx_list;
+#ifdef CONFIG_CONTAINER_MEM_CONT
+   struct mem_container *mem_container;
+#endif
 };
 
 struct sighand_struct {
diff -puN kernel/fork.c~mem-control-accounting-setup kernel/fork.c
--- linux-2.6.22-rc6/kernel/fork.c~mem-control-accounting-setup 2007-07-05 
13:45:17.0 -0700
+++ linux-2.6.22-rc6-balbir/kernel/fork.c   2007-07-05 13:45:17.0 
-0700
@@ -330,7 +330,7 @@ static inline void mm_free_pgd(struct mm
 
 #include 
 
-static struct mm_struct * mm_init(struct mm_struct * mm)
+static struct mm_struct * mm_init(struct mm_struct * mm, struct task_struct *p)
 {
atomic_set(>mm_users, 1);
atomic_set(>mm_count, 1);
@@ -347,11 +347,14 @@ static struct mm_struct * mm_init(struct
mm->ioctx_list = NULL;
mm->free_area_cache = TASK_UNMAPPED_BASE;
mm->cached_hole_size = ~0UL;
+   mm_init_container(mm, p);
 
if (likely(!mm_alloc_pgd(mm))) {
mm->def_flags = 0;
return mm;
}
+
+   mm_free_container(mm);
free_mm(mm);
return NULL;
 }
@@ -366,7 +369,7 @@ struct mm_struct * mm_alloc(void)
mm = allocate_mm();
if (mm) {
memset(mm, 0, sizeof(*mm));
-   mm = mm_init(mm);
+   mm = mm_init(mm, current);
}
return mm;
 }
@@ -380,6 +383,7 @@ void fastcall __mmdrop(struct mm_struct 
 {
BUG_ON(mm == _mm);
mm_free_pgd(mm);
+   mm_free_container(mm);
destroy_context(mm);
free_mm(mm);
 }
@@ -500,7 +504,7 @@ static struct mm_struct *dup_mm(struct t
mm->token_priority = 0;
mm->last_interval = 0;
 
-   if (!mm_init(mm))
+   if (!mm_init(mm, tsk))
goto fail_nomem;
 
if (init_new_context(tsk, mm))
diff -puN mm/memcontrol.c~mem-control-accounting-setup mm/memcontrol.c
--- linux-2.6.22-rc6/mm/memcontrol.c~mem-control-accounting-setup   
2007-07-05 13:45:17.0 -0700
+++ linux-2.6.22-rc6-balbir/mm/memcontrol.c 2007-07-05 13:45:17.0 
-0700

[-mm PATCH 4/8] Memory controller memory accounting (v2)

2007-07-05 Thread Balbir Singh


Add the accounting hooks. The accounting is carried out for RSS and Page
Cache (unmapped) pages. There is now a common limit and accounting for both.
The RSS accounting is accounted at page_add_*_rmap() and page_remove_rmap()
time. Page cache is accounted at add_to_page_cache(),
__delete_from_page_cache(). Swap cache is also accounted for.

Each page's meta_page is protected with a bit in page flags, this makes
handling of race conditions involving simultaneous mappings of a page easier.
A reference count is kept in the meta_page to deal with cases where a page
might be unmapped from the RSS of all tasks, but still lives in the page
cache.

Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---

 fs/exec.c  |1 
 include/linux/memcontrol.h |   11 +++
 include/linux/page-flags.h |3 +
 mm/filemap.c   |8 ++
 mm/memcontrol.c|  132 -
 mm/memory.c|   22 +++
 mm/migrate.c   |6 ++
 mm/page_alloc.c|3 +
 mm/rmap.c  |2 
 mm/swap_state.c|8 ++
 mm/swapfile.c  |   40 +++--
 11 files changed, 218 insertions(+), 18 deletions(-)

diff -puN fs/exec.c~mem-control-accounting fs/exec.c
--- linux-2.6.22-rc6/fs/exec.c~mem-control-accounting   2007-07-05 
13:45:18.0 -0700
+++ linux-2.6.22-rc6-balbir/fs/exec.c   2007-07-05 13:45:18.0 -0700
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
diff -puN include/linux/memcontrol.h~mem-control-accounting 
include/linux/memcontrol.h
--- linux-2.6.22-rc6/include/linux/memcontrol.h~mem-control-accounting  
2007-07-05 13:45:18.0 -0700
+++ linux-2.6.22-rc6-balbir/include/linux/memcontrol.h  2007-07-05 
18:27:26.0 -0700
@@ -24,6 +24,8 @@ extern void mm_init_container(struct mm_
 extern void mm_free_container(struct mm_struct *mm);
 extern void page_assign_meta_page(struct page *page, struct meta_page *mp);
 extern struct meta_page *page_get_meta_page(struct page *page);
+extern int mem_container_charge(struct page *page, struct mm_struct *mm);
+extern void mem_container_uncharge(struct meta_page *mp);
 
 #else /* CONFIG_CONTAINER_MEM_CONT */
 static inline void mm_init_container(struct mm_struct *mm,
@@ -45,6 +47,15 @@ static inline struct meta_page *page_get
return NULL;
 }
 
+static inline int mem_container_charge(struct page *page, struct mm_struct *mm)
+{
+   return 0;
+}
+
+static inline void mem_container_uncharge(struct meta_page *mp)
+{
+}
+
 #endif /* CONFIG_CONTAINER_MEM_CONT */
 
 #endif /* _LINUX_MEMCONTROL_H */
diff -puN include/linux/page-flags.h~mem-control-accounting 
include/linux/page-flags.h
--- linux-2.6.22-rc6/include/linux/page-flags.h~mem-control-accounting  
2007-07-05 13:45:18.0 -0700
+++ linux-2.6.22-rc6-balbir/include/linux/page-flags.h  2007-07-05 
13:45:18.0 -0700
@@ -98,6 +98,9 @@
 #define PG_checked PG_owner_priv_1 /* Used by some filesystems */
 #define PG_pinned  PG_owner_priv_1 /* Xen pinned pagetable */
 
+#define PG_metapage21  /* Used for checking if a meta_page */
+   /* is associated with a page*/
+
 #if (BITS_PER_LONG > 32)
 /*
  * 64-bit-only flags build down from bit 31
diff -puN mm/filemap.c~mem-control-accounting mm/filemap.c
--- linux-2.6.22-rc6/mm/filemap.c~mem-control-accounting2007-07-05 
13:45:18.0 -0700
+++ linux-2.6.22-rc6-balbir/mm/filemap.c2007-07-05 18:26:29.0 
-0700
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include  /* for BUG_ON(!in_atomic()) only */
+#include 
 #include "internal.h"
 
 /*
@@ -116,6 +117,7 @@ void __remove_from_page_cache(struct pag
 {
struct address_space *mapping = page->mapping;
 
+   mem_container_uncharge(page_get_meta_page(page));
radix_tree_delete(>page_tree, page->index);
page->mapping = NULL;
mapping->nrpages--;
@@ -442,6 +444,11 @@ int add_to_page_cache(struct page *page,
int error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
 
if (error == 0) {
+
+   error = mem_container_charge(page, current->mm);
+   if (error)
+   goto out;
+
write_lock_irq(>tree_lock);
error = radix_tree_insert(>page_tree, offset, page);
if (!error) {
@@ -455,6 +462,7 @@ int add_to_page_cache(struct page *page,
write_unlock_irq(>tree_lock);
radix_tree_preload_end();
}
+out:
return error;
 }
 EXPORT_SYMBOL(add_to_page_cache);
diff -puN mm/memcontrol.c~mem-control-accounting mm/memcontrol.c
--- linux-2.6.22-rc6/mm/memcontrol.c~mem-control-accounting 2007-07-05 
13:45:18.0 -0700
+++ linux-2.6.22-rc6-balbir/mm/memcontrol.c 2007-07-05 18:27:29.0 
-0700
@@ -16,6 +16,9 @@
 #include 
 #include 
 #include 
+#include 
+#include

[-mm PATCH 6/8] Memory controller add per container LRU and reclaim (v2)

2007-07-05 Thread Balbir Singh


Add the meta_page to the per container LRU. The reclaim algorithm has been
modified to make the isolate_lru_pages() as a pluggable component. The
scan_control data structure now accepts the container on behalf of which
reclaims are carried out. try_to_free_pages() has been extended to become
container aware.

Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---

 include/linux/memcontrol.h  |   11 +++
 include/linux/res_counter.h |   23 
 include/linux/swap.h|3 +
 mm/memcontrol.c |  121 ++
 mm/swap.c   |2 
 mm/vmscan.c |  125 +++-
 6 files changed, 259 insertions(+), 26 deletions(-)

diff -puN include/linux/memcontrol.h~mem-control-lru-and-reclaim 
include/linux/memcontrol.h
--- linux-2.6.22-rc6/include/linux/memcontrol.h~mem-control-lru-and-reclaim 
2007-07-05 18:28:12.0 -0700
+++ linux-2.6.22-rc6-balbir/include/linux/memcontrol.h  2007-07-05 
18:45:57.0 -0700
@@ -26,6 +26,13 @@ extern void page_assign_meta_page(struct
 extern struct meta_page *page_get_meta_page(struct page *page);
 extern int mem_container_charge(struct page *page, struct mm_struct *mm);
 extern void mem_container_uncharge(struct meta_page *mp);
+extern void mem_container_move_lists(struct meta_page *mp, bool active);
+extern unsigned long mem_container_isolate_pages(unsigned long nr_to_scan,
+   struct list_head *dst,
+   unsigned long *scanned, int order,
+   int mode, struct zone *z,
+   struct mem_container *mem_cont,
+   int active);
 
 #else /* CONFIG_CONTAINER_MEM_CONT */
 static inline void mm_init_container(struct mm_struct *mm,
@@ -56,6 +63,10 @@ static inline void mem_container_uncharg
 {
 }
 
+static inline void mem_container_move_lists(struct meta_page *mp, bool active)
+{
+}
+
 #endif /* CONFIG_CONTAINER_MEM_CONT */
 
 #endif /* _LINUX_MEMCONTROL_H */
diff -puN include/linux/swap.h~mem-control-lru-and-reclaim include/linux/swap.h
--- linux-2.6.22-rc6/include/linux/swap.h~mem-control-lru-and-reclaim   
2007-07-05 18:28:12.0 -0700
+++ linux-2.6.22-rc6-balbir/include/linux/swap.h2007-07-05 
18:28:12.0 -0700
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -191,6 +192,8 @@ extern void swap_setup(void);
 /* linux/mm/vmscan.c */
 extern unsigned long try_to_free_pages(struct zone **zones, int order,
gfp_t gfp_mask);
+extern unsigned long try_to_free_mem_container_pages(struct mem_container 
*mem);
+extern int __isolate_lru_page(struct page *page, int mode);
 extern unsigned long shrink_all_memory(unsigned long nr_pages);
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
diff -puN mm/memcontrol.c~mem-control-lru-and-reclaim mm/memcontrol.c
--- linux-2.6.22-rc6/mm/memcontrol.c~mem-control-lru-and-reclaim
2007-07-05 18:28:12.0 -0700
+++ linux-2.6.22-rc6-balbir/mm/memcontrol.c 2007-07-05 18:49:32.0 
-0700
@@ -19,6 +19,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 struct container_subsys mem_container_subsys;
 
@@ -46,6 +48,10 @@ struct mem_container {
 */
struct list_head active_list;
struct list_head inactive_list;
+   /*
+* spin_lock to protect the per container LRU
+*/
+   spinlock_t lru_lock;
 };
 
 /*
@@ -103,6 +109,92 @@ void __always_inline unlock_meta_page(st
bit_spin_unlock(PG_metapage, >flags);
 }
 
+unsigned long mem_container_isolate_pages(unsigned long nr_to_scan,
+   struct list_head *dst,
+   unsigned long *scanned, int order,
+   int mode, struct zone *z,
+   struct mem_container *mem_cont,
+   int active)
+{
+   unsigned long nr_taken = 0;
+   struct page *page;
+   unsigned long scan;
+   LIST_HEAD(mp_list);
+   struct list_head *src;
+   struct meta_page *mp;
+
+   if (active)
+   src = _cont->active_list;
+   else
+   src = _cont->inactive_list;
+
+   for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
+   mp = list_entry(src->prev, struct meta_page, list);
+   page = mp->page;
+
+   if (PageActive(page) && !active) {
+   mem_container_move_lists(mp, true);
+   scan--;
+   continue;
+   }
+   if (!PageActive(page) && active) {
+   mem_container_move_lists(mp, false);
+   scan--;
+   continue;
+

[-mm PATCH 5/8] Memory controller task migration (v2)

2007-07-05 Thread Balbir Singh


Allow tasks to migrate from one container to the other. We migrate
mm_struct's mem_container only when the thread group id migrates.


Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---

 mm/memcontrol.c |   35 +++
 1 file changed, 35 insertions(+)

diff -puN mm/memcontrol.c~mem-control-task-migration mm/memcontrol.c
--- linux-2.6.22-rc6/mm/memcontrol.c~mem-control-task-migration 2007-07-05 
13:45:18.0 -0700
+++ linux-2.6.22-rc6-balbir/mm/memcontrol.c 2007-07-05 13:45:18.0 
-0700
@@ -302,11 +302,46 @@ err:
return rc;
 }
 
+static void mem_container_move_task(struct container_subsys *ss,
+   struct container *cont,
+   struct container *old_cont,
+   struct task_struct *p)
+{
+   struct mm_struct *mm;
+   struct mem_container *mem, *old_mem;
+
+   mm = get_task_mm(p);
+   if (mm == NULL)
+   return;
+
+   mem = mem_container_from_cont(cont);
+   old_mem = mem_container_from_cont(old_cont);
+
+   if (mem == old_mem)
+   goto out;
+
+   /*
+* Only thread group leaders are allowed to migrate, the mm_struct is
+* in effect owned by the leader
+*/
+   if (p->tgid != p->pid)
+   goto out;
+
+   css_get(>css);
+   rcu_assign_pointer(mm->mem_container, mem);
+   css_put(_mem->css);
+
+out:
+   mmput(mm);
+   return;
+}
+
 struct container_subsys mem_container_subsys = {
.name = "mem_container",
.subsys_id = mem_container_subsys_id,
.create = mem_container_create,
.destroy = mem_container_destroy,
.populate = mem_container_populate,
+   .attach = mem_container_move_task,
.early_init = 1,
 };
_

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[-mm PATCH 1/8] Memory controller resource counters (v2)

2007-07-05 Thread Balbir Singh


From: Pavel Emelianov <[EMAIL PROTECTED]>

Introduce generic structures and routines for resource accounting.

Each resource accounting container is supposed to aggregate it,
container_subsystem_state and its resource-specific members within.

Signed-off-by: Pavel Emelianov <[EMAIL PROTECTED]>
Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---

 include/linux/res_counter.h |  102 +
 init/Kconfig|4 +
 kernel/Makefile |1 
 kernel/res_counter.c|  121 
 4 files changed, 228 insertions(+)

diff -puN /dev/null include/linux/res_counter.h
--- /dev/null   2007-06-01 08:12:04.0 -0700
+++ linux-2.6.22-rc6-balbir/include/linux/res_counter.h 2007-07-05 
13:45:17.0 -0700
@@ -0,0 +1,102 @@
+#ifndef __RES_COUNTER_H__
+#define __RES_COUNTER_H__
+
+/*
+ * resource counters
+ * contain common data types and routines for resource accounting
+ *
+ * Copyright 2007 OpenVZ SWsoft Inc
+ *
+ * Author: Pavel Emelianov <[EMAIL PROTECTED]>
+ *
+ */
+
+#include 
+
+/*
+ * the core object. the container that wishes to account for some
+ * resource may include this counter into its structures and use
+ * the helpers described beyond
+ */
+
+struct res_counter {
+   /*
+* the current resource consumption level
+*/
+   unsigned long usage;
+   /*
+* the limit that usage cannot exceed
+*/
+   unsigned long limit;
+   /*
+* the number of insuccessful attempts to consume the resource
+*/
+   unsigned long failcnt;
+   /*
+* the lock to protect all of the above.
+* the routines below consider this to be IRQ-safe
+*/
+   spinlock_t lock;
+};
+
+/*
+ * helpers to interact with userspace
+ * res_counter_read/_write - put/get the specified fields from the
+ * res_counter struct to/from the user
+ *
+ * @cnt: the counter in question
+ * @member:  the field to work with (see RES_xxx below)
+ * @buf: the buffer to opeate on,...
+ * @nbytes:  its size...
+ * @pos: and the offset.
+ */
+
+ssize_t res_counter_read(struct res_counter *cnt, int member,
+   const char __user *buf, size_t nbytes, loff_t *pos);
+ssize_t res_counter_write(struct res_counter *cnt, int member,
+   const char __user *buf, size_t nbytes, loff_t *pos);
+
+/*
+ * the field descriptors. one for each member of res_counter
+ */
+
+enum {
+   RES_USAGE,
+   RES_LIMIT,
+   RES_FAILCNT,
+};
+
+/*
+ * helpers for accounting
+ */
+
+void res_counter_init(struct res_counter *cnt);
+
+/*
+ * charge - try to consume more resource.
+ *
+ * @cnt: the counter
+ * @val: the amount of the resource. each controller defines its own
+ *   units, e.g. numbers, bytes, Kbytes, etc
+ *
+ * returns 0 on success and <0 if the cnt->usage will exceed the cnt->limit
+ * _locked call expects the cnt->lock to be taken
+ */
+
+int res_counter_charge_locked(struct res_counter *cnt, unsigned long val);
+int res_counter_charge(struct res_counter *cnt, unsigned long val);
+
+/*
+ * uncharge - tell that some portion of the resource is released
+ *
+ * @cnt: the counter
+ * @val: the amount of the resource
+ *
+ * these calls check for usage underflow and show a warning on the console
+ * _locked call expects the cnt->lock to be taken
+ */
+
+void res_counter_uncharge_locked(struct res_counter *cnt, unsigned long val);
+void res_counter_uncharge(struct res_counter *cnt, unsigned long val);
+
+#endif
diff -puN init/Kconfig~res_counters_infra init/Kconfig
--- linux-2.6.22-rc6/init/Kconfig~res_counters_infra2007-07-05 
13:45:17.0 -0700
+++ linux-2.6.22-rc6-balbir/init/Kconfig2007-07-05 13:45:17.0 
-0700
@@ -320,6 +320,10 @@ config CPUSETS
 
  Say N if unsure.
 
+config RESOURCE_COUNTERS
+   bool
+   select CONTAINERS
+
 config SYSFS_DEPRECATED
bool "Create deprecated sysfs files"
default y
diff -puN kernel/Makefile~res_counters_infra kernel/Makefile
--- linux-2.6.22-rc6/kernel/Makefile~res_counters_infra 2007-07-05 
13:45:17.0 -0700
+++ linux-2.6.22-rc6-balbir/kernel/Makefile 2007-07-05 13:45:17.0 
-0700
@@ -58,6 +58,7 @@ obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
+obj-$(CONFIG_RESOURCE_COUNTERS) += res_counter.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <[EMAIL PROTECTED]>, the -fno-omit-frame-pointer is
diff -puN /dev/null kernel/res_counter.c
--- /dev/null   2007-06-01 08:12:04.0 -0700
+++ linux-2.6.22-rc6-balbir/kernel/res_counter.c2007-07-05 
13:45:17.0 -0700
@@ -0,0 +1,121 @@
+/*
+ * resource containers
+ *
+ * Copyright 2007 OpenVZ SWsoft Inc
+ *
+ * Author: Pavel Emelianov <[EMAIL PROTECTED]>
+ *
+ */
+
+#include 
+#include 
+#include 
+#include

[-mm PATCH 0/8] Memory controller introduction (v2)

2007-07-05 Thread Balbir Singh

Changelog since version 1

1. Fixed some compile time errors (in mm/migrate.c from Vaidyanathan S)
2. Fixed a panic seen when LIST_DEBUG is enabled
3. Added a mechanism to control whether we track page cache or both
   page cache and mapped pages (as requested by Pavel)

This patchset implements another version of the memory controller. These
patches have been through a big churn, the first set of patches were posted
last year and earlier this year at
http://lkml.org/lkml/2007/2/19/10

Ever since, the RSS controller has been through four revisions, the latest
one being
http://lwn.net/Articles/236817/

This patchset draws from the patches listed above and from some of the
contents of the patches posted by Vaidyanathan for page cache control.
http://lkml.org/lkml/2007/6/20/92

Pavel, Vaidy could you look at the patches and add your signed off by
where relevant?

At OLS, the resource management BOF, it was discussed that we need to manage
RSS and unmapped page cache together. This patchset is a step towards that

TODO's

1. Add memory controller water mark support. Reclaim on high water mark
2. Add support for shrinking on limit change
3. Add per zone per container LRU lists
4. Make page_referenced() container aware
5. Figure out a better CLUI for the controller

In case you have been using/testing the RSS controller, you'll find that
this controller works slower than the RSS controller. The reason being
that both swap cache and page cache is accounted for, so pages do go
out to swap upon reclaim (they cannot live in the swap cache).

I've test compiled the framework without the controller enabled, tested
the code on UML and minimally on a power box.

Any test output, feedback, comments, suggestions are welcome!

series

res_counters_infra.patch
mem-control-setup.patch
mem-control-accounting-setup.patch
mem-control-accounting.patch
mem-control-task-migration.patch
mem-control-lru-and-reclaim.patch
mem-control-out-of-memory.patch

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21.5 june 30th to july 1st date hang?

2007-07-05 Thread Thomas Gleixner

On Thu, 2007-07-05 at 19:12 -0400, Ernie Petrides wrote:
> On Thursday, 5-Jul-2007 at 16:49 MDT, Chris Friesen wrote:
> 
> > Ernie Petrides wrote:
> > 
> > > Only kernels built with the CONFIG_HIGH_RES_TIMERS option enabled were
> > > vulnerable.
> > 
> > As I mentioned in my post to Thomas, we have high res timers disabled 
> > and were still affected.  Granted, our kernel has been modified so it is 
> > possible that vanilla would not be affectedI haven't tested it.
> > 
> > Chris
> 
> That's odd, because Thomas's patch removed two calls to clock_was_set(),
> which is a no-op when CONFIG_HIGH_RES_TIMERS is not enabled (at least in
> the 2.6.21 source tree).
> 
> Also, I personally tested with the reproducer you posted here, initially
> on a box running 2.6.22-rc4, and there were no problems (but I'm not sure
> what config options were enabled on that kernel).  I did reproduce the
> problem on a stock 2.6.21 kernel with CONFIG_HIGH_RES_TIMERS enabled.

It needs a running smp_call_function() to be interrupted by the timer
interrupt, which calls clock_was_set(). So it's not that easy to
reproduce.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21.5 june 30th to july 1st date hang?

2007-07-05 Thread Thomas Gleixner

On Thu, 2007-07-05 at 17:45 -0600, Chris Friesen wrote:
> Ernie Petrides wrote:
> 
> > That's odd, because Thomas's patch removed two calls to clock_was_set(),
> > which is a no-op when CONFIG_HIGH_RES_TIMERS is not enabled (at least in
> > the 2.6.21 source tree).
> 
> I'm using a modified 2.6.10 tree...I expect the timer code is different.

Way different and you have extra patches on top.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Kyle Moffett


On Jul 06, 2007, at 00:03:15, Nigel Cunningham wrote:

On Friday 06 July 2007 13:54:15 Benjamin Herrenschmidt wrote:

On Fri, 2007-07-06 at 09:35 +1000, Nigel Cunningham wrote:


Nice try :) Okay then, you remove the freezer, try hibernating,  
then get back to me after you've fixed your filesystem because  
some process that wasn't frozen started writing things after the  
atomic copy (making the on disk filesystem inconsistent with the  
snapshot).


As Pavel rightly said, you can get rid of the freezer, but you're  
only going to have to implement another one that does the  
essentially the same thing, even if it is at some other level.


I was mostly talking about STR... Regarding STD, we have a  
different problem and we all know it. The freezer is one somewhat  
horrible way to get it working for now, I would prefer something  
more along the way that blocks the page cache from writing out new  
dirty pages though, except those specifically flagged by the  
snapshot.


That is, some kind of proper snapshotting facility, as linus was  
describing some time ago.


The kind of thing Linus was talking about would limit you (as  
swsusp and uswsusp do now) to only half the amount of memory.


How so?  Suppose hibernate is implemented like this:

(1) Userspace program calls sys_freeze_processes()
  (a) Pokes all CPUs with IPMIs and tells them to finish the  
currently running timeslot then stop
  (b) Atomically sends SIGSTOP to all userspace processes in a non- 
trappable way, except the calling process and any process which is  
ptracing it.

  (c) Returns to the calling process.

(2) Userspace process sends SIGCONT to only those processes which are  
necessary for sync and a device-mapper snapshot.


(3) Userspace calls sys_snapshot_kernel(snapshot_overhead_pages)
  (a) Kernel starts freeing memory and swapping stuff out to make  
room for a copy of *kernel* memory (not pagecache, not process RAM).   
It does the same for at least snapshot_overhead_pages extra (used by  
userspace later).  It then allocates this memory to keep it from  
going away.  Since most processes are stopped we won't have much else  
competing with us for the RAM.
  (a) Kernel uses the device-mapper up-call-into-filesystem  
machinery to get all mounted filesystems synced and ready for a DM  
snapshot.  This may include sending data via the userspace processes  
resumed in (2).  Any deadlocks here are userspace's fault (see (2)).   
Will need some modification to handle doing multiple blockdevs at a  
time.  Anything using FUSE is basically perma-synced anyways (no dep- 
handling needed), and anything using loop should already be handled  
by DM.  This includes allocating memory for the basic snapshot  
datastructures.
  (b) At this point all blockdev operations should be halted and  
disk caches flushed; that's all we care about.
  (c) Go through the device tree and quiesce DMA and shut off  
interrupts.  Since all the disks are synced this is easy.
  (d) Use IPMIs again to get all the CPUs together, which should be  
easy as most processes are sleeping in IO or SIGSTOPed, and we're  
getting no interrupts.
  (e) One CPU turns off all interrupts on itself and takes an atomic  
snapshot of kernel memory into the previously allocated storage.   
Once again, does not include pagecache.  The kernel also records a  
list of what pages *are* included in the pagecache.  It then marks  
all userspace pages as copy-on-write.
  (f) That CPU finalizes the modified DM snapshot using the  
previously-allocated memory.
  (g) That CPU frees up the snapshot_overhead_pages memory allocated  
during step (a) for userspace to use.
  (h) The CPU does the equivalent of a "swapoff -a" without  
overwriting any data already on any swap device(s).

  (i) The CPU then IPMI-signals the other CPUs to wake them up
  (j) The kernel returns a FD-reference to the snapshot and the read- 
only halves of the CoW pagecache to the process which called  
sys_snapshot_kernel().


(4) The userspace process now has a reference to the copy of the  
kernel pages and the unmodified pagecache pages.  Since 99% of the  
processes aren't running, we aren't going to be having to CoW many of  
the pagecache pages.


(5) The userspace process uses read() or other syscalls to get data  
out of the kernel-snapshot FD in small chunks, within its  
snapshot_overhead_pages limit.  It compresses these and writes them  
out to the snapshot-storage blockdev (must not be mounted during  
snapshot), or to any network server.


(6) The userspace process syncs the disks and halts the system.  Any  
changed filesystem pages after the pseudo-DM-snapshot should have  
been stored in semi-volatile storage somewhere and will be discarded  
on the next reboot.


So basically your hibernate-overhead would consist of:
  (1) The pages necessary for the atomic snapshot of kernel memory  
and the list of pagecache pages at that time
  (2) A little memory necessary for the kernel non-persistent DM

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Benjamin Herrenschmidt


> I/O from swsusp and suspend2 use bios directly, so the page cache isn't an 
> issue for them (apart from the fact that Suspend2 saves the page cache 
> separately so it can get a full image). Not sure about uswsusp.
> 
> Only having half the amount of memory doesn't sound like a big limitation for 
> modern desktops & laptops, but don't forget that there are embedded guys 
> wanting to hbernate too :)

Wait wait wait ... uses the BIOS ? what do you mean ?

I know that for example, things like MacOS X use a separate polled path to
the storage driver for suspend (works fine for the built-in IDE, but more
complicated in large scale). If you can use BIOS calls to write your
suspend image, that is, if you don't need any of the normal block
infrastructure, then you don't need a freezer ! not at all !

You just do like STR ... and at the end of the day, once you have stopped
all your driver, you shut interrupts off and do the BIOS thing

I fail to see how processes could dirty pages while/after the BIOS thingy :-)

But then, the problem with that approach is that of course, you need a BIOS
capable of doing that (or a special sideband path to the "blessed" block
driver that will be used for suspend ... not necessarily a hard thing to do,
would be trivial to add support to drivers/ide or libata for that sort of
things I suppose).

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Valgrinding the kernel?

2007-07-05 Thread Dan Kegel


It'd be nice to see if Valgrind could catch uninitialized
references in the kernel, if only to see if Coverity is
missing anything that happens in practice.

Back in December 2002, Valgrind started to run UML:
http://user-mode-linux.sourceforge.net/diary.html
http://marc.info/?l=linux-kernel=104035199923121=2
but it wasn't quite usable, and it seems broken since then.
The last note I could find about this was from Jeff In July 2005:
http://marc.info/?l=linux-kernel=112273702329952=2

Has there been any motion since then?

Thanks,
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull][resend] Input updates for 2.6.22-rc7

2007-07-05 Thread Linus Torvalds

On Thu, 5 Jul 2007, Dmitry Torokhov wrote:
> 
> I was not... The copy I got from LKML looks fine in Kmail, Gmail and
> MS Outlook.   ^^

That's the problem.

It *looks* *fine*.

The "" doesn't look any different from a regular space. There's no 
way to tell the difference. Except it doesn't *work* the same. You 
cut-and-paste it into a shell window, and that shell window will not 
consider a nbsp to be the same thing as a space.

And I don't really understand why Kmail would do something like that. They 
obviously do it on purpose, since the nbsp wasn't there originally, so 
they literally go do extra work to _corrupt_ the data they cut-and-paste. 
Why do it? Who knows. But it's really sad.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Nigel Cunningham

Hi.

On Friday 06 July 2007 13:54:15 Benjamin Herrenschmidt wrote:
> On Fri, 2007-07-06 at 09:35 +1000, Nigel Cunningham wrote:
> > 
> > Nice try :) Okay then, you remove the freezer, try hibernating, then get 
back 
> > to me after you've fixed your filesystem because some process that wasn't 
> > frozen started writing things after the atomic copy (making the on disk 
> > filesystem inconsistent with the snapshot).
> > 
> > As Pavel rightly said, you can get rid of the freezer, but you're only 
going 
> > to have to implement another one that does the essentially the same thing, 
> > even if it is at some other level.
> 
> I was mostly talking about STR... Regarding STD, we have a different
> problem and we all know it. The freezer is one somewhat horrible way to
> get it working for now, I would prefer something more along the way that
> blocks the page cache from writing out new dirty pages though, except
> those specifically flagged by the snapshot.
> 
> That is, some kind of proper snapshotting facility, as linus was
> describing some time ago.

The kind of thing Linus was talking about would limit you (as swsusp and 
uswsusp do now) to only half the amount of memory. I suppose you could lzf 
compress as you did the snapshot. That would generally get you up to 2/3rds, 
but then again you can't know what compression ratio you'll get until you 
try, so reliability would suffer or it would take longer because of retrying.

I/O from swsusp and suspend2 use bios directly, so the page cache isn't an 
issue for them (apart from the fact that Suspend2 saves the page cache 
separately so it can get a full image). Not sure about uswsusp.

Only having half the amount of memory doesn't sound like a big limitation for 
modern desktops & laptops, but don't forget that there are embedded guys 
wanting to hbernate too :)

Regards,

Nigel
-- 
Nigel Cunningham
Christian Reformed Church of Cobden
103 Curdie Street, Cobden 3266, Victoria, Australia
Ph. +61 3 5595 1185 / +61 417 100 574
Communal Worship: 11 am Sunday.


pgphQk1pG4rx2.pgp
Description: PGP signature

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Benjamin Herrenschmidt


> How about a freezer whose job it is to "wait for pending hard  
> interrupts to complete when we have already guaranteed that we won't  
> get any more"?  That part should be really *REALLY* easy.  You don't  
> need to care about either userspace processes or kernel threads at  
> all.  Specifically, Step 1 consists of:

Well, waiting for pending DMA and making sure to not trigger more
activity is what driver suspend() is supposed to do. With the ability
for simple drivers that can cope with it to just basically use a
late_suspend(), called after IRQs are off, that basically does what you
describe: wait for pending HW tasks to complete (polling) and turn the
damn thing off.

Note that the later is really a shortcut for somewhat dump and directly
accessible devices (PCI comes to mind). Things like USB has to use the
"normal" mechanism of blocking IOs etc... at suspend(), at least, USB
devices have to since the USB HC will not issue any new URBs. (And will
return them with a nice error code which is a perfect way to deal with
it in driver, been there, it works fine, once we fixed the races in the
USB host code itself, which I think we pretty much did by now).

> Scheduling and userspace are all still fully enabled in this  
> scenario.  Once all your devices are turned off, the only remaining  
> running threads will be those which haven't done IO since the  
> beginning of the suspend.  We can then disable preemption, turn off  
> the timer interrupts, and tell the other CPUs to park all their  
> remaining threads in schedule() and sleep.  Then we put the IRQ  
> controller to sleep and go to sleep ourselves.  If our driver model  
> locking is sufficient to handle putting a parent device to sleep  
> while threads are sleeping on a child device then there are exactly 0  
> problems.

What you propose is basically a slightly over-simplistic version of what
I think (and Paulus think) should be done. We do need to do it via
driver callbacks down the tree since only drivers can know how to deal
with their DMA etc... and ordering need to be respected, but that's
basically it.

And guess what ? It's what we do on powerbooks, and it works fine,
without a freezer :-)

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Jeremy Maitin-Shepard

Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes:

[snip]

> At the end of the day, I stand my ground, the freezer cannot be made
> reliable without massive infrastructure changes or giving up on very
> useful features such as fuse among others. Besides, it only partially
> "hides" the problem of requests going to drivers, thus it's a bad
> solutions.

I agree that the freezer absolutely should not be used for suspend to
ram ("suspend"), since it is unnecessary with properly written drivers,
which are important to have anyway.  It seems that it is indeed the
consensus that it will be phased out sooner or later.

It does seem that the current device suspend interface does not tell the
drivers enough, since as discussed, they need to know whether to merely
block if they receive a request while suspended (as should be done while
initiating a suspend to ram), or if they should wake up the device (as
should be done if a suspend to ram is not in progress).  Clearly these
two cases need to be addressed by every driver supporting suspend/resume
(but possibly indirectly if the subsystem handles it for them).

The current hibernate approach used by all of the existing
implementations for Linux seems to depend fundamentally on the freezer,
though, in order to actually save the system state.  Thus, it will still
be necessary to fix all of the issues with the freezer, or adopt an
alternate hibernate approach (which is unlikely).  Unfortunately, even
leaving kernel threads and certain drivers running after the snapshot is
taken means that the saved image isn't completely correct, and the
freezer cannot help with these issues.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Benjamin Herrenschmidt

On Fri, 2007-07-06 at 09:35 +1000, Nigel Cunningham wrote:
> 
> Nice try :) Okay then, you remove the freezer, try hibernating, then get back 
> to me after you've fixed your filesystem because some process that wasn't 
> frozen started writing things after the atomic copy (making the on disk 
> filesystem inconsistent with the snapshot).
> 
> As Pavel rightly said, you can get rid of the freezer, but you're only going 
> to have to implement another one that does the essentially the same thing, 
> even if it is at some other level.

I was mostly talking about STR... Regarding STD, we have a different
problem and we all know it. The freezer is one somewhat horrible way to
get it working for now, I would prefer something more along the way that
blocks the page cache from writing out new dirty pages though, except
those specifically flagged by the snapshot.

That is, some kind of proper snapshotting facility, as linus was
describing some time ago.

Ben.

> > > >  - Silently add GFP_NOIO to all allocations, to avoid having things
> > > > blocking in kmalloc() with a mutex held that will deadlock with
> > > > suspend() in a driver for example. Or set some way to have all GFP
> > > > waiters wakeup and fail rather than wait for IOs. It's hard/bizarre but
> > > > necessary, again, with or without a freezer.
> > > 
> > > GFP_ATOMIC? (In driver suspend, they shouldn't be sleeping either, right?)
> > 
> > NOIO should be enough I think but ATOMIC would do).
> >  
> > That's one of the reason why I used to have the pre-suspend and
> > post-resume hooks in my original powermac implementation, for those few
> > drivers complicated enough to require some pre-allocations.
> >  
> > > >  - Deal with the firmware problem. The best way is probably to have an
> > > > async request_firmware interface(). Another thing is, drivers may want
> > > > to cache their firmware in main memory, that sort of thing...
> > > >
> > 
> > Note that the above firmware problem could be dealt with also with the
> > pre-suspend/post-resume. Allowing to pre-request firmware etc... and
> > keep it around until after resume, because we know we will need it.
> > Gives a chance to drivers to perform things while the system is still
> > live, filesystems still working, etc... (big memory allocations for
> > example).
> > 
> > > > And that's just a small list off the top of my mind, of known problems
> > > > that will cause deadlocks or misbehaviours today, with or without the
> > > > freezer, and that need to be addressed.
> > > 
> > > Userspace device drivers too?
> > 
> > Maybe but they are less of an issue, most of the time, they don't do DMA
> > or whatever harmful things. If they are USB drivers, for example, they
> > are an non-issues at that level.
> 
> (Leaving the rest of the message intact so we don't have to fragment the 
> discussion into a million subthreads).
> 
> Regards,
> 
> Nigel
> 
> 
> Invalid signature
> 
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull][resend] Input updates for 2.6.22-rc7

2007-07-05 Thread Dmitry Torokhov

On Thursday 05 July 2007 19:09, Linus Torvalds wrote:
> 
> On Thu, 5 Jul 2007, Dmitry Torokhov wrote:
> > 
> > Please consider pulling from:
> > 
> > Â  Â  Â  Â  git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git 
> > for-linus
> 
> There's somethign wrong with your emails, and it's very irritating.
> 

Oops, sorry.

> I cannot just cut-and-paste the whole line, because your tabs and spaces 
> aren't tabs and spaces, they are some horrible abomination.
> 

Yes, for some reason Kmail somethmes does this when cut-and-pasting
from another email. I don't think it does it when cutting and pasting
from anywhere else...

> What _looks_ like a tab above, when I save it and look at it with "od", it 
> shows it true nasty life: it's not a tab, and it's not even eight spaces, 
> it's four copies of the byte sequence '\302\240 ' ('\xC2\xA0\x20'), ie 
> some horrid nasty three-byte sequence where one character is a space, and 
> the previous two characters are some utf-8 abomination.
> 
> I have no idea what kind of crap you use to generate it, and quite 
> frankly, I don't want to know. I just want it to stop, so that when I 
> cut-and-paste, I don't get random UTF-8 characters that just *look* like 
> spaces but don't act like it, and cause my shell to very reasonably whine 
> about the result.
> 
> I think the "c2 a0" character is the utf-8 representation of a  
> (non-breaking space), but:
>  - you are damn well sending text
>  - it's followed by a regular space, so it's stupid
>  - please don't do it.
> 
> It says your user-agent is "Kmail", and maybe there is some way to fix it. 
> And if kmail is correct, please make a bug-report to the kmail people. 
> Sending hidden invisible utf-8 crap that looks like space, but doesn't act 
> like it, is just damn impolite by kmail. I assume you weren't even aware 
> of the random crud you are sending out?
>

I was not... The copy I got from LKML looks fine in Kmail, Gmail and
MS Outlook.

The thing is I like Kmail because it never screwed up patches that I sent
out. I guess I will just stop cutting and pasting e-mails and just adjust
my script that generates changelog and diffstat. The issue may even be
already fixed in newer versions of Kmail, I am a little behind times with
my setup...

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] bloody mess with attribute() syntax

2007-07-05 Thread Al Viro

On Thu, Jul 05, 2007 at 01:56:35PM -0700, Linus Torvalds wrote:

> Is it slightly complex? Yes. It's a bit strange that the SYM_PTR doesn't 
> contain the information about the *pointer*, and the real information 
> about an object is actually "one removed" from the type infromation, but 
> it's a rather direct result of how sparse parses and maintains the type 
> information.

Not only that, but it's a fairly natural if you look at that as
lazy expression in type space...  Fortunately, we do have referential
transparency there, unlike e.g. in expression graphs handling.

BTW, one really ugly thing about __attribute__((mode(...))) is that
int *A;
int B;
typeof (A) __attribute__((mode(__pointer__))) p;
typeof (B) __attribute__((mode(__pointer__))) q;

gives int *p and intptr_t q resp.  IOW, we can't eliminate the damn thing
in parser unless we are willing to deal with typeof() in there, and I'd
rather not.

It really looks like we have to delay at least some of those suckers
until examine_... time.  IOW, new kind of SYM_... nodes.

FWIW, I'm going to kill off direct messing with symbol->type et.al.
in evaluate.c and trim that stuff down to few primitives provided
by symbol.c; classify_type() is one such thing, but it really ought
to be lazy - i.e. do not assume that type is already examined,
do just enough type expression evaluation to get the derivation
type and be done with that; we probably want to get more degrees
of ->examined.  Plus "find all qualifiers", "find all qualifiers of
pointed-to", type-related part of degenerate(), type_difference()
(after lifting !Wtypesign stuff into compatible_assignment_types())
and "calculate compatible type".  All lazy...  We probably want
to go for more grades of ->examined, while we are at it.

After that we'll have much more straightforward logics in evaluate.c
and free hands for fixing the handling of attributes, etc.

Eventually I'd like to kill off MOD_CHAR/MOD_SHORT/MOD_LONG/MOD_LONGLONG
as ->modifiers bits and separate the use of struct ctype for declaration
parser state from that in struct symbol; we *are* tight on bits
there, we have a bunch of MOD_... that make sense only in parser
state (MOD_BITWISE is the same kind of thing, BTW) and parser context
might need to grow, which is obviously not nice for struct symbol.
Very few places really care about MOD_SPECIFIER outside of parser and
they could be dealt with in saner way...

BTW, what the hell is struct symbol ->value and what's SYM_MEMBER is
supposed to be used for?  AFAICS, nothing ever sets them these days
and SYM_MEMBER appears to have never been used in the entire history
of sparse...

I'm documenting the existing type system (i.e. uses of struct symbol,
etc.); I think I've got most of the picture by now, will post when
it's done.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [-mm Patch] INFINIBAND: check the return value of kmalloc

2007-07-05 Thread WANG Cong

On Thu, Jul 05, 2007 at 02:42:37PM -0700, Roland Dreier wrote:
>thanks, I added Jesper's suggestion to the original patch and queued
>this for 2.6.23:
>
>(Steve, let me know if this looks OK or not to you)
>
>commit 8d339921a2cb279457dce79f8a308978e0b41b27
>Author: WANG Cong <[EMAIL PROTECTED]>
>Date:   Thu Jul 5 14:40:32 2007 -0700
>
>RDMA/cxgb3: Check return of kmalloc() in iwch_register_device()
>
>Signed-off-by: WANG Cong <[EMAIL PROTECTED]>
>[ Also remove cast from void * return of kmalloc() as suggested by
>  Jesper Juhl <[EMAIL PROTECTED]>. ]
>Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>
>

Very neat. Thanks!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUGFIX]{PATCH] flush icache on ia64 take2

2007-07-05 Thread KAMEZAWA Hiroyuki

This is a patch for fixing icache flush race in ia64(Montecito) by implementing
flush_icache_page() at el.

Changelog:
 - updated against 2.6.22-rc7 (previous one was against 2.6.21)
 - removed hugetlbe's lazy_mmu_prot_update().
 - rewrote patch description.
 - removed patch against mprotect() if flushes cache.

Brief Description:
In current kernel, ia64 flushes executable page's icache by
lazy_mmu_prot_update() after set_pte(). But multi-threaded programs can access
text page if pte says "a page is present". Then, there is a race condition.
This patch guarantees cache is flushed before set_pte() call.

Basic Information:
Montecito, new ia64 processor, has separated L2 I-cache and L2 D-cache,
and I-cache and D-cache is not guaranteed to be consistent in automatic way.

L1 cache is also separated but L1 D-cache is write-through. 

Montecito has separated L2 cache and Mixed L3 cache. And...L2 D-cache is
*write back*. (See http://download.intel.com/design/Itanium2/manuals/
30806501.pdf section 2.3.3)

What we found and my understanding:

In our environment, I found SIGILL occurs when...
 * A multi threaded program is executed. (a program is a HPC program and uses
   automatic parallelizaiton.)
 * Above program is on NFS.
 * SIGILL happes only when a file is newly loaded into the page-cache.
 * Where failure occurs is not fixed. SIGILL comes at random instruction point.
 * This didn't happen before Montecito.

>From Itanium2 document, we can see
 * Any invalidation of a cache-line will invalidate an I-cache.
 * I-cache and D-cache is not coherent.
 * L1D-cache to L2 cache is *write-through*.
 * L2D-cache to L3-mixed cache is *write-back*.

And we don't see this problem before Montecito. Big difference between old ones
and Montecito is 
 * old cpus has Mixed L2 cache.
 * Montecito has separated L2I-cache and L2D-cache.

Following is my understanding. I'd like to hear ia64 specialist's opinion.

Assume CPU(A) and CPU(B).

 1. CPU(A) causes a page fault in text and calls do_no_page().
 2. CPU(B) executes NFS's ops and fill page with received RPC result(from NFS).
and do SetPageUptodate().
 3. CPU(A) continues a page fault operation and calls set_pte().
 4. CPU(B)'s another thread executes a text in the page which CPU(A) mapped.
 5-a. CPU(A) calls lazy_mmu_prot_update() and flush icache (this is slow).
 5-b. CPU(B) continues execution and cause SIGILL by an instruction which
  CPU(A) haven't flushed yet.

 In stage 2. , CPU(B) loads all text data into L2-Dcache and L3-mixed cache. 
 And write them.
 But data which can be accessed by CPU(B)'s L2-Icache is in L3-mixed cache.
 Then, If write back from L2-Dcache to L3-mixed cache is delayed, L2-Icache
 of CPU(B) will fetch wrong data.
 Note: CPU(A) will fetch fetch correct instruction in above case.

 Usual file systems uses DMA and it purges cache. But NFS uses copy-by-cpu.

 Anyway, there is SIGILL problem in NFS/ia64 and icache flush can fix
 SIGILL problem (in our HPC team test.)

 calling lazy_mmu_prot_update() before set_pte() can fix this. But
 it seems strange way. Then, flush_icache_page() is implemented.
 
 Note1: icache flush is called only when VM_EXEC flag is on and 
PG_arch_1 is not set.
 Note2: description in Devid Mosberger's "ia64 linux kernel" pp204 says
   "linux taditionally maps the memory stack and memory allocated by
the brk() with executable permission turned on"
But this is changed now. anon/stack is not mapped as executable usually.
It depens on READ_IMPLIES_EXEC personality. So checking VM_EXEC is 
enough.

What this patch does:
 1. remove all lazy_mmu_prot_update()...which is used by only ia64.
 2. implements flush_cache_page()/flush_icache_page() for ia64.


Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>

---
 arch/ia64/mm/init.c   |7 +--
 include/asm-generic/pgtable.h |4 
 include/asm-ia64/cacheflush.h |   24 ++--
 include/asm-ia64/pgtable.h|9 -
 mm/fremap.c   |1 -
 mm/hugetlb.c  |5 +++--
 mm/memory.c   |   13 ++---
 mm/migrate.c  |6 +-
 mm/mprotect.c |1 -
 mm/rmap.c |1 -
 10 files changed, 37 insertions(+), 34 deletions(-)

Index: devel-2.6.22-rc7/include/asm-ia64/cacheflush.h
===
--- devel-2.6.22-rc7.orig/include/asm-ia64/cacheflush.h
+++ devel-2.6.22-rc7/include/asm-ia64/cacheflush.h
@@ -10,18 +10,38 @@
 
 #include 
 #include 
+#include 
 
 /*
  * Cache flushing routines.  This is the kind of stuff that can be very 
expensive, so try
  * to avoid them whenever possible.
  */
+extern void __flush_icache_page_ia64(struct page *page);
 
 #define flush_cache_all()  do { } while (0)
 #define flush_cache_mm(mm) do { } while (0)
 #define flush_cache_dup_mm(mm) do { } while (0)

[PATCH] PCI: do not delay when changing power states on Geode

2007-07-05 Thread Andres Salomon

Geode hardware requires no delay when doing power transition for PCI;
the board doesn't even have a real PCI bus.  Thanks to Tom Sylla for
pointing this out.  We can save precious milliseconds when changing power
states to D3hot by getting rid of this delay.

We do this as a PCI quirk.

Signed-off-by: Andres Salomon <[EMAIL PROTECTED]>
---

 drivers/pci/quirks.c |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 01d8f8a..b85fd5f 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1368,6 +1368,20 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x260a, 
quirk_intel_pcie_pm);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,   0x260b, quirk_intel_pcie_pm);
 
 /*
+ * Geode hardware does not require any sort of delay when
+ * transitioning between PCI power states.  This allows us to shave
+ * off time when doing suspend/resume.
+ */
+static void __init quirk_geode_pci_pm(struct pci_dev *dev)
+{
+   pci_pm_d3_delay = 0;
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CYRIX, PCI_DEVICE_ID_CYRIX_5530_LEGACY,
+   quirk_geode_pci_pm);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CS5536_ISA,
+   quirk_geode_pci_pm);
+
+/*
  * Toshiba TC86C001 IDE controller reports the standard 8-byte BAR0 size
  * but the PIO transfers won't work if BAR0 falls at the odd 8 bytes.
  * Re-allocate the region if needed...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][Patch] Allow not mounting a root fs

2007-07-05 Thread H. Peter Anvin

Bodo Eggert wrote:
> This patch adds the option to not mount another root filesystem 
> by specifying root=initramfs.

Uhm, the kernel doesn't mount anything if you're using an initramfs.

> BTW: Is it possible to mount a tmpfs on / before extracting the cpio?

Not in the stock kernel.  There have been some patches floating around
for that, I think.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Does the kernel HPET support has problems or the hwclock from util-linux?

2007-07-05 Thread rae l

On 7/3/07, Luca Tettamanti <[EMAIL PROTECTED]> wrote:

rae l <[EMAIL PROTECTED]> ha scritto:
> from this address, I know util-linux-2.12r is the latest:
> http://www.kernel.org/pub/linux/utils/util-linux/util-linux-2.12r.lsm
>
> My Dell OptiPlex 320 has 4 HPET timers and no RTC, so the execution of
> hwclock has errors:
>
> [EMAIL PROTECTED] ~ $ /sbin/hwclock --show
> select() to /dev/rtc to wait for clock tick timed out
> [EMAIL PROTECTED] ~ $ /sbin/hwclock --version
> hwclock from util-linux-2.12r

I think that the problem is that HPET and the CMOS RTC (list in CC)
share the same interrupt line.
I suppose that you should enable CONFIG_HPET_RTC_IRQ (my hardware has
the same "feature"); in this way /dev/rtc correcly reports that it
cannot deliver the interrupt (when HPET is enabled) and hwclock uses
direct ISA access.

I searched the CONFIG_HPET_RTC_IRQ, it's conflict with
HPET_EMULATE_RTC, that I've enabled.
so while HPET_RTC_IRQ is the good way, EMULATE_RTC is a bad way.

Luca
--
Windows NT crashed.
I'm the Blue Screen of Death.
No one hears your screams.

--
Denis Cheng
Linux Application Developer
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][Patch] Allow not mounting a root fs

2007-07-05 Thread Bodo Eggert

This patch adds the option to not mount another root filesystem 
by specifying root=initramfs.

TODO: Documentation
---

BTW: Is it possible to mount a tmpfs on / before extracting the cpio?


While I'm at it:

In init/do_mounts.c, mount_root(void):
ROOT_NFS: Is it desirable to use the floppy as a root device if nfs-root 
failed? If I tell my system to d A, i usually dislike it to do B instead.

Is the boot-from-floppy code here still usable, even though booting from 
floppy is no longer supported?



diff -X dontdiff -pruN 2.6.21.ori/init/do_mounts.c 2.6.21/init/do_mounts.c
--- 2.6.21.ori/init/do_mounts.c 2007-07-06 03:31:57.0 +0200
+++ 2.6.21/init/do_mounts.c 2007-07-06 03:27:33.0 +0200
@@ -427,6 +427,9 @@ void __init prepare_namespace(void)
mount_block_root(root_device_name, root_mountflags);
goto out;
}
+   if (!strncmp(root_device_name, "initramfs", 3)) {
+   goto out_nomount;
+   }
ROOT_DEV = name_to_dev_t(root_device_name);
if (strncmp(root_device_name, "/dev/", 5) == 0)
root_device_name += 5;
@@ -444,6 +447,7 @@ void __init prepare_namespace(void)
 out:
sys_mount(".", "/", NULL, MS_MOVE, NULL);
sys_chroot(".");
+out_nomount:
security_sb_post_mountroot();
 }
 
-- 
Top 100 things you don't want the sysadmin to say:
65. What do you mean /home was on that disk?  I umounted it!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/2] raid5: 65% sequential-write performance improvement, stripe-queue take2

2007-07-05 Thread Dan Williams

On 04 Jul 2007 13:41:26 +0200, Andi Kleen <[EMAIL PROTECTED]> wrote:

Dan Williams <[EMAIL PROTECTED]> writes:

> The write performance numbers are better than I expected and would seem
> to address the concerns raised in the thread "Odd (slow) RAID
> performance"[2].  The read performance drop was not expected.  However,
> the numbers suggest some additional changes to be made to the queuing
> model.

Have you considered supporting copy-xor in MD for non accelerated
RAID? I've been looking at fixing the old dubious slow crufty x86 SSE
XOR functions.

Copy-xor is something that Neil suggested at the beginning of the
acceleration work.  It was put on the back-burner, but now that the
implementation has settled it can be revisited.

One thing I discovered is that it seems fairly
pointless to make them slower with cache avoidance when most of the data is
copied before anyways. I think much more advantage could be gotten by
supporting copy-xor because XORing during a copy should be nearly
free.

Yes, it does not make sense to have cache-avoidance mismatched copy
and xor operations in MD.  However, I think the memcpy should be
changed to a cache-avoiding memcpy rather than caching the xor data.
Then a copy-xor implementation will have a greater effect, or do you
see it differently?

On the other hand ext3 write() also uses a cache avoiding copy now
and for the XOR it would need to load the data from memory again.
Perhaps this could be also optimized somehow (e.g. setting a flag
somewhere and using a normal copy for the RAID-5 case)

The incoming async_memcpy call has a flags parameter where this could go...

One possible way to implement support for copy-xor (and xor-copy-xor
for that matter) would be to write a soft-dmaengine driver.  When a
memcpy is submitted it can hold off processing it to see if an xor
operation is attached to the chain.  Once the xor descriptor is
attached the implementation will know the location of all the incoming
data, all the existing stripe data and the destination for the xor.

-Andi

Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: slow down printk during boot.

2007-07-05 Thread Dave Jones

On Fri, Jul 06, 2007 at 03:50:11AM +0200, Bodo Eggert wrote:
 > Dave Jones <[EMAIL PROTECTED]> wrote:
 > 
 > > This patch from Randy has proven quite useful from time to time,
 > > and has been in Fedora kernels for a while for that reason.
 > > I fixed up some checkpatch warnings, and rediffed it a bunch
 > > of times, Randy did the heavy lifting.
 > > 
 > > ---
 > > 
 > > This one delays each printk() during boot by a variable time
 > > (from kernel command line), while system_state == SYSTEM_BOOTING.
 > > Caveat:  it's not terribly SMP safe or SMP nice.
 > > Any ideas for improvements (esp. in the SMP area) are appreciated.
 > 
 > I just created a different patch, which replaces the panic with one that
 > does allow scrolling and reading the messages. Maybe that would be the
 > better approach ...

I've used the 'slow down the printk' patch in situations where we
haven't panic'd, but silently rebooted instantly or hung with
a black screen.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SATA exceptions

2007-07-05 Thread Tejun Heo

Hello,

S.Çağlar Onur wrote:
> [ 4260.278427] ata1.00: cmd ca/00:08:d0:88:bc/00:00:00:00:00/ee tag 0 cdb 0x0 
> data 4096 out
> [ 4260.278430]  res 51/40:01:d7:88:bc/00:00:0e:00:00/ee Emask 0x9 
> (media error)

That's media error on sector 247236823 on WRITE.  Media errors on write
are bad signs - it usually means the drive even failed to remap the
sector because extra space ran out.  I'm not sure this is the case here
tho - the smart log is clear.  Please run smart short/long tests and see
what they say.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Print utsname on Oops on all architectures

2007-07-05 Thread Joshua Wise

Hopefully this patch has not been munged by pine; I have, minimally,
unchecked the mung-patches-sent-to-lkml option in pine's config. In the case
that it has been munged, I have also attached it.

--

From: Joshua Wise <[EMAIL PROTECTED]>

Background:
 This patch is a follow-on to "Info dump on Oops or panic()" [1].
 
 On some architectures, the kernel printed some information on the running
 kernel, but not on all architectures. The information printed was generally
 the version and build number, but it was not located in a consistant place,
 and some architectures did not print it at all.

Description:
 This patch uses the already-existing die_chain to print utsname information
 on Oops. This patch also removes the architecture-specific utsname
 printers. To avoid crashing the system further (and hence not printing the
 Oops) in the case where the system is so hopelessly smashed that utsname
 might be destroyed, we vsprintf the utsname data into a static buffer
 first, and then just print that on crash.

Testing:
 I wrote a module that does a *(int*)0 = 0; and observed that I got my
 utsname data printed.

Potential impact:
 This adds another line to the Oops output, causing the first few lines to
 potentially scroll off the screen. This also adds a few more pointer
 dereferences in the Oops path, because it adds to the die_chain notifier
 chain, reducing the likelihood that the Oops will be printed if there is
 very bad memory corruption.

Patch:
 This patch is against git 0471448f4d017470995d8a2272dc8c06dbed3b77.

Signed-Off-By: Joshua Wise <[EMAIL PROTECTED]>

[1] http://lkml.org/lkml/2007/6/28/316

--

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 92b6162..13f342e 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -20,7 +20,6 @@ #include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 8423617..e4448f0 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -28,7 +28,6 @@ #include 
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -203,10 +202,8 @@ void __show_regs(struct pt_regs *regs)
unsigned long flags;
char buf[64];
 
-   printk("CPU: %d%s  (%s %.*s)\n",
-   smp_processor_id(), print_tainted(), init_utsname()->release,
-   (int)strcspn(init_utsname()->version, " "),
-   init_utsname()->version);
+   printk("CPU: %d%s\n",
+   smp_processor_id(), print_tainted());
print_symbol("PC is at %s\n", instruction_pointer(regs));
print_symbol("LR is at %s\n", regs->ARM_lr);
printk("pc : [<%08lx>]lr : [<%08lx>]psr: %08lx\n"
diff --git a/arch/i386/kernel/process.c b/arch/i386/kernel/process.c
index 06dfa65..1703243 100644
--- a/arch/i386/kernel/process.c
+++ b/arch/i386/kernel/process.c
@@ -27,7 +27,6 @@ #include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -308,10 +307,8 @@ void show_regs(struct pt_regs * regs)
 
if (user_mode_vm(regs))
printk(" ESP: %04x:%08lx",0x & regs->xss,regs->esp);
-   printk(" EFLAGS: %08lx%s  (%s %.*s)\n",
-  regs->eflags, print_tainted(), init_utsname()->release,
-  (int)strcspn(init_utsname()->version, " "),
-  init_utsname()->version);
+   printk(" EFLAGS: %08lx%s\n",
+  regs->eflags, print_tainted());
printk("EAX: %08lx EBX: %08lx ECX: %08lx EDX: %08lx\n",
regs->eax,regs->ebx,regs->ecx,regs->edx);
printk("ESI: %08lx EDI: %08lx EBP: %08lx",
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 6e2f035..04c4abd 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -32,7 +32,6 @@ #include 
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -414,8 +413,8 @@ void show_regs(struct pt_regs * regs)
 
printk("NIP: "REG" LR: "REG" CTR: "REG"\n",
   regs->nip, regs->link, regs->ctr);
-   printk("REGS: %p TRAP: %04lx   %s  (%s)\n",
-  regs, regs->trap, print_tainted(), init_utsname()->release);
+   printk("REGS: %p TRAP: %04lx   %s\n",
+  regs, regs->trap, print_tainted());
printk("MSR: "REG" ", regs->msr);
printbits(regs->msr, msr_bits);
printk("  CR: %08lx  XER: %08lx\n", regs->ccr, regs->xer);
diff --git a/arch/x86_64/kernel/process.c b/arch/x86_64/kernel/process.c
index 5909039..94fb7e3 100644
--- a/arch/x86_64/kernel/process.c
+++ b/arch/x86_64/kernel/process.c
@@ -32,7 +32,6 @@ #include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -310,11 +309,8 @@ void __show_regs(struct pt_regs * regs)
 
printk("\n");
print_modules();
-   printk("Pid: %d, comm: %.20s %s %s %.*s\n",
-   current->pid, current->comm,

Re: slow down printk during boot.

2007-07-05 Thread Bodo Eggert

Dave Jones <[EMAIL PROTECTED]> wrote:

> This patch from Randy has proven quite useful from time to time,
> and has been in Fedora kernels for a while for that reason.
> I fixed up some checkpatch warnings, and rediffed it a bunch
> of times, Randy did the heavy lifting.
> 
> ---
> 
> This one delays each printk() during boot by a variable time
> (from kernel command line), while system_state == SYSTEM_BOOTING.
> Caveat:  it's not terribly SMP safe or SMP nice.
> Any ideas for improvements (esp. in the SMP area) are appreciated.

I just created a different patch, which replaces the panic with one that
does allow scrolling and reading the messages. Maybe that would be the
better approach ...

The patch was sent in:
Subject: [RFC][PATCH] introduce panic_gently
Message-ID: [EMAIL PROTECTED]
news:[EMAIL PROTECTED]
-- 
Justify my text? I'm sorry but it has no excuse. 

Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED]
 [EMAIL PROTECTED] [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Nigel Cunningham

Hi.

On Friday 06 July 2007 11:19:32 Kyle Moffett wrote:
> On Jul 05, 2007, at 19:35:11, Nigel Cunningham wrote:
> > On Friday 06 July 2007 09:20:43 Benjamin Herrenschmidt wrote:
> >> No, the freezer creates all those places what are harmful for a  
> >> task to block because they will break the freezer :-)
> >
> > Nice try :) Okay then, you remove the freezer, try hibernating,  
> > then get back to me after you've fixed your filesystem because some  
> > process that wasn't frozen started writing things after the atomic  
> > copy (making the on disk filesystem inconsistent with the snapshot).
> 
> Umm, this thread is NOT ABOUT HIBERNATING!!!  Please go back and read  
> the subject, specifically the "suspend to RAM" parts :-D.  When your  
> hardware can put itself to sleep and atomically preserve memory as it  
> does so, you don't need an atomic copy.  For Real Suspend(TM) (IE:  
> Suspend-to-RAM), the list of things to do is short and simple:

We agreed a while back that you don't need the freezer for suspend to ram. As 
far as I was aware, we went off-topic, so the topic is out of date.

> 1)  Stop DMA and put most hardware into low-power states (stops all  
> interrupt sources)
> 2)  Ensure that the other CPUs have finished any trailing interrupt  
> handlers and put them to sleep
> 3)  Put the interrupt-controllers into low-power state
> 4)  Go to sleep
> 
> > As Pavel rightly said, you can get rid of the freezer, but you're  
> > only going to have to implement another one that does the  
> > essentially the same thing, even if it is at some other level.
> 
> How about a freezer whose job it is to "wait for pending hard  
> interrupts to complete when we have already guaranteed that we won't  
> get any more"?  That part should be really *REALLY* easy.  You don't  
> need to care about either userspace processes or kernel threads at  
> all.  Specifically, Step 1 consists of:
> 
> suspend_device(dev)
> {
>   set_no_bind_flag(dev);
>   for (dev->subdevices)
>   suspend_device(dev);
>   set_no_io_flag(dev);
>   wait_for_in_progress_dma(dev);
>   turn_off_interrupts(dev);
>   go_to_low_power_state(dev);
> }
> 
> After you've set the "no_bind" flag, you won't get any *new*  
> subdevices trying to bind, therefore it's safe to iterate over the  
> list of present sub-devices and suspend them.  Once those are  
> suspended and in low-power states you can set a "no_io" flag to  
> prevent the driver from submitting more IO.  At that point you can  
> lazily wait for existing DMA/IO/interrupts to finish on the device,  
> since *NOBODY* will be submitting them anymore, and we certainly  
> aren't probing for new devices.  Then you can just turn off the power  
> to the device.  When all the leaf devices are off, the parent device  
> can be turned off because everything waiting on the leaf devices is  
> blocked on them and won't unblock until the parent device *AND* the  
> leaf device are turned on again, in that order.

For suspending, yes. For hibernating, that's not enough, because other 
processes can still be happily allocating and freeing memory, and will only 
get stopped when they try to do i/o or such like. If you're trying to make 
hibernation reliable, you need to be able to reliably check whether you're 
going to have enough storage for the image you're preparing, and enough 
memory for the atomic copy and so on. That's why the freezer is needed for 
hibernation. If you don't have it, any hibernation implementation you make is 
going to be only as reliable as the extent to which the system is otherwise 
idle.
 
> Scheduling and userspace are all still fully enabled in this  
> scenario.  Once all your devices are turned off, the only remaining  
> running threads will be those which haven't done IO since the  
> beginning of the suspend.  We can then disable preemption, turn off  
> the timer interrupts, and tell the other CPUs to park all their  
> remaining threads in schedule() and sleep.  Then we put the IRQ  
> controller to sleep and go to sleep ourselves.  If our driver model  
> locking is sufficient to handle putting a parent device to sleep  
> while threads are sleeping on a child device then there are exactly 0  
> problems.
> 
> Resuming is basically running the whole process in reverse.  Runtime- 
> suspend is achieved by not setting the 'no_io' or 'no_bind' flags and  
> putting selective device-subtrees to sleep without doing anything to  
> the rest of the system.

Fully agree when it comes to suspend to ram. There, this process should work, 
so far as I can see. But as I said, we went - to one degree or another - off 
topic, and did discuss hibernation too.

Regards,

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.


pgpVZjJGxwNkX.pgp
Description: PGP signature

Re: [2.6 patch] kernel/cpuset.c: cleanups

2007-07-05 Thread Paul Jackson

> - make the following needlessly global functions static:

Ok by me, but this is really the patches of Paul Menage <[EMAIL PROTECTED]>,
in particular:

  containersv10-make-cpusets-a-client-of-containers.patch

that you dealing with -- adding him to the cc list.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Kyle Moffett


On Jul 05, 2007, at 19:35:11, Nigel Cunningham wrote:

On Friday 06 July 2007 09:20:43 Benjamin Herrenschmidt wrote:
No, the freezer creates all those places what are harmful for a  
task to block because they will break the freezer :-)


Nice try :) Okay then, you remove the freezer, try hibernating,  
then get back to me after you've fixed your filesystem because some  
process that wasn't frozen started writing things after the atomic  
copy (making the on disk filesystem inconsistent with the snapshot).


Umm, this thread is NOT ABOUT HIBERNATING!!!  Please go back and read  
the subject, specifically the "suspend to RAM" parts :-D.  When your  
hardware can put itself to sleep and atomically preserve memory as it  
does so, you don't need an atomic copy.  For Real Suspend(TM) (IE:  
Suspend-to-RAM), the list of things to do is short and simple:


1)  Stop DMA and put most hardware into low-power states (stops all  
interrupt sources)
2)  Ensure that the other CPUs have finished any trailing interrupt  
handlers and put them to sleep

3)  Put the interrupt-controllers into low-power state
4)  Go to sleep

As Pavel rightly said, you can get rid of the freezer, but you're  
only going to have to implement another one that does the  
essentially the same thing, even if it is at some other level.


How about a freezer whose job it is to "wait for pending hard  
interrupts to complete when we have already guaranteed that we won't  
get any more"?  That part should be really *REALLY* easy.  You don't  
need to care about either userspace processes or kernel threads at  
all.  Specifically, Step 1 consists of:


suspend_device(dev)
{
set_no_bind_flag(dev);
for (dev->subdevices)
suspend_device(dev);
set_no_io_flag(dev);
wait_for_in_progress_dma(dev);
turn_off_interrupts(dev);
go_to_low_power_state(dev);
}

After you've set the "no_bind" flag, you won't get any *new*  
subdevices trying to bind, therefore it's safe to iterate over the  
list of present sub-devices and suspend them.  Once those are  
suspended and in low-power states you can set a "no_io" flag to  
prevent the driver from submitting more IO.  At that point you can  
lazily wait for existing DMA/IO/interrupts to finish on the device,  
since *NOBODY* will be submitting them anymore, and we certainly  
aren't probing for new devices.  Then you can just turn off the power  
to the device.  When all the leaf devices are off, the parent device  
can be turned off because everything waiting on the leaf devices is  
blocked on them and won't unblock until the parent device *AND* the  
leaf device are turned on again, in that order.


Scheduling and userspace are all still fully enabled in this  
scenario.  Once all your devices are turned off, the only remaining  
running threads will be those which haven't done IO since the  
beginning of the suspend.  We can then disable preemption, turn off  
the timer interrupts, and tell the other CPUs to park all their  
remaining threads in schedule() and sleep.  Then we put the IRQ  
controller to sleep and go to sleep ourselves.  If our driver model  
locking is sufficient to handle putting a parent device to sleep  
while threads are sleeping on a child device then there are exactly 0  
problems.


Resuming is basically running the whole process in reverse.  Runtime- 
suspend is achieved by not setting the 'no_io' or 'no_bind' flags and  
putting selective device-subtrees to sleep without doing anything to  
the rest of the system.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] util-linux-ng 2.13-rc1

2007-07-05 Thread Mike Frysinger

On Thursday 05 July 2007, Bryan Henderson wrote:
> >i dont see how blaming autotools for other people's misuse is relevant
>
> Here's how other people's misuse of the tool can be relevant to the choice
> of the tool: some tools are easier to use right than others.  Probably the
> easiest thing to use right is the system you designed and built yourself.
> I've considered distributing code with an Autotools-based build system
> before and determined quickly that I am not up to that challenge.  (The
> bigger part of the challenge isn't writing the original input files; it's
> debugging when a user says his build doesn't work).  But as far as I know,
> my hand-rolled build system is used correctly by me.

which brings us back to the package maintainer maintains the autotool source 
files, not joe blow user.  if there's trouble with the build system, then the 
maintainers (who are knowledgeable in autotools) are in a pretty easy 
position to fix/address it.  as you've stated, hand rolled build systems work 
great for the guy rolling it, but beyond that all bets are off.  util-linux 
had a hand rolled build system that fell apart in many places.  the 
maintainers of util-linux have well versed autotool people at their disposal, 
so i really dont see this as being worrisome.

> > > checks the width of integers on i386 for projects not caring about that
> > > and fails to find installed libraries without telling how it was
> > > supposed to find them or how to make it find that library.
> >
> > no idea what this rant is about.
>
> The second part sounds like my number 1 complaint as a user of
> Autotools-based packages: 'configure' often can't find my libraries.  I
> know exactly where they are, and even what compiler and linker options are
> needed to use them, but it often takes a half hour of tracing 'configure'
> or generated make files to figure out how to force the build to understand
> the same thing.  And that's with lots of experience.  The first five times
> it was much more frustrating.

the large majority of time, i find this to be trivial: read config.log.  but 
this comes with familiarity with the tool and autotools is sitting by far the 
best right now.  if you're having trouble with the package in question, just 
ask on the mailing list and post your config.log; i'm sure you'd get someone 
to readily point out the answer.
-mike


signature.asc
Description: This is a digitally signed message part.

Re: [git pull][resend] Input updates for 2.6.22-rc7

2007-07-05 Thread Linus Torvalds

On Fri, 6 Jul 2007, Jesper Juhl wrote:
> 
> > I'm constantly surprised by just how _many_ ways MUA's find to screw up.
> 
> 'pine' actually seems to work pretty damn well once you disable the
> flowed-text "feature".

Yes. And 'alpine', it's modern version, does even better, but you also 
need to make sure to disable "downgrade-multipart-to-text".

I've been using alpine for a while now, and it's nice to see it be utf-8 
capable and able to handle other charsets well.

So as a former pine user, I can recommend upgrading.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RSA support into kernel?

2007-07-05 Thread Gautam Singaraju


Is there any attempt being made to provide software based RSA
cryptographic support in kernel? I see that Linux supports
Hardware based cryptographic devices (VIA Padlock ACE). How is the
performance of such hardware? How well are these devices supported?
-GS
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: slow down printk during boot.

2007-07-05 Thread Jeff Garzik


Dave Jones wrote:

This patch from Randy has proven quite useful from time to time,
and has been in Fedora kernels for a while for that reason.
I fixed up some checkpatch warnings, and rediffed it a bunch
of times, Randy did the heavy lifting.

---

This one delays each printk() during boot by a variable time
(from kernel command line), while system_state == SYSTEM_BOOTING.
Caveat:  it's not terribly SMP safe or SMP nice.
Any ideas for improvements (esp. in the SMP area) are appreciated.

---

From: Randy Dunlap <[EMAIL PROTECTED]>

Optionally add a boot delay after each kernel printk() call,
crudely measured in milliseconds, with a maximum delay of
10 seconds per printk.

Enable CONFIG_BOOT_DELAY=y and then add (e.g.):
"lpj=loops_per_jiffy boot_delay=100"
to the kernel command line.

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
Signed-off-by: Dave Jones <[EMAIL PROTECTED]>

---

 init/calibrate.c  |2 +-
 init/main.c   |   25 +
 kernel/printk.c   |   33 +
 lib/Kconfig.debug |   18 ++
 4 files changed, 77 insertions(+), 1 deletion(-)


hey, that's pretty neat.

I've occasionally hand-hacked something similar, to achieve those effects.

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] util-linux-ng 2.13-rc1

2007-07-05 Thread Matthew Wilcox

On Thu, Jul 05, 2007 at 11:30:20PM +0200, Karel Zak wrote:
> > > >  The package build system is now based on autotools. The build system
> > > >  supports  separate CFLAGS and LDFLAGS for suid programs (SUID_CFLAGS,
> > > >  SUID_LDFLAGS). For more details see the README file
> > >
> > > And this is really dumb.  autotools is a completely pain in the ass and
> 
>  Well, Adrian Bunk added autotools stuff to util-linux during his work
>  on v2.13. This stuff has been fixed and stabilized in util-linux-ng
>  v2.13.
> 
>  I'm not fanatical autotools protagonist, but it seems useful in
>  util-linux. We will see...
> 
>  I'm ready to change or fix arbitrary thing in util-linux-ng, but I
>  always need a real reason -- bug report, new feature, or so. This
>  discussion is about impressions and feelings only.

No, it's based on long, hard and painful experiences attempting to debug
the fuckups that autoconf creates when it goes wrong.

-- 
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] Virtio draft IV: the block driver

2007-07-05 Thread Rusty Russell

On Thu, 2007-07-05 at 09:32 +0200, Christian Borntraeger wrote:
> Am Mittwoch, 4. Juli 2007 schrieb Rusty Russell:
> > +   vbr = mempool_alloc(vblk->pool, GFP_ATOMIC);
> > +   if (!vbr)
> > +   goto stop;
> [...]
> > +   BUG_ON(req->nr_phys_segments > ARRAY_SIZE(vblk->sg));
> > +   vbr->req = req;
> > +   if (!do_req(q, vblk, vbr))
> > +   goto stop;
> [...]
> > +stop:
> > +   /* Queue full?  Wait. */
> > +   blk_stop_queue(q);
> > +   mempool_free(vbr, vblk->pool);
> 
> 
> Hmm, can mempool_free really handle NULL as its first argument? (first goto). 

Good point.  Any objections to fixing that?

Cheers,
Rusty.
===
Christian Borntraeger points out that mempool_free() doesn't noop when
handed NULL.  This is inconsistent with the other free-like functions
in the kernel.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r a306f0a8de5e mm/mempool.c
--- a/mm/mempool.c  Fri Jul 06 10:28:39 2007 +1000
+++ b/mm/mempool.c  Fri Jul 06 10:29:40 2007 +1000
@@ -263,6 +263,9 @@ void mempool_free(void *element, mempool
 {
unsigned long flags;
 
+   if (unlikely(element == NULL))
+   return;
+
smp_mb();
if (pool->curr_nr < pool->min_nr) {
spin_lock_irqsave(>lock, flags);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] util-linux-ng 2.13-rc1

2007-07-05 Thread Bryan Henderson

>i dont see how blaming autotools for other people's misuse is relevant

Here's how other people's misuse of the tool can be relevant to the choice 
of the tool: some tools are easier to use right than others.  Probably the 
easiest thing to use right is the system you designed and built yourself. 
I've considered distributing code with an Autotools-based build system 
before and determined quickly that I am not up to that challenge.  (The 
bigger part of the challenge isn't writing the original input files; it's 
debugging when a user says his build doesn't work).  But as far as I know, 
my hand-rolled build system is used correctly by me.

>> checks the width of integers on i386 for projects not caring about that 
and
>> fails to find installed libraries without telling how it was supposed 
to
>> find them or how to make it find that library.
>
>no idea what this rant is about.

The second part sounds like my number 1 complaint as a user of 
Autotools-based packages: 'configure' often can't find my libraries.  I 
know exactly where they are, and even what compiler and linker options are 
needed to use them, but it often takes a half hour of tracing 'configure' 
or generated make files to figure out how to force the build to understand 
the same thing.  And that's with lots of experience.  The first five times 
it was much more frustrating.

>> Configuring the build of an autotools program is harder than 
nescensary;
>> if it used a config file, you could easily save it somewhere while 
adding
>> comments on how and why you did *that* choice, and you could possibly
>> use a set of default configs which you'd just include.
>
>history shows this is a pita to maintain.  every package has its own 
build 
>system and configuration file ...

It's my understanding that autotools _does_ provide that ability (as 
stated, though I think "config file" may have been meant here as 
"config.make").  The config file is a shell script that contains a 
'configure' command with a pile of options on it, and as many comments as 
you want, to tailor the build to your requirements.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux SMP Porting Guide

2007-07-05 Thread Jesper Juhl


On 05/07/07, Mohamed Bamakhrama <[EMAIL PROTECTED]> wrote:

Hi all,
Is there any mailing list or tutorial which provides guidelines for
porting Linux SMP into a new architecture?


Hmm, not that /I/ know of, but various generic documentation is
available in the Documentation/ directory.

Perhaps if you had a more specific question about some specific
issue/problem it would be easier to help you...

--
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUGFIX][PATCH] DO flush icache before set_pte() on ia64.

2007-07-05 Thread KAMEZAWA Hiroyuki

On Fri, 6 Jul 2007 07:18:53 +0900
KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:

> On Thu, 5 Jul 2007 12:13:09 -0600
> Mike Stroyan <[EMAIL PROTECTED]> wrote:
> >   The L3 cache is involved in the HP-UX defect description because the
> > earlier HP-UX patch PHKL_33781 added flushing of the instruction cache
> > when an executable mapping was removed.  Linux never added that
> > unsuccessfull attempt at montecito cache coherency.  In the current
> > linux situation it can execute old cache lines straight from L2 icache.
> > 
> Hmm... I couldn't understand "why icache includes old lines in a new page."
> This happens at
>  - a file is newly loaded into page-cache.
>  - only on NFS.
>  - happens very *often* if the program is unlucky.
> 
> So I wrote my understainding as I think.
> 
I'll remove reference to HP-UX in the next post. And rewrite all description.

> > 
> >   The only defect that I see in the current implementation of
> > lazy_mmu_prot_update() is that it is called too late in some
> > functions that are already calling it.  Are your large changes
> > attempting to correct other defects?  Or are you simplifying
> > away potentially valuable code because you don't understand it?
> > 
> I know your *simple* patch in April wasn't included. So I wrote this.
> In April thread, commenter's advices was "implement flush_icache_page()" I 
> think.  
> If you have a better patch, please post.
> 
I'll check callers of lazy_mmu_prot_update() again and remove uncecessary calls.
But, basically, i-cache flush will be necessary when VM_EXEC is on. PG_arch_1 
will
help us for optimization.

-Kame




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 3/4] Enable link power management for ata drivers

2007-07-05 Thread Andrew Morton

On Thu, 05 Jul 2007 20:02:08 -0400
Jeff Garzik <[EMAIL PROTECTED]> wrote:

> May I assume that I may delete the patches from Kristen, and assume that 
> you will resend an updated version of her AN and ALPM patches to me?
> 

Sure.  But I have a sneaking feeling that Kristen sneaks sneaky fixes into
her patches without telling me, so I won't be 100% confident about their
uptodateness (hint).


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [-mm patch] arch/i386/xen/events.c should #include

2007-07-05 Thread Jeremy Fitzhardinge


Adrian Bunk wrote:

Every file should include the headers containing the prototypes for
its global functions


OK.

   J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [-mm patch] arch/i386/xen/mmu.c must #include

2007-07-05 Thread Jeremy Fitzhardinge


Adrian Bunk wrote:

This patch fixes the following compile error:
  


Hm, OK.  What .config?

   J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [-mm patch] make arch/i386/xen/mmu.c:xen_pgd_pin() static

2007-07-05 Thread Jeremy Fitzhardinge


Adrian Bunk wrote:

xen_pgd_pin() can become static.
  


Hold off on that for now.  I have some local patches which add other 
xen/ files which use it.


   J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] net/core/netevent.c should #include

2007-07-05 Thread David Miller

From: Adrian Bunk <[EMAIL PROTECTED]>
Date: Fri, 6 Jul 2007 01:22:17 +0200

> Every file should include the headers containing the prototypes for
> its global functions.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

Applied, thanks Adrian.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 3/4] Enable link power management for ata drivers

2007-07-05 Thread Jeff Garzik

May I assume that I may delete the patches from Kristen, and assume that 
you will resend an updated version of her AN and ALPM patches to me?


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 3/4] Enable link power management for ata drivers

2007-07-05 Thread Jeff Garzik


Andrew Morton wrote:

I guess we can bump ATA_DFLAG_CFG_MASK up to 12, like this?


Yep

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 3/4] Enable link power management for ata drivers

2007-07-05 Thread Jeff Garzik


Andrew Morton wrote:

On Thu, 5 Jul 2007 13:05:30 -0700
Kristen Carlson Accardi <[EMAIL PROTECTED]> wrote:


+   ATA_DFLAG_IPM   = (1 << 6), /* device supports interface PM */
ATA_DFLAG_CFG_MASK  = (1 << 8) - 1,


I had to bump this to (1<<7), so we've run out.


You can shuffle the numbers a bit, as long as the masks (*_MASK) stay 
correct for their purpose


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Understanding I/O behaviour

2007-07-05 Thread Jesper Juhl


On 06/07/07, Robert Hancock <[EMAIL PROTECTED]> wrote:
[snip]


Try playing with reducing /proc/sys/vm/dirty_ratio and see how that
helps. This workload will fill up memory with dirty data very quickly,
and it seems like system responsiveness often goes down the toilet when
this happens and the system is going crazy trying to write it all out.



Perhaps trying out a different elevator would also be worthwhile.

--
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull][resend] Input updates for 2.6.22-rc7

2007-07-05 Thread Jesper Juhl

On 06/07/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:

On Thu, 5 Jul 2007, Linus Torvalds wrote:
>
> It says your user-agent is "Kmail", and maybe there is some way to fix it.
> And if kmail is correct, please make a bug-report to the kmail people.

Ok, googling for kmail, I think it really is kmail doing it, because I
find others complaining about the same idiocy.

Btw, the others who noticed this weren't _nearly_ as polite as I am about
kmail.

Apparently kmail - at least when cutting-and-pasting - will actually turn
every other space into an NBSP for some internal idiotic reason. So even
if you _originally_ had 8 spaces, Kmail will apparently corrupt your data
when cutting-and-pasting according to that other report I saw.

Please stop using kmail, or ask for it to get fixed.

Or just configure it differently.  I use kmail sometimes (either that
or pine) and with a little config tweaking (and a few rules of thumb
about use) it can actually be made to behave resonably fine.

Here are a few tips;
 - Don't cut'n'paste stuff into kmail
 - Go to Settings --> Configure KMail, select Composer, remove the
checkmark from "Word wrap at column ...".
 - Go to Settings --> Configure KMail, select Composer, go to the
Charset tab, make the list read us-ascii, iso-8859-1   - just listing
those two (in that order) seems to generate working mails.
 - Go to Settings --> Configure KMail, select Accounts, go to the
Sending tab, make "Message property" be "Allow 8-bit".

When writing a new message, check the Options menu, make sure Wordwrap
is not enabled and that Encoding is us-ascii or iso-8859-1 (or
possibly something else) - the auto-detect option seems to sometimes
get things wrong.

When inserting a patch or similar into a mail, use Message-->"Insert File"

I'm constantly surprised by just how _many_ ways MUA's find to screw up.

'pine' actually seems to work pretty damn well once you disable the
flowed-text "feature".

It's not like they seem to all have some stupid bug. It's more like they
seem to all have willfully added code explicitly to mess up the content of
email, often with the goal of making it "look" right, even if it's crap.

In this case, it means that you cannot cut-and-paste simple ASCII text,
because Kmail messed up.

Yeah, email is old, and there are too many ways to do things, too many
conflicting RFC's, compeeting commercial implementations etc etc etc -
the whole thing could do with a from-scratch re-implementation (as if
that's going to happen)...

--
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Understanding I/O behaviour

2007-07-05 Thread Robert Hancock


Martin Knoblauch wrote:

Hi,

 for a customer we are operating a rackful of HP/DL380/G4 boxes that
have given us some problems with system responsiveness under [I/O
triggered] system load.

 The systems in question have the following HW:

2x Intel/EM64T CPUs
8GB memory
CCISS Raid controller with 4x72GB SCSI disks as RAID5
2x BCM5704 NIC (using tg3)

 The distribution is RHEL4. We have tested several kernels including
the original 2.6.9, 2.6.19.2, 2.6.22-rc7 and 2.6.22-rc7+cfs-v18.

 One part of the workload is when several processes try to write 5 GB
each to the local filesystem (ext2->LVM->CCISS). When this happens, the
load goes up to 12 and responsiveness goes down. This means from one
moment to the next things like opening a ssh connection to the host in
question, or doing "df" take forever (minutes). Especially bad with the
vendor kernel, better (but not perfect) with 2.6.19 and 2.6.22-rc7.

 The load basically comes from the writing processes and up to 12
"pdflush" threads all being in "D" state.

 So, what I would like to understand is how we can maximize the
responsiveness of the system, while keeping disk throughput at maximum.

 During my investiogation I basically performed the following test,
because it represents the kind of trouble situation:


$ cat dd3.sh
echo "Start 3 dd processes: "`date`
dd if=/dev/zero of=/scratch/X1 bs=1M count=5000&
dd if=/dev/zero of=/scratch/X2 bs=1M count=5000&
dd if=/dev/zero of=/scratch/X3 bs=1M count=5000&
wait
echo "Finish 3 dd processes: "`date`
sync
echo "Finish sync: "`date`
rm -f /scratch/X?
echo "Files removed: "`date`


 This results in the following timings. All with the anticipatory
scheduler, because it gives the best results:

2.6.19.2, HT: 10m
2.6.19.2, non-HT: 8m45s
2.6.22-rc7, HT: 10m
2.6.22-rc7, non-HT: 6m
2.6.22-rc7+cfs_v18, HT: 10m40s
2.6.22-rc7+cfs_v18, non-HT: 10m45s

 The "felt" responsiveness was best with the last two kernels, although
the load profile over time looks identical in all cases.

 So, a few questions:

a) any idea why disabling HT improves throughput, except for the cfs
kernels? For plain 2.6.22 the difference is quite substantial
b) any ideas how to optimize the settings of the /proc/sys/vm/
parameters? The documentation is a bit thin here.


Try playing with reducing /proc/sys/vm/dirty_ratio and see how that 
helps. This workload will fill up memory with dirty data very quickly, 
and it seems like system responsiveness often goes down the toilet when 
this happens and the system is going crazy trying to write it all out.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ov511 module does not build

2007-07-05 Thread Sid Boyce


Adrian Bunk wrote:

On Wed, Jul 04, 2007 at 12:23:57AM +0100, Sid Boyce wrote:
  
With the same setup in .config for linux-2.6.22-rc2-git7, it builds, after 
that and right up to linux-2.6.22-rc7-git1 it doesn't.

/usr/src/linux-2.6.22-rc2-git7/drivers/media/video/ov511.ko
# CONFIG_VIDEO_V4L1 is not set




That's the problem.

  

CONFIG_VIDEO_V4L1_COMPAT=y
In any of the 2.6.22-rc kernels, there is no option to select OV511.
tindog:/usr/src/linux-2.6.22-rc2-git7 # grep -i ov511 .config
tindog:/usr/src/linux-2.6.22-rc2-git7 #

tindog:/usr/src/linux-2.6.22-rc7-git1 # diff 
../linux-2.6.22-rc2-git7/drivers/media/video/Kconfig 
drivers/media/video/Kconfig

14c14
< if VIDEO_CAPTURE_DRIVERS
---


if VIDEO_CAPTURE_DRIVERS && VIDEO_DEV
  

694c694
< if V4L_USB_DRIVERS
---


if V4L_USB_DRIVERS && USB
  

tindog:/usr/src/linux-2.6.22-rc7-git1 # grep V4L .config
# CONFIG_VIDEO_V4L1 is not set
CONFIG_VIDEO_V4L1_COMPAT=y
CONFIG_VIDEO_V4L2=y
CONFIG_V4L_USB_DRIVERS=y

tindog:/usr/src/linux-2.6.22-rc2-git7 # grep V4L .config
# CONFIG_VIDEO_V4L1 is not set



If it built with this version and this .config, something went wrong.

Are you sure this is the correct .config that built the ov511 module?

  

CONFIG_VIDEO_V4L1_COMPAT=y
CONFIG_VIDEO_V4L2=y
# CONFIG_V4L_USB_DRIVERS is not set

Still does not build for 2.6.22-rc7-git2 with .config set with

# CONFIG_V4L_USB_DRIVERS is not set

Regards
Sid.
--
Sid Boyce ... Hamradio License G3VBV, Licensed Private Pilot
Emeritus IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support 
Specialist, Cricket Coach

Microsoft Windows Free Zone - Linux used for all Computing Tasks


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



cu
Adrian

  
Thanks, I see, I didn't select that as it said it was DEPRECATED, now 
I've selected OV511.

Regards
Sid.

--
Sid Boyce ... Hamradio License G3VBV, Licensed Private Pilot
Emeritus IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support Specialist, 
Cricket Coach
Microsoft Windows Free Zone - Linux used for all Computing Tasks


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21.5 june 30th to july 1st date hang?

2007-07-05 Thread Chris Friesen


Ernie Petrides wrote:


That's odd, because Thomas's patch removed two calls to clock_was_set(),
which is a no-op when CONFIG_HIGH_RES_TIMERS is not enabled (at least in
the 2.6.21 source tree).


I'm using a modified 2.6.10 tree...I expect the timer code is different.

Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Nigel Cunningham

Hi.

On Friday 06 July 2007 09:20:43 Benjamin Herrenschmidt wrote:
> 
> > Will you be able to guarantee that every place where a task can/will block 
> > will be harmless place? If so, how will you guarantee that? How will you 
> > debug issues where a task occasionally doesn't block in the right place, 
> > particularly instances where it is some less than obvious interaction with 
> > other tasks?
> 
> Which places aren't harmless if you don't have a freezer ?

If I knew that, I wouldn't be asking the question.

> > This is the whole point to having the freezer. It makes things more 
> > predictable and testable. It shows us, clearly, when process X is the one 
> > that is causing problems.
> 
> No, the freezer creates all those places what are harmful for a task to
> block because they will break the freezer :-)

Nice try :) Okay then, you remove the freezer, try hibernating, then get back 
to me after you've fixed your filesystem because some process that wasn't 
frozen started writing things after the atomic copy (making the on disk 
filesystem inconsistent with the snapshot).

As Pavel rightly said, you can get rid of the freezer, but you're only going 
to have to implement another one that does the essentially the same thing, 
even if it is at some other level.
 
> > >  - Silently add GFP_NOIO to all allocations, to avoid having things
> > > blocking in kmalloc() with a mutex held that will deadlock with
> > > suspend() in a driver for example. Or set some way to have all GFP
> > > waiters wakeup and fail rather than wait for IOs. It's hard/bizarre but
> > > necessary, again, with or without a freezer.
> > 
> > GFP_ATOMIC? (In driver suspend, they shouldn't be sleeping either, right?)
> 
> NOIO should be enough I think but ATOMIC would do).
>  
> That's one of the reason why I used to have the pre-suspend and
> post-resume hooks in my original powermac implementation, for those few
> drivers complicated enough to require some pre-allocations.
>  
> > >  - Deal with the firmware problem. The best way is probably to have an
> > > async request_firmware interface(). Another thing is, drivers may want
> > > to cache their firmware in main memory, that sort of thing...
> > >
> 
> Note that the above firmware problem could be dealt with also with the
> pre-suspend/post-resume. Allowing to pre-request firmware etc... and
> keep it around until after resume, because we know we will need it.
> Gives a chance to drivers to perform things while the system is still
> live, filesystems still working, etc... (big memory allocations for
> example).
> 
> > > And that's just a small list off the top of my mind, of known problems
> > > that will cause deadlocks or misbehaviours today, with or without the
> > > freezer, and that need to be addressed.
> > 
> > Userspace device drivers too?
> 
> Maybe but they are less of an issue, most of the time, they don't do DMA
> or whatever harmful things. If they are USB drivers, for example, they
> are an non-issues at that level.

(Leaving the rest of the message intact so we don't have to fragment the 
discussion into a million subthreads).

Regards,

Nigel


pgpUiX6vWBYmy.pgp
Description: PGP signature

Re: [git pull][resend] Input updates for 2.6.22-rc7

2007-07-05 Thread Linus Torvalds

On Thu, 5 Jul 2007, Linus Torvalds wrote:
> 
> It says your user-agent is "Kmail", and maybe there is some way to fix it. 
> And if kmail is correct, please make a bug-report to the kmail people. 

Ok, googling for kmail, I think it really is kmail doing it, because I 
find others complaining about the same idiocy.

Btw, the others who noticed this weren't _nearly_ as polite as I am about 
kmail.

Apparently kmail - at least when cutting-and-pasting - will actually turn 
every other space into an NBSP for some internal idiotic reason. So even 
if you _originally_ had 8 spaces, Kmail will apparently corrupt your data 
when cutting-and-pasting according to that other report I saw.

Please stop using kmail, or ask for it to get fixed.

I'm constantly surprised by just how _many_ ways MUA's find to screw up. 
It's not like they seem to all have some stupid bug. It's more like they 
seem to all have willfully added code explicitly to mess up the content of 
email, often with the goal of making it "look" right, even if it's crap.

In this case, it means that you cannot cut-and-paste simple ASCII text, 
because Kmail messed up.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Libata PATA status

2007-07-05 Thread Jeff Garzik


Andi Kleen wrote:

My personal wish list feature would be a little forwarder driver
to forward /dev/hd* to /dev/sd* for this; then old IDE could be
disabled without risking breaking old root file systems.



That's on the long-term TODO list.

libata is moving towards making libata-scsi an optional module (will 
always be around for ATAPI, and for compat with current ATA), and 
driving ATA disks as a native block driver, rather than having SCSI do 
the work for us.


libata's qc_issue/qc_complete high level API and internal modularity 
were designed to make this possible.  It is easy to see in the current 
code how libata-scsi is merely a user of the qc_issue/qc_complete API, 
with all the low-level details isolated away from that module.


Addendum:  Remember too, /dev/hdX is not just a major/minor pair, but a 
userspace interface, complete with expected ioctl support.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6 patch] mm/mempolicy.c: cleanups

2007-07-05 Thread Adrian Bunk

This patch contains the following cleanups:
- every file should include the headers containing the prototypes for
  its global functions
- make the follosing needlessly global functions static:
  - migrate_to_node()
  - do_mbind()
  - sp_alloc()
  - mpol_rebind_policy()

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 include/linux/mempolicy.h |6 --
 mm/mempolicy.c|   25 -
 2 files changed, 16 insertions(+), 15 deletions(-)

--- linux-2.6.22-rc6-mm1/include/linux/mempolicy.h.old  2007-07-05 
17:16:55.0 +0200
+++ linux-2.6.22-rc6-mm1/include/linux/mempolicy.h  2007-07-05 
17:17:05.0 +0200
@@ -143,7 +143,6 @@
 
 extern void numa_default_policy(void);
 extern void numa_policy_init(void);
-extern void mpol_rebind_policy(struct mempolicy *pol, const nodemask_t *new);
 extern void mpol_rebind_task(struct task_struct *tsk,
const nodemask_t *new);
 extern void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new);
@@ -225,11 +224,6 @@
 {
 }
 
-static inline void mpol_rebind_policy(struct mempolicy *pol,
-   const nodemask_t *new)
-{
-}
-
 static inline void mpol_rebind_task(struct task_struct *tsk,
const nodemask_t *new)
 {
--- linux-2.6.22-rc6-mm1/mm/mempolicy.c.old 2007-07-05 17:14:16.0 
+0200
+++ linux-2.6.22-rc6-mm1/mm/mempolicy.c 2007-07-05 17:22:17.0 +0200
@@ -89,6 +89,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -110,6 +111,9 @@
.policy = MPOL_DEFAULT,
 };
 
+static void mpol_rebind_policy(struct mempolicy *pol,
+   const nodemask_t *newmask);
+
 /* Do sanity checking on a policy */
 static int mpol_check_policy(int mode, nodemask_t *nodes)
 {
@@ -459,7 +463,7 @@
 }
 
 /* Set the process memory policy */
-long do_set_mempolicy(int mode, nodemask_t *nodes)
+static long do_set_mempolicy(int mode, nodemask_t *nodes)
 {
struct mempolicy *new;
 
@@ -519,8 +523,8 @@
 }
 
 /* Retrieve NUMA policy */
-long do_get_mempolicy(int *policy, nodemask_t *nmask,
-   unsigned long addr, unsigned long flags)
+static long do_get_mempolicy(int *policy, nodemask_t *nmask,
+unsigned long addr, unsigned long flags)
 {
int err;
struct mm_struct *mm = current->mm;
@@ -601,7 +605,8 @@
  * Migrate pages from one node to a target node.
  * Returns error or the number of pages not migrated.
  */
-int migrate_to_node(struct mm_struct *mm, int source, int dest, int flags)
+static int migrate_to_node(struct mm_struct *mm, int source, int dest,
+  int flags)
 {
nodemask_t nmask;
LIST_HEAD(pagelist);
@@ -732,8 +737,9 @@
 }
 #endif
 
-long do_mbind(unsigned long start, unsigned long len,
-   unsigned long mode, nodemask_t *nmask, unsigned long flags)
+static long do_mbind(unsigned long start, unsigned long len,
+unsigned long mode, nodemask_t *nmask,
+unsigned long flags)
 {
struct vm_area_struct *vma;
struct mm_struct *mm = current->mm;
@@ -1466,8 +1472,8 @@
kmem_cache_free(sn_cache, n);
 }
 
-struct sp_node *
-sp_alloc(unsigned long start, unsigned long end, struct mempolicy *pol)
+static struct sp_node *sp_alloc(unsigned long start, unsigned long end,
+   struct mempolicy *pol)
 {
struct sp_node *n = kmem_cache_alloc(sn_cache, GFP_KERNEL);
 
@@ -1645,7 +1651,8 @@
 }
 
 /* Migrate a policy to a different set of nodes */
-void mpol_rebind_policy(struct mempolicy *pol, const nodemask_t *newmask)
+static void mpol_rebind_policy(struct mempolicy *pol,
+  const nodemask_t *newmask)
 {
nodemask_t *mpolmask;
nodemask_t tmp;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6 patch] mm/shmem.c: make 3 functions static

2007-07-05 Thread Adrian Bunk

This patch makes three needlessly global functions static.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 include/linux/mm.h |   15 ---
 mm/shmem.c |   10 +-
 2 files changed, 5 insertions(+), 20 deletions(-)

--- linux-2.6.22-rc6-mm1/include/linux/mm.h.old 2007-07-05 17:03:26.0 
+0200
+++ linux-2.6.22-rc6-mm1/include/linux/mm.h 2007-07-05 17:03:45.0 
+0200
@@ -707,9 +707,6 @@
 extern void show_free_areas(void);
 
 #ifdef CONFIG_SHMEM
-int shmem_set_policy(struct vm_area_struct *vma, struct mempolicy *new);
-struct mempolicy *shmem_get_policy(struct vm_area_struct *vma,
-   unsigned long addr);
 int shmem_lock(struct file *file, int lock, struct user_struct *user);
 #else
 static inline int shmem_lock(struct file *file, int lock,
@@ -717,18 +714,6 @@
 {
return 0;
 }
-
-static inline int shmem_set_policy(struct vm_area_struct *vma,
-  struct mempolicy *new)
-{
-   return 0;
-}
-
-static inline struct mempolicy *shmem_get_policy(struct vm_area_struct *vma,
-unsigned long addr)
-{
-   return NULL;
-}
 #endif
 struct file *shmem_file_setup(char *name, loff_t size, unsigned long flags);
 
--- linux-2.6.22-rc6-mm1/mm/shmem.c.old 2007-07-05 17:04:00.0 +0200
+++ linux-2.6.22-rc6-mm1/mm/shmem.c 2007-07-05 17:06:27.0 +0200
@@ -1025,8 +1025,8 @@
return page;
 }
 
-struct page *shmem_swapin(struct shmem_inode_info *info, swp_entry_t entry,
- unsigned long idx)
+static struct page *shmem_swapin(struct shmem_inode_info *info,
+swp_entry_t entry, unsigned long idx)
 {
struct shared_policy *p = >policy;
int i, num;
@@ -1335,14 +1335,14 @@
 }
 
 #ifdef CONFIG_NUMA
-int shmem_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
+static int shmem_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
 {
struct inode *i = vma->vm_file->f_path.dentry->d_inode;
return mpol_set_shared_policy(_I(i)->policy, vma, new);
 }
 
-struct mempolicy *
-shmem_get_policy(struct vm_area_struct *vma, unsigned long addr)
+static struct mempolicy *shmem_get_policy(struct vm_area_struct *vma,
+ unsigned long addr)
 {
struct inode *i = vma->vm_file->f_path.dentry->d_inode;
unsigned long idx;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6 patch] mm/migrate.c: cleanups

2007-07-05 Thread Adrian Bunk

This patch contains the following cleanups:
- every file should include the headers containing the prototypes for
  its global functions
- make the needlessly global putback_lru_pages() static

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 include/linux/migrate.h |2 --
 mm/migrate.c|3 ++-
 2 files changed, 2 insertions(+), 3 deletions(-)

--- linux-2.6.22-rc6-mm1/include/linux/migrate.h.old2007-07-05 
17:10:01.0 +0200
+++ linux-2.6.22-rc6-mm1/include/linux/migrate.h2007-07-05 
17:10:10.0 +0200
@@ -26,7 +26,6 @@
 }
 
 extern int isolate_lru_page(struct page *p, struct list_head *pagelist);
-extern int putback_lru_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
struct page *, struct page *);
 extern int migrate_pages(struct list_head *l, new_page_t x, unsigned long);
@@ -44,7 +43,6 @@
 
 static inline int isolate_lru_page(struct page *p, struct list_head *list)
{ return -ENOSYS; }
-static inline int putback_lru_pages(struct list_head *l) { return 0; }
 static inline int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private) { return -ENOSYS; }
 
--- linux-2.6.22-rc6-mm1/mm/migrate.c.old   2007-07-05 17:10:16.0 
+0200
+++ linux-2.6.22-rc6-mm1/mm/migrate.c   2007-07-05 17:11:43.0 +0200
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "internal.h"
 
@@ -101,7 +102,7 @@
  *
  * returns the number of pages put back.
  */
-int putback_lru_pages(struct list_head *l)
+static int putback_lru_pages(struct list_head *l)
 {
struct page *page;
struct page *page2;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6 patch] mm/vmstat.c: possible cleanups

2007-07-05 Thread Adrian Bunk

This patch contains the following possible cleanups:
- make the needlessly global setup_vmstat() static
- #if 0 the unused refresh_vm_stats()

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 mm/vmstat.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- linux-2.6.22-rc6-mm1/mm/vmstat.c.old2007-07-05 16:54:39.0 
+0200
+++ linux-2.6.22-rc6-mm1/mm/vmstat.c2007-07-05 16:55:42.0 +0200
@@ -353,6 +353,8 @@
}
 }
 
+#if 0
+
 static void __refresh_cpu_vm_stats(void *dummy)
 {
refresh_cpu_vm_stats(smp_processor_id());
@@ -370,6 +372,8 @@
 }
 EXPORT_SYMBOL(refresh_vm_stats);
 
+#endif  /*  0  */
+
 #endif
 
 #ifdef CONFIG_NUMA
@@ -957,7 +961,7 @@
 static struct notifier_block __cpuinitdata vmstat_notifier =
{ _cpuup_callback, NULL, 0 };
 
-int __init setup_vmstat(void)
+static int __init setup_vmstat(void)
 {
int cpu;
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[-mm patch] kernel/sched.c: make 2 functions static

2007-07-05 Thread Adrian Bunk

This patch makes the following needlessly global functions static:
- load_balance_start()
- load_balance_next()

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 kernel/sched.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- linux-2.6.22-rc6-mm1/kernel/sched.c.old 2007-07-05 16:39:42.0 
+0200
+++ linux-2.6.22-rc6-mm1/kernel/sched.c 2007-07-05 16:43:38.0 +0200
@@ -2011,7 +2011,7 @@
  * classes, starting with the highest-prio one:
  */
 
-struct task_struct * load_balance_start(struct rq *rq)
+static struct task_struct * load_balance_start(struct rq *rq)
 {
struct sched_class *class = sched_class_highest;
struct task_struct *p;
@@ -2028,7 +2028,7 @@
return NULL;
 }
 
-struct task_struct * load_balance_next(struct rq *rq)
+static struct task_struct * load_balance_next(struct rq *rq)
 {
struct sched_class *class = rq->load_balance_class;
struct task_struct *p;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6 patch] kernel/sched.c: make code static

2007-07-05 Thread Adrian Bunk

This patch makes the following needlessly global code static:
- arch_reinit_sched_domains()
- struct attr_sched_mc_power_savings
- struct attr_sched_smt_power_savings

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 include/linux/cpu.h |2 -
 kernel/sched.c  |   46 ++--
 2 files changed, 23 insertions(+), 25 deletions(-)

--- linux-2.6.22-rc6-mm1/include/linux/cpu.h.old2007-07-05 
16:13:11.0 +0200
+++ linux-2.6.22-rc6-mm1/include/linux/cpu.h2007-07-05 16:18:03.0 
+0200
@@ -41,8 +41,6 @@
 extern int cpu_add_sysdev_attr_group(struct attribute_group *attrs);
 extern void cpu_remove_sysdev_attr_group(struct attribute_group *attrs);
 
-extern struct sysdev_attribute attr_sched_mc_power_savings;
-extern struct sysdev_attribute attr_sched_smt_power_savings;
 extern int sched_create_sysfs_power_savings_entries(struct sysdev_class *cls);
 
 #ifdef CONFIG_HOTPLUG_CPU
--- linux-2.6.22-rc6-mm1/kernel/sched.c.old 2007-07-05 16:11:34.0 
+0200
+++ linux-2.6.22-rc6-mm1/kernel/sched.c 2007-07-05 16:30:40.0 +0200
@@ -6127,7 +6127,7 @@
 }
 
 #if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
-int arch_reinit_sched_domains(void)
+static int arch_reinit_sched_domains(void)
 {
int err;
 
@@ -6156,24 +6156,6 @@
return ret ? ret : count;
 }
 
-int sched_create_sysfs_power_savings_entries(struct sysdev_class *cls)
-{
-   int err = 0;
-
-#ifdef CONFIG_SCHED_SMT
-   if (smt_capable())
-   err = sysfs_create_file(>kset.kobj,
-   _sched_smt_power_savings.attr);
-#endif
-#ifdef CONFIG_SCHED_MC
-   if (!err && mc_capable())
-   err = sysfs_create_file(>kset.kobj,
-   _sched_mc_power_savings.attr);
-#endif
-   return err;
-}
-#endif
-
 #ifdef CONFIG_SCHED_MC
 static ssize_t sched_mc_power_savings_show(struct sys_device *dev, char *page)
 {
@@ -6184,8 +6166,8 @@
 {
return sched_power_savings_store(buf, count, 0);
 }
-SYSDEV_ATTR(sched_mc_power_savings, 0644, sched_mc_power_savings_show,
-   sched_mc_power_savings_store);
+static SYSDEV_ATTR(sched_mc_power_savings, 0644, sched_mc_power_savings_show,
+  sched_mc_power_savings_store);
 #endif
 
 #ifdef CONFIG_SCHED_SMT
@@ -6198,8 +6180,26 @@
 {
return sched_power_savings_store(buf, count, 1);
 }
-SYSDEV_ATTR(sched_smt_power_savings, 0644, sched_smt_power_savings_show,
-   sched_smt_power_savings_store);
+static SYSDEV_ATTR(sched_smt_power_savings, 0644, sched_smt_power_savings_show,
+  sched_smt_power_savings_store);
+#endif
+
+int sched_create_sysfs_power_savings_entries(struct sysdev_class *cls)
+{
+   int err = 0;
+
+#ifdef CONFIG_SCHED_SMT
+   if (smt_capable())
+   err = sysfs_create_file(>kset.kobj,
+   _sched_smt_power_savings.attr);
+#endif
+#ifdef CONFIG_SCHED_MC
+   if (!err && mc_capable())
+   err = sysfs_create_file(>kset.kobj,
+   _sched_mc_power_savings.attr);
+#endif
+   return err;
+}
 #endif
 
 /*

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6 patch] ipc/shm.c: make 2 functions static

2007-07-05 Thread Adrian Bunk

This patch makes two needlessly global functions static.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 ipc/shm.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- linux-2.6.22-rc6-mm1/ipc/shm.c.old  2007-07-05 16:08:24.0 +0200
+++ linux-2.6.22-rc6-mm1/ipc/shm.c  2007-07-05 16:08:44.0 +0200
@@ -234,7 +234,7 @@
 }
 
 #ifdef CONFIG_NUMA
-int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
+static int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
 {
struct file *file = vma->vm_file;
struct shm_file_data *sfd = shm_file_data(file);
@@ -244,7 +244,8 @@
return err;
 }
 
-struct mempolicy *shm_get_policy(struct vm_area_struct *vma, unsigned long 
addr)
+static struct mempolicy *shm_get_policy(struct vm_area_struct *vma,
+   unsigned long addr)
 {
struct file *file = vma->vm_file;
struct shm_file_data *sfd = shm_file_data(file);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6 patch] arch/i386/mm/discontig.c: make some variables static

2007-07-05 Thread Adrian Bunk

This patch makes some needlessly global variables static.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 arch/i386/mm/discontig.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

--- linux-2.6.22-rc6-mm1/arch/i386/mm/discontig.c.old   2007-07-05 
15:59:32.0 +0200
+++ linux-2.6.22-rc6-mm1/arch/i386/mm/discontig.c   2007-07-05 
16:02:47.0 +0200
@@ -103,14 +103,14 @@
 
 #define LARGE_PAGE_BYTES (PTRS_PER_PTE * PAGE_SIZE)
 
-unsigned long node_remap_start_pfn[MAX_NUMNODES];
+static unsigned long node_remap_start_pfn[MAX_NUMNODES];
 unsigned long node_remap_size[MAX_NUMNODES];
-unsigned long node_remap_offset[MAX_NUMNODES];
-void *node_remap_start_vaddr[MAX_NUMNODES];
+static unsigned long node_remap_offset[MAX_NUMNODES];
+static void *node_remap_start_vaddr[MAX_NUMNODES];
 void set_pmd_pfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags);
 
-void *node_remap_end_vaddr[MAX_NUMNODES];
-void *node_remap_alloc_vaddr[MAX_NUMNODES];
+static void *node_remap_end_vaddr[MAX_NUMNODES];
+static void *node_remap_alloc_vaddr[MAX_NUMNODES];
 static unsigned long kva_start_pfn;
 static unsigned long kva_pages;
 /*

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6 patch] arch/i386/mach-generic/probe.c: make struct apic_probe static

2007-07-05 Thread Adrian Bunk

This patch makes the needlessly global struct apic_probe static.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

--- linux-2.6.22-rc6-mm1/arch/i386/mach-generic/probe.c.old 2007-07-05 
15:55:40.0 +0200
+++ linux-2.6.22-rc6-mm1/arch/i386/mach-generic/probe.c 2007-07-05 
15:55:51.0 +0200
@@ -22,7 +22,7 @@
 
 struct genapic *genapic = _default;
 
-struct genapic *apic_probe[] __initdata = { 
+static struct genapic *apic_probe[] __initdata = { 
_summit,
_bigsmp, 
_es7000,

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6 patch] arch/i386/mach-es7000/es7000plat.c: cleanups

2007-07-05 Thread Adrian Bunk


This patch contains the following cleanups:
- make some needlessly global functions static
- #if 0 the unused es7000_stop_cpu()

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 arch/i386/mach-es7000/es7000plat.c |   12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

--- linux-2.6.22-rc6-mm1/arch/i386/mach-es7000/es7000plat.c.old 2007-07-05 
15:49:14.0 +0200
+++ linux-2.6.22-rc6-mm1/arch/i386/mach-es7000/es7000plat.c 2007-07-05 
15:50:33.0 +0200
@@ -45,11 +45,11 @@
  * ES7000 Globals
  */
 
-volatile unsigned long *psai = NULL;
-struct mip_reg *mip_reg;
-struct mip_reg *host_reg;
-intmip_port;
-unsigned long  mip_addr, host_addr;
+static volatile unsigned long  *psai = NULL;
+static struct mip_reg  *mip_reg;
+static struct mip_reg  *host_reg;
+static int mip_port;
+static unsigned long   mip_addr, host_addr;
 
 /*
  * GSI override for ES7000 platforms.
@@ -240,6 +240,7 @@
 
 }
 
+#if 0
 int
 es7000_stop_cpu(int cpu)
 {
@@ -259,6 +260,7 @@
return 0;
 
 }
+#endif  /*  0  */
 
 void __init
 es7000_sw_apic()

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6 patch] kernel/cpuset.c: cleanups

2007-07-05 Thread Adrian Bunk

This patch contains the following cleanups:
- make the following needlessly global functions static:
  - cpuset_can_attach()
  - cpuset_attach()
  - cpuset_populate()
  - cpuset_post_clone()
  - cpuset_create()
  - cpuset_destroy()
- remove the unused EXPORT_SYMBOL_GPL(cpuset_mem_spread_node)

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 kernel/cpuset.c |   21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

--- linux-2.6.22-rc6-mm1/kernel/cpuset.c.old2007-07-03 15:40:33.0 
+0200
+++ linux-2.6.22-rc6-mm1/kernel/cpuset.c2007-07-04 23:32:06.0 
+0200
@@ -873,8 +873,8 @@
return val;
 }
 
-int cpuset_can_attach(struct container_subsys *ss,
- struct container *cont, struct task_struct *tsk)
+static int cpuset_can_attach(struct container_subsys *ss,
+struct container *cont, struct task_struct *tsk)
 {
struct cpuset *cs = container_cs(cont);
 
@@ -884,9 +884,9 @@
return security_task_setscheduler(tsk, 0, NULL);
 }
 
-void cpuset_attach(struct container_subsys *ss,
-  struct container *cont, struct container *oldcont,
-  struct task_struct *tsk)
+static void cpuset_attach(struct container_subsys *ss,
+ struct container *cont, struct container *oldcont,
+ struct task_struct *tsk)
 {
cpumask_t cpus;
nodemask_t from, to;
@@ -1163,7 +1163,7 @@
.private = FILE_SPREAD_SLAB,
 };
 
-int cpuset_populate(struct container_subsys *ss, struct container *cont)
+static int cpuset_populate(struct container_subsys *ss, struct container *cont)
 {
int err;
 
@@ -1205,8 +1205,8 @@
  * changed to grant parent->cpus_allowed-sibling_cpus_exclusive
  * (and likewise for mems) to the new container.
  */
-void cpuset_post_clone(struct container_subsys *ss,
-   struct container *container)
+static void cpuset_post_clone(struct container_subsys *ss,
+ struct container *container)
 {
struct container *parent, *child;
struct cpuset *cs, *parent_cs;
@@ -1234,7 +1234,7 @@
  * Must be called with the mutex on the parent inode held
  */
 
-int cpuset_create(struct container_subsys *ss, struct container *cont)
+static int cpuset_create(struct container_subsys *ss, struct container *cont)
 {
struct cpuset *cs;
struct cpuset *parent;
@@ -1269,7 +1269,7 @@
return 0;
 }
 
-void cpuset_destroy(struct container_subsys *ss, struct container *cont)
+static void cpuset_destroy(struct container_subsys *ss, struct container *cont)
 {
struct cpuset *cs = container_cs(cont);
 
@@ -1726,7 +1726,6 @@
current->cpuset_mem_spread_rotor = node;
return node;
 }
-EXPORT_SYMBOL_GPL(cpuset_mem_spread_node);
 
 /**
  * cpuset_excl_nodes_overlap - Do we overlap @p's mem_exclusive ancestors?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6 patch] make coretemp_device_remove() static

2007-07-05 Thread Adrian Bunk

coretemp_device_remove() can become static.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---
--- linux-2.6.22-rc6-mm1/drivers/hwmon/coretemp.c.old   2007-07-04 
20:46:05.0 +0200
+++ linux-2.6.22-rc6-mm1/drivers/hwmon/coretemp.c   2007-07-04 
20:46:20.0 +0200
@@ -318,7 +318,7 @@
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-void coretemp_device_remove(unsigned int cpu)
+static void coretemp_device_remove(unsigned int cpu)
 {
struct pdev_entry *p, *n;
mutex_lock(_list_mutex);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[-mm patch] make arch/i386/xen/mmu.c:xen_pgd_pin() static

2007-07-05 Thread Adrian Bunk

xen_pgd_pin() can become static.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 arch/i386/xen/mmu.c |2 +-
 arch/i386/xen/mmu.h |3 ---
 2 files changed, 1 insertion(+), 4 deletions(-)

--- linux-2.6.22-rc6-mm1/arch/i386/xen/mmu.h.old2007-07-04 
20:42:44.0 +0200
+++ linux-2.6.22-rc6-mm1/arch/i386/xen/mmu.h2007-07-04 20:42:54.0 
+0200
@@ -27,9 +27,6 @@
 void xen_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm);
 void xen_exit_mmap(struct mm_struct *mm);
 
-void xen_pgd_pin(pgd_t *pgd);
-//void xen_pgd_unpin(pgd_t *pgd);
-
 #ifdef CONFIG_X86_PAE
 unsigned long long xen_pte_val(pte_t);
 unsigned long long xen_pmd_val(pmd_t);
--- linux-2.6.22-rc6-mm1/arch/i386/xen/mmu.c.old2007-07-04 
20:43:00.0 +0200
+++ linux-2.6.22-rc6-mm1/arch/i386/xen/mmu.c2007-07-04 20:43:06.0 
+0200
@@ -408,7 +408,7 @@
 /* This is called just after a mm has been created, but it has not
been used yet.  We need to make sure that its pagetable is all
read-only, and can be pinned. */
-void xen_pgd_pin(pgd_t *pgd)
+static void xen_pgd_pin(pgd_t *pgd)
 {
struct multicall_space mcs;
struct mmuext_op *op;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6 patch] i386: no need to make enable_cpu_hotplug a variable

2007-07-05 Thread Adrian Bunk

As long as there's no write access to this variable there's no reason 
to let gcc check it at runtime.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 arch/i386/kernel/topology.c |2 --
 include/asm-i386/cpu.h  |2 +-
 2 files changed, 1 insertion(+), 3 deletions(-)

--- linux-2.6.22-rc6-mm1/include/asm-i386/cpu.h.old 2007-07-04 
20:29:25.0 +0200
+++ linux-2.6.22-rc6-mm1/include/asm-i386/cpu.h 2007-07-04 20:36:33.0 
+0200
@@ -13,7 +13,7 @@
 extern int arch_register_cpu(int num);
 #ifdef CONFIG_HOTPLUG_CPU
 extern void arch_unregister_cpu(int);
-extern int enable_cpu_hotplug;
+#define enable_cpu_hotplug 1
 #else
 #define enable_cpu_hotplug 0
 #endif
--- linux-2.6.22-rc6-mm1/arch/i386/kernel/topology.c.old2007-07-04 
20:30:12.0 +0200
+++ linux-2.6.22-rc6-mm1/arch/i386/kernel/topology.c2007-07-04 
20:35:56.0 +0200
@@ -51,8 +51,6 @@
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-int enable_cpu_hotplug = 1;
-
 void arch_unregister_cpu(int num) {
return unregister_cpu(_devices[num].cpu);
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[-mm patch] arch/i386/xen/events.c should #include

2007-07-05 Thread Adrian Bunk

Every file should include the headers containing the prototypes for
its global functions.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---
--- linux-2.6.22-rc6-mm1/arch/i386/xen/events.c.old 2007-07-03 
04:26:28.0 +0200
+++ linux-2.6.22-rc6-mm1/arch/i386/xen/events.c 2007-07-03 04:26:59.0 
+0200
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6 patch] lib/ioremap.c should #include

2007-07-05 Thread Adrian Bunk

Every file should include the headers containing the prototypes for
its global functions.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---
--- linux-2.6.22-rc6-mm1/lib/ioremap.c.old  2007-07-03 05:02:10.0 
+0200
+++ linux-2.6.22-rc6-mm1/lib/ioremap.c  2007-07-03 05:02:22.0 +0200
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[-mm patch] arch/i386/xen/mmu.c must #include

2007-07-05 Thread Adrian Bunk

This patch fixes the following compile error:

<--  snip  -->

...
  CC  arch/i386/xen/mmu.o
In file included from 
/home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/arch/i386/xen/mmu.c:46:
include2/asm/mmu_context.h: In function ‘switch_mm’:
include2/asm/mmu_context.h:45: error: dereferencing pointer to incomplete type
include2/asm/mmu_context.h:50: error: dereferencing pointer to incomplete type
include2/asm/mmu_context.h:53: error: dereferencing pointer to incomplete type
include2/asm/mmu_context.h:58: error: dereferencing pointer to incomplete type
include2/asm/mmu_context.h:58: error: dereferencing pointer to incomplete type
include2/asm/mmu_context.h:59: error: dereferencing pointer to incomplete type
include2/asm/mmu_context.h:66: error: dereferencing pointer to incomplete type
include2/asm/mmu_context.h:70: error: dereferencing pointer to incomplete type
include2/asm/mmu_context.h:71: error: dereferencing pointer to incomplete type
...
make[2]: *** [arch/i386/xen/mmu.o] Error 1

<--  snip  -->

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---
--- linux-2.6.22-rc6-mm1/arch/i386/xen/mmu.c.old2007-07-04 
00:11:28.0 +0200
+++ linux-2.6.22-rc6-mm1/arch/i386/xen/mmu.c2007-07-04 00:11:39.0 
+0200
@@ -39,6 +39,7 @@
  * Jeremy Fitzhardinge <[EMAIL PROTECTED]>, XenSource Inc, 2007
  */
 #include 
+#include 
 
 #include 
 #include 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Benjamin Herrenschmidt


> Will you be able to guarantee that every place where a task can/will block 
> will be harmless place? If so, how will you guarantee that? How will you 
> debug issues where a task occasionally doesn't block in the right place, 
> particularly instances where it is some less than obvious interaction with 
> other tasks?

Which places aren't harmless if you don't have a freezer ?

> This is the whole point to having the freezer. It makes things more 
> predictable and testable. It shows us, clearly, when process X is the one 
> that is causing problems.

No, the freezer creates all those places what are harmful for a task to
block because they will break the freezer :-)

> >  - Silently add GFP_NOIO to all allocations, to avoid having things
> > blocking in kmalloc() with a mutex held that will deadlock with
> > suspend() in a driver for example. Or set some way to have all GFP
> > waiters wakeup and fail rather than wait for IOs. It's hard/bizarre but
> > necessary, again, with or without a freezer.
> 
> GFP_ATOMIC? (In driver suspend, they shouldn't be sleeping either, right?)

NOIO should be enough I think but ATOMIC would do).
 
That's one of the reason why I used to have the pre-suspend and
post-resume hooks in my original powermac implementation, for those few
drivers complicated enough to require some pre-allocations.
 
> >  - Deal with the firmware problem. The best way is probably to have an
> > async request_firmware interface(). Another thing is, drivers may want
> > to cache their firmware in main memory, that sort of thing...
> >

Note that the above firmware problem could be dealt with also with the
pre-suspend/post-resume. Allowing to pre-request firmware etc... and
keep it around until after resume, because we know we will need it.
Gives a chance to drivers to perform things while the system is still
live, filesystems still working, etc... (big memory allocations for
example).

> > And that's just a small list off the top of my mind, of known problems
> > that will cause deadlocks or misbehaviours today, with or without the
> > freezer, and that need to be addressed.
> 
> Userspace device drivers too?

Maybe but they are less of an issue, most of the time, they don't do DMA
or whatever harmful things. If they are USB drivers, for example, they
are an non-issues at that level.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6 patch] net/core/netevent.c should #include

2007-07-05 Thread Adrian Bunk

Every file should include the headers containing the prototypes for
its global functions.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---
--- linux-2.6.22-rc6-mm1/net/core/netevent.c.old2007-07-03 
04:59:08.0 +0200
+++ linux-2.6.22-rc6-mm1/net/core/netevent.c2007-07-03 04:59:23.0 
+0200
@@ -15,6 +15,7 @@
 
 #include 
 #include 
+#include 
 
 static ATOMIC_NOTIFIER_HEAD(netevent_notif_chain);
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21.5 june 30th to july 1st date hang?

2007-07-05 Thread Ernie Petrides

On Thursday, 5-Jul-2007 at 16:49 MDT, Chris Friesen wrote:

> Ernie Petrides wrote:
> 
> > Only kernels built with the CONFIG_HIGH_RES_TIMERS option enabled were
> > vulnerable.
> 
> As I mentioned in my post to Thomas, we have high res timers disabled 
> and were still affected.  Granted, our kernel has been modified so it is 
> possible that vanilla would not be affectedI haven't tested it.
> 
> Chris

That's odd, because Thomas's patch removed two calls to clock_was_set(),
which is a no-op when CONFIG_HIGH_RES_TIMERS is not enabled (at least in
the 2.6.21 source tree).

Also, I personally tested with the reproducer you posted here, initially
on a box running 2.6.22-rc4, and there were no problems (but I'm not sure
what config options were enabled on that kernel).  I did reproduce the
problem on a stock 2.6.21 kernel with CONFIG_HIGH_RES_TIMERS enabled.

Cheers.  -ernie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/3] clean gendisk out of scsi ULD structs

2007-07-05 Thread James Bottomley

On Fri, 2007-07-06 at 00:02 +0100, Al Viro wrote:
> On Thu, Jul 05, 2007 at 02:06:36PM -0700, Kristen Carlson Accardi wrote:
> > Since gendisk will now become part of struct scsi_device, we don't need
> > to store this value in any private data structs where they already store
> > scsi_device.  This series cleans up a few drivers which did this.
> 
> What the hell?  gendisks are *NOT* supposed to be embedded into other
> data structures, you'll screw up the lifetime rules for them.

Don't panic .. they're not ... we have a pointer to the gendisk in our
SCSI structures (properly refcounted).  The reason is historical and
actually goes back to 2002 when we first got rid of the static arrays of
structures we used to keep around.

Doug ... don't think 'disk' when you see 'gendisk' just think of it as a
useful infrastructure library.

James

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Nigel Cunningham

Hi.

On Friday 06 July 2007 08:46:54 Benjamin Herrenschmidt wrote:
> On Thu, 2007-07-05 at 11:30 +0200, Pavel Machek wrote:
> > 
> > ...but the moment you start blocking tasks that done driver request,
> > you _do_ have mini-freezer of your own, with pretty much the same
> > problems.
> 
> No, not at all the same problems. Those tasks will block, but that will
> be harmless because we won't have some "freezer" things waiting for all
> tasks to reach a "stable" point (calling try_to_freeze()). We just let
> them block wherever we want, as long as it doesn't prevent a -driver-
> from suspending, which should be allright, we have no problem.

Will you be able to guarantee that every place where a task can/will block 
will be harmless place? If so, how will you guarantee that? How will you 
debug issues where a task occasionally doesn't block in the right place, 
particularly instances where it is some less than obvious interaction with 
other tasks?

This is the whole point to having the freezer. It makes things more 
predictable and testable. It shows us, clearly, when process X is the one 
that is causing problems.

> > In another message I shown that removing freezer will not help with
> > FUSE in general case.
> 
> I disagree.

Why?
 
> > It probably does not help with firmware, too; as soon as udev attempts
> > to do something with your wireless card, it is blocked, and if the
> > wireless card needs the firmware from udev, you are deadlocked.
> 
> Firmware load has been a problem since day 1, I've talked about it
> multiple times, it's broken with or without the freezer, and so far, the
> reaction of pretty much everybody has been to dig their head deeper in
> the mud and ignore the problem.
> 
> There are other issues (again, with or without freezer) that should be
> dealt with. For example, drivers that haven't yet got their suspend()
> callback or already have got their resume() may rely on services of the
> kernel that are still blocked, that's where things may go hairy.
> request_firmware() within resume() is a typical example of that.
> 
> There are a few things we should do in that area. For example, once we
> start to call driver suspend's, we should probably set a system wide
> flag that will do things such as:
> 
>  - block usermode helpers (either make call_usermodehelper return
> something like -EBUSY or have it queue up the calls and issue them later
> when thing are resuming, we need to look closely at what semantic we
> want here).
> 
>  - Silently add GFP_NOIO to all allocations, to avoid having things
> blocking in kmalloc() with a mutex held that will deadlock with
> suspend() in a driver for example. Or set some way to have all GFP
> waiters wakeup and fail rather than wait for IOs. It's hard/bizarre but
> necessary, again, with or without a freezer.

GFP_ATOMIC? (In driver suspend, they shouldn't be sleeping either, right?)
 
>  - Deal with the firmware problem. The best way is probably to have an
> async request_firmware interface(). Another thing is, drivers may want
> to cache their firmware in main memory, that sort of thing...
> 
> And that's just a small list off the top of my mind, of known problems
> that will cause deadlocks or misbehaviours today, with or without the
> freezer, and that need to be addressed.

Userspace device drivers too?

Regards,

Nigel


pgpPgBl5GBtHl.pgp
Description: PGP signature

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Benjamin Herrenschmidt


> Yes, fuse could handle being frozen there.  However that would only
> solve part of the problem: an operation waiting for a reply could be
> holding a VFS mutex and some other task may be blocked on that mutex.
> 
> How would you solve freezing those tasks?

That task is implicitely frozen... but the kernel doesn't know it and
thus the freezer timeouts or fails or deadlocks or whatever.

The freezer could be made to ignore tasks that are sleeping in the
kernel assuming that if they go out of it, they'll ultimately reach
do_signal and freeze, but that means they can potentially still issues
IOs which is what the freezer tries to avoid ...

Or the kernel could start tracking dependencies, but then, good luck
implementing that crap.

At the end of the day, I stand my ground, the freezer cannot be made
reliable without massive infrastructure changes or giving up on very
useful features such as fuse among others. Besides, it only partially
"hides" the problem of requests going to drivers, thus it's a bad
solutions.

We would be much better off spending time fixing the drivers to properly
block requests after suspended, and that also gives for free the ability
to do dynamic runtime suspend on them.

And for "trivial" drivers where we don't care, using late_suspend to
power the chip off later when IRQs are off is an easy enough way to
solve it with very little code (though won't help with dynamic PM but
that's not necessarily an issue). No need for a freezer either way.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Libertas: Fix regression in cmd.c introduced in commit 18c96c3497aa871608d57ca5e08de3558159a6c9

2007-07-05 Thread Guillaume LECERF


Hi all,

While reading the last libertas commits, I discovered that this commit
introduced a regression in the libertas wireless driver, inside cmd.c
:

commit 18c96c3497aa871608d57ca5e08de3558159a6c9
[PATCH] libertas: fix WPA associations by handling ENABLE_RSN correctly

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=18c96c3497aa871608d57ca5e08de3558159a6c9

The logic behind has been totally broken, and I think this patch is
worth reviewing.

Regards,

--- a/drivers/net/wireless/libertas/cmd.c
+++ b/drivers/net/wireless/libertas/cmd.c
@@ -241,7 +241,7 @@
   if (*enable)
   penableRSN->enable = cpu_to_le16(cmd_enable_rsn);
   else
-   penableRSN->enable = cpu_to_le16(cmd_enable_rsn);
+   penableRSN->enable = cpu_to_le16(cmd_disable_rsn);
   }

   lbs_deb_leave(LBS_DEB_CMD);

--
Guillaume LECERF
GeeXboX developer
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-07-05 Thread Mike Stroyan

On Thu, Jul 05, 2007 at 10:57:00AM +0200, Zoltan Menyhart wrote:
> KAMEZAWA Hiroyuki wrote:
> >In our test, we confirmed that this can be fixed by flushing L2I just 
> >before SetPageUptodate() in NFS.
> 
> I can agree.
> We can be more permissive: it can be done anywhere after the new
> data is put in place and before nfs_readpage() or nfs_readpages()
> returns.
> 
> I saw your patch http://marc.info/?l=linux-mm=118352909826277=2
> that modifies e.g. mm/memory.c and not the NFS layer.
> 
> Have you proposed a patch against the NFS layer?

  This really doesn't look like a job for the file system layer.
That would require all sorts of file system readpage routines to
be modified to handle memory management details that are already
handled by the memory.c functions.  The do_no_page code is already
dealing with the necessary icache flushing operations.  It just
happens to be doing it with a bad race condition for ia64.

-- 
Mike Stroyan <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull][resend] Input updates for 2.6.22-rc7

2007-07-05 Thread Linus Torvalds

[ Ok, pulled. However, it's time for another installment of "Flame that 
  stupid mail client", because this isn't the first time this bit me ]

On Thu, 5 Jul 2007, Dmitry Torokhov wrote:
> 
> Please consider pulling from:
> 
> Â  Â  Â  Â  git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git 
> for-linus

There's somethign wrong with your emails, and it's very irritating.

I cannot just cut-and-paste the whole line, because your tabs and spaces 
aren't tabs and spaces, they are some horrible abomination.

What _looks_ like a tab above, when I save it and look at it with "od", it 
shows it true nasty life: it's not a tab, and it's not even eight spaces, 
it's four copies of the byte sequence '\302\240 ' ('\xC2\xA0\x20'), ie 
some horrid nasty three-byte sequence where one character is a space, and 
the previous two characters are some utf-8 abomination.

I have no idea what kind of crap you use to generate it, and quite 
frankly, I don't want to know. I just want it to stop, so that when I 
cut-and-paste, I don't get random UTF-8 characters that just *look* like 
spaces but don't act like it, and cause my shell to very reasonably whine 
about the result.

I think the "c2 a0" character is the utf-8 representation of a  
(non-breaking space), but:
 - you are damn well sending text
 - it's followed by a regular space, so it's stupid
 - please don't do it.

It says your user-agent is "Kmail", and maybe there is some way to fix it. 
And if kmail is correct, please make a bug-report to the kmail people. 
Sending hidden invisible utf-8 crap that looks like space, but doesn't act 
like it, is just damn impolite by kmail. I assume you weren't even aware 
of the random crud you are sending out?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Benjamin Herrenschmidt

On Thu, 2007-07-05 at 10:23 -0400, Alan Stern wrote:
> 
> How will that help?  Block the kernel thread in the freezer or block it 
> in the driver -- either way it is blocked.  So how do your deadlocks 
> get resolved?

Because nobody is waiting on that kernel thread anyway without a freezer
so there is no deadlock anymore.

> I disagree with your analysis -- not that it's completely wrong, but it 
> points out an existing basic problem in the kernel.  The kernel should 
> never depend on userspace!  More correctly, a task executing in the 
> kernel should never block with any sort of mutex or other lock held (in 
> a way that would preclude it from being frozen, let's say) while 
> waiting for a response from userspace.
> 
> Then the dependency graph would be easy to construct: User tasks can
> depend on whatever they want, and kernel threads never depend on a user
> task.

In an idea world, there would be no hunger...

> If this contradicts the existing implementations and APIs for userspace 
> filesystems, then so be it.  My conclusion would be that the 
> implementations and APIs should be changed.

Why are you guys working so hard and spending so much energy to try to
avoid doing the right thing is beyond my understanding...

> It _does_ apply to kernel threads.  That's exactly why I wrote above 
> that kernel threads which try to do I/O during a suspend will need 
> extra attention.

Ok none at all if you don't have a freezer.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH] introduce panic_gently

2007-07-05 Thread Bodo Eggert

If the boot process failes to find init or the root fs, the cause has 
usually scrolled off the screen, and because of the panic, it can't be 
reached anymore.

This patch introduces panic_gently, which will allow to use the scrollback 
buffer and to reboot, but it can't be called from unsafe context.

Signed-Off-By: Bodo Eggert <[EMAIL PROTECTED]>

---

This patch seems to work correctly on bochs/i386, except for the qemu
BIOS hangigng after a ctrl_alt_del, but I did run qemu using -kernel and 
-initrd, which might have caused this behaviour.

Is this function useful outside init code?
Should it be disabled on non-console systems/archs?


diff -X dontdiff -pruN 2.6.21.ori/include/linux/kernel.h 
2.6.21/include/linux/kernel.h
--- 2.6.21.ori/include/linux/kernel.h   2007-07-06 00:13:03.0 +0200
+++ 2.6.21/include/linux/kernel.h   2007-07-05 23:35:46.0 +0200
@@ -96,6 +96,8 @@ extern struct atomic_notifier_head panic
 extern long (*panic_blink)(long time);
 NORET_TYPE void panic(const char * fmt, ...)
__attribute__ ((NORET_AND format (printf, 1, 2)));
+NORET_TYPE void panic_gently(const char * fmt, ...)
+   __attribute__ ((NORET_AND format (printf, 1, 2)));
 extern void oops_enter(void);
 extern void oops_exit(void);
 extern int oops_may_print(void);
diff -X dontdiff -pruN 2.6.21.ori/init/do_mounts.c 2.6.21/init/do_mounts.c
--- 2.6.21.ori/init/do_mounts.c 2006-11-29 22:57:37.0 +0100
+++ 2.6.21/init/do_mounts.c 2007-07-05 23:55:35.0 +0200
@@ -315,7 +315,7 @@ retry:
root_device_name, b);
printk("Please append a correct \"root=\" boot option\n");
 
-   panic("VFS: Unable to mount root fs on %s", b);
+   panic_gently("VFS: Unable to mount root fs on %s", b);
}
 
printk("No filesystem could mount root, tried: ");
@@ -325,7 +325,7 @@ retry:
 #ifdef CONFIG_BLOCK
__bdevname(ROOT_DEV, b);
 #endif
-   panic("VFS: Unable to mount root fs on %s", b);
+   panic_gently("VFS: Unable to mount root fs on %s", b);
 out:
putname(fs_names);
 }
diff -X dontdiff -pruN 2.6.21.ori/init/main.c 2.6.21/init/main.c
--- 2.6.21.ori/init/main.c  2007-07-06 00:13:03.0 +0200
+++ 2.6.21/init/main.c  2007-07-05 23:43:15.0 +0200
@@ -579,7 +579,7 @@ asmlinkage void __init start_kernel(void
 */
console_init();
if (panic_later)
-   panic(panic_later, panic_param);
+   panic_gently(panic_later, panic_param);
 
lockdep_info();
 
@@ -769,7 +769,7 @@ static int noinline init_post(void)
run_init_process("/bin/init");
run_init_process("/bin/sh");
 
-   panic("No init found.  Try passing init= option to kernel.");
+   panic_gently("No init found.  Try passing init= option to kernel.");
 }
 
 static int __init init(void * unused)
diff -X dontdiff -pruN 2.6.21.ori/kernel/panic.c 2.6.21/kernel/panic.c
--- 2.6.21.ori/kernel/panic.c   2007-07-06 00:13:03.0 +0200
+++ 2.6.21/kernel/panic.c   2007-07-05 23:48:28.0 +0200
@@ -139,7 +139,64 @@ NORET_TYPE void panic(const char * fmt, 
}
 }
 
+NORET_TYPE void panic_gently(const char * fmt, ...)
+{
+   long i;
+   static char buf[1024];
+   va_list args;
+#if defined(CONFIG_S390)
+unsigned long caller = (unsigned long) __builtin_return_address(0);
+#endif
+
+   va_start(args, fmt);
+   vsnprintf(buf, sizeof(buf), fmt, args);
+   va_end(args);
+   printk(KERN_EMERG "Kernel panic - not syncing: %s\n",buf);
+
+   atomic_notifier_call_chain(_notifier_list, 0, buf);
+
+   if (!panic_blink)
+   panic_blink = no_blink;
+
+   if (panic_timeout > 0) {
+   /*
+* Delay timeout seconds before rebooting the machine. 
+* We can't use the "normal" timers since we just panicked..
+*/
+   printk(KERN_EMERG "Rebooting in %d seconds..",panic_timeout);
+   for (i = 0; i < panic_timeout*1000; ) {
+   touch_nmi_watchdog();
+   i += panic_blink(i);
+   mdelay(1);
+   i++;
+   }
+   /*  This will not be a clean reboot, with everything
+*  shutting down.  But if there is a chance of
+*  rebooting the system it will be rebooted.
+*/
+   kernel_restart(NULL);
+   }
+#ifdef __sparc__
+   {
+   extern int stop_a_enabled;
+   /* Make sure the user can actually press Stop-A (L1-A) */
+   stop_a_enabled = 1;
+   printk(KERN_EMERG "Press Stop-A (L1-A) to return to the boot 
prom\n");
+   }
+#endif
+#if defined(CONFIG_S390)
+disabled_wait(caller);
+#endif
+   for (i = 0;;) {
+   touch_softlockup_watchdog();
+   i += panic_blink(i);
+

Re: [patch 0/3] clean gendisk out of scsi ULD structs

2007-07-05 Thread Al Viro

On Thu, Jul 05, 2007 at 06:09:27PM -0400, Douglas Gilbert wrote:
> Since a scsi_device object is usually a SCSI logical unit,
> one wonders why it would contain a gendisk object. Logical
> units aren't necessarily disks, they might be enclosures or
> just place holders that respond to an INQUIRY (e.g. lun=0
> when the enclosing target has other active lus whose lun!=0).

gendisk is just a tag for requests coming into given queue.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mtd : add FUJITSU MBM29F800BA and ST M29F800AB descriptions

2007-07-05 Thread Philippe De Muyter

On Thu, Jul 05, 2007 at 01:23:16PM -0400, David Woodhouse wrote:
> On Thu, 2007-07-05 at 17:05 +0200, Philippe De Muyter wrote:
> > Add descriptions for Fujitsu MBM29F800BA and ST M29F800AB flash chips.
> > Those chips are compatible (except for the ids) with the AMD
> > AM29F800BB.
> 
> Aren't these CFI-compliant?

No

Philippe
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] util-linux-ng 2.13-rc1

2007-07-05 Thread Jeff Garzik


Christoph Hellwig wrote:

On Wed, Jul 04, 2007 at 12:11:56AM +0200, Karel Zak wrote:

 The package build system is now based on autotools. The build system
 supports  separate CFLAGS and LDFLAGS for suid programs (SUID_CFLAGS,
 SUID_LDFLAGS). For more details see the README file


And this is really dumb.  autotools is a completely pain in the ass and
not useful at all for linux-only tools.



A myth.  It is quite useful for packagers, because of the high Just 
Works(tm) factor.  After porting an entire across several revisions of a 
distro, the autotools-based packages are the ones that work out of the 
box 90% of the time.


The other 90% of _my_ time comes from annoying people who roll their own 
Makefile/build solution, which the packager has to then learn.


It's just not scalable for people to keep building their own build 
solutions.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/3] clean gendisk out of scsi ULD structs

2007-07-05 Thread Al Viro

On Thu, Jul 05, 2007 at 02:06:36PM -0700, Kristen Carlson Accardi wrote:
> Since gendisk will now become part of struct scsi_device, we don't need
> to store this value in any private data structs where they already store
> scsi_device.  This series cleans up a few drivers which did this.

What the hell?  gendisks are *NOT* supposed to be embedded into other
data structures, you'll screw up the lifetime rules for them.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21.5 june 30th to july 1st date hang?

2007-07-05 Thread Chris Friesen


Ernie Petrides wrote:


Only kernels built with the CONFIG_HIGH_RES_TIMERS option enabled were
vulnerable.


As I mentioned in my post to Thomas, we have high res timers disabled 
and were still affected.  Granted, our kernel has been modified so it is 
possible that vanilla would not be affectedI haven't tested it.


Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Thread Migration Preemption

2007-07-05 Thread Steven Rostedt

On Thu, 2007-07-05 at 17:51 -0400, Mathieu Desnoyers wrote:
> Thread Migration Preemption
> 
> This patch adds the ability to protect critical sections from migration to
> another CPU without disabling preemption.
> 
> This will be useful to minimize the amount of preemption disabling for the -rt
> patch. It will help leveraging improvements brought by the local_t types in
> asm/local.h (see Documentation/local_ops.txt). Note that the updates done to
> variables protected by migration_disable must be either atomic or protected 
> from
> concurrent updates done by other threads.
> 
> Typical use:
> 
> migration_disable();
> local_inc(&__get_cpu_var(_local_t_var));
> migration_enable();
> 
> Which will increment the variable atomically wrt the local CPU.
> 
> Comments (such as how to integrate this in the already almost full
> preempt_count) are welcome.

Ingo and Thomas, this also would help with the IRQ thread running a
softirq issue.  We wouldn't need to bind to a CPU the thread. We could
simply disable the ability to migrate while the IRQ thread was handling
the softirqs.

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Benjamin Herrenschmidt

On Thu, 2007-07-05 at 11:30 +0200, Pavel Machek wrote:
> 
> ...but the moment you start blocking tasks that done driver request,
> you _do_ have mini-freezer of your own, with pretty much the same
> problems.

No, not at all the same problems. Those tasks will block, but that will
be harmless because we won't have some "freezer" things waiting for all
tasks to reach a "stable" point (calling try_to_freeze()). We just let
them block wherever we want, as long as it doesn't prevent a -driver-
from suspending, which should be allright, we have no problem.

> In another message I shown that removing freezer will not help with
> FUSE in general case.

I disagree.

> It probably does not help with firmware, too; as soon as udev attempts
> to do something with your wireless card, it is blocked, and if the
> wireless card needs the firmware from udev, you are deadlocked.

Firmware load has been a problem since day 1, I've talked about it
multiple times, it's broken with or without the freezer, and so far, the
reaction of pretty much everybody has been to dig their head deeper in
the mud and ignore the problem.

There are other issues (again, with or without freezer) that should be
dealt with. For example, drivers that haven't yet got their suspend()
callback or already have got their resume() may rely on services of the
kernel that are still blocked, that's where things may go hairy.
request_firmware() within resume() is a typical example of that.

There are a few things we should do in that area. For example, once we
start to call driver suspend's, we should probably set a system wide
flag that will do things such as:

 - block usermode helpers (either make call_usermodehelper return
something like -EBUSY or have it queue up the calls and issue them later
when thing are resuming, we need to look closely at what semantic we
want here).

 - Silently add GFP_NOIO to all allocations, to avoid having things
blocking in kmalloc() with a mutex held that will deadlock with
suspend() in a driver for example. Or set some way to have all GFP
waiters wakeup and fail rather than wait for IOs. It's hard/bizarre but
necessary, again, with or without a freezer.

 - Deal with the firmware problem. The best way is probably to have an
async request_firmware interface(). Another thing is, drivers may want
to cache their firmware in main memory, that sort of thing...

And that's just a small list off the top of my mind, of known problems
that will cause deadlocks or misbehaviours today, with or without the
freezer, and that need to be addressed.

Ben.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Benjamin Herrenschmidt


> There is that.
> 
> OK, bite the bullet. Tasks involved in fuse are special. Give them a flag
> and teach the freezer to put them on ice only after all other task are
> frozen. In a way they are kernel, there's no use denying that.

Yet another ugly hack to work around the fact that the freezer cannot
work reliably ... yuck

Why not spend that energy fixing drivers to properly block requests
instead ?

Ben.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 3/4] Enable link power management for ata drivers

2007-07-05 Thread Andrew Morton

On Thu, 5 Jul 2007 15:33:34 -0700
Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Thu, 5 Jul 2007 13:05:30 -0700
> Kristen Carlson Accardi <[EMAIL PROTECTED]> wrote:
> 
> > +   ATA_DFLAG_IPM   = (1 << 6), /* device supports interface PM */
> > ATA_DFLAG_CFG_MASK  = (1 << 8) - 1,
> 
> I had to bump this to (1<<7), so we've run out.

err, no, we've more than run out because you AN patches took the last one.


I guess we can bump ATA_DFLAG_CFG_MASK up to 12, like this?

--- 
a/include/linux/libata.h~ata-ahci-alpm-enable-link-power-management-for-ata-drivers
+++ a/include/linux/libata.h
@@ -140,11 +140,12 @@ enum {
ATA_DFLAG_ACPI_PENDING  = (1 << 5), /* ACPI resume action pending */
ATA_DFLAG_ACPI_FAILED   = (1 << 6), /* ACPI on devcfg has failed */
ATA_DFLAG_AN= (1 << 7), /* device supports Async 
notification */
-   ATA_DFLAG_CFG_MASK  = (1 << 8) - 1,
+   ATA_DFLAG_IPM   = (1 << 8), /* device supports interface PM */
+   ATA_DFLAG_CFG_MASK  = (1 << 12) - 1,
 
-   ATA_DFLAG_PIO   = (1 << 8), /* device limited to PIO mode */
-   ATA_DFLAG_NCQ_OFF   = (1 << 9), /* device limited to non-NCQ mode */
-   ATA_DFLAG_SPUNDOWN  = (1 << 10), /* XXX: for spindown_compat */
+   ATA_DFLAG_PIO   = (1 << 12), /* device limited to PIO mode */
+   ATA_DFLAG_NCQ_OFF   = (1 << 13), /* device limited to non-NCQ mode 
*/
+   ATA_DFLAG_SPUNDOWN  = (1 << 14), /* XXX: for spindown_compat */
ATA_DFLAG_INIT_MASK = (1 << 16) - 1,
 
ATA_DFLAG_DETACH= (1 << 16),


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 >

1 - 100 of 812 matches

Mail list logo