Re: [PATCH v3 01/17] hashtable: introduce a small and naive hashtable

2012-09-04 Thread Pedro Alves
On 09/04/2012 06:17 PM, Steven Rostedt wrote:
> On Tue, 2012-09-04 at 17:40 +0100, Pedro Alves wrote:
> 
>> BTW, you can also go a step further and remove the need to close with double 
>> }},
>> with something like:
>>
>> #define do_for_each_ftrace_rec(pg, rec)  
>> \
>> for (pg = ftrace_pages_start, rec = &pg->records[pg->index]; 
>> \
>>  pg && rec == &pg->records[pg->index];   
>> \
>>  pg = pg->next)  
>> \
>>   for (rec = pg->records; rec < &pg->records[pg->index]; rec++)
>>
> 
> Yeah, but why bother? It's hidden in a macro, and the extra '{ }' shows
> that this is something "special".

The point of both changes is that there's nothing special in the end
at all.  It all just works...

-- 
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[perf] how to measure offcore events

2012-09-04 Thread Yuanfang Chen
Hello,

I'm trying to measure offcore events
OFFCORE_RESPONSE.ALL_READS.LLC_MISS.DRAM_N (0x3004003F7) using perf
tool. However, I didn't find the way to encode offcore events in
current perf documentation. Can someone help me? Thank you so much.

yuanfang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] slab: do ClearSlabPfmemalloc() for all pages of slab

2012-09-04 Thread Mel Gorman
Right now, we call ClearSlabPfmemalloc() for first page of slab when we
clear SlabPfmemalloc flag. This is fine for most swap-over-network use
cases as it is expected that order-0 pages are in use. Unfortunately it
is possible that that __ac_put_obj() checks SlabPfmemalloc on a tail page
and while this is harmless, it is sloppy. This patch ensures that the head
page is always used.

This problem was originally identified by Joonsoo Kim.

[js1...@gmail.com: Original implementation and problem identification]
Signed-off-by: Mel Gorman 
---
 mm/slab.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 811af03..d34a903 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1000,7 +1000,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, 
struct array_cache *ac,
l3 = cachep->nodelists[numa_mem_id()];
if (!list_empty(&l3->slabs_free) && force_refill) {
struct slab *slabp = virt_to_slab(objp);
-   ClearPageSlabPfmemalloc(virt_to_page(slabp->s_mem));
+   
ClearPageSlabPfmemalloc(virt_to_head_page(slabp->s_mem));
clear_obj_pfmemalloc(&objp);
recheck_pfmemalloc_active(cachep, ac);
return objp;
@@ -1032,7 +1032,7 @@ static void *__ac_put_obj(struct kmem_cache *cachep, 
struct array_cache *ac,
 {
if (unlikely(pfmemalloc_active)) {
/* Some pfmemalloc slabs exist, check if this is one */
-   struct page *page = virt_to_page(objp);
+   struct page *page = virt_to_head_page(objp);
if (PageSlabPfmemalloc(page))
set_obj_pfmemalloc(&objp);
}
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] Small fixes for swap-over-network

2012-09-04 Thread Mel Gorman
This series is 4 small patches posted by Jonsoo Kim and Chuck Lever with
some minor changes applied. They are not critical but they should be fixed
before 3.6 comes out. I've picked them up and reposted to make sure they
did not get lost.

Ordinarily I would say that 1-3 should go through Pekka's slab tree and
the last patch through David Millers linux-net tree but as they are fairly
minor maybe it would be easier if all 4 went through Andrew's tree at the
same time.

The patches have been tested against 3.6-rc4 and they passed the swap over
NFS and NBD tests.

 include/net/sock.h |2 +-
 mm/slab.c  |6 +++---
 mm/slub.c  |   15 ++-
 3 files changed, 14 insertions(+), 9 deletions(-)

-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/4] slab: fix starting index for finding another object

2012-09-04 Thread Mel Gorman
From: Joonsoo Kim 

In array cache, there is a object at index 0, check it.

Signed-off-by: Joonsoo Kim 
Signed-off-by: Mel Gorman 
---
 mm/slab.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index d34a903..c685475 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -983,7 +983,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct 
array_cache *ac,
}
 
/* The caller cannot use PFMEMALLOC objects, find another one */
-   for (i = 1; i < ac->avail; i++) {
+   for (i = 0; i < ac->avail; i++) {
/* If a !PFMEMALLOC object is found, swap them */
if (!is_obj_pfmemalloc(ac->entry[i])) {
objp = ac->entry[i];
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] Squelch compiler warning in sk_rmem_schedule()

2012-09-04 Thread Mel Gorman
From: Chuck Lever 

In file included from linux/include/linux/tcp.h:227:0,
 from linux/include/linux/ipv6.h:221,
 from linux/include/net/ipv6.h:16,
 from linux/include/linux/sunrpc/clnt.h:26,
 from linux/net/sunrpc/stats.c:22:
linux/include/net/sock.h: In function ‘sk_rmem_schedule’:
linux/nfs-2.6/include/net/sock.h:1339:13: warning: comparison between
  signed and unsigned integer expressions [-Wsign-compare]

Seen with gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2) using the
-Wextra option.

[c76562b6: netvm: prevent a stream-specific deadlock] accidentally replaced
the "size" parameter of sk_rmem_schedule() with an unsigned int. This
changes the semantics of the comparison in the return statement.

In sk_wmem_schedule we have syntactically the same comparison, but
"size" is a signed integer.  In addition, __sk_mem_schedule() takes
a signed integer for its "size" parameter, so there is an implicit
type conversion in sk_rmem_schedule() anyway.

Revert the "size" parameter back to a signed integer so that the
semantics of the expressions in both sk_[rw]mem_schedule() are
exactly the same.

Signed-off-by: Chuck Lever 
Signed-off-by: Mel Gorman 
---
 include/net/sock.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 72132ae..adb7da2 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1332,7 +1332,7 @@ static inline bool sk_wmem_schedule(struct sock *sk, int 
size)
 }
 
 static inline bool
-sk_rmem_schedule(struct sock *sk, struct sk_buff *skb, unsigned int size)
+sk_rmem_schedule(struct sock *sk, struct sk_buff *skb, int size)
 {
if (!sk_has_account(sk))
return true;
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] slub: consider pfmemalloc_match() in get_partial_node()

2012-09-04 Thread Mel Gorman
From: Joonsoo Kim 

The function get_partial() is currently not checking pfmemalloc_match()
meaning that it is possible for pfmemalloc pages to leak to non-pfmemalloc
users. This is a problem in the following situation.  Assume that there is
a request from normal allocation and there are no objects in the per-cpu
cache and no node-partial slab.

In this case, slab_alloc enters the slow path and new_slab_objects()
is called which may return a PFMEMALLOC page. As the current user is not
allowed to access PFMEMALLOC page, deactivate_slab() is called ([5091b74a:
mm: slub: optimise the SLUB fast path to avoid pfmemalloc checks]) and
returns an object from PFMEMALLOC page.

Next time, when we get another request from normal allocation, slab_alloc()
enters the slow-path and calls new_slab_objects().  In new_slab_objects(),
we call get_partial() and get a partial slab which was just deactivated
but is a pfmemalloc page. We extract one object from it and re-deactivate.

"deactivate -> re-get in get_partial -> re-deactivate" occures repeatedly.

As a result, access to PFMEMALLOC page is not properly restricted and it
can cause a performance degradation due to frequent deactivation.
deactivation frequently.

This patch changes get_partial_node() to take pfmemalloc_match() into
account and prevents the "deactivate -> re-get in get_partial() scenario.
Instead, new_slab() is called.

Signed-off-by: Joonsoo Kim 
Acked-by: David Rientjes 
Signed-off-by: Mel Gorman 
---
 mm/slub.c |   15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8f78e25..2fdd96f9e9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1524,12 +1524,13 @@ static inline void *acquire_slab(struct kmem_cache *s,
 }
 
 static int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain);
+static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);
 
 /*
  * Try to allocate a partial slab from a specific node.
  */
-static void *get_partial_node(struct kmem_cache *s,
-   struct kmem_cache_node *n, struct kmem_cache_cpu *c)
+static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
+   struct kmem_cache_cpu *c, gfp_t flags)
 {
struct page *page, *page2;
void *object = NULL;
@@ -1545,9 +1546,13 @@ static void *get_partial_node(struct kmem_cache *s,
 
spin_lock(&n->list_lock);
list_for_each_entry_safe(page, page2, &n->partial, lru) {
-   void *t = acquire_slab(s, n, page, object == NULL);
+   void *t;
int available;
 
+   if (!pfmemalloc_match(page, flags))
+   continue;
+
+   t = acquire_slab(s, n, page, object == NULL);
if (!t)
break;
 
@@ -1614,7 +1619,7 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t 
flags,
 
if (n && cpuset_zone_allowed_hardwall(zone, flags) &&
n->nr_partial > s->min_partial) {
-   object = get_partial_node(s, n, c);
+   object = get_partial_node(s, n, c, flags);
if (object) {
/*
 * Return the object even if
@@ -1643,7 +1648,7 @@ static void *get_partial(struct kmem_cache *s, gfp_t 
flags, int node,
void *object;
int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
 
-   object = get_partial_node(s, get_node(s, searchnode), c);
+   object = get_partial_node(s, get_node(s, searchnode), c, flags);
if (object || node != NUMA_NO_NODE)
return object;
 
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] i2c: nomadik: Add Device Tree support to the Nomadik I2C driver

2012-09-04 Thread Linus Walleij
On Tue, Sep 4, 2012 at 4:28 PM, Arnd Bergmann  wrote:

> In this particular case, we don't have a single board file providing a
> struct nmk_i2c_controller definition for platform data, so the best way
> to handle this IMHO is to remove the header file with the platform
> data definition, and just encode the defaults in the driver.

Alessandro Rubini is actively working on bridging this (and
other amba_device primecells) to PCI, that is the reason why it
was recently converted to an amba_device. How is he then supposed to
get the proper parameters into the driver? Note that the PCI ID
is no help at all since the parameters depend on what is connected
to the I2C bus, not on what it itself is connected to. Isn't platform data
used in such cases?

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 06/16] sched: account for blocked load waking back up

2012-09-04 Thread Benjamin Segall
Preeti Murthy  writes:

> Hi Paul,
>
> @@ -1170,20 +1178,42 @@ static inline void enqueue_entity_load_avg(struct 
> cfs_rq *cfs_rq,
>                                                   struct sched_entity *se,
>                                                   int wakeup)
>  {
> -       /* we track migrations using entity decay_count == 0 */
> -       if (unlikely(!se->avg.decay_count)) {
> +       /*
> +        * We track migrations using entity decay_count <= 0, on a wake-up
> +        * migration we use a negative decay count to track the remote 
> decays
> +        * accumulated while sleeping.
> +        */
> +       if (unlikely(se->avg.decay_count <= 0)) {
>                 se->avg.last_runnable_update = rq_of(cfs_rq)->clock_task;
> +               if (se->avg.decay_count) {
> +                       /*
> +                        * In a wake-up migration we have to approximate 
> the
> +                        * time sleeping.  This is because we can't 
> synchronize
> +                        * clock_task between the two cpus, and it is not
> +                        * guaranteed to be read-safe.  Instead, we can
> +                        * approximate this using our carried decays, 
> which are
> +                        * explicitly atomically readable.
> +                        */
> +                       se->avg.last_runnable_update -= 
> (-se->avg.decay_count)
> +                                                       << 20;
> +                       update_entity_load_avg(se, 0);
> +                       /* Indicate that we're now synchronized and on-rq 
> */
> +                       se->avg.decay_count = 0;
> +               }
>                 wakeup = 0;
>         } else {
>                 __synchronize_entity_decay(se);
>
>  
> Should not the last_runnable_update of se get updated in 
> __synchronize_entity_decay()?
> Because it contains the value of the runnable update before going to sleep.If 
> not updated,when
> update_entity_load_avg() is called below during a local wakeup,it will decay 
> the runtime load
> for the duration including the time the sched entity has slept.

If you are asking if it should be updated in the else block (local
wakeup, no migration) here, no:

* __synchronize_entity_decay will decay load_avg_contrib to match the
  decay that the cfs_rq has done, keeping those in sync, and ensuring we
  don't subtract too much when we update our current load average.
* clock_task - last_runnable_update will be the amount of time that the
  task has been blocked. update_entity_load_avg (below) and
  __update_entity_runnable_avg will account this time as non-runnable
  time into runnable_avg_sum/period, and from there onto the cfs_rq via
  __update_entity_load_avg_contrib.

Both of these are necessary, and will happen. In the case of !wakeup,
the task is being moved between groups or is migrating between cpus, and
we pretend (to the best of our ability in the case of migrating between
cpus which may have different clock_tasks) that the task has been
runnable this entire time.

In the more general case, no, it is called from migrate_task_rq_fair,
which doesn't have the necessary locks to read clock_task.

>
> This also means that during 
> dequeue_entity_load_avg(),update_entity_load_avg() needs to be
> called to keep the runnable_avg_sum of the sched entity updated till
> before sleep.

Yes, this happens first thing in dequeue_entity_load_avg.
>
>         }
>
> -       if (wakeup)
> +       /* migrated tasks did not contribute to our blocked load */
> +       if (wakeup) {
>                 subtract_blocked_load_contrib(cfs_rq, 
> se->avg.load_avg_contrib);
> +               update_entity_load_avg(se, 0);
> +       }
>
> -       update_entity_load_avg(se, 0);
>         cfs_rq->runnable_load_avg += se->avg.load_avg_contrib;
> -       update_cfs_rq_blocked_load(cfs_rq);
> +       /* we force update consideration on load-balancer moves */
> +       update_cfs_rq_blocked_load(cfs_rq, !wakeup);
>  }
>
>   --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
> Regards
> Preeti


Re: linux-next: Tree for Sept 4 (cma)

2012-09-04 Thread Randy Dunlap
On 09/04/2012 12:13 AM, Stephen Rothwell wrote:

> Hi all,
> 
> Changes since 20120824:
> 



drivers/base/dma-contiguous.c:351:3: error: expected ';' before '}' token


} else if (ret != -EBUSY) {
>>> break
}


-- 

~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kexec/kdump kernel fails to start

2012-09-04 Thread Flavio Leitner
Hi folks,

I have system that no longer boots kdump kernel. Basically,

# echo c > /proc/sysrq-trigger

to dump a vmcore doesn't work. It just hangs after showing the usual
panic messages. I've bisected the problem and the commit introducing
the issue is the one below.

Any idea?

commit 722bc6b16771ed80871e1fd81c86d3627dda2ac8
Author: WANG Cong   2012-03-05 20:05:13
Committer: Ingo Molnar   2012-03-06 05:38:26
Parent: 550cf00dbc8ee402bef71628cb71246493dd4500 (Merge tag 'mmc-fixes-for-3.3' 
of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc)
Child:  a6fca40f1d7f3e232c9de27c1cebbb9f787fbc4f (x86, tlb: Switch cr3 in 
leave_mm() only when needed)
Branches: master, remotes/origin/master
Follows: v3.3-rc6
Precedes: v3.5-rc1

x86/mm: Fix the size calculation of mapping tables

For machines that enable PSE, the first 2/4M memory region still uses
4K pages, so needs more PTEs in this case, but
find_early_table_space() doesn't count this.

This patch fixes it.

The bug was found via code review, no misbehavior of the kernel
was observed.


Machine details:
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 26
model name  : Intel(R) Core(TM) i7 CPU 920  @ 2.67GHz
stepping: 5
microcode   : 0x11
cpu MHz : 1596.000
cache size  : 8192 KB
physical id : 0
siblings: 8
core id : 0
cpu cores   : 4
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm 
constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc 
aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 
sse4_2 popcnt lahf_lm ida dts tpr_shadow vnmi flexpriority ept vpid
bogomips: 5333.87
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

# free
 total   used   free sharedbuffers cached
Mem:  16161684   117491004412584  0  10212   11421096
-/+ buffers/cache: 317792   15843892
Swap: 17406420  0   17406420


dmesg is attached.

thanks,
fbl




dmesg.log.gz
Description: GNU Zip compressed data


Re: [PATCH] gpio-ich: Share ownership of GPIO groups

2012-09-04 Thread Linus Walleij
On Tue, Sep 4, 2012 at 1:36 PM, Jean Delvare  wrote:

> Any news on this? I'd like to get this patch (or an alternative
> implementation of the same) into kernel 3.7, and its merge window is
> approaching.

I have acked the GPIO part, the rest is up to Sam. He's often in
submarine mode but usually he appears just in due time for the merge
window...

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] i2c: nomadik: Add Device Tree support to the Nomadik I2C driver

2012-09-04 Thread Alessandro Rubini
> Alessandro Rubini is actively working on bridging this (and
> other amba_device primecells) to PCI, that is the reason why it
> was recently converted to an amba_device.

Yes, I've been inactive for a while but I'm on it right now.

> How is he then supposed to get the proper parameters into the
> driver? Note that the PCI ID is no help at all since the parameters
> depend on what is connected to the I2C bus, not on what it itself is
> connected to. Isn't platform data used in such cases?

I'm using platform data currently, but Davide Ciminaghi is actively
working to convert the configuration to device-tree: the way we pass
platform data to the pci device (and thus amba) is not considered
acceptable by Peter Anvin.

I'm thus asking Davide if he's happy to remove the platform data
configuration path right now (I personally wouldn't be very happy, but
I acknowledge it should happen, sooner or later).

/alessandro
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] net: don't allow INET to be not configured

2012-09-04 Thread Stephen Hemminger
There is no reason to expose turning off TCP/IP networking.
If networking is enabled force TCP/IP to enabled. This also
eliminates the time chasing down errors with bogus configurations
generated by 'make randconfig'

For testing, it is still possible to edit Kconfig

Signed-off-by: Stephen Hemminger 


--- a/net/Kconfig   2012-08-15 08:59:22.910704705 -0700
+++ b/net/Kconfig   2012-09-04 10:39:53.654585718 -0700
@@ -51,26 +51,7 @@ source "net/xfrm/Kconfig"
 source "net/iucv/Kconfig"
 
 config INET
-   bool "TCP/IP networking"
-   ---help---
- These are the protocols used on the Internet and on most local
- Ethernets. It is highly recommended to say Y here (this will enlarge
- your kernel by about 400 KB), since some programs (e.g. the X window
- system) use TCP/IP even if your machine is not connected to any
- other computer. You will get the so-called loopback device which
- allows you to ping yourself (great fun, that!).
-
- For an excellent introduction to Linux networking, please read the
- Linux Networking HOWTO, available from
- .
-
- If you say Y here and also to "/proc file system support" and
- "Sysctl support" below, you can change various aspects of the
- behavior of the TCP/IP code by writing to the (virtual) files in
- /proc/sys/net/ipv4/*; the options are explained in the file
- .
-
- Short answer: say Y.
+   def_bool y
 
 if INET
 source "net/ipv4/Kconfig"
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] Fix ACPI BGRT support for images located in EFI boot services memory

2012-09-04 Thread Josh Triplett
On Tue, Sep 04, 2012 at 03:27:20PM +0100, Matt Fleming wrote:
> On Thu, 2012-08-30 at 14:28 -0700, Josh Triplett wrote:
> > The ACPI BGRT lets the OS access the BIOS logo image and its position on the
> > screen at boot time, allowing it to maintain that image on the screen until
> > ready to display something else, making boot more seamless.  This series 
> > fixes
> > support for accessing the boot logo image via the BGRT when the BIOS stores 
> > it
> > in EFI boot services memory, as recommended by the ACPI 5.0 spec.  Linux 
> > needs
> > to copy the image out of boot services memory before reclaiming boot 
> > services
> > memory.
> > 
> > The first patch refactors EFI initialization to defer freeing boot services
> > memory until later in the boot process, after we have ACPI available.  The
> > second patch adds a helper function to look up existing EFI boot services
> > mappings, to avoid re-mapping them.  The third patch moves BGRT 
> > initialization
> > to before the reclamation of boot services memory, copies the logo at that
> > point, and reworks the existing BGRT driver to use that existing copy.
> 
> Since we always end up doing a copy anyway, is there no way we could
> just copy the boot logo *without* deferring freeing the boot services
> code, e.g. move the copy before we do SetVirtualAddressMap()?

Unfortunately not.  We need enough of ACPI available to go read the
BGRT to know what to copy, so we need to defer freeing boot services
code until after we initialize ACPI (and thus everything ACPI needs,
which includes EFI since ACPI looks for root tables there).

> I wouldn't be surprised if some implementations got really cranky if
> we accessed boot services data after we installed a new virtual memory
> map.

Note that I've carefully accessed the boot services data *through* the
new virtual memory map, which should work fine.

> Besides, if we can avoid moving the efi_free_boot_services() call we can
> avoid littering init/main.c with more #ifdef CONFIG_X86 blocks.

Those seem easy enough to convert into appropriate always-available
stubs, if you'd like.  And I could move efi_free_boot_services() inside
efi_late_init(), too, keeping it an internal implementation detail of
EFI initialization.  Would that help?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: don't allow INET to be not configured

2012-09-04 Thread David Miller
From: Stephen Hemminger 
Date: Tue, 4 Sep 2012 10:44:51 -0700

> There is no reason to expose turning off TCP/IP networking.
> If networking is enabled force TCP/IP to enabled. This also
> eliminates the time chasing down errors with bogus configurations
> generated by 'make randconfig'
> 
> For testing, it is still possible to edit Kconfig
> 
> Signed-off-by: Stephen Hemminger 

People have legitimate reasons to enable NET without INET, so let's
just fix the dependency bugs as they are reported.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] Fix ACPI BGRT support for images located in EFI boot services memory

2012-09-04 Thread H. Peter Anvin

On 09/04/2012 10:59 AM, Josh Triplett wrote:


Unfortunately not.  We need enough of ACPI available to go read the
BGRT to know what to copy, so we need to defer freeing boot services
code until after we initialize ACPI (and thus everything ACPI needs,
which includes EFI since ACPI looks for root tables there).


I wouldn't be surprised if some implementations got really cranky if
we accessed boot services data after we installed a new virtual memory
map.


Note that I've carefully accessed the boot services data *through* the
new virtual memory map, which should work fine.



There are some platforms which have bugs in this area, so there are 
other reasons to defer freeing up boot memory until as late in the boot 
process as we can possibly get away with.


free_initmem() is presuambly the place that makes most sense.  This is 
EFI-specific but not x86-specific, let's not commingle those concepts, 
please...


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the final tree (net-next tree related)

2012-09-04 Thread David Miller
From: Stephen Rothwell 
Date: Tue, 4 Sep 2012 16:58:53 +1000

> net/built-in.o: In function `tcp_fastopen_ctx_free':
> tcp_fastopen.c:(.text+0x5cc5c): undefined reference to `crypto_destroy_tfm'
> net/built-in.o: In function `tcp_fastopen_reset_cipher':
> (.text+0x5): undefined reference to `crypto_alloc_base'
> net/built-in.o: In function `tcp_fastopen_reset_cipher':
> (.text+0x5cd6c): undefined reference to `crypto_destroy_tfm'
> 
> Presumably caused by commit 104671636897 ("tcp: TCP Fast Open Server -
> header & support functions") from the net-next tree.  I assume that some
> dependency on the CRYPTO infrastructure is missing.

Thanks for the report, I've pushed the following change to net-next
which should address this:


[PATCH] net: Add INET dependency on aes crypto for the sake of TCP fastopen.

Stephen Rothwell says:


After merging the final tree, today's linux-next build (powerpc
ppc44x_defconfig) failed like this:

net/built-in.o: In function `tcp_fastopen_ctx_free':
tcp_fastopen.c:(.text+0x5cc5c): undefined reference to `crypto_destroy_tfm'
net/built-in.o: In function `tcp_fastopen_reset_cipher':
(.text+0x5): undefined reference to `crypto_alloc_base'
net/built-in.o: In function `tcp_fastopen_reset_cipher':
(.text+0x5cd6c): undefined reference to `crypto_destroy_tfm'

Presumably caused by commit 104671636897 ("tcp: TCP Fast Open Server -
header & support functions") from the net-next tree.  I assume that some
dependency on the CRYPTO infrastructure is missing.

I have reverted commit 1bed966cc3bd ("Merge branch
'tcp_fastopen_server'") for today.


Reported-by: Stephen Rothwell 
Signed-off-by: David S. Miller 
---
 net/Kconfig |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/Kconfig b/net/Kconfig
index 245831b..30b48f5 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -52,6 +52,8 @@ source "net/iucv/Kconfig"
 
 config INET
bool "TCP/IP networking"
+   select CRYPTO
+   select CRYPTO_AES
---help---
  These are the protocols used on the Internet and on most local
  Ethernets. It is highly recommended to say Y here (this will enlarge
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] memcg: first step towards hierarchical controller

2012-09-04 Thread Tejun Heo
Hello,

On Tue, Sep 04, 2012 at 04:35:52PM +0200, Michal Hocko wrote:
...
> The problem is that we don't know whether somebody has an use case which
> cannot be transformed like that. Therefore this patch starts the slow
> transition to hierarchical only memory controller by warning users who
> are using flat hierarchies. The warning triggers only if a subgroup of
> non-root group is created with use_hierarchy==0.
> 
> Signed-off-by: Michal Hocko 

I think this could work as the first step.  Regardless of the involved
steps, the goal is 1. finding out whether there are use cases or users
of flat hierarchy (ugh... even the name is stupid :) and 2. if so,
push them to stop doing that and give them time to do so.  While
userland growing "echo 1" to use_hierarchy isn't optimal, it isn't the
end of the world and something which can be taken care of by the
distros.

That said, I don't see how different this is from the staged way I
suggested other than requiring "echo 1" instead of a mount option.  At
any rate, the two aren't mutually exclusive and this looks good to me.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-09-04 Thread Tejun Heo
Hello,

On Tue, Sep 04, 2012 at 09:54:23AM -0400, Vivek Goyal wrote:
> > Given that we are working around stack depth issues in the
> > filesystems already in several places, and now it seems like there's
> > a reason to work around it in the block layers as well, shouldn't we
> > simply increase the default stack size rather than introduce
> > complexity and performance regressions to try and work around not
> > having enough stack?
> 
> Dave,
> 
> In this particular instance, we really don't have any bug reports of
> stack overflowing. Just discussing what will happen if we make 
> generic_make_request() recursive again.

I think there was one and that's why we added the bio_list thing.

> > I mean, we can deal with it like the ia32 4k stack issue was dealt
> > with (i.e. ignore those stupid XFS people, that's an XFS bug), or
> > we can face the reality that storage stacks have become so complex
> > that 8k is no longer a big enough stack for a modern system
> 
> So first question will be, what's the right stack size? If we make
> generic_make_request() recursive, then at some storage stack depth we will
> overflow stack anyway (if we have created too deep a stack). Hence
> keeping current logic kind of makes sense as in theory we can support
> arbitrary depth of storage stack.

But, yeah, this can't be solved by enlarging the stack size.  The
upper limit is unbound.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: S3C24XX: Add WIZnet W5300E01-ARM board support

2012-09-04 Thread Taehun Kim
2012/9/4 Sylwester Nawrocki :
> On 09/03/2012 07:36 PM, Taehun Kim wrote:
>> +static void __init w5300e01_init(void)
>> +{
>> + s3c_nand_set_platdata(&w5300e01_nand_info);
>> + platform_add_devices(w5300e01_devices, ARRAY_SIZE(w5300e01_devices));
>> +
>> + /* W5300 interrupt pin. */
>> + s3c_gpio_cfgpin(S3C2410_GPF(0), S3C2410_GPIO_IRQ);
>> +
>> + s3c_gpio_cfgpin(S3C2410_GPF(4), S3C2410_GPIO_OUTPUT);
>> + s3c_gpio_cfgpin(S3C2410_GPF(5), S3C2410_GPIO_OUTPUT);
>> + s3c_gpio_cfgpin(S3C2410_GPF(6), S3C2410_GPIO_OUTPUT);
>> + s3c_gpio_cfgpin(S3C2410_GPF(7), S3C2410_GPIO_OUTPUT);
>
> Please don't use these obsolete S3C2410_GPIO_* defines, they will be
> gone soon, if aren't yet.
>
>> + gpio_set_value(S3C2410_GPF(0), 1);
>> + gpio_set_value(S3C2410_GPF(4), 1);
>> + gpio_set_value(S3C2410_GPF(5), 1);
>> + gpio_set_value(S3C2410_GPF(6), 1);
>> + gpio_set_value(S3C2410_GPF(7), 1);
>
> Instead I would do something like:
>
> 8<-
>
> static const struct gpio gpios[] = {
> { S3C2410_GPF(4), GPIOF_OUT_INIT_HIGH, NULL },
> { S3C2410_GPF(5), GPIOF_OUT_INIT_HIGH, NULL },
> { S3C2410_GPF(6), GPIOF_OUT_INIT_HIGH, NULL },
> { S3C2410_GPF(7), GPIOF_OUT_INIT_HIGH, NULL },
> };
>
> if (!WARN_ON(gpio_request_array(gpios, ARRAY_SIZE(gpios)))
> gpios_free_array(gpios);
>
> /* W5300 interrupt pin. */
> if (!WARN_ON(gpio_request(S3C2410_GPF(0), GPIOF_IN, NULL))) {
> s3c_gpio_cfgpin(S3C2410_GPF(0), S3C_GPIO_SFN(2)); /* EINT0 */
> gpio_free(S3C2410_GPF(0));
> }
>
> 8<-
>
> --
>
> Regards,
> Sylwester

Thank you for your feedback. I will change the gpio routine as follows:

-

static const struct gpio w5300e01_gpios[] = {
{ S3C2410_GPF(4), GPIOF_OUT_INIT_HIGH, NULL },
{ S3C2410_GPF(5), GPIOF_OUT_INIT_HIGH, NULL },
{ S3C2410_GPF(6), GPIOF_OUT_INIT_HIGH, NULL },
{ S3C2410_GPF(7), GPIOF_OUT_INIT_HIGH, NULL },
};

static void __init w5300_init(void)
{
/* W5300 interrupt pin. */
if (WARN_ON(gpio_request(S3C2410_GPF(0), "W5300 irq"))) {
pr_err("%s: GPIO request failed.\n", __func__);
return;
}
s3c_gpio_cfgpin(S3C2410_GPF(0), S3C_GPIO_SFN(2)); /* EINT0 */
}

static void __init w5300e01_init(void)
{
s3c_nand_set_platdata(&w5300e01_nand_info);
platform_add_devices(w5300e01_devices, ARRAY_SIZE(w5300e01_devices));

if (WARN_ON(gpio_request_array(w5300e01_gpios, 
ARRAY_SIZE(w5300e01_gpios
pr_err("%s: GPIO request failed\n", __func__);

w5300_init();
s3c_pm_init();
}
-

Does anybody have a other comments?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] staging: comedi: remove pointer math for subdevice access

2012-09-04 Thread Greg KH
On Mon, Aug 20, 2012 at 04:59:16PM -0700, H Hartley Sweeten wrote:
> Convert all the comedi_subdevice pointer access from pointer
> math to array access.
> 
> Signed-off-by: H Hartley Sweeten 
> Cc: Dan Carpenter 
> Cc: Ian Abbott 

This patch doesn't seem to apply, care to break it up into smaller
pieces, and also fold in your "fix the build" patch as well?  That way
we have a chance to get it applied :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/5] fat: eliminate orphaned inode number allocation

2012-09-04 Thread J. Bruce Fields
On Wed, Sep 05, 2012 at 02:07:40AM +0900, OGAWA Hirofumi wrote:
> OGAWA Hirofumi  writes:
> 
> > Namjae Jeon  writes:
> >
> >> From: Namjae Jeon 
> >>
> >> Maintain a list of inode(i_pos) numbers of orphaned inodes (i.e the
> >> inodes that have been unlinked but still having open file
> >> descriptors).At file/directory creation time, skip using such i_pos
> >> values.Removal of the i_pos from the list is done during inode eviction.
> >
> > What happens if the directory (has busy entries) was completely removed?
> >
> >
> > And Al's point is important for NFS too. If you want stable ino for NFS,
> > you never can't change it.
> 
> s/never can't/never can/

If vfat exports aren't fixable, maybe we should just remove that
feature?

I'm afraid that having unfixable half-working vfat exports is just an
attractive nuisance that causes users and developers to waste their
time

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] staging: comedi: remove pointer math for subdevice access

2012-09-04 Thread H Hartley Sweeten
On Tuesday, September 04, 2012 11:34 AM, Greg KH wrote:
> On Mon, Aug 20, 2012 at 04:59:16PM -0700, H Hartley Sweeten wrote:
>> Convert all the comedi_subdevice pointer access from pointer
>> math to array access.
>> 
>> Signed-off-by: H Hartley Sweeten 
>> Cc: Dan Carpenter 
>> Cc: Ian Abbott 
>
> This patch doesn't seem to apply, care to break it up into smaller
> pieces, and also fold in your "fix the build" patch as well?  That way
> we have a chance to get it applied :)

Not a problem. Thanks!

Hartley

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 2/2] virtio-ring: Allocate indirect buffers from cache when possible

2012-09-04 Thread Michael S. Tsirkin
On Tue, Sep 04, 2012 at 07:34:19PM +0300, Avi Kivity wrote:
> On 08/31/2012 12:56 PM, Michael S. Tsirkin wrote:
> > On Fri, Aug 31, 2012 at 11:36:07AM +0200, Sasha Levin wrote:
> >> On 08/30/2012 03:38 PM, Michael S. Tsirkin wrote:
> >> >> +static unsigned int indirect_alloc_thresh = 16;
> >> > Why 16?  Please make is MAX_SG + 1 this makes some sense.
> >> 
> >> Wouldn't MAX_SG mean we always allocate from the cache? Isn't the memory 
> >> waste
> >> too big in this case?
> > 
> > Sorry. I really meant MAX_SKB_FRAGS + 1. MAX_SKB_FRAGS is 17 so gets us
> > threshold of 18. It is less than the size of an skb+shinfo itself so -
> > does it look too big to you? Also why do you think 16 is not too big but
> > 18 is?  If there's a reason then I am fine with 16 too but then please
> > put it in code comment near where the value is set.
> > 
> > Yes this means virtio net always allocates from cache
> > but this is a good thing, isn't it? Gets us more consistent
> > performance.
> 
> kmalloc() also goes to a cache.  Is there a measurable difference?

Yes see 0/2 and followup discussion.

> Ugh, there's an ugly loop in __find_general_cachep(), which really wants
> to be replaced with fls().
> 
> -- 
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Fix load avg vs cpu-hotplug

2012-09-04 Thread tip-bot for Peter Zijlstra
Commit-ID:  f319da0c6894fcf55e21320e40506418a2aad629
Gitweb: http://git.kernel.org/tip/f319da0c6894fcf55e21320e40506418a2aad629
Author: Peter Zijlstra 
AuthorDate: Mon, 20 Aug 2012 11:26:57 +0200
Committer:  Ingo Molnar 
CommitDate: Tue, 4 Sep 2012 14:30:18 +0200

sched: Fix load avg vs cpu-hotplug

Rabik and Paul reported two different issues related to the same few
lines of code.

Rabik's issue is that the nr_uninterruptible migration code is wrong in
that he sees artifacts due to this (Rabik please do expand in more
detail).

Paul's issue is that this code as it stands relies on us using
stop_machine() for unplug, we all would like to remove this assumption
so that eventually we can remove this stop_machine() usage altogether.

The only reason we'd have to migrate nr_uninterruptible is so that we
could use for_each_online_cpu() loops in favour of
for_each_possible_cpu() loops, however since nr_uninterruptible() is the
only such loop and its using possible lets not bother at all.

The problem Rabik sees is (probably) caused by the fact that by
migrating nr_uninterruptible we screw rq->calc_load_active for both rqs
involved.

So don't bother with fancy migration schemes (meaning we now have to
keep using for_each_possible_cpu()) and instead fold any nr_active delta
after we migrate all tasks away to make sure we don't have any skewed
nr_active accounting.

Reported-by: Rakib Mullick 
Reported-by: Paul E. McKenney 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/1345454817.23018.27.camel@twins
Signed-off-by: Ingo Molnar 
---
 kernel/sched/core.c |   31 ++-
 1 files changed, 10 insertions(+), 21 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fbf1fd0..207a81c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5304,27 +5304,17 @@ void idle_task_exit(void)
 }
 
 /*
- * While a dead CPU has no uninterruptible tasks queued at this point,
- * it might still have a nonzero ->nr_uninterruptible counter, because
- * for performance reasons the counter is not stricly tracking tasks to
- * their home CPUs. So we just add the counter to another CPU's counter,
- * to keep the global sum constant after CPU-down:
- */
-static void migrate_nr_uninterruptible(struct rq *rq_src)
-{
-   struct rq *rq_dest = cpu_rq(cpumask_any(cpu_active_mask));
-
-   rq_dest->nr_uninterruptible += rq_src->nr_uninterruptible;
-   rq_src->nr_uninterruptible = 0;
-}
-
-/*
- * remove the tasks which were accounted by rq from calc_load_tasks.
+ * Since this CPU is going 'away' for a while, fold any nr_active delta
+ * we might have. Assumes we're called after migrate_tasks() so that the
+ * nr_active count is stable.
+ *
+ * Also see the comment "Global load-average calculations".
  */
-static void calc_global_load_remove(struct rq *rq)
+static void calc_load_migrate(struct rq *rq)
 {
-   atomic_long_sub(rq->calc_load_active, &calc_load_tasks);
-   rq->calc_load_active = 0;
+   long delta = calc_load_fold_active(rq);
+   if (delta)
+   atomic_long_add(delta, &calc_load_tasks);
 }
 
 /*
@@ -5618,8 +5608,7 @@ migration_call(struct notifier_block *nfb, unsigned long 
action, void *hcpu)
BUG_ON(rq->nr_running != 1); /* the migration thread */
raw_spin_unlock_irqrestore(&rq->lock, flags);
 
-   migrate_nr_uninterruptible(rq);
-   calc_global_load_remove(rq);
+   calc_load_migrate(rq);
break;
 #endif
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Add missing call to calc_load_exit_idle()

2012-09-04 Thread tip-bot for Charles Wang
Commit-ID:  749c8814f08f12baa4a9c2812a7c6ede7d69507d
Gitweb: http://git.kernel.org/tip/749c8814f08f12baa4a9c2812a7c6ede7d69507d
Author: Charles Wang 
AuthorDate: Mon, 20 Aug 2012 16:02:33 +0800
Committer:  Ingo Molnar 
CommitDate: Tue, 4 Sep 2012 14:30:29 +0200

sched: Add missing call to calc_load_exit_idle()

Azat Khuzhin reported high loadavg in Linux v3.6

After checking the upstream scheduler code, I found Peter's commit:

  5167e8d5417b sched/nohz: Rewrite and fix load-avg computation -- again

not fully applied, missing the call to calc_load_exit_idle().

After that idle exit in sampling window will always be calculated
to non-idle, and the load will be higher than normal.

This patch adds the missing call to calc_load_exit_idle().

Signed-off-by: Charles Wang 
Cc: sta...@kernel.org
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1345449754-27130-1-git-send-email-muming...@gmail.com
Signed-off-by: Ingo Molnar 
---
 kernel/time/tick-sched.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 024540f..3a9e5d5 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -573,6 +573,7 @@ static void tick_nohz_restart_sched_tick(struct tick_sched 
*ts, ktime_t now)
tick_do_update_jiffies64(now);
update_cpu_load_nohz();
 
+   calc_load_exit_idle();
touch_softlockup_watchdog();
/*
 * Cancel the scheduled timer and restore the tick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Unthrottle rt runqueues in __disable_runtime()

2012-09-04 Thread tip-bot for Peter Boonstoppel
Commit-ID:  a4c96ae319b8047f62dedbe1eac79e321c185749
Gitweb: http://git.kernel.org/tip/a4c96ae319b8047f62dedbe1eac79e321c185749
Author: Peter Boonstoppel 
AuthorDate: Thu, 9 Aug 2012 15:34:47 -0700
Committer:  Ingo Molnar 
CommitDate: Tue, 4 Sep 2012 14:30:30 +0200

sched: Unthrottle rt runqueues in __disable_runtime()

migrate_tasks() uses _pick_next_task_rt() to get tasks from the
real-time runqueues to be migrated. When rt_rq is throttled
_pick_next_task_rt() won't return anything, in which case
migrate_tasks() can't move all threads over and gets stuck in an
infinite loop.

Instead unthrottle rt runqueues before migrating tasks.

Additionally: move unthrottle_offline_cfs_rqs() to rq_offline_fair()

Signed-off-by: Peter Boonstoppel 
Signed-off-by: Peter Zijlstra 
Cc: Paul Turner 
Link: 
http://lkml.kernel.org/r/5fbf8e85ca34454794f0f7ecba79798f379d364...@hqmail04.nvidia.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/core.c  |3 ---
 kernel/sched/fair.c  |7 +--
 kernel/sched/rt.c|1 +
 kernel/sched/sched.h |1 -
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 207a81c..a4ea245 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5342,9 +5342,6 @@ static void migrate_tasks(unsigned int dead_cpu)
 */
rq->stop = NULL;
 
-   /* Ensure any throttled groups are reachable by pick_next_task */
-   unthrottle_offline_cfs_rqs(rq);
-
for ( ; ; ) {
/*
 * There's this thread running, bail when that's the only
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c219bf8..86ad83c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2052,7 +2052,7 @@ static void destroy_cfs_bandwidth(struct cfs_bandwidth 
*cfs_b)
hrtimer_cancel(&cfs_b->slack_timer);
 }
 
-void unthrottle_offline_cfs_rqs(struct rq *rq)
+static void unthrottle_offline_cfs_rqs(struct rq *rq)
 {
struct cfs_rq *cfs_rq;
 
@@ -2106,7 +2106,7 @@ static inline struct cfs_bandwidth 
*tg_cfs_bandwidth(struct task_group *tg)
return NULL;
 }
 static inline void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
-void unthrottle_offline_cfs_rqs(struct rq *rq) {}
+static inline void unthrottle_offline_cfs_rqs(struct rq *rq) {}
 
 #endif /* CONFIG_CFS_BANDWIDTH */
 
@@ -4956,6 +4956,9 @@ static void rq_online_fair(struct rq *rq)
 static void rq_offline_fair(struct rq *rq)
 {
update_sysctl();
+
+   /* Ensure any throttled groups are reachable by pick_next_task */
+   unthrottle_offline_cfs_rqs(rq);
 }
 
 #endif /* CONFIG_SMP */
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 944cb68..e0b7ba9 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -691,6 +691,7 @@ balanced:
 * runtime - in which case borrowing doesn't make sense.
 */
rt_rq->rt_runtime = RUNTIME_INF;
+   rt_rq->rt_throttled = 0;
raw_spin_unlock(&rt_rq->rt_runtime_lock);
raw_spin_unlock(&rt_b->rt_runtime_lock);
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f6714d0..0848fa3 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1144,7 +1144,6 @@ extern void print_rt_stats(struct seq_file *m, int cpu);
 
 extern void init_cfs_rq(struct cfs_rq *cfs_rq);
 extern void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq);
-extern void unthrottle_offline_cfs_rqs(struct rq *rq);
 
 extern void account_cfs_bandwidth_used(int enabled, int was_enabled);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 1/1] ieee802154: MRF24J40 driver

2012-09-04 Thread David Miller
From: Alan Ott 
Date: Sun,  2 Sep 2012 21:44:13 -0400

> Driver for the Microchip MRF24J40 802.15.4 WPAN module.
> 
> Signed-off-by: Alan Ott 

Applied to net-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Fix kernel-doc warnings in kernel/sched/ fair.c

2012-09-04 Thread tip-bot for Randy Dunlap
Commit-ID:  9450d57eab5cad36774c297da123062744472588
Gitweb: http://git.kernel.org/tip/9450d57eab5cad36774c297da123062744472588
Author: Randy Dunlap 
AuthorDate: Sat, 18 Aug 2012 17:45:08 -0700
Committer:  Ingo Molnar 
CommitDate: Tue, 4 Sep 2012 14:30:49 +0200

sched: Fix kernel-doc warnings in kernel/sched/fair.c

Fix two kernel-doc warnings in kernel/sched/fair.c:

  Warning(kernel/sched/fair.c:3660): Excess function parameter 'cpus' 
description in 'update_sg_lb_stats'
  Warning(kernel/sched/fair.c:3806): Excess function parameter 'cpus' 
description in 'update_sd_lb_stats'

Signed-off-by: Randy Dunlap 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/50303714.3090...@xenotime.net
Signed-off-by: Ingo Molnar 
---
 kernel/sched/fair.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 86ad83c..42d9df6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3658,7 +3658,6 @@ fix_small_capacity(struct sched_domain *sd, struct 
sched_group *group)
  * @group: sched_group whose statistics are to be updated.
  * @load_idx: Load index of sched_domain of this_cpu for load calc.
  * @local_group: Does group contain this_cpu.
- * @cpus: Set of cpus considered for load balancing.
  * @balance: Should we balance.
  * @sgs: variable to hold the statistics for this group.
  */
@@ -3805,7 +3804,6 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 /**
  * update_sd_lb_stats - Update sched_domain's statistics for load balancing.
  * @env: The load balancing environment.
- * @cpus: Set of cpus considered for load balancing.
  * @balance: Should we balance.
  * @sds: variable to hold the statistics for this sched_domain.
  */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Remove AFFINE_WAKEUPS feature flag

2012-09-04 Thread tip-bot for Namhyung Kim
Commit-ID:  c751134ef8b070070d5f06348286b29d86424677
Gitweb: http://git.kernel.org/tip/c751134ef8b070070d5f06348286b29d86424677
Author: Namhyung Kim 
AuthorDate: Thu, 16 Aug 2012 13:21:05 +0900
Committer:  Ingo Molnar 
CommitDate: Tue, 4 Sep 2012 14:31:31 +0200

sched: Remove AFFINE_WAKEUPS feature flag

Commit beac4c7e4a1c ("sched: Remove AFFINE_WAKEUPS feature") removed
use of the flag but left the definition. Get rid of it.

Signed-off-by: Namhyung Kim 
Signed-off-by: Peter Zijlstra 
Cc: Mike Galbraith 
Link: 
http://lkml.kernel.org/r/1345090865-20851-1-git-send-email-namhy...@kernel.org
Signed-off-by: Ingo Molnar 
---
 kernel/sched/features.h |8 
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index de00a48..c38f52e 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -12,14 +12,6 @@ SCHED_FEAT(GENTLE_FAIR_SLEEPERS, true)
 SCHED_FEAT(START_DEBIT, true)
 
 /*
- * Based on load and program behaviour, see if it makes sense to place
- * a newly woken task on the same cpu as the task that woke it --
- * improve cache locality. Typically used with SYNC wakeups as
- * generated by pipes and the like, see also SYNC_WAKEUPS.
- */
-SCHED_FEAT(AFFINE_WAKEUPS, true)
-
-/*
  * Prefer to schedule the task we woke last (assuming it failed
  * wakeup-preemption), since its likely going to consume data we
  * touched, increases cache locality.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched/debug: Limit sd->*_idx range on sysctl

2012-09-04 Thread tip-bot for Namhyung Kim
Commit-ID:  201c373e8e4823700d3160d5c28e1ab18fd1193e
Gitweb: http://git.kernel.org/tip/201c373e8e4823700d3160d5c28e1ab18fd1193e
Author: Namhyung Kim 
AuthorDate: Thu, 16 Aug 2012 17:03:24 +0900
Committer:  Ingo Molnar 
CommitDate: Tue, 4 Sep 2012 14:31:32 +0200

sched/debug: Limit sd->*_idx range on sysctl

Various sd->*_idx's are used for refering the rq's load average table
when selecting a cpu to run.  However they can be set to any number
with sysctl knobs so that it can crash the kernel if something bad is
given. Fix it by limiting them into the actual range.

Signed-off-by: Namhyung Kim 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1345104204-8317-1-git-send-email-namhy...@kernel.org
Signed-off-by: Ingo Molnar 
---
 kernel/sched/core.c |   35 ++-
 1 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ae66229..ec0f2b8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4896,16 +4896,25 @@ static void sd_free_ctl_entry(struct ctl_table **tablep)
*tablep = NULL;
 }
 
+static int min_load_idx = 0;
+static int max_load_idx = CPU_LOAD_IDX_MAX;
+
 static void
 set_table_entry(struct ctl_table *entry,
const char *procname, void *data, int maxlen,
-   umode_t mode, proc_handler *proc_handler)
+   umode_t mode, proc_handler *proc_handler,
+   bool load_idx)
 {
entry->procname = procname;
entry->data = data;
entry->maxlen = maxlen;
entry->mode = mode;
entry->proc_handler = proc_handler;
+
+   if (load_idx) {
+   entry->extra1 = &min_load_idx;
+   entry->extra2 = &max_load_idx;
+   }
 }
 
 static struct ctl_table *
@@ -4917,30 +4926,30 @@ sd_alloc_ctl_domain_table(struct sched_domain *sd)
return NULL;
 
set_table_entry(&table[0], "min_interval", &sd->min_interval,
-   sizeof(long), 0644, proc_doulongvec_minmax);
+   sizeof(long), 0644, proc_doulongvec_minmax, false);
set_table_entry(&table[1], "max_interval", &sd->max_interval,
-   sizeof(long), 0644, proc_doulongvec_minmax);
+   sizeof(long), 0644, proc_doulongvec_minmax, false);
set_table_entry(&table[2], "busy_idx", &sd->busy_idx,
-   sizeof(int), 0644, proc_dointvec_minmax);
+   sizeof(int), 0644, proc_dointvec_minmax, true);
set_table_entry(&table[3], "idle_idx", &sd->idle_idx,
-   sizeof(int), 0644, proc_dointvec_minmax);
+   sizeof(int), 0644, proc_dointvec_minmax, true);
set_table_entry(&table[4], "newidle_idx", &sd->newidle_idx,
-   sizeof(int), 0644, proc_dointvec_minmax);
+   sizeof(int), 0644, proc_dointvec_minmax, true);
set_table_entry(&table[5], "wake_idx", &sd->wake_idx,
-   sizeof(int), 0644, proc_dointvec_minmax);
+   sizeof(int), 0644, proc_dointvec_minmax, true);
set_table_entry(&table[6], "forkexec_idx", &sd->forkexec_idx,
-   sizeof(int), 0644, proc_dointvec_minmax);
+   sizeof(int), 0644, proc_dointvec_minmax, true);
set_table_entry(&table[7], "busy_factor", &sd->busy_factor,
-   sizeof(int), 0644, proc_dointvec_minmax);
+   sizeof(int), 0644, proc_dointvec_minmax, false);
set_table_entry(&table[8], "imbalance_pct", &sd->imbalance_pct,
-   sizeof(int), 0644, proc_dointvec_minmax);
+   sizeof(int), 0644, proc_dointvec_minmax, false);
set_table_entry(&table[9], "cache_nice_tries",
&sd->cache_nice_tries,
-   sizeof(int), 0644, proc_dointvec_minmax);
+   sizeof(int), 0644, proc_dointvec_minmax, false);
set_table_entry(&table[10], "flags", &sd->flags,
-   sizeof(int), 0644, proc_dointvec_minmax);
+   sizeof(int), 0644, proc_dointvec_minmax, false);
set_table_entry(&table[11], "name", sd->name,
-   CORENAME_MAX_SIZE, 0444, proc_dostring);
+   CORENAME_MAX_SIZE, 0444, proc_dostring, false);
/* &table[12] is terminator */
 
return table;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Add time unit suffix to sched sysctl knobs

2012-09-04 Thread tip-bot for Namhyung Kim
Commit-ID:  d00535db42805e9ae5eadf1b4a86e01e85674b0c
Gitweb: http://git.kernel.org/tip/d00535db42805e9ae5eadf1b4a86e01e85674b0c
Author: Namhyung Kim 
AuthorDate: Thu, 16 Aug 2012 11:15:30 +0900
Committer:  Ingo Molnar 
CommitDate: Tue, 4 Sep 2012 14:31:34 +0200

sched: Add time unit suffix to sched sysctl knobs

Unlike others, sched_migration_cost, sched_time_avg and
sched_shares_window doesn't have time unit as suffix. Add them.

Signed-off-by: Namhyung Kim 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1345083330-19486-1-git-send-email-namhy...@kernel.org
Signed-off-by: Ingo Molnar 
---
 kernel/sysctl.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 87174ef..81c7b1a 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -307,7 +307,7 @@ static struct ctl_table kern_table[] = {
.extra2 = &max_sched_tunable_scaling,
},
{
-   .procname   = "sched_migration_cost",
+   .procname   = "sched_migration_cost_ns",
.data   = &sysctl_sched_migration_cost,
.maxlen = sizeof(unsigned int),
.mode   = 0644,
@@ -321,14 +321,14 @@ static struct ctl_table kern_table[] = {
.proc_handler   = proc_dointvec,
},
{
-   .procname   = "sched_time_avg",
+   .procname   = "sched_time_avg_ms",
.data   = &sysctl_sched_time_avg,
.maxlen = sizeof(unsigned int),
.mode   = 0644,
.proc_handler   = proc_dointvec,
},
{
-   .procname   = "sched_shares_window",
+   .procname   = "sched_shares_window_ns",
.data   = &sysctl_sched_shares_window,
.maxlen = sizeof(unsigned int),
.mode   = 0644,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Remove useless code in yield_to()

2012-09-04 Thread tip-bot for Michael Wang
Commit-ID:  38b8dd6f87398524d02c21ff614c507ba8c9d295
Gitweb: http://git.kernel.org/tip/38b8dd6f87398524d02c21ff614c507ba8c9d295
Author: Michael Wang 
AuthorDate: Tue, 3 Jul 2012 14:34:02 +0800
Committer:  Ingo Molnar 
CommitDate: Tue, 4 Sep 2012 14:31:42 +0200

sched: Remove useless code in yield_to()

It's impossible to enter the else branch if we have set
skip_clock_update in task_yield_fair(), as yield_to_task_fair()
 will directly return true after invoke task_yield_fair().

Signed-off-by: Michael Wang 
Acked-by: Mike Galbraith 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/4ff2925a.9060...@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/core.c |7 ---
 1 files changed, 0 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ec0f2b8..c46a011 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4348,13 +4348,6 @@ again:
 */
if (preempt && rq != p_rq)
resched_task(p_rq->curr);
-   } else {
-   /*
-* We might have set it in task_yield_fair(), but are
-* not going to schedule(), so don't want to skip
-* the next update.
-*/
-   rq->skip_clock_update = 0;
}
 
 out:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:perf/urgent] perf_event: Switch to internal refcount, fix race with close()

2012-09-04 Thread tip-bot for Al Viro
Commit-ID:  a6fa941d94b411bbd2b6421ffbde6db3c93e65ab
Gitweb: http://git.kernel.org/tip/a6fa941d94b411bbd2b6421ffbde6db3c93e65ab
Author: Al Viro 
AuthorDate: Mon, 20 Aug 2012 14:59:25 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 4 Sep 2012 17:29:22 +0200

perf_event: Switch to internal refcount, fix race with close()

Don't mess with file refcounts (or keep a reference to file, for
that matter) in perf_event.  Use explicit refcount of its own
instead.  Deal with the race between the final reference to event
going away and new children getting created for it by use of
atomic_long_inc_not_zero() in inherit_event(); just have the
latter free what it had allocated and return NULL, that works
out just fine (children of siblings of something doomed are
created as singletons, same as if the child of leader had been
created and immediately killed).

Signed-off-by: Al Viro 
Cc: sta...@kernel.org
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/20120820135925.gg23...@zeniv.linux.org.uk
Signed-off-by: Ingo Molnar 
---
 include/linux/perf_event.h |2 +-
 kernel/events/core.c   |   62 +++
 2 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 7602ccb..ad04dfc 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -926,7 +926,7 @@ struct perf_event {
struct hw_perf_eventhw;
 
struct perf_event_context   *ctx;
-   struct file *filp;
+   atomic_long_t   refcount;
 
/*
 * These accumulate total time (in nanoseconds) that children
diff --git a/kernel/events/core.c b/kernel/events/core.c
index b7935fc..efef428 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2935,12 +2935,12 @@ EXPORT_SYMBOL_GPL(perf_event_release_kernel);
 /*
  * Called when the last reference to the file is gone.
  */
-static int perf_release(struct inode *inode, struct file *file)
+static void put_event(struct perf_event *event)
 {
-   struct perf_event *event = file->private_data;
struct task_struct *owner;
 
-   file->private_data = NULL;
+   if (!atomic_long_dec_and_test(&event->refcount))
+   return;
 
rcu_read_lock();
owner = ACCESS_ONCE(event->owner);
@@ -2975,7 +2975,13 @@ static int perf_release(struct inode *inode, struct file 
*file)
put_task_struct(owner);
}
 
-   return perf_event_release_kernel(event);
+   perf_event_release_kernel(event);
+}
+
+static int perf_release(struct inode *inode, struct file *file)
+{
+   put_event(file->private_data);
+   return 0;
 }
 
 u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
@@ -3227,7 +3233,7 @@ unlock:
 
 static const struct file_operations perf_fops;
 
-static struct perf_event *perf_fget_light(int fd, int *fput_needed)
+static struct file *perf_fget_light(int fd, int *fput_needed)
 {
struct file *file;
 
@@ -3241,7 +3247,7 @@ static struct perf_event *perf_fget_light(int fd, int 
*fput_needed)
return ERR_PTR(-EBADF);
}
 
-   return file->private_data;
+   return file;
 }
 
 static int perf_event_set_output(struct perf_event *event,
@@ -3273,19 +3279,21 @@ static long perf_ioctl(struct file *file, unsigned int 
cmd, unsigned long arg)
 
case PERF_EVENT_IOC_SET_OUTPUT:
{
+   struct file *output_file = NULL;
struct perf_event *output_event = NULL;
int fput_needed = 0;
int ret;
 
if (arg != -1) {
-   output_event = perf_fget_light(arg, &fput_needed);
-   if (IS_ERR(output_event))
-   return PTR_ERR(output_event);
+   output_file = perf_fget_light(arg, &fput_needed);
+   if (IS_ERR(output_file))
+   return PTR_ERR(output_file);
+   output_event = output_file->private_data;
}
 
ret = perf_event_set_output(event, output_event);
if (output_event)
-   fput_light(output_event->filp, fput_needed);
+   fput_light(output_file, fput_needed);
 
return ret;
}
@@ -5950,6 +5958,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 
mutex_init(&event->mmap_mutex);
 
+   atomic_long_set(&event->refcount, 1);
event->cpu  = cpu;
event->attr = *attr;
event->group_leader = group_leader;
@@ -6260,12 +6269,12 @@ SYSCALL_DEFINE5(perf_event_open,
return event_fd;
 
if (group_fd != -1) {
-   group_leader = perf_fget_light(group_fd, &fput_needed);
-   if (IS_ERR(group_leader)) {
-   err = PTR_ERR(group_leader);
+

[tip:perf/urgent] perf/x86: Enable Intel Cedarview Atom suppport

2012-09-04 Thread tip-bot for Stephane Eranian
Commit-ID:  3ec18cd8b8f8395d0df604c62ab3bc2cf3a966b4
Gitweb: http://git.kernel.org/tip/3ec18cd8b8f8395d0df604c62ab3bc2cf3a966b4
Author: Stephane Eranian 
AuthorDate: Mon, 20 Aug 2012 11:24:21 +0200
Committer:  Ingo Molnar 
CommitDate: Tue, 4 Sep 2012 17:29:23 +0200

perf/x86: Enable Intel Cedarview Atom suppport

This patch enables perf_events support for Intel Cedarview
Atom (model 54) processors. Support includes PEBS and LBR.
Tested on my Atom N2600 netbook.

Signed-off-by: Stephane Eranian 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/20120820092421.GA11284@quad
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/cpu/perf_event_intel.c |1 +
 arch/x86/kernel/cpu/perf_event_intel_lbr.c |3 ++-
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 7f2739e..0d3d63a 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2008,6 +2008,7 @@ __init int intel_pmu_init(void)
break;
 
case 28: /* Atom */
+   case 54: /* Cedariew */
memcpy(hw_cache_event_ids, atom_hw_cache_event_ids,
   sizeof(hw_cache_event_ids));
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index 520b426..da02e9c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -686,7 +686,8 @@ void intel_pmu_lbr_init_atom(void)
 * to have an operational LBR which can freeze
 * on PMU interrupt
 */
-   if (boot_cpu_data.x86_mask < 10) {
+   if (boot_cpu_data.x86_model == 28
+   && boot_cpu_data.x86_mask < 10) {
pr_cont("LBR disabled due to erratum");
return;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Squashfs-devel] PROBLEM: mount empty SquashFS

2012-09-04 Thread Geert Uytterhoeven
Hi Phillip,

On Wed, Aug 1, 2012 at 6:25 AM, Phillip Lougher
 wrote:
> Cyril Strejc wrote:
>> I have problem when mounting empty SquashFS. Mount syscall ends with EINVAL.
>>
>> Kernel vesions: mainline
>> SquashFS tools version: 4.2
>>
>> Steps to reproduce:
>> 1. create empty directory (mkdir empty)
>> 2. create SquashFS image (my mksquashfs output below)
>> 3. mount image using block or loop device (strace output below)
>>
>> I've add some printk to
>> super.c: squashfs_fill_super()
>>
>> /* code starts here */
>> ...
>> handle_fragments:
>>  fragments = le32_to_cpu(sblk->fragments);
>>  printk("fragments = %u\n", fragments);
>> ...
>> check_directory_table:
>>  /* Sanity check directory_table */
>>  if (msblk->directory_table >= next_table) {
>>  printk("directory_table = %llu, next_table = %llu\n",
>> msblk->directory_table, next_table);
>>  err = -EINVAL;
>>  printk("mount error: 16\n");
>>  goto failed_mount;
>>  }
>> ...
>>
>> dmesg after mount:
>> fragments = 0
>> directory_table = 125, next_table = 125
>> mount error: 16
>>
>>
>> I hardly understand these details. Please, do You have any idea?
>>
>
> Hi Cyril,
>
> This is a Squashfs kernel bug introduced by some extra superblock
> sanity checks added in kernel 3.0.  These extra sanity checks were
> necessary to harden Squashfs against corrupted Squashfs filesystems
> generated by the latest version of fsfuzzer (a tool used to randomly
> corrupt filesystems with the aim of making the filesystem code
> behave badly).
>
> I discovered the sanity checks mistakenly flagged empty filesystems
> as invalid in January, and added a fix to the mainline kernel, FYI
> the commit is here:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=cc37f75a9ffbbfcb1c3297534f293c8284e3c5a6
>
> This bug has been fixed in kernel versions 3.3 and newer, but, kernel
> versions 3.0, 3.1 and 3.2 unfortunately have this bug.
>
> There's really only three solutions to this problem:
>
> - upgrade to a later kernel, 3.3 or newer,
> - apply the above commit to your kernel, or
> - avoid generating empty filesystems and trying to mount them
>
> The one obvious question that arises here is why are you generating
> completely empty filesystems and then trying to mount them?  An
> empty Squashfs filesystem doesn't seem to serve any useful purpose?

I can easily imagine a system that has an optional filesystem mounted,
which may be empty. For such a system, it's a regression.

So I think this warrants application to the stable 3.0, 3.1, and 3.2 branches.
Commit cc37f75a9ffbbfcb1c3297534f293c8284e3c5a6 ("Squashfs: fix
mount time sanity check for corrupted superblock") seems to cherry-pick just
fine on v3.0.42, v3.1.10, and v3.2.28.

>> mksquashfs command output:
>> --
>> localhost:~ # mksquashfs emptydir test.bin
>> Parallel mksquashfs: Using 1 processor
>> Creating 4.0 filesystem on test.bin, block size 131072.
>>
>> Exportable Squashfs 4.0 filesystem, gzip compressed, data block size 131072
>>  compressed data, compressed metadata, compressed fragments, no
>> xattrs
>>  duplicates are removed
>> Filesystem size 0.15 Kbytes (0.00 Mbytes)
>>  99.37% of uncompressed filesystem size (0.15 Kbytes)
>> Inode table size 29 bytes (0.03 Kbytes)
>>  85.29% of uncompressed inode table size (34 bytes)
>> Directory table size 0 bytes (0.00 Kbytes)
>>  nan% of uncompressed directory table size (0 bytes)
>> Number of duplicate files found 0
>> Number of inodes 1
>> Number of files 0
>> Number of fragments 0
>> Number of symbolic links  0
>> Number of device nodes 0
>> Number of fifo nodes 0
>> Number of socket nodes 0
>> Number of directories 1
>> Number of ids (unique uids + gids) 1
>> Number of uids 1
>>  root (0)
>> Number of gids 1
>>  root (0)
>> 
>>
>> mount command output (strace):
>> -
>> mount("/dev/loop0", "/mnt", "squashfs", MS_MGC_VAL, NULL) = -1 EINVAL
>> (Invalid argument)
>>
>> anouther mount version (empty image on MTD device):
>> ---
>> mount("/dev/mtdblock4", "/mnt", "squashfs", MS_RDONLY|MS_SILENT, NULL) =
>> -1 EINVAL (Invalid argument)

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:perf/urgent] perf/hwpb: Invoke __perf_event_disable() if interrupts are already disabled

2012-09-04 Thread tip-bot for K.Prasad
Commit-ID:  500ad2d8b01390c98bc6dce068bccfa9534b8212
Gitweb: http://git.kernel.org/tip/500ad2d8b01390c98bc6dce068bccfa9534b8212
Author: K.Prasad 
AuthorDate: Thu, 2 Aug 2012 13:46:35 +0530
Committer:  Ingo Molnar 
CommitDate: Tue, 4 Sep 2012 17:29:53 +0200

perf/hwpb: Invoke __perf_event_disable() if interrupts are already disabled

While debugging a warning message on PowerPC while using hardware
breakpoints, it was discovered that when perf_event_disable is invoked
through hw_breakpoint_handler function with interrupts disabled, a
subsequent IPI in the code path would trigger a WARN_ON_ONCE message in
smp_call_function_single function.

This patch calls __perf_event_disable() when interrupts are already
disabled, instead of perf_event_disable().

Reported-by: Edjunior Barbosa Machado 
Signed-off-by: K.Prasad 
[naveen.n@linux.vnet.ibm.com: v3: Check to make sure we target current task]
Signed-off-by: Naveen N. Rao 
Acked-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/20120802081635.5811.17737.stgit@localhost.localdomain
[ Fixed build error on MIPS. ]
Signed-off-by: Ingo Molnar 
---
 include/linux/perf_event.h|2 ++
 kernel/events/core.c  |2 +-
 kernel/events/hw_breakpoint.c |   11 ++-
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ad04dfc..33ed9d6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1296,6 +1296,7 @@ extern int perf_swevent_get_recursion_context(void);
 extern void perf_swevent_put_recursion_context(int rctx);
 extern void perf_event_enable(struct perf_event *event);
 extern void perf_event_disable(struct perf_event *event);
+extern int __perf_event_disable(void *info);
 extern void perf_event_task_tick(void);
 #else
 static inline void
@@ -1334,6 +1335,7 @@ static inline int  
perf_swevent_get_recursion_context(void)   { return -1; }
 static inline void perf_swevent_put_recursion_context(int rctx)
{ }
 static inline void perf_event_enable(struct perf_event *event) { }
 static inline void perf_event_disable(struct perf_event *event)
{ }
+static inline int __perf_event_disable(void *info) { 
return -1; }
 static inline void perf_event_task_tick(void)  { }
 #endif
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index efef428..7fee567 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1253,7 +1253,7 @@ retry:
 /*
  * Cross CPU call to disable a performance event
  */
-static int __perf_event_disable(void *info)
+int __perf_event_disable(void *info)
 {
struct perf_event *event = info;
struct perf_event_context *ctx = event->ctx;
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index bb38c4d..9a7b487 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -453,7 +453,16 @@ int modify_user_hw_breakpoint(struct perf_event *bp, 
struct perf_event_attr *att
int old_type = bp->attr.bp_type;
int err = 0;
 
-   perf_event_disable(bp);
+   /*
+* modify_user_hw_breakpoint can be invoked with IRQs disabled and 
hence it
+* will not be possible to raise IPIs that invoke __perf_event_disable.
+* So call the function directly after making sure we are targeting the
+* current task.
+*/
+   if (irqs_disabled() && bp->ctx && bp->ctx->task == current)
+   __perf_event_disable(bp);
+   else
+   perf_event_disable(bp);
 
bp->attr.bp_addr = attr->bp_addr;
bp->attr.bp_type = attr->bp_type;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [PATCH] extcon-max8997: remove usage of ret in max8997_muic_handle_charger_type_detach

2012-09-04 Thread Devendra Naga
On Tue, Sep 4, 2012 at 11:17 AM, MyungJoo Ham  wrote:
>> any comments ?
>>
>
> Sorry for replying late.
>
> It has been applied to the extcon-for-next tree at
> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/extcon-for-next
> , which will appear in hours.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 1/2] virtio_console: Add support for DMA memory allocation

2012-09-04 Thread Michael S. Tsirkin
On Tue, Sep 04, 2012 at 06:58:47PM +0200, Sjur Brændeland wrote:
> Hi Michael,
> 
> > Exactly. Though if we just fail load it will be much less code.
> >
> > Generally, using a feature bit for this is a bit of a problem though:
> > normally driver is expected to be able to simply ignore
> > a feature bit. In this case driver is required to
> > do something so a feature bit is not a good fit.
> > I am not sure what the right thing to do is.
> 
> I see - so in order to avoid the binding between driver and device
> there are two options I guess. Either make virtio_dev_match() or
> virtcons_probe() fail. Neither of them seems like the obvious choice.
> 
> Maybe adding a check for VIRTIO_CONSOLE_F_DMA_MEM match
> between device and driver in virtcons_probe() is the lesser evil?
> 
> Regards,
> Sjur

A simplest thing to do is change dev id. rusty?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-09-04 Thread Tejun Heo
Hello, Mikulas, Kent.

On Mon, Sep 03, 2012 at 08:41:00PM -0700, Kent Overstreet wrote:
> On Mon, Sep 03, 2012 at 04:41:37PM -0400, Mikulas Patocka wrote:
> > ... or another possibility - start a timer when something is put to 
> > current->bio_list and use that timer to pop entries off current->bio_list 
> > and submit them to a workqueue. The timer can be cpu-local so only 
> > interrupt masking is required to synchronize against the timer.
> > 
> > This would normally run just like the current kernel and in case of 
> > deadlock, the timer would kick in and resolve the deadlock.
> 
> Ugh. That's a _terrible_ idea.

That's exactly how workqueue rescuers work - rescuers kick in if new
worker creation doesn't succeed in given amount of time.  The
suggested mechanism already makes use of workqueue, so it's already
doing it.  If you can think of a better way to detect the generic
stall condition, please be my guest.

> Remember the old plugging code? You ever have to debug performance
> issues caused by it?

That is not equivalent.  Plugging was kicking in all the time and it
wasn't entirely well-defined what the plugging / unplugging conditions
were.  This type of rescuing for forward-progress guarantee only kicks
in under severe memory pressure and people expect finite latency and
throughput hits under such conditions.  The usual bio / request /
scsi_cmd allocations could be failing under these circumstances and
things could be progressing only thanks to the finite preallocated
pools.  I don't think involving rescue timer would be noticeably
deterimental.

Actually, if the timer approach can reduce the frequency of rescuer
involvement, I think it could actually be better.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/5] net: socket bind to file descriptor introduced

2012-09-04 Thread J. Bruce Fields
On Mon, Aug 20, 2012 at 02:18:13PM +0400, Stanislav Kinsbursky wrote:
> 16.08.2012 07:03, Eric W. Biederman пишет:
> >Stanislav Kinsbursky  writes:
> >
> >>This patch set introduces new socket operation and new system call:
> >>sys_fbind(), which allows to bind socket to opened file.
> >>File to bind to can be created by sys_mknod(S_IFSOCK) and opened by
> >>open(O_PATH).
> >>
> >>This system call is especially required for UNIX sockets, which has name
> >>lenght limitation.
> >>
> >>The following series implements...
> >
> >Hmm.  I just realized this patchset is even sillier than I thought.
> >
> >Stanislav is the problem you are ultimately trying to solve nfs clients
> >in a container connecting to the wrong user space rpciod?
> >
> 
> Hi, Eric.
> The problem you mentioned was the reason why I started to think about this.
> But currently I believe, that limitations in unix sockets connect or
> bind should be removed, because it will be useful it least for CRIU
> project.
> 
> >Aka net/sunrpc/xprtsock.c:xs_setup_local only taking an absolute path
> >and then creating a delayed work item to actually open the unix domain
> >socket?
> >
> >The straight correct and straight forward thing to do appears to be:
> >- Capture the root from current->fs in xs_setup_local.
> >- In xs_local_finish_connect change current->fs.root to the captured
> >   version of root before kernel_connect, and restore current->fs.root
> >   after kernel_connect.
> >
> >It might not be a bad idea to implement open on unix domain sockets in
> >a filesystem as create(AF_LOCAL)+connect() which would allow you to
> >replace __sock_create + kernel_connect with a simple file_open_root.
> >
> 
> I like the idea of introducing new family (AF_LOCAL_AT for example)
> and new sockaddr for connecting or binding from specified root. The
> only thing I'm worrying is passing file descriptor to unix bind or
> connect routine. Because this approach doesn't provide easy way to
> use such family and sockaddr in kernel (like in NFS example).
> 
> >But I think the simple scheme of:
> >struct path old_root;
> >old_root = current->fs.root;
> >kernel_connect(...);
> >current->fs.root = old_root;
> >
> >Is more than sufficient and will remove the need for anything
> >except a purely local change to get nfs clients to connect from
> >containers.
> >
> 
> That was my first idea.

So is this what you're planning on doing now?

> And probably it would be worth to change all
> fs_struct to support sockets with relative path.
> What do you think about it?

I didn't understand the question.  Are you suggesting that changes to
fs_struct would be required to make this work?  I don't see why.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-09-04 Thread Tejun Heo
On Tue, Sep 04, 2012 at 11:55:40AM -0700, Tejun Heo wrote:
> Actually, if the timer approach can reduce the frequency of rescuer
> involvement, I think it could actually be better.

Ooh, it wouldn't.  It's kicking in only after alloc failure.  I don't
know.  I think conditioning it on alloc failure is cleaner and
converting all per-bio allocations to front-pad makes sense.  Using a
timer wouldn't make the mechanism any simpler, right?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kexec/kdump kernel fails to start

2012-09-04 Thread Yinghai Lu
On Tue, Sep 4, 2012 at 10:32 AM, Flavio Leitner  wrote:
> Hi folks,
>
> I have system that no longer boots kdump kernel. Basically,
>
> # echo c > /proc/sysrq-trigger
>
> to dump a vmcore doesn't work. It just hangs after showing the usual
> panic messages. I've bisected the problem and the commit introducing
> the issue is the one below.
>
> Any idea?
>
> commit 722bc6b16771ed80871e1fd81c86d3627dda2ac8
> Author: WANG Cong   2012-03-05 20:05:13
> Committer: Ingo Molnar   2012-03-06 05:38:26
> Parent: 550cf00dbc8ee402bef71628cb71246493dd4500 (Merge tag 
> 'mmc-fixes-for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc)
> Child:  a6fca40f1d7f3e232c9de27c1cebbb9f787fbc4f (x86, tlb: Switch cr3 in 
> leave_mm() only when needed)
> Branches: master, remotes/origin/master
> Follows: v3.3-rc6
> Precedes: v3.5-rc1
>
> x86/mm: Fix the size calculation of mapping tables
>
> For machines that enable PSE, the first 2/4M memory region still uses
> 4K pages, so needs more PTEs in this case, but
> find_early_table_space() doesn't count this.
>
> This patch fixes it.
>
> The bug was found via code review, no misbehavior of the kernel
> was observed.

maybe just revert the offending commit?

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/5] fat: eliminate orphaned inode number allocation

2012-09-04 Thread OGAWA Hirofumi
"J. Bruce Fields"  writes:

> On Wed, Sep 05, 2012 at 02:07:40AM +0900, OGAWA Hirofumi wrote:
>> OGAWA Hirofumi  writes:
>> 
>> > Namjae Jeon  writes:
>> >
>> >> From: Namjae Jeon 
>> >>
>> >> Maintain a list of inode(i_pos) numbers of orphaned inodes (i.e the
>> >> inodes that have been unlinked but still having open file
>> >> descriptors).At file/directory creation time, skip using such i_pos
>> >> values.Removal of the i_pos from the list is done during inode eviction.
>> >
>> > What happens if the directory (has busy entries) was completely removed?
>> >
>> >
>> > And Al's point is important for NFS too. If you want stable ino for NFS,
>> > you never can't change it.
>> 
>> s/never can't/never can/
>
> If vfat exports aren't fixable, maybe we should just remove that
> feature?
>
> I'm afraid that having unfixable half-working vfat exports is just an
> attractive nuisance that causes users and developers to waste their
> time

In historically, it was introduced by Neil Brown, when nfs export
interface was rewritten (I'm not sure what was intended).

Personally, I'm ok to remove it though, it is really personal
opinion. The state would be rather I don't have strong opinion to
remove.

Thanks.
-- 
OGAWA Hirofumi 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] media: v4l2-ctrls: add control for dpcm predictor

2012-09-04 Thread Sakari Ailus
Hi Prabhakar,

Thanks for the patch. I've got a few comments below.

On Tue, Sep 04, 2012 at 11:07:52AM +0530, Prabhakar Lad wrote:
> From: Lad, Prabhakar 
> 
> add V4L2_CID_DPCM_PREDICTOR control of type menu, which
> determines the dpcm predictor. The predictor can be either
> simple or advanced.
> 
> Signed-off-by: Lad, Prabhakar 
> Signed-off-by: Manjunath Hadli 
> Acked-by: Hans Verkuil 
> Reviewed-by: Sylwester Nawrocki 
> Cc: Sakari Ailus 
> Cc: Laurent Pinchart 
> Cc: Mauro Carvalho Chehab 
> Cc: Hans de Goede 
> Cc: Kyungmin Park 
> Cc: Rob Landley 
> ---
> This patches has one checkpatch warning for line over
> 80 characters altough it can be avoided I have kept it
> for consistency.
> 
> Changes for v4:
> 1: Aligned the description to fit appropriately in the
> para tag, pointed by Sylwester.
> 
> Changes for v3:
> 1: Added better explanation for DPCM, pointed by Hans.
> 
> Changes for v2:
> 1: Added documentaion in controls.xml pointed by Sylwester.
> 2: Chnaged V4L2_DPCM_PREDICTOR_ADVANCE to V4L2_DPCM_PREDICTOR_ADVANCED
>pointed by Sakari.
> 
>  Documentation/DocBook/media/v4l/controls.xml |   46 
> +-
>  drivers/media/v4l2-core/v4l2-ctrls.c |9 +
>  include/linux/videodev2.h|5 +++
>  3 files changed, 59 insertions(+), 1 deletions(-)
> 
> diff --git a/Documentation/DocBook/media/v4l/controls.xml 
> b/Documentation/DocBook/media/v4l/controls.xml
> index 93b9c68..ad873ea 100644
> --- a/Documentation/DocBook/media/v4l/controls.xml
> +++ b/Documentation/DocBook/media/v4l/controls.xml
> @@ -4267,7 +4267,51 @@ interface and may change in the future.
>   pixels / second.
>   
> 
> -   
> +   
> +  spanname="id">V4L2_CID_DPCM_PREDICTOR
> + menu
> +   
> +   
> +  Differential pulse-code modulation (DPCM) 
> is a signal
> + encoder that uses the baseline of pulse-code modulation (PCM) but 
> adds some
> + functionalities based on the prediction of the samples of the 
> signal. The input
> + can be an analog signal or a digital signal.
> +
> + If the input is a continuous-time analog signal, it needs to 
> be sampled
> + first so that a discrete-time signal is the input to the DPCM 
> encoder.
> +
> + Simple: take the values of two consecutive samples; if they 
> are analog
> + samples, quantize them; calculate the difference between the first 
> one and the
> + next; the output is the difference, and it can be further entropy 
> coded.
> +
> + Advanced: instead of taking a difference relative to a 
> previous input sample,
> + take the difference relative to the output of a local model of the 
> decoder process;
> + in this option, the difference can be quantized, which allows a 
> good way to
> + incorporate a controlled loss in the encoding.

This is directly from Wikipedia, isn't it?

What comes to the content, DPCM in the context of V4L2 media bus codes, as a
digital interface, is always digital. So there's no need to document it.
Entropy coding is also out of the question: the samples of the currently
defined formats are equal in size.

Another thing what I'm not sure is the definition of the simple and advanced
encoders. I've seen sensors that allow you to choose which one to use, but
the documentation hasn't stated what the actual implementation is. Does TI
documentation do so?

In V4L2 documentation we should state what is common in the hardware
documentation, and that is mostly limited to "simple" and "advanced". I
really don't know enough that I could say what the exact implamentation of
those two are in all of the cases.

I suggest we leave just a few words of the DPCM compression itself (roughly
the factual content of the first paragraph with the exception of the
reference to analogue signal) and a link to Wikipedia.

> + Applying one of these two processes, short-term redundancy 
> (positive correlation of
> + nearby values) of the signal is eliminated; compression ratios on 
> the order of 2 to 4
> + can be achieved if differences are subsequently entropy coded, 
> because the entropy of
> + the difference signal is much smaller than that of the original 
> discrete signal treated
> + as independent samples.For more information about DPCM see  + 
> url="http://en.wikipedia.org/wiki/Differential_pulse-code_modulation";>Wikipedia.
> + 
> +   
> +   
> + 
> +   
> + 
> +  V4L2_DPCM_PREDICTOR_SIMPLE
> +   Predictor type is simple
> + 
> + 
> +   
> V4L2_DPCM_PREDICTOR_ADVANCED
> +   Predictor type is advanced
> + 
> +   
> + 
> +   
> + 
>   
>
>
> diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c 
> b/drivers/media/v4l2-core/v4l2-ctrls.c
> index b6a2ee7

Re: [PATCH V2] block/throttle: Add IO throttled information in blkio.throttle.

2012-09-04 Thread Tejun Heo
Hello, Tao Ma.

On Sat, Sep 01, 2012 at 09:58:43PM +0800, Tao Ma wrote:
> Vivek and I have talked about its usage in my first try. See the thread
> here. https://lkml.org/lkml/2012/5/22/81
> And I am OK to say it again here. In our case, we use flashcache as a
> block device and the bad thing is that flashcache is a bio-based dm
> target and we can't use block io controller here to control the weight
> of different cgroups. So io throttle is chosen. But as io throttle can
> only set a hard upper limit for different instances, it makes the
> control not flexible enough. Say with io controller, if there is no
> requests form the cgroup with weight 1000, a cgroup with 500 can use the
> whole bandwidth of the underlying device. But if we set 1000 iops for
> cgroup A and 500 iops for cgroup B in io throttle, cgroup B can't exceed
> its limit even if cgroup A has no request pending. So if we can export
> the io_queued information out to the system admin, they can write some
> daemon and in the above case, increase the upper limit of cgroup B to
> some number say 1000. It helps us to utilize the device more
> efficiently. Does it make sense to you?

Somewhat, in a pretty twisted way. :P

> > Adding throttle.io_queued could be a bit more consistent?
>
> sorry, I don't know what is your meaning here. You mean some codes like
>   blkg_rwstat_add(&stats_cpu->throttle.io_queude, rw, 1)?

So, there already is io_dispatched, so if you have io_queued, you can
read the two and calculate the difference from userland (reading
io_queued first would probably be better to avoid triggering the
throttled condition spuriously).  That way, you don't have to worry
about synchronizing stats across cpus and it's a simple addition of a
stat conter.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i825xx: fix paging fault on znet_probe()

2012-09-04 Thread David Miller
From: Fengguang Wu 
Date: Sun, 2 Sep 2012 15:25:46 +0800

> In znet_probe(), strncmp() may access beyond 0x10 and
> trigger the below oops in kvm.  Fix it by limiting the loop
> under 0x10-8. I suspect the limit could be further decreased
> to 0x10-sizeof(struct netidblk), however no datasheet at hand..
 ...
> Signed-off-by: Fengguang Wu 

This also makes the code actually match the description in the comment
above the loop :-)

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ACPI: Enable SCI_EMULATE to manually simulate physical hotplug testing.

2012-09-04 Thread Yinghai Lu
On Tue, Sep 4, 2012 at 9:27 AM, Toshi Kani  wrote:
> On Mon, 2012-09-03 at 14:27 -0700, Yinghai Lu wrote:
>> From: Ashok Raj 
>>
>> Emulate an ACPI SCI interrupt to emulate a hot-plug event. Useful
>> for testing ACPI based hot-plug on systems that don't have the
>> necessary firmware support.
>>
>> Enable CONFIG_ACPI_SCI_EMULATE on kernel compile.
>>
>> Now you will notice /sys/kernel/debug/acpi/sci_notify when new kernel is
>> booted.
>>
>> echo "\_SB.PCIB 1" > /sys/kernel/debug/acpi/sci_notify to trigger a hot-add
>> of root bus that is corresponding to PCIB.
>>
>> echo "\_SB.PCIB 3" > /sys/kernel/debug/acpi/sci_notify to trigger a 
>> hot-remove
>> of root bus that is corresponding to PCIB.
>
> Hi Yinghai,
>
> This feature has been very useful.  Thanks for working on this change.
> I have a few comments below.
>
>
>> -v2: Update to current upstream, and remove not related stuff.
>> -v3: According to Len's request, update it to use debugfs.  - Yinghai Lu
>>
>> Signed-off-by: Yinghai Lu 
>> Cc: Len Brown 
>> Cc: linux-a...@vger.kernel.org
>>
>> ===
>> ---
>>  drivers/acpi/Kconfig   |   10 +++
>>  drivers/acpi/Makefile  |1
>>  drivers/acpi/sci_emu.c |  145 
>> +
>>  3 files changed, 156 insertions(+)
>>
>> Index: linux-2.6/drivers/acpi/Kconfig
>> ===
>> --- linux-2.6.orig/drivers/acpi/Kconfig
>> +++ linux-2.6/drivers/acpi/Kconfig
>> @@ -272,6 +272,16 @@ config ACPI_BLACKLIST_YEAR
>> Enter 0 to disable this mechanism and allow ACPI to
>> run by default no matter what the year.  (default)
>>
>> +config ACPI_SCI_EMULATE
>> +bool "ACPI SCI Event Emulation Support"
>> +depends on DEBUG_FS
>> + default n
>> + help
>> +   This will enable your system to emulate sci hotplug event
>> +   notification through proc file system. For example user needs to
>> +   echo "XXX 0" > /sys/kernel/debug/acpi/sci_notify (where, XXX is
>> +   a target ACPI device object name present under \_SB scope).
>> +
>>  config ACPI_DEBUG
>>   bool "Debug Statements"
>>   default n
>> Index: linux-2.6/drivers/acpi/sci_emu.c
>> ===
>> --- /dev/null
>> +++ linux-2.6/drivers/acpi/sci_emu.c
>> @@ -0,0 +1,145 @@
>> +/*
>> + *  Code to emulate SCI interrupt for Hotplug node insertion/removal
>> + */
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include "internal.h"
>> +
>> +#include "acpica/accommon.h"
>> +#include "acpica/acnamesp.h"
>> +#include "acpica/acevents.h"
>> +
>> +#define _COMPONENT   ACPI_SYSTEM_COMPONENT
>> +ACPI_MODULE_NAME("sci_emu");
>> +MODULE_LICENSE("GPL");
>> +
>> +static struct dentry *sci_notify_dentry;
>> +
>> +static void sci_notify_client(char *acpi_name, u32 event)
>> +{
>> + struct acpi_namespace_node *node;
>> + acpi_status status, status1;
>> + acpi_handle hlsb, hsb;
>> + union acpi_operand_object *obj_desc;
>> +
>> + status = acpi_get_handle(NULL, "\\_SB", &hsb);
>> + status1 = acpi_get_handle(hsb, acpi_name, &hlsb);
>
> Why do you obtain hsb for \_SB when acpi_name is supposed to be a full
> path name?  Can you simply specify a NULL like this?
>   status = acpi_get_handle(NULL, acpi_name, &hlsb);

assume those two main function is from ashok.

but assume that could make user omit \_SB_?

>
>
>> + if (ACPI_FAILURE(status) || ACPI_FAILURE(status1)) {
>> + pr_err(PREFIX
>> + "acpi getting handle to <\\_SB.%s> failed inside notify_client\n",
>> + acpi_name);
>> + return;
>> + }
>> +
>> + status = acpi_ut_acquire_mutex(ACPI_MTX_NAMESPACE);
>> + if (ACPI_FAILURE(status)) {
>> + pr_err(PREFIX "Acquiring acpi namespace mutext failed\n");
>> + return;
>> + }
>> +
>> + node = acpi_ns_validate_handle(hlsb);
>> + if (!node) {
>> + acpi_ut_release_mutex(ACPI_MTX_NAMESPACE);
>> + pr_err(PREFIX "Mapping handle to node failed\n");
>> + return;
>> + }
>> +
>> + /*
>> +  * Check for internal object and make sure there is a handler
>> +  * registered for this object
>> +  */
>> + obj_desc = acpi_ns_get_attached_object(node);
>> + if (obj_desc) {
>> + if (obj_desc->common_notify.notify_list[0]) {
>
> Is the above check necessary?  acpi_ev_queue_notify_request() sets up to
> call the global handler, acpi_gbl_global_notify[0], even if the object
> does not have a local handler registered.

Not sure.

maybe Len or other acpi guyes could answer your questions.

Thanks

Yinghai Lu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read 

Re: kexec/kdump kernel fails to start

2012-09-04 Thread Flavio Leitner
On Tue, 4 Sep 2012 12:02:00 -0700
Yinghai Lu  wrote:

> On Tue, Sep 4, 2012 at 10:32 AM, Flavio Leitner  wrote:
> > Hi folks,
> >
> > I have system that no longer boots kdump kernel. Basically,
> >
> > # echo c > /proc/sysrq-trigger
> >
> > to dump a vmcore doesn't work. It just hangs after showing the usual
> > panic messages. I've bisected the problem and the commit introducing
> > the issue is the one below.
> >
> > Any idea?
> >
> > commit 722bc6b16771ed80871e1fd81c86d3627dda2ac8
> > Author: WANG Cong   2012-03-05 20:05:13
> > Committer: Ingo Molnar   2012-03-06 05:38:26
> > Parent: 550cf00dbc8ee402bef71628cb71246493dd4500 (Merge tag 
> > 'mmc-fixes-for-3.3' of 
> > git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc)
> > Child:  a6fca40f1d7f3e232c9de27c1cebbb9f787fbc4f (x86, tlb: Switch cr3 in 
> > leave_mm() only when needed)
> > Branches: master, remotes/origin/master
> > Follows: v3.3-rc6
> > Precedes: v3.5-rc1
> >
> > x86/mm: Fix the size calculation of mapping tables
> >
> > For machines that enable PSE, the first 2/4M memory region still uses
> > 4K pages, so needs more PTEs in this case, but
> > find_early_table_space() doesn't count this.
> >
> > This patch fixes it.
> >
> > The bug was found via code review, no misbehavior of the kernel
> > was observed.
> 
> maybe just revert the offending commit?

I don't know where the 4K pages were noticed. Here is the
dmesg output passing 'debug':

[0.00] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[0.00] last_pfn = 0xbf800 max_arch_pfn = 0x4
[0.00] initial memory mapped : 0 - 2000
[0.00] Base memory trampoline at [88098000] 98000 size 20480
[0.00] init_memory_mapping: -bf80
[0.00]  00 - 00bf80 page 2M
[0.00] kernel direct mapping tables up to bf80 @ 1fa0-2000
[0.00] init_memory_mapping: 0001-00044000
[0.00]  01 - 044000 page 2M
[0.00] kernel direct mapping tables up to 44000 @ bdaab000-bf4bd000
[0.00] RAMDISK: 352c8000 - 3695c000

so, it appears that on my system, the pages are 2M.
I will try moving the extra accounting to be inside of CONFIG_X86_32.

fbl
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kexec/kdump kernel fails to start

2012-09-04 Thread Yinghai Lu
On Tue, Sep 4, 2012 at 12:17 PM, Flavio Leitner  wrote:
> On Tue, 4 Sep 2012 12:02:00 -0700
> [0.00] x86 PAT enabled: cpu 0, old 0x7040600070406, new 
> 0x7010600070106
> [0.00] last_pfn = 0xbf800 max_arch_pfn = 0x4
> [0.00] initial memory mapped : 0 - 2000
> [0.00] Base memory trampoline at [88098000] 98000 size 20480
> [0.00] init_memory_mapping: -bf80
> [0.00]  00 - 00bf80 page 2M
> [0.00] kernel direct mapping tables up to bf80 @ 1fa0-2000
> [0.00] init_memory_mapping: 0001-00044000
> [0.00]  01 - 044000 page 2M
> [0.00] kernel direct mapping tables up to 44000 @ 
> bdaab000-bf4bd000
> [0.00] RAMDISK: 352c8000 - 3695c000
>
BTW, can you please try our new init_memory_mapping clean up at

git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
for-x86-mm

hope it could make your kdump working.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/5] fat: eliminate orphaned inode number allocation

2012-09-04 Thread J. Bruce Fields
On Wed, Sep 05, 2012 at 04:02:13AM +0900, OGAWA Hirofumi wrote:
> "J. Bruce Fields"  writes:
> 
> > On Wed, Sep 05, 2012 at 02:07:40AM +0900, OGAWA Hirofumi wrote:
> >> OGAWA Hirofumi  writes:
> >> 
> >> > Namjae Jeon  writes:
> >> >
> >> >> From: Namjae Jeon 
> >> >>
> >> >> Maintain a list of inode(i_pos) numbers of orphaned inodes (i.e the
> >> >> inodes that have been unlinked but still having open file
> >> >> descriptors).At file/directory creation time, skip using such i_pos
> >> >> values.Removal of the i_pos from the list is done during inode eviction.
> >> >
> >> > What happens if the directory (has busy entries) was completely removed?
> >> >
> >> >
> >> > And Al's point is important for NFS too. If you want stable ino for NFS,
> >> > you never can't change it.
> >> 
> >> s/never can't/never can/
> >
> > If vfat exports aren't fixable, maybe we should just remove that
> > feature?
> >
> > I'm afraid that having unfixable half-working vfat exports is just an
> > attractive nuisance that causes users and developers to waste their
> > time
> 
> In historically, it was introduced by Neil Brown, when nfs export
> interface was rewritten (I'm not sure what was intended).
> 
> Personally, I'm ok to remove it though, it is really personal
> opinion. The state would be rather I don't have strong opinion to
> remove.

Neil, any opinion?

If we can document circumstances under which nfs exports of fat
filesystems are reliable, fine.

Otherwise I'd rather just be clear that we don't support it.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-09-04 Thread Mikulas Patocka


On Mon, 3 Sep 2012, Kent Overstreet wrote:

> On Mon, Sep 03, 2012 at 04:41:37PM -0400, Mikulas Patocka wrote:
> > ... or another possibility - start a timer when something is put to 
> > current->bio_list and use that timer to pop entries off current->bio_list 
> > and submit them to a workqueue. The timer can be cpu-local so only 
> > interrupt masking is required to synchronize against the timer.
> > 
> > This would normally run just like the current kernel and in case of 
> > deadlock, the timer would kick in and resolve the deadlock.
> 
> Ugh. That's a _terrible_ idea.
> 
> Remember the old plugging code? You ever have to debug performance
> issues caused by it?

Yes, I do remember it (and I fixed one bug that resulted in missed unplug 
and degraded performance).

But currently, deadlocks due to exhausted mempools and bios being stalled 
in current->bio_list don't happen (or do happen below so rarely that they 
aren't reported).

If we add a timer, it will turn a deadlock into an i/o delay, but it can't 
make things any worse.

BTW. can these new-style timerless plugs introduce deadlocks too? What 
happens when some bios are indefinitely delayed because their requests are 
held in a plug and a mempool runs out?

> > > I could be convinced, but right now I prefer my solution.
> > 
> > It fixes bio allocation problem, but not other similar mempool problems in 
> > dm and md.
> 
> I looked a bit more, and actually I think the rest of the problem is
> pretty limited in scope - most of those mempool allocations are per
> request, not per split.
> 
> I'm willing to put some time into converting dm/md over to bioset's
> front_pad. I'm having to learn the code for the immutable biovec work,
> anyways.

Currently, dm targets allocate request-specific data from target-specific 
mempool. mempools are in dm-crypt, dm-delay, dm-mirror, dm-snapshot, 
dm-thin, dm-verity. You can change it to allocate request-specific data 
with the bio.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] staging/rts_pstor: Use pr_ printks in rtsx.c

2012-09-04 Thread Greg Kroah-Hartman
On Wed, Aug 29, 2012 at 10:29:26AM +0900, Toshiaki Yamane wrote:
> fixed some checkpatch warnings.
> -WARNING: Prefer pr_info(... to printk(KERN_INFO, ...
> -WARNING: Prefer pr_err(... to printk(KERN_ERR, ...

No, please use dev_info() and dev_err() instead wherever possible.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] staging/rts_pstor: Use pr_ printks in debug.h

2012-09-04 Thread Greg Kroah-Hartman
On Wed, Aug 29, 2012 at 10:30:15AM +0900, Toshiaki Yamane wrote:
> fixed below checkpatch warnings.
> -WARNING: Prefer pr_debug(... to printk(KERN_DEBUG, ...

No, please use dev_dbg() instead.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] staging/rts_pstor: remove braces {} in sd.c

2012-09-04 Thread Greg Kroah-Hartman
On Sat, Sep 01, 2012 at 10:43:00PM +0900, Toshiaki Yamane wrote:
> fixed below checkpatch warnings.
> -WARNING: braces {} are not necessary for single statement blocks
> -WARNING: braces {} are not necessary for any arm of this statement
> 
> Signed-off-by: Toshiaki Yamane 
> ---
>  drivers/staging/rts_pstor/sd.c | 1112 
> +---
>  1 file changed, 469 insertions(+), 643 deletions(-)

Why is the object file size changing with this patch applied?  That
implies that something went wrong with your patch, care to redo it in a
format that I can properly review it?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-09-04 Thread Vivek Goyal
On Tue, Sep 04, 2012 at 03:26:19PM -0400, Mikulas Patocka wrote:

[..]
> BTW. can these new-style timerless plugs introduce deadlocks too? What 
> happens when some bios are indefinitely delayed because their requests are 
> held in a plug and a mempool runs out?

I think they will not deadlock because these on stack bios/requests are
flushed/dispatched when process schedules out. So if a submitter blocks
on a mempool, it will be scheduled out and requests on plug will be 
dispatched.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] hwmon: add Maxim MAX197 support

2012-09-04 Thread Vivien Didelot
Hi,

On Tue, 2012-09-04 at 10:20 -0700, Guenter Roeck wrote:
> > > > > Applied to -next.
> > > > > 
> > > > 
> > > > Hm, now it's probably to late, but for this kind of general
> purpose ADC the IIO
> > > > framework is in my opinion the better place to add support for
> it.
> > > > 
> > > Possibly. If someone writes an iio driver for it, and everyone
> agrees,
> > > we can drop it from hwmon at that time.
> > I can write one but unfortunately need some testing as I don't have
> this
> > hardware.
> 
> Vivien, any thoughts/comments/feedback ? 

I don't see any particular advantage of including it in the IIO
subsystem. However, if everyone agrees that it should be in the IIO
subsystem, and someone writes a driver for it, I might be able to test
it. Given my current schedule, it's likely to take some time though.

Thanks,
Vivien

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-09-04 Thread Kent Overstreet
On Tue, Sep 04, 2012 at 11:55:40AM -0700, Tejun Heo wrote:
> Hello, Mikulas, Kent.
> 
> On Mon, Sep 03, 2012 at 08:41:00PM -0700, Kent Overstreet wrote:
> > On Mon, Sep 03, 2012 at 04:41:37PM -0400, Mikulas Patocka wrote:
> > > ... or another possibility - start a timer when something is put to 
> > > current->bio_list and use that timer to pop entries off current->bio_list 
> > > and submit them to a workqueue. The timer can be cpu-local so only 
> > > interrupt masking is required to synchronize against the timer.
> > > 
> > > This would normally run just like the current kernel and in case of 
> > > deadlock, the timer would kick in and resolve the deadlock.
> > 
> > Ugh. That's a _terrible_ idea.
> 
> That's exactly how workqueue rescuers work - rescuers kick in if new
> worker creation doesn't succeed in given amount of time.  The
> suggested mechanism already makes use of workqueue, so it's already
> doing it.  If you can think of a better way to detect the generic
> stall condition, please be my guest.
> 
> > Remember the old plugging code? You ever have to debug performance
> > issues caused by it?
> 
> That is not equivalent.  Plugging was kicking in all the time and it
> wasn't entirely well-defined what the plugging / unplugging conditions
> were.  This type of rescuing for forward-progress guarantee only kicks
> in under severe memory pressure and people expect finite latency and
> throughput hits under such conditions.  The usual bio / request /
> scsi_cmd allocations could be failing under these circumstances and
> things could be progressing only thanks to the finite preallocated
> pools.  I don't think involving rescue timer would be noticeably
> deterimental.
> 
> Actually, if the timer approach can reduce the frequency of rescuer
> involvement, I think it could actually be better.

Ok, that was an overly harsh, emotional response. But I still hate the
idea.

You want to point me at the relevant workqueue code? I'd really like to
see what you did there, it's entirely possible you're aware of some
issue I'm not but if not I'd like to take a stab at it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-09-04 Thread Kent Overstreet
On Tue, Sep 04, 2012 at 12:01:19PM -0700, Tejun Heo wrote:
> On Tue, Sep 04, 2012 at 11:55:40AM -0700, Tejun Heo wrote:
> > Actually, if the timer approach can reduce the frequency of rescuer
> > involvement, I think it could actually be better.
> 
> Ooh, it wouldn't.  It's kicking in only after alloc failure.  I don't
> know.  I think conditioning it on alloc failure is cleaner and
> converting all per-bio allocations to front-pad makes sense.  Using a
> timer wouldn't make the mechanism any simpler, right?

Exactly
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] Fix ACPI BGRT support for images located in EFI boot services memory

2012-09-04 Thread Josh Triplett
On Tue, Sep 04, 2012 at 11:10:54AM -0700, H. Peter Anvin wrote:
> On 09/04/2012 10:59 AM, Josh Triplett wrote:
> >
> >Unfortunately not.  We need enough of ACPI available to go read the
> >BGRT to know what to copy, so we need to defer freeing boot services
> >code until after we initialize ACPI (and thus everything ACPI needs,
> >which includes EFI since ACPI looks for root tables there).
> >
> >>I wouldn't be surprised if some implementations got really cranky if
> >>we accessed boot services data after we installed a new virtual memory
> >>map.
> >
> >Note that I've carefully accessed the boot services data *through* the
> >new virtual memory map, which should work fine.
> >
> 
> There are some platforms which have bugs in this area, so there are
> other reasons to defer freeing up boot memory until as late in the
> boot process as we can possibly get away with.
> 
> free_initmem() is presuambly the place that makes most sense.

You're suggesting a call from free_initmem() to
efi_free_boot_services()?  Or, from init_post() right before the call to
free_initmem()?

> This
> is EFI-specific but not x86-specific, let's not commingle those
> concepts, please...

init/main.c already calls the x86-specific efi_enter_virtual_mode
(defined in arch/x86/platform/efi/efi.c), and I split the call to the
x86-specific efi_free_boot_services out of that.  Neither of those
functions exists on non-x86 platforms, and thus I mirrored the #ifdef
currently wrapped around efi_enter_virtual_mode for the new call to
efi_free_boot_services.  While it might make sense for that code to
exist on non-x86 EFI platforms, it currently doesn't.  At best, I could
add static inline stubs to linux/efi.h for those functions to avoid the
ifdefs, but as far as I can tell the same issue applies to quite a few
more functions in efi.h.

Would you like me to add the static inline stubs for the couple of
functions called from init/main.c, or leave the #ifdefs?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] staging/ozwpan: Fix zero address check in oz_set_active_pd

2012-09-04 Thread Greg KH
On Mon, Sep 03, 2012 at 09:54:39PM +0200, Andi Kleen wrote:
> > Its already fixed by this patch :-
> > 
> > http://driverdev.linuxdriverproject.org/pipermail/devel/2012-August/029734.html
> 
> Should be in 3.6 then as it's a bug fix.

I agree, will do.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/4] pinctrl: add samsung pinctrl and gpiolib driver

2012-09-04 Thread Thomas Abraham
On 3 September 2012 16:44, Linus Walleij  wrote:
> On Thu, Aug 23, 2012 at 1:15 PM, Thomas Abraham
>  wrote:
>
>> Add a new device tree enabled pinctrl and gpiolib driver for Samsung
>> SoC's. This driver provides a common and extensible framework for all
>> Samsung SoC's to interface with the pinctrl and gpiolib subsystems. This
>> driver supports only device tree based instantiation and hence can be
>> used only on those Samsung platforms that have device tree enabled.
>>
>> This driver is split into two parts: the pinctrl interface and the gpiolib
>> interface. The pinctrl interface registers pinctrl devices with the pinctrl
>> subsystem and gpiolib interface registers gpio chips with the gpiolib
>> subsystem. The information about the pins, pin groups, pin functions and
>> gpio chips, which are SoC specific, are parsed from device tree node.
>>
>> Cc: Linus Walleij 
>> Cc: Kukjin Kim 
>> Signed-off-by: Thomas Abraham 
>
> Looks good to me, I saw Stephen had some minor comments and
> I expect that you probably fix them before applying to the Samsung
> tree so:
> Reviewed-by: Linus Walleij 
>
> Feel free to push this through ARM SoC, I guess that's the plan?

Hi Linus,

Thanks for reviewing the Samsung pinctrl driver patches. I will do the
changes that Stephen has listed and resubmit. I will request Samsung
maintainer to consider the support for pinctrl driver for 3.7.

Thanks,
Thomas.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the final tree (net-next tree related)

2012-09-04 Thread Jerry Chu
On Tue, Sep 4, 2012 at 11:20 AM, David Miller  wrote:
> From: Stephen Rothwell 
> Date: Tue, 4 Sep 2012 16:58:53 +1000
>
>> net/built-in.o: In function `tcp_fastopen_ctx_free':
>> tcp_fastopen.c:(.text+0x5cc5c): undefined reference to `crypto_destroy_tfm'
>> net/built-in.o: In function `tcp_fastopen_reset_cipher':
>> (.text+0x5): undefined reference to `crypto_alloc_base'
>> net/built-in.o: In function `tcp_fastopen_reset_cipher':
>> (.text+0x5cd6c): undefined reference to `crypto_destroy_tfm'
>>
>> Presumably caused by commit 104671636897 ("tcp: TCP Fast Open Server -
>> header & support functions") from the net-next tree.  I assume that some
>> dependency on the CRYPTO infrastructure is missing.
>
> Thanks for the report, I've pushed the following change to net-next
> which should address this:
>
> 
> [PATCH] net: Add INET dependency on aes crypto for the sake of TCP fastopen.
>
> Stephen Rothwell says:
>
> 
> After merging the final tree, today's linux-next build (powerpc
> ppc44x_defconfig) failed like this:
>
> net/built-in.o: In function `tcp_fastopen_ctx_free':
> tcp_fastopen.c:(.text+0x5cc5c): undefined reference to `crypto_destroy_tfm'
> net/built-in.o: In function `tcp_fastopen_reset_cipher':
> (.text+0x5): undefined reference to `crypto_alloc_base'
> net/built-in.o: In function `tcp_fastopen_reset_cipher':
> (.text+0x5cd6c): undefined reference to `crypto_destroy_tfm'
>
> Presumably caused by commit 104671636897 ("tcp: TCP Fast Open Server -
> header & support functions") from the net-next tree.  I assume that some
> dependency on the CRYPTO infrastructure is missing.
>
> I have reverted commit 1bed966cc3bd ("Merge branch
> 'tcp_fastopen_server'") for today.
> 
>
> Reported-by: Stephen Rothwell 
> Signed-off-by: David S. Miller 
> ---
>  net/Kconfig |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/net/Kconfig b/net/Kconfig
> index 245831b..30b48f5 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -52,6 +52,8 @@ source "net/iucv/Kconfig"
>
>  config INET
> bool "TCP/IP networking"
> +   select CRYPTO
> +   select CRYPTO_AES
> ---help---
>   These are the protocols used on the Internet and on most local
>   Ethernets. It is highly recommended to say Y here (this will enlarge
> --
> 1.7.7.6
>

Thanks for fixing this, David. (Sorry for missing the dependency.)

Jerry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] dm: Use bioset's front_pad for dm_target_io

2012-09-04 Thread Kent Overstreet
On Tue, Sep 04, 2012 at 03:26:19PM -0400, Mikulas Patocka wrote:
> 
> 
> On Mon, 3 Sep 2012, Kent Overstreet wrote:
> 
> > On Mon, Sep 03, 2012 at 04:41:37PM -0400, Mikulas Patocka wrote:
> > > ... or another possibility - start a timer when something is put to 
> > > current->bio_list and use that timer to pop entries off current->bio_list 
> > > and submit them to a workqueue. The timer can be cpu-local so only 
> > > interrupt masking is required to synchronize against the timer.
> > > 
> > > This would normally run just like the current kernel and in case of 
> > > deadlock, the timer would kick in and resolve the deadlock.
> > 
> > Ugh. That's a _terrible_ idea.
> > 
> > Remember the old plugging code? You ever have to debug performance
> > issues caused by it?
> 
> Yes, I do remember it (and I fixed one bug that resulted in missed unplug 
> and degraded performance).
> 
> But currently, deadlocks due to exhausted mempools and bios being stalled 
> in current->bio_list don't happen (or do happen below so rarely that they 
> aren't reported).
> 
> If we add a timer, it will turn a deadlock into an i/o delay, but it can't 
> make things any worse.

This is all true. I'm not arguing your solution wouldn't _work_... I'd
try and give some real reasoning for my objections but it'll take me
awhile to figure out how to coherently explain it and I'm very sleep
deprived.

> Currently, dm targets allocate request-specific data from target-specific 
> mempool. mempools are in dm-crypt, dm-delay, dm-mirror, dm-snapshot, 
> dm-thin, dm-verity. You can change it to allocate request-specific data 
> with the bio.

I wrote a patch for dm_target_io last night. I think I know an easy way
to go about converting the rest but it'll probably have to wait until
I'm further along with my immutable bvec stuff.

Completely untested patch below:


commit 8754349145edfc791450d3ad54c19f0f3715c86c
Author: Kent Overstreet 
Date:   Tue Sep 4 06:17:56 2012 -0700

dm: Use bioset's front_pad for dm_target_io

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f2eb730..3cf39b0 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -71,6 +71,7 @@ struct dm_target_io {
struct dm_io *io;
struct dm_target *ti;
union map_info info;
+   struct bio clone;
 };
 
 /*
@@ -174,7 +175,7 @@ struct mapped_device {
 * io objects are allocated from here.
 */
mempool_t *io_pool;
-   mempool_t *tio_pool;
+   mempool_t *rq_tio_pool;
 
struct bio_set *bs;
 
@@ -214,15 +215,8 @@ struct dm_md_mempools {
 
 #define MIN_IOS 256
 static struct kmem_cache *_io_cache;
-static struct kmem_cache *_tio_cache;
 static struct kmem_cache *_rq_tio_cache;
 
-/*
- * Unused now, and needs to be deleted. But since io_pool is overloaded and 
it's
- * still used for _io_cache, I'm leaving this for a later cleanup
- */
-static struct kmem_cache *_rq_bio_info_cache;
-
 static int __init local_init(void)
 {
int r = -ENOMEM;
@@ -232,22 +226,13 @@ static int __init local_init(void)
if (!_io_cache)
return r;
 
-   /* allocate a slab for the target ios */
-   _tio_cache = KMEM_CACHE(dm_target_io, 0);
-   if (!_tio_cache)
-   goto out_free_io_cache;
-
_rq_tio_cache = KMEM_CACHE(dm_rq_target_io, 0);
if (!_rq_tio_cache)
-   goto out_free_tio_cache;
-
-   _rq_bio_info_cache = KMEM_CACHE(dm_rq_clone_bio_info, 0);
-   if (!_rq_bio_info_cache)
-   goto out_free_rq_tio_cache;
+   goto out_free_io_cache;
 
r = dm_uevent_init();
if (r)
-   goto out_free_rq_bio_info_cache;
+   goto out_free_rq_tio_cache;
 
_major = major;
r = register_blkdev(_major, _name);
@@ -261,12 +246,8 @@ static int __init local_init(void)
 
 out_uevent_exit:
dm_uevent_exit();
-out_free_rq_bio_info_cache:
-   kmem_cache_destroy(_rq_bio_info_cache);
 out_free_rq_tio_cache:
kmem_cache_destroy(_rq_tio_cache);
-out_free_tio_cache:
-   kmem_cache_destroy(_tio_cache);
 out_free_io_cache:
kmem_cache_destroy(_io_cache);
 
@@ -275,9 +256,7 @@ out_free_io_cache:
 
 static void local_exit(void)
 {
-   kmem_cache_destroy(_rq_bio_info_cache);
kmem_cache_destroy(_rq_tio_cache);
-   kmem_cache_destroy(_tio_cache);
kmem_cache_destroy(_io_cache);
unregister_blkdev(_major, _name);
dm_uevent_exit();
@@ -461,20 +440,15 @@ static void free_io(struct mapped_device *md, struct 
dm_io *io)
mempool_free(io, md->io_pool);
 }
 
-static void free_tio(struct mapped_device *md, struct dm_target_io *tio)
-{
-   mempool_free(tio, md->tio_pool);
-}
-
 static struct dm_rq_target_io *alloc_rq_tio(struct mapped_device *md,
gfp_t gfp_mask)
 {
-   return mempool_alloc(md->tio_pool, gfp_mask);
+   return mempool_alloc(md->rq_tio_pool, gfp_mask);
 }
 
 static void free_rq_tio(struct dm_r

Re: [PATCH] net: Providing protocol type via system.sockprotoname xattr of /proc/PID/fd entries

2012-09-04 Thread David Miller
From: Masatake YAMATO 
Date: Thu, 30 Aug 2012 05:44:29 +0900

> lsof reports some of socket descriptors as "can't identify protocol" like:
> 
> [yamato@localhost]/tmp% sudo lsof | grep dbus | grep iden
> dbus-daem   652  dbus6u sock ... 17812 can't identify 
> protocol
> dbus-daem   652  dbus   34u sock ... 24689 can't identify 
> protocol
> dbus-daem   652  dbus   42u sock ... 24739 can't identify 
> protocol
> dbus-daem   652  dbus   48u sock ... 22329 can't identify 
> protocol
> ...
> 
> lsof cannot resolve the protocol used in a socket because procfs
> doesn't provide the map between inode number on sockfs and protocol
> type of the socket.
> 
> For improving the situation this patch adds an extended attribute named
> 'system.sockprotoname' in which the protocol name for
> /proc/PID/fd/SOCKET is stored. So lsof can know the protocol for a
> given /proc/PID/fd/SOCKET with getxattr system call.
> 
> A few weeks ago I submitted a patch for the same purpose. The patch
> was introduced /proc/net/sockfs which enumerates inodes and protocols
> of all sockets alive on a system. However, it was rejected because (1)
> a global lock was needed, and (2) the layout of struct socket was
> changed with the patch.
> 
> This patch doesn't use any global lock; and doesn't change the layout
> of any structs.
> 
> In this patch, a protocol name is stored to dentry->d_name of sockfs
> when new socket is associated with a file descriptor. Before this
> patch dentry->d_name was not used; it was just filled with empty
> string. lsof may use an extended attribute named
> 'system.sockprotoname' to retrieve the value of dentry->d_name.
> 
> It is nice if we can see the protocol name with ls -l
> /proc/PID/fd. However, "socket:[#INODE]", the name format returned
> from sockfs_dname() was already defined. To keep the compatibility
> between kernel and user land, the extended attribute is used to
> prepare the value of dentry->d_name.
> 
> Signed-off-by: Masatake YAMATO 

This looks a lot more reasonable than your previous attempt.

Applied to net-next, thanks a lot.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kexec/kdump kernel fails to start

2012-09-04 Thread Flavio Leitner
On Tue, 4 Sep 2012 12:20:14 -0700
Yinghai Lu  wrote:

> On Tue, Sep 4, 2012 at 12:17 PM, Flavio Leitner  wrote:
> > On Tue, 4 Sep 2012 12:02:00 -0700
> > [0.00] x86 PAT enabled: cpu 0, old 0x7040600070406, new 
> > 0x7010600070106
> > [0.00] last_pfn = 0xbf800 max_arch_pfn = 0x4
> > [0.00] initial memory mapped : 0 - 2000
> > [0.00] Base memory trampoline at [88098000] 98000 size 20480
> > [0.00] init_memory_mapping: -bf80
> > [0.00]  00 - 00bf80 page 2M
> > [0.00] kernel direct mapping tables up to bf80 @ 
> > 1fa0-2000
> > [0.00] init_memory_mapping: 0001-00044000
> > [0.00]  01 - 044000 page 2M
> > [0.00] kernel direct mapping tables up to 44000 @ 
> > bdaab000-bf4bd000
> > [0.00] RAMDISK: 352c8000 - 3695c000
> >

Alright, moving the extra accounting to be inside of CONFIG_X86_32 works out.

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index e0e6990..63e6a5c 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -60,10 +60,10 @@ static void __init find_early_table_space(struct map_range 
*mr, unsigned long en
extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
 #ifdef CONFIG_X86_32
extra += PMD_SIZE;
-#endif
/* The first 2/4M doesn't use large pages. */
if (mr->start < PMD_SIZE)
extra += mr->end - mr->start;
+#endif
 
ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
} else

> BTW, can you please try our new init_memory_mapping clean up at
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> for-x86-mm
> 
> hope it could make your kdump working.

I will give a try.
fbl

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 0/3] promote zcache from staging

2012-09-04 Thread Seth Jennings
zcache is the remaining piece of code required to support in-kernel
memory compression.  The other two features, cleancache and frontswap,
have been promoted to mainline in 3.0 and 3.5 respectively.  This
patchset promotes zcache from the staging tree to mainline.

Based on the level of activity and contributions we're seeing from a
diverse set of people and interests, I think zcache has matured to the
point where it makes sense to promote this out of staging.

Overview

zcache is a backend to frontswap and cleancache that accepts pages from
those mechanisms and compresses them, leading to reduced I/O caused by
swap and file re-reads.  This is very valuable in shared storage situations
to reduce load on things like SANs.  Also, in the case of slow backing/swap
devices, zcache can also yield a performance gain.

In-Kernel Memory Compression Overview:

 swap subsystempage cache
+  +
frontswap  cleancache
+  +
zcache frontswap glue  zcache cleancache glue
+  +
+-++
  +
zcache/tmem core
  +
+-++
+  +
 zsmalloc zbud

Everything below the frontswap/cleancache layer is current inside the
zcache driver expect for zsmalloc which is a shared between zcache and
another memory compression driver, zram.

Since zcache is dependent on zsmalloc, it is also being promoted by this
patchset.

For information on zsmalloc and the rationale behind it's design and use
cases verses already existing allocators in the kernel:

https://lkml.org/lkml/2012/1/9/386

zsmalloc is the allocator used by zcache to store persistent pages that
comes from frontswap, as opposed to zbud which is the (internal) allocator
used for ephemeral pages from cleancache.

zsmalloc uses many fields of the page struct to create it's conceptual
high-order page called a zspage.  Exactly which fields are used and for
what purpose is documented at the top of the zsmalloc .c file.  Because
zsmalloc uses struct page extensively, Andrew advised that the
promotion location be mm/:

https://lkml.org/lkml/2012/1/20/308

Some benchmarking numbers demonstrating the I/O saving that can be had
with zcache:

https://lkml.org/lkml/2012/3/22/383

Dan's presentation at LSF/MM this year on zcache:

http://oss.oracle.com/projects/tmem/dist/documentation/presentations/LSFMM12-zcache-final.pdf

There was a recent thread about cleancache memory corruption that is
resolved by this patch that should be making it into linux-next via
Greg very soon:

https://lkml.org/lkml/2012/8/29/253

Changlog:
v2:
    * rebased to next-20120904
* removed already accepted patch from patchset

Seth Jennings (3):
  zsmalloc: promote to mm/
  drivers: add memory management driver class
  zcache: promote to drivers/mm/

 drivers/Kconfig|2 ++
 drivers/Makefile   |1 +
 drivers/mm/Kconfig |   13 +
 drivers/mm/Makefile|1 +
 drivers/{staging => mm}/zcache/Makefile|0
 drivers/{staging => mm}/zcache/tmem.c  |0
 drivers/{staging => mm}/zcache/tmem.h  |0
 drivers/{staging => mm}/zcache/zcache-main.c   |4 ++--
 drivers/staging/Kconfig|4 
 drivers/staging/Makefile   |2 --
 drivers/staging/zcache/Kconfig |   11 ---
 drivers/staging/zram/zram_drv.h|3 +--
 drivers/staging/zsmalloc/Kconfig   |   10 --
 drivers/staging/zsmalloc/Makefile  |3 ---
 .../staging/zsmalloc => include/linux}/zsmalloc.h  |0
 mm/Kconfig |   18 ++
 mm/Makefile|1 +
 .../zsmalloc/zsmalloc-main.c => mm/zsmalloc.c  |3 +--
 18 files changed, 40 insertions(+), 36 deletions(-)
 create mode 100644 drivers/mm/Kconfig
 create mode 100644 drivers/mm/Makefile
 rename drivers/{staging => mm}/zcache/Makefile (100%)
 rename drivers/{staging => mm}/zcache/tmem.c (100%)
 rename drivers/{staging => mm}/zcache/tmem.h (100%)
 rename drivers/{staging => mm}/zcache/zcache-main.c (99%)
 delete mode 100644 drivers/staging/zcache/Kconfig
 delete mode 100644 drivers/staging/zsmalloc/Kconfig
 delete mode 100644 drivers/staging/zsmalloc/Makefile
 rename {drivers/staging/zsmalloc => include/linux}/zsmalloc.h (100%)
 rename drivers/staging/zsmalloc/zsmalloc-main.c => mm/zsmalloc.c (99%)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at 

[PATCH v2 1/3] zsmalloc: promote to mm/

2012-09-04 Thread Seth Jennings
This patch promotes the slab-based zsmalloc memory allocator
from the staging tree to mm/

zcache depends on this allocator for storing compressed RAM pages
in an efficient way under system wide memory pressure where
high-order (greater than 0) page allocation are very likely to
fail.

For more information on zsmalloc and its internals, read the
documentation at the top of the zsmalloc c file.

Signed-off-by: Seth Jennings 
---
 drivers/staging/Kconfig|2 --
 drivers/staging/Makefile   |1 -
 drivers/staging/zcache/zcache-main.c   |4 ++--
 drivers/staging/zram/zram_drv.h|3 +--
 drivers/staging/zsmalloc/Kconfig   |   10 --
 drivers/staging/zsmalloc/Makefile  |3 ---
 .../staging/zsmalloc => include/linux}/zsmalloc.h  |0
 mm/Kconfig |   18 ++
 mm/Makefile|1 +
 .../zsmalloc/zsmalloc-main.c => mm/zsmalloc.c  |3 +--
 10 files changed, 23 insertions(+), 22 deletions(-)
 delete mode 100644 drivers/staging/zsmalloc/Kconfig
 delete mode 100644 drivers/staging/zsmalloc/Makefile
 rename {drivers/staging/zsmalloc => include/linux}/zsmalloc.h (100%)
 rename drivers/staging/zsmalloc/zsmalloc-main.c => mm/zsmalloc.c (99%)

diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index e3402d5..b7f7bc7 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -78,8 +78,6 @@ source "drivers/staging/zram/Kconfig"
 
 source "drivers/staging/zcache/Kconfig"
 
-source "drivers/staging/zsmalloc/Kconfig"
-
 source "drivers/staging/wlags49_h2/Kconfig"
 
 source "drivers/staging/wlags49_h25/Kconfig"
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index 3be59d0..ad74bee 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -34,7 +34,6 @@ obj-$(CONFIG_DX_SEP)+= sep/
 obj-$(CONFIG_IIO)  += iio/
 obj-$(CONFIG_ZRAM) += zram/
 obj-$(CONFIG_ZCACHE)   += zcache/
-obj-$(CONFIG_ZSMALLOC) += zsmalloc/
 obj-$(CONFIG_WLAGS49_H2)   += wlags49_h2/
 obj-$(CONFIG_WLAGS49_H25)  += wlags49_h25/
 obj-$(CONFIG_FB_SM7XX) += sm7xxfb/
diff --git a/drivers/staging/zcache/zcache-main.c 
b/drivers/staging/zcache/zcache-main.c
index 52b43b7..34b2c5c 100644
--- a/drivers/staging/zcache/zcache-main.c
+++ b/drivers/staging/zcache/zcache-main.c
@@ -32,9 +32,9 @@
 #include 
 #include 
 #include 
-#include "tmem.h"
+#include 
 
-#include "../zsmalloc/zsmalloc.h"
+#include "tmem.h"
 
 #ifdef CONFIG_CLEANCACHE
 #include 
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 572c0b1..f6d0925 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -17,8 +17,7 @@
 
 #include 
 #include 
-
-#include "../zsmalloc/zsmalloc.h"
+#include 
 
 /*
  * Some arbitrary value. This is just to catch
diff --git a/drivers/staging/zsmalloc/Kconfig b/drivers/staging/zsmalloc/Kconfig
deleted file mode 100644
index 9084565..000
--- a/drivers/staging/zsmalloc/Kconfig
+++ /dev/null
@@ -1,10 +0,0 @@
-config ZSMALLOC
-   tristate "Memory allocator for compressed pages"
-   default n
-   help
- zsmalloc is a slab-based memory allocator designed to store
- compressed RAM pages.  zsmalloc uses virtual memory mapping
- in order to reduce fragmentation.  However, this results in a
- non-standard allocator interface where a handle, not a pointer, is
- returned by an alloc().  This handle must be mapped in order to
- access the allocated space.
diff --git a/drivers/staging/zsmalloc/Makefile 
b/drivers/staging/zsmalloc/Makefile
deleted file mode 100644
index b134848..000
--- a/drivers/staging/zsmalloc/Makefile
+++ /dev/null
@@ -1,3 +0,0 @@
-zsmalloc-y := zsmalloc-main.o
-
-obj-$(CONFIG_ZSMALLOC) += zsmalloc.o
diff --git a/drivers/staging/zsmalloc/zsmalloc.h b/include/linux/zsmalloc.h
similarity index 100%
rename from drivers/staging/zsmalloc/zsmalloc.h
rename to include/linux/zsmalloc.h
diff --git a/mm/Kconfig b/mm/Kconfig
index d5c8019..2586b66 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -411,3 +411,21 @@ config FRONTSWAP
  and swap data is stored as normal on the matching swap device.
 
  If unsure, say Y to enable frontswap.
+
+config ZSMALLOC
+   tristate "Memory allocator for compressed pages"
+   default n
+   help
+ zsmalloc is a slab-based memory allocator designed to store
+ compressed RAM pages.  zsmalloc uses a memory pool that combines
+ single pages into higher order pages by linking them together
+ using the fields of the struct page. Allocations are then
+ mapped through copy buffers or VM mapping, in order to reduce
+ memory pool fragmentation and increase allocation success rate under
+ memory p

[PATCH v2 2/3] drivers: add memory management driver class

2012-09-04 Thread Seth Jennings
This patchset creates a new driver class under drivers/ for
memory management related drivers, like zcache.

This driver class would be for drivers that don't actually enabled
a hardware device, but rather augment the memory manager in some
way.

In-tree candidates for this driver class are zcache, zram, and
lowmemorykiller, both in staging.

Signed-off-by: Seth Jennings 
---
 drivers/Kconfig|2 ++
 drivers/Makefile   |1 +
 drivers/mm/Kconfig |3 +++
 3 files changed, 6 insertions(+)
 create mode 100644 drivers/mm/Kconfig

diff --git a/drivers/Kconfig b/drivers/Kconfig
index 324e958..d126132 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -154,4 +154,6 @@ source "drivers/vme/Kconfig"
 
 source "drivers/pwm/Kconfig"
 
+source "drivers/mm/Kconfig"
+
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index d64a0f7..aa69e1c 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -140,3 +140,4 @@ obj-$(CONFIG_EXTCON)+= extcon/
 obj-$(CONFIG_MEMORY)   += memory/
 obj-$(CONFIG_IIO)  += iio/
 obj-$(CONFIG_VME_BUS)  += vme/
+obj-$(CONFIG_MM_DRIVERS)   += mm/
diff --git a/drivers/mm/Kconfig b/drivers/mm/Kconfig
new file mode 100644
index 000..e5b3743
--- /dev/null
+++ b/drivers/mm/Kconfig
@@ -0,0 +1,3 @@
+menu "Memory management drivers"
+
+endmenu
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 3/3] zcache: promote to drivers/mm/

2012-09-04 Thread Seth Jennings
This patchset promotes the zcache driver from staging to drivers/mm/.

zcache captures swap pages via frontswap and pages that fall
out of the page cache via cleancache and compress them in RAM,
providing a compressed RAM swap and a compressed second-chance
page cache.

Signed-off-by: Seth Jennings 
---
 drivers/mm/Kconfig   |   10 ++
 drivers/mm/Makefile  |1 +
 drivers/{staging => mm}/zcache/Makefile  |0
 drivers/{staging => mm}/zcache/tmem.c|0
 drivers/{staging => mm}/zcache/tmem.h|0
 drivers/{staging => mm}/zcache/zcache-main.c |0
 drivers/staging/Kconfig  |2 --
 drivers/staging/Makefile |1 -
 drivers/staging/zcache/Kconfig   |   11 ---
 9 files changed, 11 insertions(+), 14 deletions(-)
 create mode 100644 drivers/mm/Makefile
 rename drivers/{staging => mm}/zcache/Makefile (100%)
 rename drivers/{staging => mm}/zcache/tmem.c (100%)
 rename drivers/{staging => mm}/zcache/tmem.h (100%)
 rename drivers/{staging => mm}/zcache/zcache-main.c (100%)
 delete mode 100644 drivers/staging/zcache/Kconfig

diff --git a/drivers/mm/Kconfig b/drivers/mm/Kconfig
index e5b3743..22289c6 100644
--- a/drivers/mm/Kconfig
+++ b/drivers/mm/Kconfig
@@ -1,3 +1,13 @@
 menu "Memory management drivers"
 
+config ZCACHE
+   bool "Dynamic compression of swap pages and clean pagecache pages"
+   depends on (CLEANCACHE || FRONTSWAP) && CRYPTO=y && ZSMALLOC=y
+   select CRYPTO_LZO
+   default n
+   help
+ Zcache uses compression and an in-kernel implementation of
+ transcendent memory to store clean page cache pages and swap
+ in RAM, providing a noticeable reduction in disk I/O.
+
 endmenu
diff --git a/drivers/mm/Makefile b/drivers/mm/Makefile
new file mode 100644
index 000..f36f509
--- /dev/null
+++ b/drivers/mm/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_ZCACHE)   += zcache/
diff --git a/drivers/staging/zcache/Makefile b/drivers/mm/zcache/Makefile
similarity index 100%
rename from drivers/staging/zcache/Makefile
rename to drivers/mm/zcache/Makefile
diff --git a/drivers/staging/zcache/tmem.c b/drivers/mm/zcache/tmem.c
similarity index 100%
rename from drivers/staging/zcache/tmem.c
rename to drivers/mm/zcache/tmem.c
diff --git a/drivers/staging/zcache/tmem.h b/drivers/mm/zcache/tmem.h
similarity index 100%
rename from drivers/staging/zcache/tmem.h
rename to drivers/mm/zcache/tmem.h
diff --git a/drivers/staging/zcache/zcache-main.c 
b/drivers/mm/zcache/zcache-main.c
similarity index 100%
rename from drivers/staging/zcache/zcache-main.c
rename to drivers/mm/zcache/zcache-main.c
diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index b7f7bc7..0940d2e 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -76,8 +76,6 @@ source "drivers/staging/iio/Kconfig"
 
 source "drivers/staging/zram/Kconfig"
 
-source "drivers/staging/zcache/Kconfig"
-
 source "drivers/staging/wlags49_h2/Kconfig"
 
 source "drivers/staging/wlags49_h25/Kconfig"
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index ad74bee..6e1c491 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -33,7 +33,6 @@ obj-$(CONFIG_IPACK_BUS)   += ipack/
 obj-$(CONFIG_DX_SEP)+= sep/
 obj-$(CONFIG_IIO)  += iio/
 obj-$(CONFIG_ZRAM) += zram/
-obj-$(CONFIG_ZCACHE)   += zcache/
 obj-$(CONFIG_WLAGS49_H2)   += wlags49_h2/
 obj-$(CONFIG_WLAGS49_H25)  += wlags49_h25/
 obj-$(CONFIG_FB_SM7XX) += sm7xxfb/
diff --git a/drivers/staging/zcache/Kconfig b/drivers/staging/zcache/Kconfig
deleted file mode 100644
index 4881839..000
--- a/drivers/staging/zcache/Kconfig
+++ /dev/null
@@ -1,11 +0,0 @@
-config ZCACHE
-   bool "Dynamic compression of swap pages and clean pagecache pages"
-   depends on (CLEANCACHE || FRONTSWAP) && CRYPTO=y && ZSMALLOC=y
-   select CRYPTO_LZO
-   default n
-   help
- Zcache doubles RAM efficiency while providing a significant
- performance boosts on many workloads.  Zcache uses
- compression and an in-kernel implementation of transcendent
- memory to store clean page cache pages and swap in RAM,
- providing a noticeable reduction in disk I/O.
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/3] promote zcache from staging

2012-09-04 Thread Konrad Rzeszutek Wilk
On Tue, Sep 04, 2012 at 03:02:46PM -0500, Seth Jennings wrote:
> zcache is the remaining piece of code required to support in-kernel
> memory compression.  The other two features, cleancache and frontswap,
> have been promoted to mainline in 3.0 and 3.5 respectively.  This
> patchset promotes zcache from the staging tree to mainline.

Could you please post it as a singular path. As if it was out-off-tree?
That way it will be much easier to review it by looking at the full code.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/3] promote zcache from staging

2012-09-04 Thread Andrew Morton
On Tue, 4 Sep 2012 15:57:11 -0400
Konrad Rzeszutek Wilk  wrote:

> On Tue, Sep 04, 2012 at 03:02:46PM -0500, Seth Jennings wrote:
> > zcache is the remaining piece of code required to support in-kernel
> > memory compression.  The other two features, cleancache and frontswap,
> > have been promoted to mainline in 3.0 and 3.5 respectively.  This
> > patchset promotes zcache from the staging tree to mainline.
> 
> Could you please post it as a singular path. As if it was out-off-tree?
> That way it will be much easier to review it by looking at the full code.

Yes please.  Very few of the MM developers are familiar with this code.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support

2012-09-04 Thread Nicholas A. Bellinger
On Tue, 2012-09-04 at 08:46 +0200, Paolo Bonzini wrote:
> Il 04/09/2012 04:21, Nicholas A. Bellinger ha scritto:
> >> @@ -112,6 +118,9 @@ static void virtscsi_complete_cmd(struct virtio_scsi 
> >> *vscsi, void *buf)
> >>struct virtio_scsi_cmd *cmd = buf;
> >>struct scsi_cmnd *sc = cmd->sc;
> >>struct virtio_scsi_cmd_resp *resp = &cmd->resp.cmd;
> >> +  struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id];
> >> +
> >> +  atomic_dec(&tgt->reqs);
> >>  
> > 
> > As tgt->tgt_lock is taken in virtscsi_queuecommand_multi() before the
> > atomic_inc_return(tgt->reqs) check, it seems like using atomic_dec() w/o
> > smp_mb__after_atomic_dec or tgt_lock access here is not using atomic.h
> > accessors properly, no..?
> 
> No, only a single "thing" is being accessed, and there is no need to
> order the decrement with respect to preceding or subsequent accesses to
> other locations.
> 
> In other words, tgt->reqs is already synchronized with itself, and that
> is enough.
> 
> (Besides, on x86 smp_mb__after_atomic_dec is a nop).
> 

So the implementation detail wrt to requests to the same target being
processed in FIFO ordering + only being able to change the queue when no
requests are pending helps understand this code more.  Thanks for the
explanation on that bit..

However, it's still my understanding that the use of atomic_dec() in the
completion path mean that smp_mb__after_atomic_dec() is a requirement to
be proper portable atomic.hcode, no..?  Otherwise tgt->regs should be
using something other than an atomic_t, right..?

> >> +static int virtscsi_queuecommand_multi(struct Scsi_Host *sh,
> >> + struct scsi_cmnd *sc)
> >> +{
> >> +  struct virtio_scsi *vscsi = shost_priv(sh);
> >> +  struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id];
> >> +  unsigned long flags;
> >> +  u32 queue_num;
> >> +
> >> +  /* Using an atomic_t for tgt->reqs lets the virtqueue handler
> >> +   * decrement it without taking the spinlock.
> >> +   */
> >> +  spin_lock_irqsave(&tgt->tgt_lock, flags);
> >> +  if (atomic_inc_return(&tgt->reqs) == 1) {
> >> +  queue_num = smp_processor_id();
> >> +  while (unlikely(queue_num >= vscsi->num_queues))
> >> +  queue_num -= vscsi->num_queues;
> >> +  tgt->req_vq = &vscsi->req_vqs[queue_num];
> >> +  }
> >> +  spin_unlock_irqrestore(&tgt->tgt_lock, flags);
> >> +  return virtscsi_queuecommand(vscsi, tgt, sc);
> >> +}
> >> +
> > 
> > The extra memory barriers to get this right for the current approach are
> > just going to slow things down even more for virtio-scsi-mq..
> 
> virtio-scsi multiqueue has a performance benefit up to 20% (for a single
> LUN) or 40% (on overall bandwidth across multiple LUNs).  I doubt that a
> single memory barrier can have that much impact. :)
> 

I've no doubt that this series increases the large block high bandwidth
for virtio-scsi, but historically that has always been the easier
workload to scale.  ;)

> The way to go to improve performance even more is to add new virtio APIs
> for finer control of the usage of the ring.  These should let us avoid
> copying the sg list and almost get rid of the tgt_lock; even though the
> locking is quite efficient in virtio-scsi (see how tgt_lock and vq_lock
> are "pipelined" so as to overlap the preparation of two requests), it
> should give a nice improvement and especially avoid a kmalloc with small
> requests.  I may have some time for it next month.
> 
> > Jen's approach is what we will ultimately need to re-architect in SCSI
> > core if we're ever going to move beyond the issues of legacy host_lock,
> > so I'm wondering if maybe this is the direction that virtio-scsi-mq
> > needs to go in as well..?
> 
> We can see after the block layer multiqueue work goes in...  I also need
> to look more closely at Jens's changes.
> 

Yes, I think Jen's new approach is providing some pretty significant
gains for raw block drivers with extremly high packet (small block
random I/O) workloads, esp with hw block drivers that support genuine mq
with hw num_queues > 1.

He also has virtio-blk converted to run in num_queues=1 mode.

> Have you measured the host_lock to be a bottleneck in high-iops
> benchmarks, even for a modern driver that does not hold it in
> queuecommand?  (Certainly it will become more important as the
> virtio-scsi queuecommand becomes thinner and thinner).

This is exactly why it would make such a good vehicle to re-architect
SCSI core.  I'm thinking it can be the first sw LLD we attempt to get
running on an (currently) future scsi-mq prototype.

>   If so, we can
> start looking at limiting host_lock usage in the fast path.
> 

That would be a good incremental step for SCSI core, but I'm not sure
that that we'll be able to scale compared to blk-mq without a
new-approach for sw/hw LLDs along the lines of what Jen's is doing.

> BTW, supporting this in tcm-vhost should be quite trivial, as all the
> request queues a

Re: [PATCH 0/3] Fix ACPI BGRT support for images located in EFI boot services memory

2012-09-04 Thread Matt Fleming
On Tue, 2012-09-04 at 10:59 -0700, Josh Triplett wrote:
> On Tue, Sep 04, 2012 at 03:27:20PM +0100, Matt Fleming wrote:
> > On Thu, 2012-08-30 at 14:28 -0700, Josh Triplett wrote:
> > > The ACPI BGRT lets the OS access the BIOS logo image and its position on 
> > > the
> > > screen at boot time, allowing it to maintain that image on the screen 
> > > until
> > > ready to display something else, making boot more seamless.  This series 
> > > fixes
> > > support for accessing the boot logo image via the BGRT when the BIOS 
> > > stores it
> > > in EFI boot services memory, as recommended by the ACPI 5.0 spec.  Linux 
> > > needs
> > > to copy the image out of boot services memory before reclaiming boot 
> > > services
> > > memory.
> > > 
> > > The first patch refactors EFI initialization to defer freeing boot 
> > > services
> > > memory until later in the boot process, after we have ACPI available.  The
> > > second patch adds a helper function to look up existing EFI boot services
> > > mappings, to avoid re-mapping them.  The third patch moves BGRT 
> > > initialization
> > > to before the reclamation of boot services memory, copies the logo at that
> > > point, and reworks the existing BGRT driver to use that existing copy.
> > 
> > Since we always end up doing a copy anyway, is there no way we could
> > just copy the boot logo *without* deferring freeing the boot services
> > code, e.g. move the copy before we do SetVirtualAddressMap()?
> 
> Unfortunately not.  We need enough of ACPI available to go read the
> BGRT to know what to copy, so we need to defer freeing boot services
> code until after we initialize ACPI (and thus everything ACPI needs,
> which includes EFI since ACPI looks for root tables there).

Ah, right. It was also pointed out to me offline that some drivers have
been known to access boot services data even after
SetVirtualAddressMap(), so this deferring shouldn't be a problem.

> > I wouldn't be surprised if some implementations got really cranky if
> > we accessed boot services data after we installed a new virtual memory
> > map.
> 
> Note that I've carefully accessed the boot services data *through* the
> new virtual memory map, which should work fine.
> 
> > Besides, if we can avoid moving the efi_free_boot_services() call we can
> > avoid littering init/main.c with more #ifdef CONFIG_X86 blocks.
> 
> Those seem easy enough to convert into appropriate always-available
> stubs, if you'd like.  And I could move efi_free_boot_services() inside
> efi_late_init(), too, keeping it an internal implementation detail of
> EFI initialization.  Would that help?

Yeah, that would seem like a good solution.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/3] promote zcache from staging

2012-09-04 Thread Seth Jennings
On 09/04/2012 02:57 PM, Konrad Rzeszutek Wilk wrote:
> On Tue, Sep 04, 2012 at 03:02:46PM -0500, Seth Jennings wrote:
>> zcache is the remaining piece of code required to support in-kernel
>> memory compression.  The other two features, cleancache and frontswap,
>> have been promoted to mainline in 3.0 and 3.5 respectively.  This
>> patchset promotes zcache from the staging tree to mainline.
> 
> Could you please post it as a singular path. As if it was out-off-tree?
> That way it will be much easier to review it by looking at the full code.

Ah yes, my bad. Scratch v2. Nothing to see here.

Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/11] kexec: Disable in a secure boot environment

2012-09-04 Thread Eric W. Biederman

Matthew Garrett  writes:

> kexec could be used as a vector for a malicious user to use a signed kernel
> to circumvent the secure boot trust model. In the long run we'll want to
> support signed kexec payloads, but for the moment we should just disable
> loading entirely in that situation.

Nacked-by: "Eric W. Biederman" 

This makes no sense.  The naming CAP_SECURE_FIRMWARE is attrocious,
you aren't implementing or enforcing secure firmware.

You don't give any justification for this other than to support some
silly EFI feature.  Why would anyone want this if we were not booting
under EFI?

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH can-next v6] can: add tx/rx LED trigger support

2012-09-04 Thread Fabio Baltieri
On Tue, Sep 04, 2012 at 09:11:28AM +0200, Kurt Van Dijck wrote:
> On Mon, Sep 03, 2012 at 10:54:49PM +0200, Oliver Hartkopp wrote:
> > On 03.09.2012 20:29, Fabio Baltieri wrote:
> > 
> > > On Mon, Sep 03, 2012 at 08:13:35PM +0200, Kurt Van Dijck wrote:
> > >> On Mon, Sep 03, 2012 at 02:40:39PM +0200, Marc Kleine-Budde wrote:
> > >>> The net->ifindex is unique. But it's only an integer. Usually can0 has a
> > >>> ifindex != 0, so a simple can%d is contra productive here.
> > >>>
> > >>> Some pointers to related code:
> > >>> http://lxr.free-electrons.com/source/drivers/base/core.c#L1847
> > >>> http://lxr.free-electrons.com/source/drivers/base/core.c#L73
> > >>> http://lxr.free-electrons.com/source/include/linux/device.h#L695
> > >>>
> > >>> comments?
> > > 
> > > That would probabily makes really hard to choose the right
> > > default_trigger for led devices to get to the appropriate CAN LED in
> > > embedded systems, as trigger name would depend from other network
> > > devices and probing order (correct me if I'm wrong).
> > > 
> > > Something with device name would probaily be more appropriate here.
> > > 
> > >>
> > >> a very recent idea: something with netdevice notifiers and 
> > >> NETDEV_CHANGENAME ...
> > >> http://lxr.free-electrons.com/source/net/core/dev.c#L1030
> > >>
> > >> you could: rename the trigger, or if we think it's usefull,
> > >> block the netdev rename when its triggers are in use.
> > > 
> > > Blocking the rename looks overkill to me,
> 
> renaming a netdev _after_ first attaching led triggers looks stupid to me
> anyway.
> 
> > > what about using device name
> > > with an optional "port id" appended to it?  Sounds simpler...
> > 
> > 
> > The name of the device can only be changed when the interface is down.
> > Is it possible to put some scripting around it to detach and attach the leds
> > to the interfaces on ifup/ifdown triggers?
> 
> Are the led triggers available for using while the netdev is down then?

Sure!  On embedded systems triggers are usually attached to actual LEDs
at probe time using default_trigger field of struct led_classdev, and
that can be specified both in machine files or in device tree.

See

http://lxr.free-electrons.com/source/arch/powerpc/boot/dts/mpc8315erdb.dts#L477
http://lxr.free-electrons.com/source/arch/arm/mach-pxa/tosa.c#L576

that's why the trigger name should be predictable, at least at
boot/probe time.

The actual CAN LED trigger also reflects the actual state of the
interface (off if down, on if up, blink on activity).

Fabio
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ARM: msm: Fix sparse warnings due to incorrect type

2012-09-04 Thread Stephen Boyd
arch/arm/mach-msm/timer.c:153:3: warning: incorrect type in initializer 
(different address spaces)
arch/arm/mach-msm/timer.c:153:3:expected void const [noderef] 
*__vpp_verify
arch/arm/mach-msm/timer.c:153:3:got struct clock_event_device [noderef] 
**
arch/arm/mach-msm/timer.c:153:38: warning: incorrect type in assignment 
(different address spaces)
arch/arm/mach-msm/timer.c:153:38:expected struct clock_event_device 
[noderef] *
arch/arm/mach-msm/timer.c:153:38:got struct clock_event_device *evt
arch/arm/mach-msm/timer.c:191:22: warning: incorrect type in assignment 
(different address spaces)
arch/arm/mach-msm/timer.c:191:22:expected struct clock_event_device 
[noderef] **static [toplevel] percpu_evt
arch/arm/mach-msm/timer.c:191:22:got struct clock_event_device *[noderef] 
*
arch/arm/mach-msm/timer.c:196:4: warning: incorrect type in initializer 
(different address spaces)
arch/arm/mach-msm/timer.c:196:4:expected void const [noderef] 
*__vpp_verify
arch/arm/mach-msm/timer.c:196:4:got struct clock_event_device [noderef] 
**
arch/arm/mach-msm/timer.c:196:39: warning: incorrect type in assignment 
(different address spaces)
arch/arm/mach-msm/timer.c:196:39:expected struct clock_event_device 
[noderef] *
arch/arm/mach-msm/timer.c:196:39:got struct clock_event_device *ce
arch/arm/mach-msm/timer.c:198:24: warning: incorrect type in argument 4 
(different address spaces)
arch/arm/mach-msm/timer.c:198:24:expected void [noderef] 
*percpu_dev_id
arch/arm/mach-msm/timer.c:198:24:got struct clock_event_device [noderef] 
**static [toplevel] percpu_evt

Signed-off-by: Stephen Boyd 
---
 arch/arm/mach-msm/timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mach-msm/timer.c b/arch/arm/mach-msm/timer.c
index 81280825..004f935 100644
--- a/arch/arm/mach-msm/timer.c
+++ b/arch/arm/mach-msm/timer.c
@@ -101,7 +101,7 @@ static struct clock_event_device msm_clockevent = {
 
 static union {
struct clock_event_device *evt;
-   struct clock_event_device __percpu **percpu_evt;
+   struct clock_event_device * __percpu *percpu_evt;
 } msm_evt;
 
 static void __iomem *source_base;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/11] kexec: Disable in a secure boot environment

2012-09-04 Thread Matthew Garrett
On Tue, Sep 04, 2012 at 01:13:32PM -0700, Eric W. Biederman wrote:
> 
> Matthew Garrett  writes:
> 
> > kexec could be used as a vector for a malicious user to use a signed kernel
> > to circumvent the secure boot trust model. In the long run we'll want to
> > support signed kexec payloads, but for the moment we should just disable
> > loading entirely in that situation.
> 
> Nacked-by: "Eric W. Biederman" 
> 
> This makes no sense.  The naming CAP_SECURE_FIRMWARE is attrocious,
> you aren't implementing or enforcing secure firmware.

I'm certainly not attached to the name, and have no problem replacing 
it.

> You don't give any justification for this other than to support some
> silly EFI feature.  Why would anyone want this if we were not booting
> under EFI?

Well, given that approximately everyone will be booting under EFI within 
18 months, treating it as a niche case seems a little short sighted. And 
secondly, there are already several non-EFI platforms that want to enact 
a policy preventing root from being able to arbitrarily replace the 
kernel. Given that people are doing this in the wild, it makes sense to 
move towards offering that policy in the mainline kernel.

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] staging/vme: Use pr_ printks in vme_user.c

2012-09-04 Thread Greg Kroah-Hartman
On Tue, Aug 21, 2012 at 08:12:53PM +0900, Toshiaki Yamane wrote:
> The below checkpatch warnings was fixed,
> 
> -WARNING: Prefer pr_info(... to printk(KERN_INFO, ...
> -WARNING: Prefer pr_debug(... to printk(KERN_DEBUG, ...
> -WARNING: Prefer pr_warn(... to printk(KERN_WARNING, ...
> -WARNING: Prefer pr_err(... to printk(KERN_ERR, ...
> 
> and added pr_fmt.

A lot of these can be converted to use dev_info(), dev_debug(),
dev_warn(), and dev_err() instead.  Please do that whenever you have
access to a struct device.

So, sorry, I can't take this patch.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] Fix ACPI BGRT support for images located in EFI boot services memory

2012-09-04 Thread H. Peter Anvin

On 09/04/2012 12:45 PM, Josh Triplett wrote:


There are some platforms which have bugs in this area, so there are
other reasons to defer freeing up boot memory until as late in the
boot process as we can possibly get away with.

free_initmem() is presuambly the place that makes most sense.


You're suggesting a call from free_initmem() to
efi_free_boot_services()?  Or, from init_post() right before the call to
free_initmem()?



free_initmem() is arch-specific, so probably the latter.


This
is EFI-specific but not x86-specific, let's not commingle those
concepts, please...


init/main.c already calls the x86-specific efi_enter_virtual_mode
(defined in arch/x86/platform/efi/efi.c), and I split the call to the
x86-specific efi_free_boot_services out of that.  Neither of those
functions exists on non-x86 platforms, and thus I mirrored the #ifdef
currently wrapped around efi_enter_virtual_mode for the new call to
efi_free_boot_services.  While it might make sense for that code to
exist on non-x86 EFI platforms, it currently doesn't.  At best, I could
add static inline stubs to linux/efi.h for those functions to avoid the
ifdefs, but as far as I can tell the same issue applies to quite a few
more functions in efi.h.

Would you like me to add the static inline stubs for the couple of
functions called from init/main.c, or leave the #ifdefs?



I think that would really help clean things up.

-hpa



--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kexec/kdump kernel fails to start

2012-09-04 Thread Flavio Leitner
On Tue, 4 Sep 2012 12:20:14 -0700
Yinghai Lu  wrote:

> On Tue, Sep 4, 2012 at 12:17 PM, Flavio Leitner  wrote:
> > On Tue, 4 Sep 2012 12:02:00 -0700
> > [0.00] x86 PAT enabled: cpu 0, old 0x7040600070406, new 
> > 0x7010600070106
> > [0.00] last_pfn = 0xbf800 max_arch_pfn = 0x4
> > [0.00] initial memory mapped : 0 - 2000
> > [0.00] Base memory trampoline at [88098000] 98000 size 20480
> > [0.00] init_memory_mapping: -bf80
> > [0.00]  00 - 00bf80 page 2M
> > [0.00] kernel direct mapping tables up to bf80 @ 
> > 1fa0-2000
> > [0.00] init_memory_mapping: 0001-00044000
> > [0.00]  01 - 044000 page 2M
> > [0.00] kernel direct mapping tables up to 44000 @ 
> > bdaab000-bf4bd000
> > [0.00] RAMDISK: 352c8000 - 3695c000
> >
> BTW, can you please try our new init_memory_mapping clean up at
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> for-x86-mm
> 
> hope it could make your kdump working.

Sorry, but it didn't work.
The same problem happened. 
fbl
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] xen/p2m: Fix one by off error in checking the P2M tree directory.

2012-09-04 Thread Konrad Rzeszutek Wilk
We would the full P2M top directory from 0->MAX_DOMAIN_PAGES (inclusive).

Which meant that if the kernel was compiled with MAX_DOMAIN_PAGES=512
we would try to use the 512th entry. Fortunately for us the p2m_top_index
has a check for this:

 BUG_ON(pfn >= MAX_P2M_PFN);

which we hit and saw this:

(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) [ Xen-4.1.2-OVM  x86_64  debug=n  Tainted:C ]
(XEN) CPU:0
(XEN) RIP:e033:[]
(XEN) RFLAGS: 0212   EM: 1   CONTEXT: pv guest
(XEN) rax: 81db5000   rbx: 81db4000   rcx: 
(XEN) rdx: 00480211   rsi:    rdi: 81db4000
(XEN) rbp: 81793db8   rsp: 81793d38   r8:  0800
(XEN) r9:  4000   r10:    r11: 81db7000
(XEN) r12: 0ff8   r13: 81df1ff8   r14: 81db6000
(XEN) r15: 0ff8   cr0: 8005003b   cr4: 26f0
(XEN) cr3: 000661795000   cr2: 

Fixes-Oracle-Bug: 14570662
Signed-off-by: Konrad Rzeszutek Wilk 
---
 arch/x86/xen/p2m.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 0bfaf5b..af11f00 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -695,7 +695,7 @@ bool __init early_can_reuse_p2m_middle(unsigned long 
set_pfn, unsigned long set_
if (p2m_index(set_pfn))
return false;
 
-   for (pfn = 0; pfn <= MAX_DOMAIN_PAGES; pfn += P2M_PER_PAGE) {
+   for (pfn = 0; pfn < MAX_DOMAIN_PAGES; pfn += P2M_PER_PAGE) {
topidx = p2m_top_index(pfn);
 
if (!p2m_top[topidx])
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] Fix ACPI BGRT support for images located in EFI boot services memory

2012-09-04 Thread Josh Triplett
On Tue, Sep 04, 2012 at 01:24:03PM -0700, H. Peter Anvin wrote:
> On 09/04/2012 12:45 PM, Josh Triplett wrote:
> >>
> >>There are some platforms which have bugs in this area, so there are
> >>other reasons to defer freeing up boot memory until as late in the
> >>boot process as we can possibly get away with.
> >>
> >>free_initmem() is presuambly the place that makes most sense.
> >
> >You're suggesting a call from free_initmem() to
> >efi_free_boot_services()?  Or, from init_post() right before the call to
> >free_initmem()?
> 
> free_initmem() is arch-specific, so probably the latter.

OK, will do.

> >>This
> >>is EFI-specific but not x86-specific, let's not commingle those
> >>concepts, please...
> >
> >init/main.c already calls the x86-specific efi_enter_virtual_mode
> >(defined in arch/x86/platform/efi/efi.c), and I split the call to the
> >x86-specific efi_free_boot_services out of that.  Neither of those
> >functions exists on non-x86 platforms, and thus I mirrored the #ifdef
> >currently wrapped around efi_enter_virtual_mode for the new call to
> >efi_free_boot_services.  While it might make sense for that code to
> >exist on non-x86 EFI platforms, it currently doesn't.  At best, I could
> >add static inline stubs to linux/efi.h for those functions to avoid the
> >ifdefs, but as far as I can tell the same issue applies to quite a few
> >more functions in efi.h.
> >
> >Would you like me to add the static inline stubs for the couple of
> >functions called from init/main.c, or leave the #ifdefs?
> >
> 
> I think that would really help clean things up.

Fair enough.  I'll send v2 shortly.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] security: allow Yama to be unconditionally stacked

2012-09-04 Thread Kees Cook
Unconditionally call Yama when CONFIG_SECURITY_YAMA_STACKED is selected,
no matter what LSM module is primary.

Ubuntu and Chrome OS already carry patches to do this, and Fedora
has voiced interest in doing this as well. Instead of having multiple
distributions (or LSM authors) carrying these patches, just allow Yama
to be called unconditionally when selected by the new CONFIG.

Signed-off-by: Kees Cook 
---
 include/linux/security.h |   31 +++
 security/security.c  |   21 +
 security/yama/Kconfig|8 
 security/yama/yama_lsm.c |   14 ++
 4 files changed, 70 insertions(+), 4 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index 3dea6a9..01ef030 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -3021,5 +3021,36 @@ static inline void free_secdata(void *secdata)
 { }
 #endif /* CONFIG_SECURITY */
 
+#ifdef CONFIG_SECURITY_YAMA
+extern int yama_ptrace_access_check(struct task_struct *child,
+   unsigned int mode);
+extern int yama_ptrace_traceme(struct task_struct *parent);
+extern void yama_task_free(struct task_struct *task);
+extern int yama_task_prctl(int option, unsigned long arg2, unsigned long arg3,
+  unsigned long arg4, unsigned long arg5);
+#else
+static inline int yama_ptrace_access_check(struct task_struct *child,
+  unsigned int mode)
+{
+   return 0;
+}
+
+static inline int yama_ptrace_traceme(struct task_struct *parent)
+{
+   return 0;
+}
+
+static inline void yama_task_free(struct task_struct *task)
+{
+}
+
+static inline int yama_task_prctl(int option, unsigned long arg2,
+ unsigned long arg3, unsigned long arg4,
+ unsigned long arg5)
+{
+   return -ENOSYS;
+}
+#endif /* CONFIG_SECURITY_YAMA */
+
 #endif /* ! __LINUX_SECURITY_H */
 
diff --git a/security/security.c b/security/security.c
index 860aeb3..68c1b9b 100644
--- a/security/security.c
+++ b/security/security.c
@@ -136,11 +136,23 @@ int __init register_security(struct security_operations 
*ops)
 
 int security_ptrace_access_check(struct task_struct *child, unsigned int mode)
 {
+#ifdef CONFIG_SECURITY_YAMA_STACKED
+   int rc;
+   rc = yama_ptrace_access_check(child, mode);
+   if (rc)
+   return rc;
+#endif
return security_ops->ptrace_access_check(child, mode);
 }
 
 int security_ptrace_traceme(struct task_struct *parent)
 {
+#ifdef CONFIG_SECURITY_YAMA_STACKED
+   int rc;
+   rc = yama_ptrace_traceme(parent);
+   if (rc)
+   return rc;
+#endif
return security_ops->ptrace_traceme(parent);
 }
 
@@ -761,6 +773,9 @@ int security_task_create(unsigned long clone_flags)
 
 void security_task_free(struct task_struct *task)
 {
+#ifdef CONFIG_SECURITY_YAMA_STACKED
+   yama_task_free(task);
+#endif
security_ops->task_free(task);
 }
 
@@ -876,6 +891,12 @@ int security_task_wait(struct task_struct *p)
 int security_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 unsigned long arg4, unsigned long arg5)
 {
+#ifdef CONFIG_SECURITY_YAMA_STACKED
+   int rc;
+   rc = yama_task_prctl(option, arg2, arg3, arg4, arg5);
+   if (rc != -ENOSYS)
+   return rc;
+#endif
return security_ops->task_prctl(option, arg2, arg3, arg4, arg5);
 }
 
diff --git a/security/yama/Kconfig b/security/yama/Kconfig
index 51d6709..20ef514 100644
--- a/security/yama/Kconfig
+++ b/security/yama/Kconfig
@@ -11,3 +11,11 @@ config SECURITY_YAMA
  Further information can be found in Documentation/security/Yama.txt.
 
  If you are unsure how to answer this question, answer N.
+
+config SECURITY_YAMA_STACKED
+   bool "Yama stacked with other LSMs"
+   depends on SECURITY_YAMA
+   default n
+   help
+ When Yama is built into the kernel, force it to stack with the
+ selected primary LSM.
diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
index dcd6178..b4c2984 100644
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -100,7 +100,7 @@ static void yama_ptracer_del(struct task_struct *tracer,
  * yama_task_free - check for task_pid to remove from exception list
  * @task: task being removed
  */
-static void yama_task_free(struct task_struct *task)
+void yama_task_free(struct task_struct *task)
 {
yama_ptracer_del(task, task);
 }
@@ -116,7 +116,7 @@ static void yama_task_free(struct task_struct *task)
  * Return 0 on success, -ve on error.  -ENOSYS is returned when Yama
  * does not handle the given option.
  */
-static int yama_task_prctl(int option, unsigned long arg2, unsigned long arg3,
+int yama_task_prctl(int option, unsigned long arg2, unsigned long arg3,
   unsigned long arg4, unsigned long arg5)
 {
int rc;
@@ -243,7 +243,7 @@ stati

Re: [PATCH 10/11] acpi: Ignore acpi_rsdp kernel parameter in a secure boot environment

2012-09-04 Thread Alan Cox
> Gotta say this capability name is confusing. Naming is
> CAP_PRE_SECURE_BOOT or something along the lines might be a better
> choice. When I just look at this name, I sure thought this
> CAP_SECURE_FIRMWARE true means it is a secure boot capable firmware.

Given there is nothing secure about it would it also be better to call it
AUTHENTICATED_BOOT ?

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/11] acpi: Ignore acpi_rsdp kernel parameter in a secure boot environment

2012-09-04 Thread Matthew Garrett
On Tue, Sep 04, 2012 at 09:37:42PM +0100, Alan Cox wrote:
> > Gotta say this capability name is confusing. Naming is
> > CAP_PRE_SECURE_BOOT or something along the lines might be a better
> > choice. When I just look at this name, I sure thought this
> > CAP_SECURE_FIRMWARE true means it is a secure boot capable firmware.
> 
> Given there is nothing secure about it would it also be better to call it
> AUTHENTICATED_BOOT ?

Well, there is the question of whether the sense is correct - you'll 
only have this capability if you don't boot with any form of 
authentication. CAP_KERNEL_ACCESS?

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: western digital caviar black. EXT4-fs error

2012-09-04 Thread Azat Khuzhin
Ted, many thanks!
I'll try to compile new kernel (maybe 3.4.x)

On Tue, Sep 4, 2012 at 7:15 AM, Theodore Ts'o  wrote:
> On Sat, Sep 01, 2012 at 11:48:17PM +0400, Azat Khuzhin wrote:
>> Recently I update my HDD on desktop machine, and bought WD Caviar Black.
>> But after I format & copy information to it (using dd), and fix
>> partitions size: I have next errors in kern.log:
>>
>> Aug 28 01:49:03 home-spb kernel: [183245.030897] EXT4-fs error (device
>> sdc2): ext4_mb_generate_buddy:739: group 3675, 32254 clusters in
>> bitmap, 32258 in gd
>
> Sorry for the delay; you sent this to linux-kernel (and not the
> linux-ext4 list).  It also took me a while to dig up the relevant fix
> from my archives; normally, once a bug has been fixed (as this one
> was, on June 7, 2012) I don't worry about it any more.
>
> The upstream fix is commit b0dd6b70f0fda17ae9762fbb72d98e40a4f66556.
>
> Note that you are using the 3.3.0 kernel.  This is a not long-term
> supported kernel, so fixes from upstream are no longer being
> backported to it.  The official Debian kernel (which tracks the 3.2.x
> stable kernel series) has the backported bug fix.  So does the 3.4
> long-term stable kernel series, as does the 3.5 kernel or any later
> kernel.
>
> If you don't know how to backport a kernel patch, or even if you do, I
> would strongly suggest that you either go back to the Debian standard
> kernel for Wheezy, or track the 3.2.x or 3.4.x long-term stable
> kernel.  (The fix is in v3.2.20 or later, and v3.4.3 or later ---
> where those trees are up to v3.2.28, and v3.4.10, respectively.)
> Otherwise, you may very well run into some other kernel bug which has
> already been fixed upstream, and you'll just waste your time as well
> as various other kernel developers.
>
> Regards,
>
> - Ted



-- 
Azat Khuzhin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: high load average in linux-3.6

2012-09-04 Thread Azat Khuzhin
Can anybody say is this fixed or not?
Or maybe you need more information?

Thanks.

On Fri, Aug 17, 2012 at 1:10 AM, Azat Khuzhin  wrote:
> Hi all.
>
> After updating to linux-v3.6-rc1-315-g3c31a6e I noticed that load avg
> is too high for "current" CPU usage & IO activity
>
> Just after starting, I see next things:
> 1) I'v kill all processes that I "can"
> iostat_min_tasks - http://pastebin.com/T59xKEy4
> ps_min_tasks - http://pastebin.com/b7zVy6up
> w_min_tasks - http://pastebin.com/UFFsncSn
>
> 2) After I started some processes/services (kde,nginx,mysql ...)
> iostat_with_graph - http://pastebin.com/bejGneGv
> ps_with_graph - http://pastebin.com/Nt9g0ynW
> w_with_graph - http://pastebin.com/pYuUbyVY
> ( graph - with DE )
>
> When I 3.3.0 I did not notice such situation.
>
> I'v trying to set kernel cmd line to "i915.i915_enable_rc6=1
> i915.i915_enable_fbc=1 i915.lvds_downclock=1 drm.vblankoffdelay=1"
> like here https://bugs.archlinux.org/task/29850
> But this doesn't helps.
>
> --
> Azat Khuzhin



-- 
Azat Khuzhin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V4] regulator: tps6586x: add support for SYS rail

2012-09-04 Thread Stephen Warren
From: Laxman Dewangan 

Device have SYS rail which is always ON. It is system power bus. LDO5
and LDO_RTC get powered through this rail internally. Add support for
this rail and make the LDO5/LDO_RTC supply by it. Update document
accordingly.

[swarren: Instantiate the sys regulator from board-harmony-power.c to
 avoid regression.]

Signed-off-by: Laxman Dewangan 
Signed-off-by: Stephen Warren 
---
Mark, this patch is based on v3.6-rc3. As we discussed, lets put this into
a TPS6586x driver topic branch in the regulator tree, so that I can merge
it into the Tegra tree for the regulator DT conversion. Note that Laxman
posted patch "regulator: tps6586x: register regulator even if no init data"
which depends on this patch, so I guess they should both go through the
topic branch.

 .../devicetree/bindings/regulator/tps6586x.txt |   65 
 arch/arm/mach-tegra/board-harmony-power.c  |   12 +++-
 drivers/mfd/tps6586x.c |   13 
 drivers/regulator/tps6586x-regulator.c |   20 ++-
 include/linux/mfd/tps6586x.h   |1 +
 5 files changed, 81 insertions(+), 30 deletions(-)

diff --git a/Documentation/devicetree/bindings/regulator/tps6586x.txt 
b/Documentation/devicetree/bindings/regulator/tps6586x.txt
index da80c2a..a2436e1 100644
--- a/Documentation/devicetree/bindings/regulator/tps6586x.txt
+++ b/Documentation/devicetree/bindings/regulator/tps6586x.txt
@@ -8,7 +8,8 @@ Required properties:
 - gpio-controller: mark the device as a GPIO controller
 - regulators: list of regulators provided by this controller, must have
   property "regulator-compatible" to match their hardware counterparts:
-  sm[0-2], ldo[0-9] and ldo_rtc
+  sys, sm[0-2], ldo[0-9] and ldo_rtc
+- sys-supply: The input supply for SYS.
 - vin-sm0-supply: The input supply for the SM0.
 - vin-sm1-supply: The input supply for the SM1.
 - vin-sm2-supply: The input supply for the SM2.
@@ -20,6 +21,9 @@ Required properties:
 
 Each regulator is defined using the standard binding for regulators.
 
+Note: LDO5 and LDO_RTC is supplied by SYS regulator internally and driver
+  take care of making proper parent child relationship.
+
 Example:
 
pmu: tps6586x@34 {
@@ -30,6 +34,7 @@ Example:
#gpio-cells = <2>;
gpio-controller;
 
+   sys-supply = <&some_reg>;
vin-sm0-supply = <&some_reg>;
vin-sm1-supply = <&some_reg>;
vin-sm2-supply = <&some_reg>;
@@ -43,8 +48,16 @@ Example:
#address-cells = <1>;
#size-cells = <0>;
 
-   sm0_reg: regulator@0 {
+   sys_reg: regulator@0 {
reg = <0>;
+   regulator-compatible = "sys";
+   regulator-name = "vdd_sys";
+   regulator-boot-on;
+   regulator-always-on;
+   };
+
+   sm0_reg: regulator@1 {
+   reg = <1>;
regulator-compatible = "sm0";
regulator-min-microvolt = < 725000>;
regulator-max-microvolt = <150>;
@@ -52,8 +65,8 @@ Example:
regulator-always-on;
};
 
-   sm1_reg: regulator@1 {
-   reg = <1>;
+   sm1_reg: regulator@2 {
+   reg = <2>;
regulator-compatible = "sm1";
regulator-min-microvolt = < 725000>;
regulator-max-microvolt = <150>;
@@ -61,8 +74,8 @@ Example:
regulator-always-on;
};
 
-   sm2_reg: regulator@2 {
-   reg = <2>;
+   sm2_reg: regulator@3 {
+   reg = <3>;
regulator-compatible = "sm2";
regulator-min-microvolt = <300>;
regulator-max-microvolt = <455>;
@@ -70,72 +83,72 @@ Example:
regulator-always-on;
};
 
-   ldo0_reg: regulator@3 {
-   reg = <3>;
+   ldo0_reg: regulator@4 {
+   reg = <4>;
regulator-compatible = "ldo0";
regulator-name = "PCIE CLK";
regulator-min-microvolt = <330>;
regulator-max-microvolt = <330>;
};
 
-   ldo1_reg: regulator@4 {
-   reg = <4>;
+

Re: kexec/kdump kernel fails to start

2012-09-04 Thread Yinghai Lu
On Tue, Sep 4, 2012 at 1:26 PM, Flavio Leitner  wrote:
>
> Sorry, but it didn't work.
> The same problem happened.

can you send out boot log ?

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/5] fat: eliminate orphaned inode number allocation

2012-09-04 Thread NeilBrown
On Tue, 4 Sep 2012 15:25:09 -0400 "J. Bruce Fields" 
wrote:

> On Wed, Sep 05, 2012 at 04:02:13AM +0900, OGAWA Hirofumi wrote:
> > "J. Bruce Fields"  writes:
> > 
> > > On Wed, Sep 05, 2012 at 02:07:40AM +0900, OGAWA Hirofumi wrote:
> > >> OGAWA Hirofumi  writes:
> > >> 
> > >> > Namjae Jeon  writes:
> > >> >
> > >> >> From: Namjae Jeon 
> > >> >>
> > >> >> Maintain a list of inode(i_pos) numbers of orphaned inodes (i.e the
> > >> >> inodes that have been unlinked but still having open file
> > >> >> descriptors).At file/directory creation time, skip using such i_pos
> > >> >> values.Removal of the i_pos from the list is done during inode 
> > >> >> eviction.
> > >> >
> > >> > What happens if the directory (has busy entries) was completely 
> > >> > removed?
> > >> >
> > >> >
> > >> > And Al's point is important for NFS too. If you want stable ino for 
> > >> > NFS,
> > >> > you never can't change it.
> > >> 
> > >> s/never can't/never can/
> > >
> > > If vfat exports aren't fixable, maybe we should just remove that
> > > feature?
> > >
> > > I'm afraid that having unfixable half-working vfat exports is just an
> > > attractive nuisance that causes users and developers to waste their
> > > time
> > 
> > In historically, it was introduced by Neil Brown, when nfs export
> > interface was rewritten (I'm not sure what was intended).
> > 
> > Personally, I'm ok to remove it though, it is really personal
> > opinion. The state would be rather I don't have strong opinion to
> > remove.
> 
> Neil, any opinion?
> 
> If we can document circumstances under which nfs exports of fat
> filesystems are reliable, fine.
> 
> Otherwise I'd rather just be clear that we don't support it.
> 
> --b.


I think that is important to maintain support for NFS export of VFAT on a
best-effort basis.  We can't provide 100% guarantees of all NFS semantics but
that doesn't prevent it from being of real practical benefit to people.

If the usage pattern is "open/read/close" or "open/write/close" while no
other client is accessing the filesystem and while the server is not under
aggressive memory pressure, then it should work quite reliably.

If you rename files while they are open, have lots of file concurrently open,
allow the server to experience high memory pressure (e.g. reading/writing
multiple files that are bigger than memory) etc etc then things can start to
fail.

VFAT is widely used as a file-transfer protocol.  If you use NFS/VFAT in that
way it works fine.  If you try to use it as a general file access protocol,
that is when you hit problems.

The patch series tries to make inode number stable across reboot.  I think
this is not worth the effort as you won't make VFAT access more reliable,
you'll just make it fail differently.

The only real answer to more reliable NFS access to VFAT is the NFSv4 concept
of volatile file handles.  Unfortunately NFSv4 hasn't yet specified these
with sufficient precision to actually use them.

So if anyone wants to improve VFAT/NFS, I suggest that the first step is to
work with the NFSV4-WG to get an implementable specification.  Good luck with
that.

NeilBrown


signature.asc
Description: PGP signature


Re: [PATCH V2] regulator: tps6586x: register regulator even if no init data

2012-09-04 Thread Stephen Warren
On 08/29/2012 09:01 AM, Laxman Dewangan wrote:
> Register all TPS6586x regulators even if there is no regulator
> init data for platform i.e. without any user-supplied constraints.
> 
> Signed-off-by: Laxman Dewangan 

Tested-by: Stephen Warren 

Note that this patch depends on the patch I just posted titled
"regulator: tps6586x: add support for SYS rail". I also believe Laxman
will be posting another patch based on these 2 soon (it will move the
regulator DT parsing out of the MFD driver into the regulator driver),
so I guess it makes sense to take them all through the same TPS6586x
topic branch in the regulator tree.

That all said, I only care about merging the first patch, "regulator:
tps6586x: add support for SYS rail", into the Tegra tree, so please
don't wait for the 3rd patch before making any tag for me to pull.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] genalloc: add best fit algorithm

2012-09-04 Thread Andrew Morton
On Tue,  4 Sep 2012 14:20:29 +0200
benjamin.gaign...@linaro.org wrote:

> Allow genalloc to use another algorithm than first-fit one.
> Add a best-fit algorithm.

This changelog give nobody any reason to merge the patch.

Why was this change made?  What are its benefits?  If it's a
performance optimisation then describe the tescase and provide numbers.
If it is to address some fragmentation issue then fully describe that
issue, describe the test case and provide testing results.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/11] acpi: Ignore acpi_rsdp kernel parameter in a secure boot environment

2012-09-04 Thread Josh Boyer
On Tue, Sep 04, 2012 at 09:37:32PM +0100, Matthew Garrett wrote:
> On Tue, Sep 04, 2012 at 09:37:42PM +0100, Alan Cox wrote:
> > > Gotta say this capability name is confusing. Naming is
> > > CAP_PRE_SECURE_BOOT or something along the lines might be a better
> > > choice. When I just look at this name, I sure thought this
> > > CAP_SECURE_FIRMWARE true means it is a secure boot capable firmware.
> > 
> > Given there is nothing secure about it would it also be better to call it
> > AUTHENTICATED_BOOT ?
> 
> Well, there is the question of whether the sense is correct - you'll 
> only have this capability if you don't boot with any form of 
> authentication. CAP_KERNEL_ACCESS?

I'm fine with whatever name we come up with, but I'd like to avoid
bikeshedding it in every patch.  Maybe we could work on the naming
through comments to the patch that actually adds the capability?

josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xen/p2m: Fix one by off error in checking the P2M tree directory.

2012-09-04 Thread Konrad Rzeszutek Wilk
On Tue, Sep 04, 2012 at 04:17:14PM -0400, Konrad Rzeszutek Wilk wrote:
> We would the full P2M top directory from 0->MAX_DOMAIN_PAGES (inclusive).

.. We would traverse the full P2M top directory (from 0->MAX_DOMAIN_PAGES
inclusive) when trying to figure out whether we can re-use some of the
P2M middle leafs.

> 
> Which meant that if the kernel was compiled with MAX_DOMAIN_PAGES=512
> we would try to use the 512th entry. Fortunately for us the p2m_top_index
> has a check for this:
> 
>  BUG_ON(pfn >= MAX_P2M_PFN);
> 
> which we hit and saw this:
> 
> (XEN) domain_crash_sync called from entry.S
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> (XEN) [ Xen-4.1.2-OVM  x86_64  debug=n  Tainted:C ]
> (XEN) CPU:0
> (XEN) RIP:e033:[]
> (XEN) RFLAGS: 0212   EM: 1   CONTEXT: pv guest
> (XEN) rax: 81db5000   rbx: 81db4000   rcx: 
> (XEN) rdx: 00480211   rsi:    rdi: 81db4000
> (XEN) rbp: 81793db8   rsp: 81793d38   r8:  0800
> (XEN) r9:  4000   r10:    r11: 81db7000
> (XEN) r12: 0ff8   r13: 81df1ff8   r14: 81db6000
> (XEN) r15: 0ff8   cr0: 8005003b   cr4: 26f0
> (XEN) cr3: 000661795000   cr2: 
> 
> Fixes-Oracle-Bug: 14570662
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  arch/x86/xen/p2m.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> index 0bfaf5b..af11f00 100644
> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c
> @@ -695,7 +695,7 @@ bool __init early_can_reuse_p2m_middle(unsigned long 
> set_pfn, unsigned long set_
>   if (p2m_index(set_pfn))
>   return false;
>  
> - for (pfn = 0; pfn <= MAX_DOMAIN_PAGES; pfn += P2M_PER_PAGE) {
> + for (pfn = 0; pfn < MAX_DOMAIN_PAGES; pfn += P2M_PER_PAGE) {
>   topidx = p2m_top_index(pfn);
>  
>   if (!p2m_top[topidx])
> -- 
> 1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 01/17] hashtable: introduce a small and naive hashtable

2012-09-04 Thread Steven Rostedt
On Tue, 2012-09-04 at 18:21 +0100, Pedro Alves wrote:
> On 09/04/2012 06:17 PM, Steven Rostedt wrote:
> > On Tue, 2012-09-04 at 17:40 +0100, Pedro Alves wrote:
> > 
> >> BTW, you can also go a step further and remove the need to close with 
> >> double }},
> >> with something like:
> >>
> >> #define do_for_each_ftrace_rec(pg, rec)
> >>   \
> >> for (pg = ftrace_pages_start, rec = &pg->records[pg->index];   
> >>   \
> >>  pg && rec == &pg->records[pg->index]; 
> >>   \
> >>  pg = pg->next)
> >>   \
> >>   for (rec = pg->records; rec < &pg->records[pg->index]; rec++)
> >>
> > 
> > Yeah, but why bother? It's hidden in a macro, and the extra '{ }' shows
> > that this is something "special".
> 
> The point of both changes is that there's nothing special in the end
> at all.  It all just works...
> 

It would still fail on a 'break'. The 'while' macro tells us that it is
special, because in the end, it wont work.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix mmap overflow checking

2012-09-04 Thread Andrew Morton
On Tue, 4 Sep 2012 17:23:00 +0800
Wanlong Gao  wrote:

> POSIX said that if the file is a regular file and the value of "off"
> plus "len" exceeds the offset maximum established in the open file
> description associated with fildes, mmap should return EOVERFLOW.

That's what POSIX says, but what does Linux do?  It is important that
we precisely describe and understand the behaviour change, as there is
potential here to break existing applications.

I'm assuming that Linux presently permits the mmap() and then generates
SIGBUS if an access is attempted beyond the max file size?

>   /* offset overflow? */
> - if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
> -   return -EOVERFLOW;
> + if (off + len < off)
> + return -EOVERFLOW;

Well, this treats sizeof(off_t) as the "offset maximum established in
the open file".  But from my reading of the above excerpt, we should in
fact be checking against the underlying fs's s_maxbytes?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-09-04 Thread Tejun Heo
Hello,

On Tue, Sep 04, 2012 at 12:42:37PM -0700, Kent Overstreet wrote:
> You want to point me at the relevant workqueue code? I'd really like to
> see what you did there, it's entirely possible you're aware of some
> issue I'm not but if not I'd like to take a stab at it.

I was mistaken.  The issue was that adding @gfp_flags to
kthread_create() wasn't trivial involving updates to arch callbacks,
so the timer was added to side-step the issue.  So, yeah, if it can be
made to work w/o timer, I think that would be better.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   >