[PATCH 00/10 V2] workqueue: async worker destruction and worker attaching/detaching

2014-05-11 Thread Lai Jiangshan
Patch1-4: async worker destruction

Patch2 reduces the review burden. It will be easier to review the whole
patchset if we know destroy_worker() is forced to destroy idle workers only.

Patch5-10: worker attaching/detaching and simplify the workers management

The code which attaches a worker to the pool and detaches a worker from the pool
is unfolded in create_worker()/destroy_worker().
The patchset moves this attaching/detaching code out and wraps them.

patch3-4 moves the detaching code out from destroy_worker(), and make
manger_mutex only protects the detaching code only rather than
protects the whole worker-destruction path.

patch5-7 makes manger_mutex only protects the attaching code rather than the
whole worker-creation path.

patch8: rename manger_mutex to attach_mutex
patch9-10: moves the attaching code out from create_worker() and use it for
rescuer.


Lai Jiangshan (10):
  workqueue: use manager lock only to protect worker_idr
  workqueue: destroy_worker() should destroy idle workers only
  workqueue: async worker destruction
  workqueue: destroy worker directly in the idle timeout handler
  workqueue: separate iteration role from worker_idr
  workqueue: convert worker_idr to worker_ida
  workqueue: narrow the protection range of manager_mutex
  workqueue: rename manager_mutex to attach_mutex
  workqueue: separate pool-attaching code out from create_worker()
  workqueue: use generic attach/detach routine for rescuers

 kernel/workqueue.c  |  401 ++-
 kernel/workqueue_internal.h |1 +
 2 files changed, 126 insertions(+), 276 deletions(-)

-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/10 V2] workqueue: async worker destruction

2014-05-11 Thread Lai Jiangshan
worker destruction includes these parts of code:
adjust pool's stats
remove the worker from idle list
detach the worker from the pool
kthread_stop() to wait for the worker's task exit
free the worker struct

We can find out that there is no essential work to do after
kthread_stop(). Which means destroy_worker() doesn't need
to wait for the worker's task exit. So we can remove kthread_stop()
and free the worker struct in the worker exiting path.

But put_unbound_pool() still needs to sync the all the workers'
destruction before to destroy the pool. Otherwise the workers
may access to the invalid pool when they are exiting.

So we also move the code of "detach the worker" to the exiting
path and let put_unbound_pool() to sync with this code via
detach_completion.

The code of "detach the worker" is wrapped in a new function
"worker_detach_from_pool()".

And the worker id is freed when detaching which happens
before the worker is fully dead. But this id of the dying worker
may be re-used for a new worker. So the dying worker's task name
is changed to "worker_dying" to avoid two or several workers
having the same name.

Since "detach the worker" is moved out from destroy_worker(),
destroy_worker() doesn't require manager_mutex.
So the "lockdep_assert_held(&pool->manager_mutex)" in destroy_worker()
is removed, and destroy_worker() is not protected by manager_mutex
in put_unbound_pool().

Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c |   65 +--
 1 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 752e109..465e751 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -163,6 +163,7 @@ struct worker_pool {
struct mutexmanager_arb;/* manager arbitration */
struct mutexmanager_mutex;  /* manager exclusion */
struct idr  worker_idr; /* M: worker IDs and iteration 
*/
+   struct completion   *detach_completion; /* all workers detached */
 
struct workqueue_attrs  *attrs; /* I: worker attributes */
struct hlist_node   hash_node;  /* PL: unbound_pool_hash node */
@@ -1688,6 +1689,30 @@ static struct worker *alloc_worker(void)
 }
 
 /**
+ * worker_detach_from_pool() - detach the worker from the pool
+ * @worker: worker which is attached to its pool
+ * @pool: attached pool
+ *
+ * Undo the attaching which had been done in create_worker().
+ * The caller worker shouldn't access to the pool after detached
+ * except it has other reference to the pool.
+ */
+static void worker_detach_from_pool(struct worker *worker,
+   struct worker_pool *pool)
+{
+   struct completion *detach_completion = NULL;
+
+   mutex_lock(&pool->manager_mutex);
+   idr_remove(&pool->worker_idr, worker->id);
+   if (idr_is_empty(&pool->worker_idr))
+   detach_completion = pool->detach_completion;
+   mutex_unlock(&pool->manager_mutex);
+
+   if (detach_completion)
+   complete(detach_completion);
+}
+
+/**
  * create_worker - create a new workqueue worker
  * @pool: pool the new worker will belong to
  *
@@ -1815,13 +1840,12 @@ static int create_and_start_worker(struct worker_pool 
*pool)
  * The worker should be idle.
  *
  * CONTEXT:
- * spin_lock_irq(pool->lock) which is released and regrabbed.
+ * spin_lock_irq(pool->lock).
  */
 static void destroy_worker(struct worker *worker)
 {
struct worker_pool *pool = worker->pool;
 
-   lockdep_assert_held(&pool->manager_mutex);
lockdep_assert_held(&pool->lock);
 
/* sanity check frenzy */
@@ -1833,24 +1857,9 @@ static void destroy_worker(struct worker *worker)
pool->nr_workers--;
pool->nr_idle--;
 
-   /*
-* Once WORKER_DIE is set, the kworker may destroy itself at any
-* point.  Pin to ensure the task stays until we're done with it.
-*/
-   get_task_struct(worker->task);
-
list_del_init(&worker->entry);
worker->flags |= WORKER_DIE;
-
-   idr_remove(&pool->worker_idr, worker->id);
-
-   spin_unlock_irq(&pool->lock);
-
-   kthread_stop(worker->task);
-   put_task_struct(worker->task);
-   kfree(worker);
-
-   spin_lock_irq(&pool->lock);
+   wake_up_process(worker->task);
 }
 
 static void idle_worker_timeout(unsigned long __pool)
@@ -2289,6 +2298,10 @@ woke_up:
spin_unlock_irq(&pool->lock);
WARN_ON_ONCE(!list_empty(&worker->entry));
worker->task->flags &= ~PF_WQ_WORKER;
+
+   set_task_comm(worker->task, "kworker_dying");
+   worker_detach_from_pool(worker, pool);
+   kfree(worker);
return 0;
}
 
@@ -3561,6 +3574,7 @@ static void rcu_free_pool(struct rcu_head *rcu)
 static void put_unbound_pool(struct worker_pool *pool)
 {
struct wor

Re: [BUG] sched_setattr() SCHED_DEADLINE hangs system

2014-05-11 Thread Michael Kerrisk (man-pages)
On 05/11/2014 04:54 PM, Michael Kerrisk (man-pages) wrote:
> [Dave: I wonder if there's anything trinity can add in the way of 
> a test here?]
> 
> Hi Peter,
> 
> This looks like another bug in sched_setattr(). Using the program
> below (which you might find generally helpful for testing), I'm 
> able to reliably freeze up my x64 (Intel Core i7-3520M Processor) 
> system for up to about a minute when I run with the following 
> command line:
> 
> $ time sudo ./t_sched_setattr d 18446744072 18446744072 18446744073
> 
> 'd' here means use SCHED_DEADLINE, then the remaining arguments
> are the Runtime, Deadline, and Period, expressed in *seconds*.
> (Those number by the way are just a little below 2^64.)
> 
> Aside from interpreting its command-line arguments, all that the 
> program does is call sched_setattr() and displays elapsed times.
> (By the way, on my system I see some weird effects for time(2), 
> presumably VDSO effects.)
> 
> Here's sample run:
> 
> time sudo ./t_sched_setattr d 18446744072 18446744072 18446744073
> Runtime  =  184467440720
> Deadline =  184467440720
> Period   =  184467440730
> About to call sched_setattr()
> Successful return from sched_setattr() [6 seconds]
> 
> real  0m40.421s
> user  0m3.097s
> sys   0m30.804s
> 
> After unfreezing the machine is fine, while the program is running,
> the machine is pretty unresponsive.
> 
> I'm on kernel 3.15-rc4.

Hi Peter,

I realize my speculation was completely off the mark. time(2) really 
is reporting the truth, and the sched_setattr() call returns immediately.
But it looks like with these settings the deadline scheduler gets itself
into a confused state. The process chews up a vast amount of CPU time
for the few actions (including process teardown) that occur after
the sched_setattr() call, and since the SCHED_DEADLINE process has
priority over everything else, the system locks up.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/10 V2] workqueue: separate pool-attaching code out from create_worker()

2014-05-11 Thread Lai Jiangshan
The code of attaching is unfolded in create_worker().
Separating this code out will make the codes more clear.

Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c |   54 ---
 1 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e6d9725..0ea0152 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -66,7 +66,7 @@ enum {
 *
 * Note that DISASSOCIATED should be flipped only while holding
 * attach_mutex to avoid changing binding state while
-* create_worker() is in progress.
+* worker_attach_to_pool() is in progress.
 */
POOL_DISASSOCIATED  = 1 << 2,   /* cpu can't serve workers */
POOL_FREEZING   = 1 << 3,   /* freeze in progress */
@@ -1668,6 +1668,38 @@ static struct worker *alloc_worker(void)
 }
 
 /**
+ * worker_attach_to_pool() - attach the worker to the pool
+ * @worker: worker to be attached
+ * @pool: the target pool
+ *
+ * attach the worker to the pool, thus the concurrency and cpu-binding of
+ * the worker are kept coordination with the pool across cpu-[un]hotplug.
+ */
+static void worker_attach_to_pool(struct worker *worker,
+  struct worker_pool *pool)
+{
+   mutex_lock(&pool->attach_mutex);
+
+   /*
+* set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
+* online CPUs.  It'll be re-applied when any of the CPUs come up.
+*/
+   set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+
+   /*
+* The pool->attach_mutex ensures %POOL_DISASSOCIATED remains
+* stable across this function.  See the comments above the
+* flag definition for details.
+*/
+   if (pool->flags & POOL_DISASSOCIATED)
+   worker->flags |= WORKER_UNBOUND;
+
+   list_add_tail(&worker->node, &pool->workers);
+
+   mutex_unlock(&pool->attach_mutex);
+}
+
+/**
  * worker_detach_from_pool() - detach the worker from the pool
  * @worker: worker which is attached to its pool
  * @pool: attached pool
@@ -1738,26 +1770,8 @@ static struct worker *create_worker(struct worker_pool 
*pool)
/* prevent userland from meddling with cpumask of workqueue workers */
worker->task->flags |= PF_NO_SETAFFINITY;
 
-   mutex_lock(&pool->attach_mutex);
-
-   /*
-* set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
-* online CPUs.  It'll be re-applied when any of the CPUs come up.
-*/
-   set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
-
-   /*
-* The pool->attach_mutex ensures %POOL_DISASSOCIATED
-* remains stable across this function.  See the comments above the
-* flag definition for details.
-*/
-   if (pool->flags & POOL_DISASSOCIATED)
-   worker->flags |= WORKER_UNBOUND;
-
/* successful, attach the worker to the pool */
-   list_add_tail(&worker->node, &pool->workers);
-
-   mutex_unlock(&pool->attach_mutex);
+   worker_attach_to_pool(worker, pool);
 
return worker;
 
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/10 V2] workqueue: convert worker_idr to worker_ida

2014-05-11 Thread Lai Jiangshan
We don't need to iterate workers via worker_idr, worker_idr is
used for allocating/freeing ID only, so we convert it to worker_ida.

By using ida_simple_get/remove(), worker_ida can be protected by itself,
so we don't need manager_mutex to protect it and remove the coupling,
and we can move the ID-removal out from worker_detach_from_pool().

Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c |   20 
 1 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b6cf4d9..9f7f4ef 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -161,10 +161,11 @@ struct worker_pool {
/* see manage_workers() for details on the two manager mutexes */
struct mutexmanager_arb;/* manager arbitration */
struct mutexmanager_mutex;  /* manager exclusion */
-   struct idr  worker_idr; /* M: worker IDs */
struct list_headworkers;/* M: attached workers */
struct completion   *detach_completion; /* all workers detached */
 
+   struct ida  worker_ida; /* worker IDs for task name */
+
struct workqueue_attrs  *attrs; /* I: worker attributes */
struct hlist_node   hash_node;  /* PL: unbound_pool_hash node */
int refcnt; /* PL: refcnt for unbound pools 
*/
@@ -1681,7 +1682,6 @@ static void worker_detach_from_pool(struct worker *worker,
struct completion *detach_completion = NULL;
 
mutex_lock(&pool->manager_mutex);
-   idr_remove(&pool->worker_idr, worker->id);
list_del(&worker->node);
if (list_empty(&pool->workers))
detach_completion = pool->detach_completion;
@@ -1712,11 +1712,8 @@ static struct worker *create_worker(struct worker_pool 
*pool)
 
lockdep_assert_held(&pool->manager_mutex);
 
-   /*
-* ID is needed to determine kthread name.  Allocate ID first
-* without installing the pointer.
-*/
-   id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_KERNEL);
+   /* ID is needed to determine kthread name. */
+   id = ida_simple_get(&pool->worker_ida, 0, 0, GFP_KERNEL);
if (id < 0)
goto fail;
 
@@ -1757,8 +1754,6 @@ static struct worker *create_worker(struct worker_pool 
*pool)
if (pool->flags & POOL_DISASSOCIATED)
worker->flags |= WORKER_UNBOUND;
 
-   /* successful, commit the pointer to idr */
-   idr_replace(&pool->worker_idr, worker, worker->id);
/* successful, attach the worker to the pool */
list_add_tail(&worker->node, &pool->workers);
 
@@ -1766,7 +1761,7 @@ static struct worker *create_worker(struct worker_pool 
*pool)
 
 fail:
if (id >= 0)
-   idr_remove(&pool->worker_idr, id);
+   ida_simple_remove(&pool->worker_ida, id);
kfree(worker);
return NULL;
 }
@@ -2233,6 +2228,7 @@ woke_up:
worker->task->flags &= ~PF_WQ_WORKER;
 
set_task_comm(worker->task, "kworker_dying");
+   ida_simple_remove(&pool->worker_ida, worker->id);
worker_detach_from_pool(worker, pool);
kfree(worker);
return 0;
@@ -3469,9 +3465,9 @@ static int init_worker_pool(struct worker_pool *pool)
 
mutex_init(&pool->manager_arb);
mutex_init(&pool->manager_mutex);
-   idr_init(&pool->worker_idr);
INIT_LIST_HEAD(&pool->workers);
 
+   ida_init(&pool->worker_ida);
INIT_HLIST_NODE(&pool->hash_node);
pool->refcnt = 1;
 
@@ -3486,7 +3482,7 @@ static void rcu_free_pool(struct rcu_head *rcu)
 {
struct worker_pool *pool = container_of(rcu, struct worker_pool, rcu);
 
-   idr_destroy(&pool->worker_idr);
+   ida_destroy(&pool->worker_ida);
free_workqueue_attrs(pool->attrs);
kfree(pool);
 }
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/10 V2] workqueue: narrow the protection range of manager_mutex

2014-05-11 Thread Lai Jiangshan
In create_worker(), pool->worker_ida is protected by idr subsystem via
using ida_simple_get()/ida_simple_put(), it doesn't need manager_mutex

struct worker allocation and kthread allocation are not visible by any one,
before attached, they don't need manager_mutex either.

The above operations are before the attaching operation which attach
the worker to the pool. And between attached and starting the worker,
the worker is already attached to the pool, the cpuhotplug will handle the
cpu-binding for the worker correctly since it is attached to the pool.
So we don't need the manager_mutex after attached.

The conclusion is that only the attaching operation needs manager_mutex,
so we narrow the protection section of manager_mutex in create_worker().

Some comments about manager_mutex are removed, due to we will rename it to
attach_mutex and add worker_attach_to_pool() later which is self-comments.

Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c |   35 +--
 1 files changed, 5 insertions(+), 30 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9f7f4ef..663de70 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1710,8 +1710,6 @@ static struct worker *create_worker(struct worker_pool 
*pool)
int id = -1;
char id_buf[16];
 
-   lockdep_assert_held(&pool->manager_mutex);
-
/* ID is needed to determine kthread name. */
id = ida_simple_get(&pool->worker_ida, 0, 0, GFP_KERNEL);
if (id < 0)
@@ -1740,6 +1738,8 @@ static struct worker *create_worker(struct worker_pool 
*pool)
/* prevent userland from meddling with cpumask of workqueue workers */
worker->task->flags |= PF_NO_SETAFFINITY;
 
+   mutex_lock(&pool->manager_mutex);
+
/*
 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
 * online CPUs.  It'll be re-applied when any of the CPUs come up.
@@ -1747,7 +1747,7 @@ static struct worker *create_worker(struct worker_pool 
*pool)
set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
 
/*
-* The caller is responsible for ensuring %POOL_DISASSOCIATED
+* The pool->manager_mutex ensures %POOL_DISASSOCIATED
 * remains stable across this function.  See the comments above the
 * flag definition for details.
 */
@@ -1757,6 +1757,8 @@ static struct worker *create_worker(struct worker_pool 
*pool)
/* successful, attach the worker to the pool */
list_add_tail(&worker->node, &pool->workers);
 
+   mutex_unlock(&pool->manager_mutex);
+
return worker;
 
 fail:
@@ -1794,8 +1796,6 @@ static int create_and_start_worker(struct worker_pool 
*pool)
 {
struct worker *worker;
 
-   mutex_lock(&pool->manager_mutex);
-
worker = create_worker(pool);
if (worker) {
spin_lock_irq(&pool->lock);
@@ -1803,8 +1803,6 @@ static int create_and_start_worker(struct worker_pool 
*pool)
spin_unlock_irq(&pool->lock);
}
 
-   mutex_unlock(&pool->manager_mutex);
-
return worker ? 0 : -ENOMEM;
 }
 
@@ -2002,8 +2000,6 @@ static bool manage_workers(struct worker *worker)
bool ret = false;
 
/*
-* Managership is governed by two mutexes - manager_arb and
-* manager_mutex.  manager_arb handles arbitration of manager role.
 * Anyone who successfully grabs manager_arb wins the arbitration
 * and becomes the manager.  mutex_trylock() on pool->manager_arb
 * failure while holding pool->lock reliably indicates that someone
@@ -2012,33 +2008,12 @@ static bool manage_workers(struct worker *worker)
 * grabbing manager_arb is responsible for actually performing
 * manager duties.  If manager_arb is grabbed and released without
 * actual management, the pool may stall indefinitely.
-*
-* manager_mutex is used for exclusion of actual management
-* operations.  The holder of manager_mutex can be sure that none
-* of management operations, including creation and destruction of
-* workers, won't take place until the mutex is released.  Because
-* manager_mutex doesn't interfere with manager role arbitration,
-* it is guaranteed that the pool's management, while may be
-* delayed, won't be disturbed by someone else grabbing
-* manager_mutex.
 */
if (!mutex_trylock(&pool->manager_arb))
return ret;
 
-   /*
-* With manager arbitration won, manager_mutex would be free in
-* most cases.  trylock first without dropping @pool->lock.
-*/
-   if (unlikely(!mutex_trylock(&pool->manager_mutex))) {
-   spin_unlock_irq(&pool->lock);
-   mutex_lock(&pool->manager_mutex);
-   spin_lock_irq(&pool->lock);
-   ret = true;
-   }
-
ret |= maybe_create_worker(pool);
 
-   mutex_unlock(&po

[PATCH 10/10 V2] workqueue: use generic attach/detach routine for rescuers

2014-05-11 Thread Lai Jiangshan
There are several problems with the code that rescuers bind itself to the pool'
cpumask
  1) It uses a way different from the normal workers to bind to the cpumask
 So we can't maintain the normal/rescuer workers under the same framework.
  2) The the code of cpu-binding for rescuer is complicated
  3) If one or more cpuhotplugs happen while the rescuer processes the
 scheduled works, the rescuer may not be correctly bound to the cpumask of
 the pool. This is allowed behavior, but is not good. It will be better
 if the cpumask of the rescuer is always kept coordination with the pool
 across any cpuhotplugs.

Using generic attach/detach routine will solve the above problems,
and result much more simple code.

Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c |   74 +--
 1 files changed, 8 insertions(+), 66 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0ea0152..099f02c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1588,70 +1588,6 @@ static void worker_leave_idle(struct worker *worker)
list_del_init(&worker->entry);
 }
 
-/**
- * worker_maybe_bind_and_lock - try to bind %current to worker_pool and lock it
- * @pool: target worker_pool
- *
- * Bind %current to the cpu of @pool if it is associated and lock @pool.
- *
- * Works which are scheduled while the cpu is online must at least be
- * scheduled to a worker which is bound to the cpu so that if they are
- * flushed from cpu callbacks while cpu is going down, they are
- * guaranteed to execute on the cpu.
- *
- * This function is to be used by unbound workers and rescuers to bind
- * themselves to the target cpu and may race with cpu going down or
- * coming online.  kthread_bind() can't be used because it may put the
- * worker to already dead cpu and set_cpus_allowed_ptr() can't be used
- * verbatim as it's best effort and blocking and pool may be
- * [dis]associated in the meantime.
- *
- * This function tries set_cpus_allowed() and locks pool and verifies the
- * binding against %POOL_DISASSOCIATED which is set during
- * %CPU_DOWN_PREPARE and cleared during %CPU_ONLINE, so if the worker
- * enters idle state or fetches works without dropping lock, it can
- * guarantee the scheduling requirement described in the first paragraph.
- *
- * CONTEXT:
- * Might sleep.  Called without any lock but returns with pool->lock
- * held.
- *
- * Return:
- * %true if the associated pool is online (@worker is successfully
- * bound), %false if offline.
- */
-static bool worker_maybe_bind_and_lock(struct worker_pool *pool)
-__acquires(&pool->lock)
-{
-   while (true) {
-   /*
-* The following call may fail, succeed or succeed
-* without actually migrating the task to the cpu if
-* it races with cpu hotunplug operation.  Verify
-* against POOL_DISASSOCIATED.
-*/
-   if (!(pool->flags & POOL_DISASSOCIATED))
-   set_cpus_allowed_ptr(current, pool->attrs->cpumask);
-
-   spin_lock_irq(&pool->lock);
-   if (pool->flags & POOL_DISASSOCIATED)
-   return false;
-   if (task_cpu(current) == pool->cpu &&
-   cpumask_equal(¤t->cpus_allowed, pool->attrs->cpumask))
-   return true;
-   spin_unlock_irq(&pool->lock);
-
-   /*
-* We've raced with CPU hot[un]plug.  Give it a breather
-* and retry migration.  cond_resched() is required here;
-* otherwise, we might deadlock against cpu_stop trying to
-* bring down the CPU on non-preemptive kernel.
-*/
-   cpu_relax();
-   cond_resched();
-   }
-}
-
 static struct worker *alloc_worker(void)
 {
struct worker *worker;
@@ -2343,8 +2279,9 @@ repeat:
 
spin_unlock_irq(&wq_mayday_lock);
 
-   /* migrate to the target cpu if possible */
-   worker_maybe_bind_and_lock(pool);
+   worker_attach_to_pool(rescuer, pool);
+
+   spin_lock_irq(&pool->lock);
rescuer->pool = pool;
 
/*
@@ -2357,6 +2294,11 @@ repeat:
move_linked_works(work, scheduled, &n);
 
process_scheduled_works(rescuer);
+   spin_unlock_irq(&pool->lock);
+
+   worker_detach_from_pool(rescuer, pool);
+
+   spin_lock_irq(&pool->lock);
 
/*
 * Put the reference grabbed by send_mayday().  @pool won't
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/10 V2] workqueue: separate iteration role from worker_idr

2014-05-11 Thread Lai Jiangshan
worker_idr has the iteration(iterating for attached workers) and worker ID
duties. These two duties are not necessary tied together. We can separate
them and use a list for tracking attached workers and iteration

After separation, we can add the rescuer workers to the list for iteration
in future. worker_idr can't add rescuer workers due to rescuer workers
can't allocate id from worker_idr.

Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c  |   39 +--
 kernel/workqueue_internal.h |1 +
 2 files changed, 14 insertions(+), 26 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 95695c3..b6cf4d9 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -161,7 +161,8 @@ struct worker_pool {
/* see manage_workers() for details on the two manager mutexes */
struct mutexmanager_arb;/* manager arbitration */
struct mutexmanager_mutex;  /* manager exclusion */
-   struct idr  worker_idr; /* M: worker IDs and iteration 
*/
+   struct idr  worker_idr; /* M: worker IDs */
+   struct list_headworkers;/* M: attached workers */
struct completion   *detach_completion; /* all workers detached */
 
struct workqueue_attrs  *attrs; /* I: worker attributes */
@@ -361,22 +362,6 @@ static void copy_workqueue_attrs(struct workqueue_attrs 
*to,
else
 
 /**
- * for_each_pool_worker - iterate through all workers of a worker_pool
- * @worker: iteration cursor
- * @wi: integer used for iteration
- * @pool: worker_pool to iterate workers of
- *
- * This must be called with @pool->manager_mutex.
- *
- * The if/else clause exists only for the lockdep assertion and can be
- * ignored.
- */
-#define for_each_pool_worker(worker, wi, pool) \
-   idr_for_each_entry(&(pool)->worker_idr, (worker), (wi)) \
-   if (({ lockdep_assert_held(&pool->manager_mutex); false; })) { 
} \
-   else
-
-/**
  * for_each_pwq - iterate through all pool_workqueues of the specified 
workqueue
  * @pwq: iteration cursor
  * @wq: the target workqueue
@@ -1674,6 +1659,7 @@ static struct worker *alloc_worker(void)
if (worker) {
INIT_LIST_HEAD(&worker->entry);
INIT_LIST_HEAD(&worker->scheduled);
+   INIT_LIST_HEAD(&worker->node);
/* on creation a worker is in !idle && prep state */
worker->flags = WORKER_PREP;
}
@@ -1696,7 +1682,8 @@ static void worker_detach_from_pool(struct worker *worker,
 
mutex_lock(&pool->manager_mutex);
idr_remove(&pool->worker_idr, worker->id);
-   if (idr_is_empty(&pool->worker_idr))
+   list_del(&worker->node);
+   if (list_empty(&pool->workers))
detach_completion = pool->detach_completion;
mutex_unlock(&pool->manager_mutex);
 
@@ -1772,6 +1759,8 @@ static struct worker *create_worker(struct worker_pool 
*pool)
 
/* successful, commit the pointer to idr */
idr_replace(&pool->worker_idr, worker, worker->id);
+   /* successful, attach the worker to the pool */
+   list_add_tail(&worker->node, &pool->workers);
 
return worker;
 
@@ -3481,6 +3470,7 @@ static int init_worker_pool(struct worker_pool *pool)
mutex_init(&pool->manager_arb);
mutex_init(&pool->manager_mutex);
idr_init(&pool->worker_idr);
+   INIT_LIST_HEAD(&pool->workers);
 
INIT_HLIST_NODE(&pool->hash_node);
pool->refcnt = 1;
@@ -3545,7 +3535,7 @@ static void put_unbound_pool(struct worker_pool *pool)
spin_unlock_irq(&pool->lock);
 
mutex_lock(&pool->manager_mutex);
-   if (!idr_is_empty(&pool->worker_idr))
+   if (!list_empty(&pool->workers))
pool->detach_completion = &detach_completion;
mutex_unlock(&pool->manager_mutex);
 
@@ -4530,7 +4520,6 @@ static void wq_unbind_fn(struct work_struct *work)
int cpu = smp_processor_id();
struct worker_pool *pool;
struct worker *worker;
-   int wi;
 
for_each_cpu_worker_pool(pool, cpu) {
WARN_ON_ONCE(cpu != smp_processor_id());
@@ -4545,7 +4534,7 @@ static void wq_unbind_fn(struct work_struct *work)
 * before the last CPU down must be on the cpu.  After
 * this, they may become diasporas.
 */
-   for_each_pool_worker(worker, wi, pool)
+   list_for_each_entry(worker, &pool->workers, node)
worker->flags |= WORKER_UNBOUND;
 
pool->flags |= POOL_DISASSOCIATED;
@@ -4591,7 +4580,6 @@ static void wq_unbind_fn(struct work_struct *work)
 static void rebind_workers(struct worker_pool *pool)
 {
struct worker *worker;
-   int wi;
 
lockdep_assert_held(&pool->manager_mutex);
 
@@ -4602,13 +4590,13 @@ static void rebind_workers(stru

[PATCH 08/10 V2] workqueue: rename manager_mutex to attach_mutex

2014-05-11 Thread Lai Jiangshan
manager_mutex is only used to protect the attaching for the pool
and the pool->workers list. It protects the pool->workers and operations
based on this list, such as:
cpu-binding for the workers in the pool->workers
concurrency management for the workers in the pool->workers

So we can simply rename manager_mutex to attach_mutex without any
functionality changed.

Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c  |   38 +++---
 kernel/workqueue_internal.h |2 +-
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 663de70..e6d9725 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -65,7 +65,7 @@ enum {
 * be executing on any CPU.  The pool behaves as an unbound one.
 *
 * Note that DISASSOCIATED should be flipped only while holding
-* manager_mutex to avoid changing binding state while
+* attach_mutex to avoid changing binding state while
 * create_worker() is in progress.
 */
POOL_DISASSOCIATED  = 1 << 2,   /* cpu can't serve workers */
@@ -122,7 +122,7 @@ enum {
  *cpu or grabbing pool->lock is enough for read access.  If
  *POOL_DISASSOCIATED is set, it's identical to L.
  *
- * M: pool->manager_mutex protected.
+ * A: pool->attach_mutex protected.
  *
  * PL: wq_pool_mutex protected.
  *
@@ -160,8 +160,8 @@ struct worker_pool {
 
/* see manage_workers() for details on the two manager mutexes */
struct mutexmanager_arb;/* manager arbitration */
-   struct mutexmanager_mutex;  /* manager exclusion */
-   struct list_headworkers;/* M: attached workers */
+   struct mutexattach_mutex;   /* attach/detach exclusion */
+   struct list_headworkers;/* A: attached workers */
struct completion   *detach_completion; /* all workers detached */
 
struct ida  worker_ida; /* worker IDs for task name */
@@ -1681,11 +1681,11 @@ static void worker_detach_from_pool(struct worker 
*worker,
 {
struct completion *detach_completion = NULL;
 
-   mutex_lock(&pool->manager_mutex);
+   mutex_lock(&pool->attach_mutex);
list_del(&worker->node);
if (list_empty(&pool->workers))
detach_completion = pool->detach_completion;
-   mutex_unlock(&pool->manager_mutex);
+   mutex_unlock(&pool->attach_mutex);
 
if (detach_completion)
complete(detach_completion);
@@ -1738,7 +1738,7 @@ static struct worker *create_worker(struct worker_pool 
*pool)
/* prevent userland from meddling with cpumask of workqueue workers */
worker->task->flags |= PF_NO_SETAFFINITY;
 
-   mutex_lock(&pool->manager_mutex);
+   mutex_lock(&pool->attach_mutex);
 
/*
 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
@@ -1747,7 +1747,7 @@ static struct worker *create_worker(struct worker_pool 
*pool)
set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
 
/*
-* The pool->manager_mutex ensures %POOL_DISASSOCIATED
+* The pool->attach_mutex ensures %POOL_DISASSOCIATED
 * remains stable across this function.  See the comments above the
 * flag definition for details.
 */
@@ -1757,7 +1757,7 @@ static struct worker *create_worker(struct worker_pool 
*pool)
/* successful, attach the worker to the pool */
list_add_tail(&worker->node, &pool->workers);
 
-   mutex_unlock(&pool->manager_mutex);
+   mutex_unlock(&pool->attach_mutex);
 
return worker;
 
@@ -3439,7 +3439,7 @@ static int init_worker_pool(struct worker_pool *pool)
(unsigned long)pool);
 
mutex_init(&pool->manager_arb);
-   mutex_init(&pool->manager_mutex);
+   mutex_init(&pool->attach_mutex);
INIT_LIST_HEAD(&pool->workers);
 
ida_init(&pool->worker_ida);
@@ -3505,10 +3505,10 @@ static void put_unbound_pool(struct worker_pool *pool)
WARN_ON(pool->nr_workers || pool->nr_idle);
spin_unlock_irq(&pool->lock);
 
-   mutex_lock(&pool->manager_mutex);
+   mutex_lock(&pool->attach_mutex);
if (!list_empty(&pool->workers))
pool->detach_completion = &detach_completion;
-   mutex_unlock(&pool->manager_mutex);
+   mutex_unlock(&pool->attach_mutex);
 
if (pool->detach_completion)
wait_for_completion(pool->detach_completion);
@@ -4495,11 +4495,11 @@ static void wq_unbind_fn(struct work_struct *work)
for_each_cpu_worker_pool(pool, cpu) {
WARN_ON_ONCE(cpu != smp_processor_id());
 
-   mutex_lock(&pool->manager_mutex);
+   mutex_lock(&pool->attach_mutex);
spin_lock_irq(&pool->lock);
 
/*
-* We've blocked all manager operations.  Make all workers
+

[PATCH 04/10 V2] workqueue: destroy worker directly in the idle timeout handler

2014-05-11 Thread Lai Jiangshan
Since destroy_worker() doesn't need to sleep nor require manager_mutex,
destroy_worker() can be directly called in the idle timeout
handler, it helps us remove POOL_MANAGE_WORKERS and
maybe_destroy_worker() and simplify the manage_workers()

After POOL_MANAGE_WORKERS is removed, worker_thread() doesn't
need to test whether it needs to manage after processed works.
So we can remove the test branch.

Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c |   69 
 1 files changed, 5 insertions(+), 64 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 465e751..95695c3 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -68,7 +68,6 @@ enum {
 * manager_mutex to avoid changing binding state while
 * create_worker() is in progress.
 */
-   POOL_MANAGE_WORKERS = 1 << 0,   /* need to manage workers */
POOL_DISASSOCIATED  = 1 << 2,   /* cpu can't serve workers */
POOL_FREEZING   = 1 << 3,   /* freeze in progress */
 
@@ -752,13 +751,6 @@ static bool need_to_create_worker(struct worker_pool *pool)
return need_more_worker(pool) && !may_start_working(pool);
 }
 
-/* Do I need to be the manager? */
-static bool need_to_manage_workers(struct worker_pool *pool)
-{
-   return need_to_create_worker(pool) ||
-   (pool->flags & POOL_MANAGE_WORKERS);
-}
-
 /* Do we have too many workers and should some go away? */
 static bool too_many_workers(struct worker_pool *pool)
 {
@@ -1867,8 +1859,7 @@ static void idle_worker_timeout(unsigned long __pool)
struct worker_pool *pool = (void *)__pool;
 
spin_lock_irq(&pool->lock);
-
-   if (too_many_workers(pool)) {
+   while (too_many_workers(pool)) {
struct worker *worker;
unsigned long expires;
 
@@ -1876,15 +1867,13 @@ static void idle_worker_timeout(unsigned long __pool)
worker = list_entry(pool->idle_list.prev, struct worker, entry);
expires = worker->last_active + IDLE_WORKER_TIMEOUT;
 
-   if (time_before(jiffies, expires))
+   if (time_before(jiffies, expires)) {
mod_timer(&pool->idle_timer, expires);
-   else {
-   /* it's been idle for too long, wake up manager */
-   pool->flags |= POOL_MANAGE_WORKERS;
-   wake_up_worker(pool);
+   break;
}
-   }
 
+   destroy_worker(worker);
+   }
spin_unlock_irq(&pool->lock);
 }
 
@@ -2001,44 +1990,6 @@ restart:
 }
 
 /**
- * maybe_destroy_worker - destroy workers which have been idle for a while
- * @pool: pool to destroy workers for
- *
- * Destroy @pool workers which have been idle for longer than
- * IDLE_WORKER_TIMEOUT.
- *
- * LOCKING:
- * spin_lock_irq(pool->lock) which may be released and regrabbed
- * multiple times.  Called only from manager.
- *
- * Return:
- * %false if no action was taken and pool->lock stayed locked, %true
- * otherwise.
- */
-static bool maybe_destroy_workers(struct worker_pool *pool)
-{
-   bool ret = false;
-
-   while (too_many_workers(pool)) {
-   struct worker *worker;
-   unsigned long expires;
-
-   worker = list_entry(pool->idle_list.prev, struct worker, entry);
-   expires = worker->last_active + IDLE_WORKER_TIMEOUT;
-
-   if (time_before(jiffies, expires)) {
-   mod_timer(&pool->idle_timer, expires);
-   break;
-   }
-
-   destroy_worker(worker);
-   ret = true;
-   }
-
-   return ret;
-}
-
-/**
  * manage_workers - manage worker pool
  * @worker: self
  *
@@ -2101,13 +2052,6 @@ static bool manage_workers(struct worker *worker)
ret = true;
}
 
-   pool->flags &= ~POOL_MANAGE_WORKERS;
-
-   /*
-* Destroy and then create so that may_start_working() is true
-* on return.
-*/
-   ret |= maybe_destroy_workers(pool);
ret |= maybe_create_worker(pool);
 
mutex_unlock(&pool->manager_mutex);
@@ -2349,9 +2293,6 @@ recheck:
 
worker_set_flags(worker, WORKER_PREP, false);
 sleep:
-   if (unlikely(need_to_manage_workers(pool)) && manage_workers(worker))
-   goto recheck;
-
/*
 * pool->lock is held and there's no work to process and no need to
 * manage, sleep.  Workers are woken up only while holding
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/10 V2] workqueue: destroy_worker() should destroy idle workers only

2014-05-11 Thread Lai Jiangshan
We used to have the CPU online failure path where a worker is created
and then destroyed without being started. A worker was created for
the CPU coming online and if the online operation failed the created worker
was shut down without being started.  But this behavior was changed.
The first worker is created and started at the same time for the CPU coming
online.

It means that we had already ensured in the code that destroy_worker()
destroys idle workers only. And we don't want to allow it destroys any
non-idle worker future. Otherwise, it may be buggy and it may be
extremely hard to check. We should force destroy_worker()
to destroy idle workers only explicitly.

Since destroy_worker() destroys idle only, this patch does not change any
functionality. We just need to update the comments and the sanity check code.

In the sanity check code, we will refuse to destroy the worker
if !(worker->flags & WORKER_IDLE).

If the worker entered idle which means it is already started,
so we remove the check of "worker->flags & WORKER_STARTED",
after this removal, WORKER_STARTED is totally unneeded,
so we remove WORKER_STARTED too.

In the comments for create_worker(), "Create a new worker which is bound..."
is changed to "... which is attached..." due to we change the name of this
behavior to attaching.

Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c |   17 +++--
 1 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index d38d07c..752e109 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -73,7 +73,6 @@ enum {
POOL_FREEZING   = 1 << 3,   /* freeze in progress */
 
/* worker flags */
-   WORKER_STARTED  = 1 << 0,   /* started */
WORKER_DIE  = 1 << 1,   /* die die die */
WORKER_IDLE = 1 << 2,   /* is idle */
WORKER_PREP = 1 << 3,   /* preparing to run works */
@@ -1692,9 +1691,8 @@ static struct worker *alloc_worker(void)
  * create_worker - create a new workqueue worker
  * @pool: pool the new worker will belong to
  *
- * Create a new worker which is bound to @pool.  The returned worker
- * can be started by calling start_worker() or destroyed using
- * destroy_worker().
+ * Create a new worker which is attached to @pool.
+ * The new worker must be started and enter idle via start_worker().
  *
  * CONTEXT:
  * Might sleep.  Does GFP_KERNEL allocations.
@@ -1778,7 +1776,6 @@ fail:
  */
 static void start_worker(struct worker *worker)
 {
-   worker->flags |= WORKER_STARTED;
worker->pool->nr_workers++;
worker_enter_idle(worker);
wake_up_process(worker->task);
@@ -1815,6 +1812,7 @@ static int create_and_start_worker(struct worker_pool 
*pool)
  * @worker: worker to be destroyed
  *
  * Destroy @worker and adjust @pool stats accordingly.
+ * The worker should be idle.
  *
  * CONTEXT:
  * spin_lock_irq(pool->lock) which is released and regrabbed.
@@ -1828,13 +1826,12 @@ static void destroy_worker(struct worker *worker)
 
/* sanity check frenzy */
if (WARN_ON(worker->current_work) ||
-   WARN_ON(!list_empty(&worker->scheduled)))
+   WARN_ON(!list_empty(&worker->scheduled)) ||
+   WARN_ON(!(worker->flags & WORKER_IDLE)))
return;
 
-   if (worker->flags & WORKER_STARTED)
-   pool->nr_workers--;
-   if (worker->flags & WORKER_IDLE)
-   pool->nr_idle--;
+   pool->nr_workers--;
+   pool->nr_idle--;
 
/*
 * Once WORKER_DIE is set, the kworker may destroy itself at any
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/10 V2] workqueue: use manager lock only to protect worker_idr

2014-05-11 Thread Lai Jiangshan
worker_idr is highly bound to managers and is always/only accessed in manager
lock context. So we don't need pool->lock for it.

Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c |   34 ++
 1 files changed, 6 insertions(+), 28 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c3f076f..d38d07c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -124,8 +124,7 @@ enum {
  *cpu or grabbing pool->lock is enough for read access.  If
  *POOL_DISASSOCIATED is set, it's identical to L.
  *
- * MG: pool->manager_mutex and pool->lock protected.  Writes require both
- * locks.  Reads can happen under either lock.
+ * M: pool->manager_mutex protected.
  *
  * PL: wq_pool_mutex protected.
  *
@@ -164,7 +163,7 @@ struct worker_pool {
/* see manage_workers() for details on the two manager mutexes */
struct mutexmanager_arb;/* manager arbitration */
struct mutexmanager_mutex;  /* manager exclusion */
-   struct idr  worker_idr; /* MG: worker IDs and iteration 
*/
+   struct idr  worker_idr; /* M: worker IDs and iteration 
*/
 
struct workqueue_attrs  *attrs; /* I: worker attributes */
struct hlist_node   hash_node;  /* PL: unbound_pool_hash node */
@@ -340,16 +339,6 @@ static void copy_workqueue_attrs(struct workqueue_attrs 
*to,
   lockdep_is_held(&wq->mutex), \
   "sched RCU or wq->mutex should be held")
 
-#ifdef CONFIG_LOCKDEP
-#define assert_manager_or_pool_lock(pool)  \
-   WARN_ONCE(debug_locks &&\
- !lockdep_is_held(&(pool)->manager_mutex) &&   \
- !lockdep_is_held(&(pool)->lock),  \
- "pool->manager_mutex or ->lock should be held")
-#else
-#define assert_manager_or_pool_lock(pool)  do { } while (0)
-#endif
-
 #define for_each_cpu_worker_pool(pool, cpu)\
for ((pool) = &per_cpu(cpu_worker_pools, cpu)[0];   \
 (pool) < &per_cpu(cpu_worker_pools, cpu)[NR_STD_WORKER_POOLS]; \
@@ -378,14 +367,14 @@ static void copy_workqueue_attrs(struct workqueue_attrs 
*to,
  * @wi: integer used for iteration
  * @pool: worker_pool to iterate workers of
  *
- * This must be called with either @pool->manager_mutex or ->lock held.
+ * This must be called with @pool->manager_mutex.
  *
  * The if/else clause exists only for the lockdep assertion and can be
  * ignored.
  */
 #define for_each_pool_worker(worker, wi, pool) \
idr_for_each_entry(&(pool)->worker_idr, (worker), (wi)) \
-   if (({ assert_manager_or_pool_lock((pool)); false; })) { } \
+   if (({ lockdep_assert_held(&pool->manager_mutex); false; })) { 
} \
else
 
 /**
@@ -1725,13 +1714,7 @@ static struct worker *create_worker(struct worker_pool 
*pool)
 * ID is needed to determine kthread name.  Allocate ID first
 * without installing the pointer.
 */
-   idr_preload(GFP_KERNEL);
-   spin_lock_irq(&pool->lock);
-
-   id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_NOWAIT);
-
-   spin_unlock_irq(&pool->lock);
-   idr_preload_end();
+   id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_KERNEL);
if (id < 0)
goto fail;
 
@@ -1773,18 +1756,13 @@ static struct worker *create_worker(struct worker_pool 
*pool)
worker->flags |= WORKER_UNBOUND;
 
/* successful, commit the pointer to idr */
-   spin_lock_irq(&pool->lock);
idr_replace(&pool->worker_idr, worker, worker->id);
-   spin_unlock_irq(&pool->lock);
 
return worker;
 
 fail:
-   if (id >= 0) {
-   spin_lock_irq(&pool->lock);
+   if (id >= 0)
idr_remove(&pool->worker_idr, id);
-   spin_unlock_irq(&pool->lock);
-   }
kfree(worker);
return NULL;
 }
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/3] TI CPSW Cleanup

2014-05-11 Thread Mugunthan V N
On Monday 12 May 2014 10:21 AM, George Cherian wrote:
> This series does some minimal cleanups.
>   -Conversion of pr_*() to dev_*()
>   -Convert kzalloc to devm_kzalloc.
>
> No functional changes.
>
> v1 -> v2 Address review comments.
> v2 -> v3 Remove a stale commit comment.
>
> George Cherian (3):
>   driver net: cpsw: Convert pr_*() to dev_*() calls
>   net: davinci_mdio: Convert pr_err() to dev_err() call
>   drivers: net: davinci_cpdma: Convert kzalloc() to devm_kzalloc().
>
>  drivers/net/ethernet/ti/cpsw.c  | 50 
> -
>  drivers/net/ethernet/ti/davinci_cpdma.c | 35 ---
>  drivers/net/ethernet/ti/davinci_mdio.c  |  2 +-
>  3 files changed, 38 insertions(+), 49 deletions(-)
>
Acked-by: Mugunthan V N 

Regards
Mugunthan V N
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Distinguish sched_wakeup event when wake up a task which did schedule out or not.

2014-05-11 Thread Peter Zijlstra
On Sun, May 11, 2014 at 02:52:24PM -0400, Steven Rostedt wrote:
> On Sun, 11 May 2014 18:35:31 +0200
> Peter Zijlstra  wrote:
> 
> 
> > So if the wait side has already observed cond==false, then without the
> > wakeup, which still potentially has ->on_rq == true, it would block.
> > Therefore the wakeup is a _real_ wakeup.
> > 
> > We fundamentally cannot know, on the wake side, if the wait side has or
> > has not observed cond, and therefore the distinction you're trying to
> > make is a false one.
> 
> I believe you may be misunderstanding Dongsheng. It has nothing to do
> with the wake condition. But the "success" is basically saying, "did I
> move the task on to the run queue?". That's a relevant piece of
> information that the wake up event isn't currently showing.
> 
> Let me ask you this; with Donsheng's patch, will there ever be a
> sched_switch event when the wakeup event sees 'false' and the
> sched_switch event see the task with a state other than "R"? And if so,
> how did the task doing the wakeup event, wake up that task?

But that has nothing what so fucking ever to do with 'success'. Reusing
that trace argument for something entirely different is just retarded.


pgpuCrCuhYhEh.pgp
Description: PGP signature


Re: [RFC PATCH 00/12 v2] A new CPU load metric for power-efficient scheduler: CPU ConCurrency

2014-05-11 Thread Peter Zijlstra
On Mon, May 12, 2014 at 02:16:49AM +0800, Yuyang Du wrote:

Yes, just what we need, more patches while we haven't had the time to
look at the old set yet :-(


pgpVHMyuJSFAA.pgp
Description: PGP signature


Re: [Intel-gfx] 3.15-rc5: Regression in i915 driver?

2014-05-11 Thread Chris Wilson
On Sun, May 11, 2014 at 07:40:57PM +0200, Daniel Vetter wrote:
> On Sun, May 11, 2014 at 11:02 AM, Dave Airlie  wrote:
> > On 11 May 2014 18:28, Thomas Meyer  wrote:
> >> Hi,
> >>
> >> 3.14.3 works as expected.
> >> 3.15-rc5 shows a strange behaviour: When resuming from ram the X server
> >> seems to be disfunctional.
> >>
> >> I see this WARNING in the kernel log before suspend to ram in the early
> >> boot process:
> 
> Doesn't ring a bell really.

Same symptoms as
https://bugs.freedesktop.org/show_bug.cgi?id=76554
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/20] perf tools: Use hpp formats to sort hist entries

2014-05-11 Thread Namhyung Kim
It wrapped sort entries to hpp functions, so using the hpp sort list
to sort entries.

Acked-by: Ingo Molnar 
Signed-off-by: Namhyung Kim 
---
 tools/perf/util/hist.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 7f0236cea4fe..38373c986e97 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -432,11 +432,11 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
 int64_t
 hist_entry__cmp(struct hist_entry *left, struct hist_entry *right)
 {
-   struct sort_entry *se;
+   struct perf_hpp_fmt *fmt;
int64_t cmp = 0;
 
-   list_for_each_entry(se, &hist_entry__sort_list, list) {
-   cmp = se->se_cmp(left, right);
+   perf_hpp__for_each_sort_list(fmt) {
+   cmp = fmt->cmp(left, right);
if (cmp)
break;
}
@@ -447,15 +447,11 @@ hist_entry__cmp(struct hist_entry *left, struct 
hist_entry *right)
 int64_t
 hist_entry__collapse(struct hist_entry *left, struct hist_entry *right)
 {
-   struct sort_entry *se;
+   struct perf_hpp_fmt *fmt;
int64_t cmp = 0;
 
-   list_for_each_entry(se, &hist_entry__sort_list, list) {
-   int64_t (*f)(struct hist_entry *, struct hist_entry *);
-
-   f = se->se_collapse ?: se->se_cmp;
-
-   cmp = f(left, right);
+   perf_hpp__for_each_sort_list(fmt) {
+   cmp = fmt->collapse(left, right);
if (cmp)
break;
}
-- 
1.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/20] perf tools: Add ->cmp(), ->collapse() and ->sort() to perf_hpp_fmt

2014-05-11 Thread Namhyung Kim
Those function pointers will be used to sort report output based on
the selected fields.  This is a preparation of later change.

Acked-by: Ingo Molnar 
Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/hist.c   | 39 +++
 tools/perf/util/hist.h |  3 +++
 2 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index 0912805c08f4..d4a4f2e7eb43 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -192,6 +192,14 @@ static int hpp__entry_##_type(struct perf_hpp_fmt *_fmt 
__maybe_unused,\
  hpp_entry_scnprintf, true);   
\
 }
 
+#define __HPP_SORT_FN(_type, _field)   
\
+static int64_t hpp__sort_##_type(struct hist_entry *a, struct hist_entry *b)   
\
+{  
\
+   s64 __a = he_get_##_field(a);   
\
+   s64 __b = he_get_##_field(b);   
\
+   return __a - __b;   
\
+}
+
 #define __HPP_ENTRY_RAW_FN(_type, _field)  
\
 static u64 he_get_raw_##_field(struct hist_entry *he)  
\
 {  
\
@@ -206,16 +214,27 @@ static int hpp__entry_##_type(struct perf_hpp_fmt *_fmt 
__maybe_unused,   \
  hpp_entry_scnprintf, false);  
\
 }
 
+#define __HPP_SORT_RAW_FN(_type, _field)   
\
+static int64_t hpp__sort_##_type(struct hist_entry *a, struct hist_entry *b)   
\
+{  
\
+   s64 __a = he_get_raw_##_field(a);   
\
+   s64 __b = he_get_raw_##_field(b);   
\
+   return __a - __b;   
\
+}
+
+
 #define HPP_PERCENT_FNS(_type, _str, _field, _min_width, _unit_width)  \
 __HPP_HEADER_FN(_type, _str, _min_width, _unit_width)  \
 __HPP_WIDTH_FN(_type, _min_width, _unit_width) \
 __HPP_COLOR_PERCENT_FN(_type, _field)  \
-__HPP_ENTRY_PERCENT_FN(_type, _field)
+__HPP_ENTRY_PERCENT_FN(_type, _field)  \
+__HPP_SORT_FN(_type, _field)
 
 #define HPP_RAW_FNS(_type, _str, _field, _min_width, _unit_width)  \
 __HPP_HEADER_FN(_type, _str, _min_width, _unit_width)  \
 __HPP_WIDTH_FN(_type, _min_width, _unit_width) \
-__HPP_ENTRY_RAW_FN(_type, _field)
+__HPP_ENTRY_RAW_FN(_type, _field)  \
+__HPP_SORT_RAW_FN(_type, _field)
 
 
 HPP_PERCENT_FNS(overhead, "Overhead", period, 8, 8)
@@ -227,19 +246,31 @@ HPP_PERCENT_FNS(overhead_guest_us, "guest usr", 
period_guest_us, 9, 8)
 HPP_RAW_FNS(samples, "Samples", nr_events, 12, 12)
 HPP_RAW_FNS(period, "Period", period, 12, 12)
 
+static int64_t hpp__nop_cmp(struct hist_entry *a __maybe_unused,
+   struct hist_entry *b __maybe_unused)
+{
+   return 0;
+}
+
 #define HPP__COLOR_PRINT_FNS(_name)\
{   \
.header = hpp__header_ ## _name,\
.width  = hpp__width_ ## _name, \
.color  = hpp__color_ ## _name, \
-   .entry  = hpp__entry_ ## _name  \
+   .entry  = hpp__entry_ ## _name, \
+   .cmp= hpp__nop_cmp, \
+   .collapse = hpp__nop_cmp,   \
+   .sort   = hpp__sort_ ## _name,  \
}
 
 #define HPP__PRINT_FNS(_name)  \
{   \
.header = hpp__header_ ## _name,\
.width  = hpp__width_ ## _name, \
-   .entry  = hpp__entry_ ## _name  \
+   .entry  = hpp__entry_ ## _name, \
+   .cmp= hpp__nop_cmp, \
+   .collapse = hpp__nop_cmp,   \
+   .sort   = hpp__sort_ ## _name,  \
}
 
 struct perf_hpp_fmt perf_hpp__format[] = {
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 38c3e874c164..36dbe00e3cc8 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -160,6 +160,9 @@ struct perf_hpp_fmt {
 struct hist_entry *he);
int (*entry)(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
 struct hist_entry *he);
+   int64_t (*cmp)(struct hist_entry *a, struct hist_entry *b);
+   int64_t (*collapse)(struct hist_entry *a

[PATCH 02/20] perf tools: Convert sort entries to hpp formats

2014-05-11 Thread Namhyung Kim
This is a preparation of consolidating management of output field and
sort keys.

Acked-by: Ingo Molnar 
Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/hist.c   |  6 
 tools/perf/util/hist.h |  6 
 tools/perf/util/sort.c | 80 +++---
 3 files changed, 88 insertions(+), 4 deletions(-)

diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index d4a4f2e7eb43..a6eea666b443 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -284,6 +284,7 @@ struct perf_hpp_fmt perf_hpp__format[] = {
 };
 
 LIST_HEAD(perf_hpp__list);
+LIST_HEAD(perf_hpp__sort_list);
 
 
 #undef HPP__COLOR_PRINT_FNS
@@ -325,6 +326,11 @@ void perf_hpp__column_register(struct perf_hpp_fmt *format)
list_add_tail(&format->list, &perf_hpp__list);
 }
 
+void perf_hpp__register_sort_field(struct perf_hpp_fmt *format)
+{
+   list_add_tail(&format->sort_list, &perf_hpp__sort_list);
+}
+
 void perf_hpp__column_enable(unsigned col)
 {
BUG_ON(col >= PERF_HPP__MAX_INDEX);
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 36dbe00e3cc8..eee154a41723 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -165,13 +165,18 @@ struct perf_hpp_fmt {
int64_t (*sort)(struct hist_entry *a, struct hist_entry *b);
 
struct list_head list;
+   struct list_head sort_list;
 };
 
 extern struct list_head perf_hpp__list;
+extern struct list_head perf_hpp__sort_list;
 
 #define perf_hpp__for_each_format(format) \
list_for_each_entry(format, &perf_hpp__list, list)
 
+#define perf_hpp__for_each_sort_list(format) \
+   list_for_each_entry(format, &perf_hpp__sort_list, sort_list)
+
 extern struct perf_hpp_fmt perf_hpp__format[];
 
 enum {
@@ -190,6 +195,7 @@ enum {
 void perf_hpp__init(void);
 void perf_hpp__column_register(struct perf_hpp_fmt *format);
 void perf_hpp__column_enable(unsigned col);
+void perf_hpp__register_sort_field(struct perf_hpp_fmt *format);
 
 typedef u64 (*hpp_field_fn)(struct hist_entry *he);
 typedef int (*hpp_callback_fn)(struct perf_hpp *hpp, bool front);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 635cd8f8b22e..b2829f947053 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -2,6 +2,7 @@
 #include "hist.h"
 #include "comm.h"
 #include "symbol.h"
+#include "evsel.h"
 
 regex_tparent_regex;
 const char default_parent_pattern[] = "^sys_|^do_page_fault";
@@ -1027,10 +1028,80 @@ static struct sort_dimension memory_sort_dimensions[] = 
{
 
 #undef DIM
 
-static void __sort_dimension__add(struct sort_dimension *sd, enum sort_type 
idx)
+struct hpp_sort_entry {
+   struct perf_hpp_fmt hpp;
+   struct sort_entry *se;
+};
+
+static int __sort__hpp_header(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+ struct perf_evsel *evsel)
+{
+   struct hpp_sort_entry *hse;
+   size_t len;
+
+   hse = container_of(fmt, struct hpp_sort_entry, hpp);
+   len = hists__col_len(&evsel->hists, hse->se->se_width_idx);
+
+   return scnprintf(hpp->buf, hpp->size, "%*s", len, hse->se->se_header);
+}
+
+static int __sort__hpp_width(struct perf_hpp_fmt *fmt,
+struct perf_hpp *hpp __maybe_unused,
+struct perf_evsel *evsel)
+{
+   struct hpp_sort_entry *hse;
+
+   hse = container_of(fmt, struct hpp_sort_entry, hpp);
+
+   return hists__col_len(&evsel->hists, hse->se->se_width_idx);
+}
+
+static int __sort__hpp_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+struct hist_entry *he)
+{
+   struct hpp_sort_entry *hse;
+   size_t len;
+
+   hse = container_of(fmt, struct hpp_sort_entry, hpp);
+   len = hists__col_len(he->hists, hse->se->se_width_idx);
+
+   return hse->se->se_snprintf(he, hpp->buf, hpp->size, len);
+}
+
+static int __sort_dimension__add_hpp(struct sort_dimension *sd)
+{
+   struct hpp_sort_entry *hse;
+
+   hse = malloc(sizeof(*hse));
+   if (hse == NULL) {
+   pr_err("Memory allocation failed\n");
+   return -1;
+   }
+
+   hse->se = sd->entry;
+   hse->hpp.header = __sort__hpp_header;
+   hse->hpp.width = __sort__hpp_width;
+   hse->hpp.entry = __sort__hpp_entry;
+   hse->hpp.color = NULL;
+
+   hse->hpp.cmp = sd->entry->se_cmp;
+   hse->hpp.collapse = sd->entry->se_collapse ? : sd->entry->se_cmp;
+   hse->hpp.sort = hse->hpp.collapse;
+
+   INIT_LIST_HEAD(&hse->hpp.list);
+   INIT_LIST_HEAD(&hse->hpp.sort_list);
+
+   perf_hpp__register_sort_field(&hse->hpp);
+   return 0;
+}
+
+static int __sort_dimension__add(struct sort_dimension *sd, enum sort_type idx)
 {
if (sd->taken)
-   return;
+   return 0;
+
+   if (__sort_dimension__add_hpp(sd) < 0)
+   return -1;
 
if (sd->entry->se_collapse)
sort__need_collapse = 1;
@@ -1040,6 +,8 @@ s

[PATCH 06/20] perf tools: Consolidate output field handling to hpp format routines

2014-05-11 Thread Namhyung Kim
Until now the hpp and sort functions do similar jobs different ways.
Since the sort functions converted/wrapped to hpp formats it can do
the job in a uniform way.

The perf_hpp__sort_list has a list of hpp formats to sort entries and
the perf_hpp__list has a list of hpp formats to print output result.

To have a backward compatiblity, it automatically adds 'overhead'
field in front of sort list.  And then all of fields in sort list
added to the output list (if it's not already there).

Acked-by: Ingo Molnar 
Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/browsers/hists.c |  4 ++--
 tools/perf/ui/gtk/hists.c  | 31 -
 tools/perf/ui/hist.c   | 28 +++
 tools/perf/ui/stdio/hist.c | 52 +-
 tools/perf/util/hist.c |  2 +-
 5 files changed, 47 insertions(+), 70 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index b0861e3e50a5..7bd8c0e81658 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -760,8 +760,8 @@ static int hist_browser__show_entry(struct hist_browser 
*browser,
if (!browser->b.navkeypressed)
width += 1;
 
-   hist_entry__sort_snprintf(entry, s, sizeof(s), browser->hists);
-   slsmg_write_nstring(s, width);
+   slsmg_write_nstring("", width);
+
++row;
++printed;
} else
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 91f10f3f6dd1..d5c336e1bb14 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -153,7 +153,6 @@ static void perf_gtk__show_hists(GtkWidget *window, struct 
hists *hists,
struct perf_hpp_fmt *fmt;
GType col_types[MAX_COLUMNS];
GtkCellRenderer *renderer;
-   struct sort_entry *se;
GtkTreeStore *store;
struct rb_node *nd;
GtkWidget *view;
@@ -172,16 +171,6 @@ static void perf_gtk__show_hists(GtkWidget *window, struct 
hists *hists,
perf_hpp__for_each_format(fmt)
col_types[nr_cols++] = G_TYPE_STRING;
 
-   list_for_each_entry(se, &hist_entry__sort_list, list) {
-   if (se->elide)
-   continue;
-
-   if (se == &sort_sym)
-   sym_col = nr_cols;
-
-   col_types[nr_cols++] = G_TYPE_STRING;
-   }
-
store = gtk_tree_store_newv(nr_cols, col_types);
 
view = gtk_tree_view_new();
@@ -199,16 +188,6 @@ static void perf_gtk__show_hists(GtkWidget *window, struct 
hists *hists,
col_idx++, NULL);
}
 
-   list_for_each_entry(se, &hist_entry__sort_list, list) {
-   if (se->elide)
-   continue;
-
-   gtk_tree_view_insert_column_with_attributes(GTK_TREE_VIEW(view),
-   -1, se->se_header,
-   renderer, "text",
-   col_idx++, NULL);
-   }
-
for (col_idx = 0; col_idx < nr_cols; col_idx++) {
GtkTreeViewColumn *column;
 
@@ -253,16 +232,6 @@ static void perf_gtk__show_hists(GtkWidget *window, struct 
hists *hists,
gtk_tree_store_set(store, &iter, col_idx++, s, -1);
}
 
-   list_for_each_entry(se, &hist_entry__sort_list, list) {
-   if (se->elide)
-   continue;
-
-   se->se_snprintf(h, s, ARRAY_SIZE(s),
-   hists__col_len(hists, 
se->se_width_idx));
-
-   gtk_tree_store_set(store, &iter, col_idx++, s, -1);
-   }
-
if (symbol_conf.use_callchain && sort__has_sym) {
if (callchain_param.mode == CHAIN_GRAPH_REL)
total = h->stat.period;
diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index c65a7fd744c6..32d2dfd794d9 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -352,8 +352,18 @@ LIST_HEAD(perf_hpp__sort_list);
 #undef __HPP_ENTRY_RAW_FN
 
 
+void perf_hpp__setup_output_field(void);
+
 void perf_hpp__init(void)
 {
+   struct list_head *list;
+   int i;
+
+   for (i = 0; i < PERF_HPP__MAX_INDEX; i++) {
+   INIT_LIST_HEAD(&perf_hpp__format[i].list);
+   INIT_LIST_HEAD(&perf_hpp__format[i].sort_list);
+   }
+
perf_hpp__column_enable(PERF_HPP__OVERHEAD);
 
if (symbol_conf.show_cpu_utilization) {
@@ -371,6 +381,13 @@ void perf_hpp__init(void)
 
if (symbol_conf.show_total_period)
perf_hpp__column_enable(PERF_HPP__PERIOD);
+
+   /* prepend overhead field for backward compatiblity.  */
+   list = &perf_hpp__format[PERF_HPP__OVERHEAD].sort_list;
+  

[PATCH 04/20] perf tools: Support event grouping in hpp ->sort()

2014-05-11 Thread Namhyung Kim
Move logic of hist_entry__sort_on_period to __hpp__sort() in order to
support event group report.

Acked-by: Ingo Molnar 
Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/hist.c | 64 +++-
 1 file changed, 58 insertions(+), 6 deletions(-)

diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index a6eea666b443..c65a7fd744c6 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -116,6 +116,62 @@ int __hpp__fmt(struct perf_hpp *hpp, struct hist_entry *he,
return ret;
 }
 
+static int field_cmp(u64 field_a, u64 field_b)
+{
+   if (field_a > field_b)
+   return 1;
+   if (field_a < field_b)
+   return -1;
+   return 0;
+}
+
+static int __hpp__sort(struct hist_entry *a, struct hist_entry *b,
+  hpp_field_fn get_field)
+{
+   s64 ret;
+   int i, nr_members;
+   struct perf_evsel *evsel;
+   struct hist_entry *pair;
+   u64 *fields_a, *fields_b;
+
+   ret = field_cmp(get_field(a), get_field(b));
+   if (ret || !symbol_conf.event_group)
+   return ret;
+
+   evsel = hists_to_evsel(a->hists);
+   if (!perf_evsel__is_group_event(evsel))
+   return ret;
+
+   nr_members = evsel->nr_members;
+   fields_a = calloc(sizeof(*fields_a), nr_members);
+   fields_b = calloc(sizeof(*fields_b), nr_members);
+
+   if (!fields_a || !fields_b)
+   goto out;
+
+   list_for_each_entry(pair, &a->pairs.head, pairs.node) {
+   evsel = hists_to_evsel(pair->hists);
+   fields_a[perf_evsel__group_idx(evsel)] = get_field(pair);
+   }
+
+   list_for_each_entry(pair, &b->pairs.head, pairs.node) {
+   evsel = hists_to_evsel(pair->hists);
+   fields_b[perf_evsel__group_idx(evsel)] = get_field(pair);
+   }
+
+   for (i = 1; i < nr_members; i++) {
+   ret = fields_a[i] - fields_b[i];
+   if (ret)
+   break;
+   }
+
+out:
+   free(fields_a);
+   free(fields_b);
+
+   return ret;
+}
+
 #define __HPP_HEADER_FN(_type, _str, _min_width, _unit_width)  \
 static int hpp__header_##_type(struct perf_hpp_fmt *fmt __maybe_unused,
\
   struct perf_hpp *hpp,\
@@ -195,9 +251,7 @@ static int hpp__entry_##_type(struct perf_hpp_fmt *_fmt 
__maybe_unused, \
 #define __HPP_SORT_FN(_type, _field)   
\
 static int64_t hpp__sort_##_type(struct hist_entry *a, struct hist_entry *b)   
\
 {  
\
-   s64 __a = he_get_##_field(a);   
\
-   s64 __b = he_get_##_field(b);   
\
-   return __a - __b;   
\
+   return __hpp__sort(a, b, he_get_##_field);  
\
 }
 
 #define __HPP_ENTRY_RAW_FN(_type, _field)  
\
@@ -217,9 +271,7 @@ static int hpp__entry_##_type(struct perf_hpp_fmt *_fmt 
__maybe_unused, \
 #define __HPP_SORT_RAW_FN(_type, _field)   
\
 static int64_t hpp__sort_##_type(struct hist_entry *a, struct hist_entry *b)   
\
 {  
\
-   s64 __a = he_get_raw_##_field(a);   
\
-   s64 __b = he_get_raw_##_field(b);   
\
-   return __a - __b;   
\
+   return __hpp__sort(a, b, he_get_raw_##_field);  
\
 }
 
 
-- 
1.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHSET 00/20] perf report: Add -F option for specifying output fields (v5)

2014-05-11 Thread Namhyung Kim
Hello,

This is a patchset implementing -F/--fields option to setup output
field/column as Ingo requested.

The -F option can receive any sort keys that -s option recognize, plus
following fields (name can be changed):

  overhead, overhead_sys, overhead_us, sample, period

The overhead_guest_sys and overhead_guest_us might be avaiable when
you profile guest machines.

Output will be sorted by in order of fields and sort keys passed by -s
option will be added to the output field list automatically.  If you
want to change the order of sorting you can give -s option in addition
to -F option.  To support old behavior, it'll also prepend 'overhead'
field to the sort keys unless you give -F option explicitly.


  $ perf report -s dso,sym
  ...
  # Overhead  Shared Object  Symbol
  #   .  ..
  #
  13.75%  ld-2.17.so [.] strcmp
  10.00%  abc[.] a 
  10.00%  abc[.] b 
  10.00%  abc[.] c 
   8.75%  abc[.] main  
   7.50%  libc-2.17.so   [.] _setjmp   
   6.25%  abc[.] _init 
   6.25%  abc[.] frame_dummy   
   5.00%  abc[.] __libc_csu_init   
   5.00%  ld-2.17.so [.] _dl_name_match_p  
   3.75%  libc-2.17.so   [.] __new_exitfn  
   2.50%  libc-2.17.so   [.] __cxa_atexit  
   1.25%  ld-2.17.so [.] _dl_check_map_versions
   1.25%  ld-2.17.so [.] _dl_setup_hash
   1.25%  ld-2.17.so [.] _dl_sysdep_start  
   1.25%  ld-2.17.so [.] brk   
   1.25%  ld-2.17.so [.] calloc@plt
   1.25%  ld-2.17.so [.] dl_main   
   1.25%  ld-2.17.so [.] match_symbol  
   1.25%  ld-2.17.so [.] sbrk  
   1.25%  ld-2.17.so [.] strlen


  $ perf report -F sym,sample,overhead
  ...
  # Symbol   Samples  Overhead
  # ..    
  #
[.] __cxa_atexit   2 2.50%
[.] __libc_csu_init4 5.00%
[.] __new_exitfn   3 3.75%
[.] _dl_check_map_versions 1 1.25%
[.] _dl_name_match_p   4 5.00%
[.] _dl_setup_hash 1 1.25%
[.] _dl_sysdep_start   1 1.25%
[.] _init  5 6.25%
[.] _setjmp6 7.50%
[.] a  810.00%
[.] b  810.00%
[.] brk1 1.25%
[.] c  810.00%
[.] calloc@plt 1 1.25%
[.] dl_main1 1.25%
[.] frame_dummy5 6.25%
[.] main   7 8.75%
[.] match_symbol   1 1.25%
[.] sbrk   1 1.25%
[.] strcmp1113.75%
[.] strlen 1 1.25%


  $ perf report -F sym,sample -s overhead
  ...
  # Symbol   Samples  Overhead
  # ..    
  #
[.] strcmp1113.75%
[.] a  810.00%
[.] b  810.00%
[.] c  810.00%
[.] main   7 8.75%
[.] _setjmp6 7.50%
[.] _init  5 6.25%
[.] frame_dummy5 6.25%
[.] __libc_csu_init4 5.00%
[.] _dl_name_match_p   4 5.00%
[.] __new_exitfn   3 3.75%
[.] __cxa_atexit   2 2.50%
[.] _dl_check_map_versions 1 1.25%
[.] _dl_setup_hash 1 1.25%
[.] _dl_sysdep_start   1 1.25%
[.] brk1 1.25%
[.] calloc@plt 1 1.25%
[.] dl_main1 1.25%
[.] match_symbol   1 1.25%
[.] sbrk   1 1.25%
[.] strlen 1 1.25%


 * changes in v5:
  - add a testcase for hist output sorting

 * changes in v4:
  - fix a tui navigation bug
  - fix a bug in output change of perf diff
  - move call to perf_hpp__init() out of setup_browser()
  - fix alignment of some output fields on stdio

 * changes in v3:
  - rename to --fields option for consisten

[PATCH 10/20] perf tools: Call perf_hpp__init() before setting up GUI browsers

2014-05-11 Thread Namhyung Kim
So that it can be set properly prior to set up output fields.  That
makes easy to handle/warn errors during the setup since it doesn't
need to be bothered with the GUI.

Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-report.c| 6 +++---
 tools/perf/builtin-top.c   | 2 ++
 tools/perf/ui/browsers/hists.c | 2 --
 tools/perf/ui/gtk/hists.c  | 2 --
 tools/perf/ui/setup.c  | 2 --
 5 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 8c9fbbdc6505..76d8d0b4f7f5 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -825,16 +825,16 @@ repeat:
goto error;
}
 
+   perf_hpp__init();
+
/* Force tty output for header output. */
if (report.header || report.header_only)
use_browser = 0;
 
if (strcmp(input_name, "-") != 0)
setup_browser(true);
-   else {
+   else
use_browser = 0;
-   perf_hpp__init();
-   }
 
if (report.header || report.header_only) {
perf_session__fprintf_info(session, stdout,
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index bb2aa6645a7e..9309629394dd 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1147,6 +1147,8 @@ int cmd_top(int argc, const char **argv, const char 
*prefix __maybe_unused)
/* display thread wants entries to be collapsed in a different tree */
sort__need_collapse = 1;
 
+   perf_hpp__init();
+
if (top.use_stdio)
use_browser = 0;
else if (top.use_tui)
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 3ed9212d2a63..69c2b0e536ab 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -661,8 +661,6 @@ __HPP_COLOR_PERCENT_FN(overhead_guest_us, period_guest_us)
 
 void hist_browser__init_hpp(void)
 {
-   perf_hpp__init();
-
perf_hpp__format[PERF_HPP__OVERHEAD].color =
hist_browser__hpp_color_overhead;
perf_hpp__format[PERF_HPP__OVERHEAD_SYS].color =
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 2237245bfac0..fd52669018ee 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -58,8 +58,6 @@ __HPP_COLOR_PERCENT_FN(overhead_guest_us, period_guest_us)
 
 void perf_gtk__init_hpp(void)
 {
-   perf_hpp__init();
-
perf_hpp__format[PERF_HPP__OVERHEAD].color =
perf_gtk__hpp_color_overhead;
perf_hpp__format[PERF_HPP__OVERHEAD_SYS].color =
diff --git a/tools/perf/ui/setup.c b/tools/perf/ui/setup.c
index 5df5140a9f29..ba51fa8a1176 100644
--- a/tools/perf/ui/setup.c
+++ b/tools/perf/ui/setup.c
@@ -86,8 +86,6 @@ void setup_browser(bool fallback_to_pager)
use_browser = 0;
if (fallback_to_pager)
setup_pager();
-
-   perf_hpp__init();
break;
}
 }
-- 
1.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/20] perf diff: Add missing setup_output_field()

2014-05-11 Thread Namhyung Kim
The perf diff command itself doesn't make use of the --fields option,
it still needs to call the function since the output only work with
that way.

Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-diff.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index f3b10dcf6838..670d191bec31 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -1155,6 +1155,9 @@ int cmd_diff(int argc, const char **argv, const char 
*prefix __maybe_unused)
if (setup_sorting() < 0)
usage_with_options(diff_usage, options);
 
+   if (setup_output_field() < 0)
+   usage_with_options(diff_usage, options);
+
setup_pager();
 
sort__setup_elide(NULL);
-- 
1.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/20] perf top: Add --fields option to specify output fields

2014-05-11 Thread Namhyung Kim
The --fields option is to allow user setup output field in any order.
It can recieve any sort keys and following (hpp) fields:

  overhead, overhead_sys, overhead_us, sample and period

If guest profiling is enabled, overhead_guest_{sys,us} will be
available too.

More more information, please see previous patch "perf report:
Add -F option to specify output fields"

Signed-off-by: Namhyung Kim 
---
 tools/perf/Documentation/perf-top.txt | 9 +
 tools/perf/builtin-top.c  | 7 +++
 2 files changed, 16 insertions(+)

diff --git a/tools/perf/Documentation/perf-top.txt 
b/tools/perf/Documentation/perf-top.txt
index 64ed79c43639..feac28017419 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -115,6 +115,15 @@ Default is to monitor all CPUS.
Sort by key(s): pid, comm, dso, symbol, parent, srcline, weight,
local_weight, abort, in_tx, transaction
 
+--fields=::
+   Specify output field - multiple keys can be specified in CSV format.
+   Following fields are available:
+   overhead, overhead_sys, overhead_us, sample and period.
+   Also it can contain any sort key(s).
+
+   By default, every sort keys not specified in --field will be appended
+   automatically.
+
 -n::
 --show-nr-samples::
Show a column with the number of samples.
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 9309629394dd..7d133dff5e15 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1085,6 +1085,8 @@ int cmd_top(int argc, const char **argv, const char 
*prefix __maybe_unused)
OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
   "sort by key(s): pid, comm, dso, symbol, parent, weight, 
local_weight,"
   " abort, in_tx, transaction"),
+   OPT_STRING(0, "fields", &field_order, "key[,keys...]",
+  "output field(s): overhead, period, sample plus all of sort 
keys"),
OPT_BOOLEAN('n', "show-nr-samples", &symbol_conf.show_nr_samples,
"Show a column with the number of samples"),
OPT_CALLBACK_NOOPT('g', NULL, &top.record_opts,
@@ -1149,6 +1151,11 @@ int cmd_top(int argc, const char **argv, const char 
*prefix __maybe_unused)
 
perf_hpp__init();
 
+   if (setup_output_field() < 0) {
+   parse_options_usage(top_usage, options, "fields", 0);
+   goto out_delete_evlist;
+   }
+
if (top.use_stdio)
use_browser = 0;
else if (top.use_tui)
-- 
1.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/20] perf tools: Use hpp formats to sort final output

2014-05-11 Thread Namhyung Kim
Convert output sorting function to use ->sort hpp functions.

Acked-by: Ingo Molnar 
Signed-off-by: Namhyung Kim 
---
 tools/perf/util/hist.c | 62 +++---
 1 file changed, 8 insertions(+), 54 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 38373c986e97..c99ae4dd973e 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -564,64 +564,18 @@ void hists__collapse_resort(struct hists *hists, struct 
ui_progress *prog)
}
 }
 
-/*
- * reverse the map, sort on period.
- */
-
-static int period_cmp(u64 period_a, u64 period_b)
+static int hist_entry__sort(struct hist_entry *a, struct hist_entry *b)
 {
-   if (period_a > period_b)
-   return 1;
-   if (period_a < period_b)
-   return -1;
-   return 0;
-}
-
-static int hist_entry__sort_on_period(struct hist_entry *a,
- struct hist_entry *b)
-{
-   int ret;
-   int i, nr_members;
-   struct perf_evsel *evsel;
-   struct hist_entry *pair;
-   u64 *periods_a, *periods_b;
-
-   ret = period_cmp(a->stat.period, b->stat.period);
-   if (ret || !symbol_conf.event_group)
-   return ret;
-
-   evsel = hists_to_evsel(a->hists);
-   nr_members = evsel->nr_members;
-   if (nr_members <= 1)
-   return ret;
-
-   periods_a = zalloc(sizeof(periods_a) * nr_members);
-   periods_b = zalloc(sizeof(periods_b) * nr_members);
-
-   if (!periods_a || !periods_b)
-   goto out;
-
-   list_for_each_entry(pair, &a->pairs.head, pairs.node) {
-   evsel = hists_to_evsel(pair->hists);
-   periods_a[perf_evsel__group_idx(evsel)] = pair->stat.period;
-   }
-
-   list_for_each_entry(pair, &b->pairs.head, pairs.node) {
-   evsel = hists_to_evsel(pair->hists);
-   periods_b[perf_evsel__group_idx(evsel)] = pair->stat.period;
-   }
+   struct perf_hpp_fmt *fmt;
+   int64_t cmp = 0;
 
-   for (i = 1; i < nr_members; i++) {
-   ret = period_cmp(periods_a[i], periods_b[i]);
-   if (ret)
+   perf_hpp__for_each_format(fmt) {
+   cmp = fmt->sort(a, b);
+   if (cmp)
break;
}
 
-out:
-   free(periods_a);
-   free(periods_b);
-
-   return ret;
+   return cmp;
 }
 
 static void hists__reset_filter_stats(struct hists *hists)
@@ -669,7 +623,7 @@ static void __hists__insert_output_entry(struct rb_root 
*entries,
parent = *p;
iter = rb_entry(parent, struct hist_entry, rb_node);
 
-   if (hist_entry__sort_on_period(he, iter) > 0)
+   if (hist_entry__sort(he, iter) > 0)
p = &(*p)->rb_left;
else
p = &(*p)->rb_right;
-- 
1.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 17/20] perf hists: Reset width of output fields with header length

2014-05-11 Thread Namhyung Kim
Some fields missed to set default column length so it broke align in
--stdio output.  Add perf_hpp__reset_width() to set it to a sane
default value.

Note that this change will ignore -w/--column-widths option for now.

Before:
  $ perf report -F cpu,comm,overhead --stdio
  ...
  # CPU  Command  Overhead
  #   ...  
  #
0  firefox 2.65%
0  kworker/0:0 1.45%
0  swapper 5.52%
0 synergys 0.92%
1  firefox 4.54%

After:
  # CPU  Command  Overhead
  # ...  ...  
  #
  0  firefox 2.65%
  0  kworker/0:0 1.45%
  0  swapper 5.52%
  0 synergys 0.92%
  1  firefox 4.54%

Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/stdio/hist.c | 21 +++--
 tools/perf/util/hist.h |  1 +
 tools/perf/util/sort.c | 12 
 3 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index d2934659fd07..0363b19930ed 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -370,12 +370,10 @@ size_t hists__fprintf(struct hists *hists, bool 
show_header, int max_rows,
  int max_cols, float min_pcnt, FILE *fp)
 {
struct perf_hpp_fmt *fmt;
-   struct sort_entry *se;
struct rb_node *nd;
size_t ret = 0;
unsigned int width;
const char *sep = symbol_conf.field_sep;
-   const char *col_width = symbol_conf.col_width_list_str;
int nr_rows = 0;
char bf[96];
struct perf_hpp dummy_hpp = {
@@ -388,22 +386,9 @@ size_t hists__fprintf(struct hists *hists, bool 
show_header, int max_rows,
 
init_rem_hits();
 
-   list_for_each_entry(se, &hist_entry__sort_list, list) {
-   if (se->elide)
-   continue;
-   width = strlen(se->se_header);
-   if (symbol_conf.col_width_list_str) {
-   if (col_width) {
-   hists__set_col_len(hists, se->se_width_idx,
-  atoi(col_width));
-   col_width = strchr(col_width, ',');
-   if (col_width)
-   ++col_width;
-   }
-   }
-   if (!hists__new_col_len(hists, se->se_width_idx, width))
-   width = hists__col_len(hists, se->se_width_idx);
-   }
+
+   perf_hpp__for_each_format(fmt)
+   perf_hpp__reset_width(fmt, hists);
 
if (!show_header)
goto print_entries;
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index f67feb432a44..034db761630e 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -202,6 +202,7 @@ void perf_hpp__append_sort_keys(void);
 bool perf_hpp__is_sort_entry(struct perf_hpp_fmt *format);
 bool perf_hpp__same_sort_entry(struct perf_hpp_fmt *a, struct perf_hpp_fmt *b);
 bool perf_hpp__should_skip(struct perf_hpp_fmt *format);
+void perf_hpp__reset_width(struct perf_hpp_fmt *fmt, struct hists *hists);
 
 typedef u64 (*hpp_field_fn)(struct hist_entry *he);
 typedef int (*hpp_callback_fn)(struct perf_hpp *hpp, bool front);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 9dc33df4f9a6..d4502db36cba 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1088,6 +1088,18 @@ bool perf_hpp__same_sort_entry(struct perf_hpp_fmt *a, 
struct perf_hpp_fmt *b)
return hse_a->se == hse_b->se;
 }
 
+void perf_hpp__reset_width(struct perf_hpp_fmt *fmt, struct hists *hists)
+{
+   struct hpp_sort_entry *hse;
+
+   if (!perf_hpp__is_sort_entry(fmt))
+   return;
+
+   hse = container_of(fmt, struct hpp_sort_entry, hpp);
+   hists__new_col_len(hists, hse->se->se_width_idx,
+  strlen(hse->se->se_header));
+}
+
 static int __sort__hpp_header(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
  struct perf_evsel *evsel)
 {
-- 
1.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/20] perf report: Add -F option to specify output fields

2014-05-11 Thread Namhyung Kim
The -F/--fields option is to allow user setup output field in any
order.  It can recieve any sort keys and following (hpp) fields:

  overhead, overhead_sys, overhead_us, sample and period

If guest profiling is enabled, overhead_guest_{sys,us} will be
available too.

The output fields also affect sort order unless you give -s/--sort
option.  And any keys specified on -s option, will also be added to
the output field list automatically.

  $ perf report -F sym,sample,overhead
  ...
  # Symbol   Samples  Overhead
  # ..    
  #
[.] __cxa_atexit   2 2.50%
[.] __libc_csu_init4 5.00%
[.] __new_exitfn   3 3.75%
[.] _dl_check_map_versions 1 1.25%
[.] _dl_name_match_p   4 5.00%
[.] _dl_setup_hash 1 1.25%
[.] _dl_sysdep_start   1 1.25%
[.] _init  5 6.25%
[.] _setjmp6 7.50%
[.] a  810.00%
[.] b  810.00%
[.] brk1 1.25%
[.] c  810.00%

Note that, the example output above is captured after applying next
patch which fixes sort/comparing behavior.

Requested-by: Ingo Molnar 
Acked-by: Ingo Molnar 
Signed-off-by: Namhyung Kim 
---
 tools/perf/Documentation/perf-report.txt |  10 ++
 tools/perf/builtin-report.c  |   7 ++
 tools/perf/ui/hist.c |  61 +--
 tools/perf/util/hist.h   |   5 +
 tools/perf/util/sort.c   | 180 ++-
 tools/perf/util/sort.h   |   2 +
 6 files changed, 255 insertions(+), 10 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 09af66298564..8adbadf34b37 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -98,6 +98,16 @@ OPTIONS
And default sort keys are changed to comm, dso_from, symbol_from, dso_to
and symbol_to, see '--branch-stack'.
 
+-F::
+--fields=::
+   Specify output field - multiple keys can be specified in CSV format.
+   Following fields are available:
+   overhead, overhead_sys, overhead_us, sample and period.
+   Also it can contain any sort key(s).
+
+   By default, every sort keys not specified in -F will be appended
+   automatically.
+
 -p::
 --parent=::
 A regex filter to identify parent. The parent is a caller of this
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 76d8d0b4f7f5..39c9b3d2054c 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -703,6 +703,8 @@ int cmd_report(int argc, const char **argv, const char 
*prefix __maybe_unused)
   " dso_to, dso_from, symbol_to, symbol_from, mispredict,"
   " weight, local_weight, mem, symbol_daddr, dso_daddr, tlb, "
   "snoop, locked, abort, in_tx, transaction"),
+   OPT_STRING('F', "fields", &field_order, "key[,keys...]",
+  "output field(s): overhead, period, sample plus all of sort 
keys"),
OPT_BOOLEAN(0, "showcpuutilization", &symbol_conf.show_cpu_utilization,
"Show sample percentage for different cpu modes"),
OPT_STRING('p', "parent", &parent_pattern, "regex",
@@ -827,6 +829,11 @@ repeat:
 
perf_hpp__init();
 
+   if (setup_output_field() < 0) {
+   parse_options_usage(report_usage, options, "F", 1);
+   goto error;
+   }
+
/* Force tty output for header output. */
if (report.header || report.header_only)
use_browser = 0;
diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index f3e96463550b..f51cba43e9e7 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -340,8 +340,6 @@ LIST_HEAD(perf_hpp__sort_list);
 #undef __HPP_ENTRY_RAW_FN
 
 
-void perf_hpp__setup_output_field(void);
-
 void perf_hpp__init(void)
 {
struct list_head *list;
@@ -357,6 +355,12 @@ void perf_hpp__init(void)
INIT_LIST_HEAD(&fmt->sort_list);
}
 
+   /*
+* If user specified field order, no need to setup default fields.
+*/
+   if (field_order)
+   return;
+
perf_hpp__column_enable(PERF_HPP__OVERHEAD);
 
if (symbol_conf.show_cpu_utilization) {
@@ -379,8 +383,6 @@ void perf_hpp__init(void)
list = &perf_hpp__format[PERF_HPP__OVERHEAD].sort_list;
if (list_empty(list))
list_add(list, &perf_hpp__sort_list);
-
-   perf_hpp__setup_output_field();
 }
 
 void perf_hpp__column_register(struct perf_hpp_fmt *format)
@@ -405,8 +407,55 @@ void perf_hpp__setup_output_fiel

[PATCH 12/20] perf tools: Add ->sort() member to struct sort_entry

2014-05-11 Thread Namhyung Kim
Currently, what the sort_entry does is just identifying hist entries
so that they can be grouped properly.  However, with -F option
support, it indeed needs to sort entries appropriately to be shown to
users.  So add ->sort() member to do it.

Acked-by: Ingo Molnar 
Signed-off-by: Namhyung Kim 
---
 tools/perf/util/sort.c | 27 ++-
 tools/perf/util/sort.h |  1 +
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 639dd49f2884..1e7b80e517d5 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -98,6 +98,12 @@ sort__comm_collapse(struct hist_entry *left, struct 
hist_entry *right)
return comm__str(right->comm) - comm__str(left->comm);
 }
 
+static int64_t
+sort__comm_sort(struct hist_entry *left, struct hist_entry *right)
+{
+   return strcmp(comm__str(right->comm), comm__str(left->comm));
+}
+
 static int hist_entry__comm_snprintf(struct hist_entry *he, char *bf,
 size_t size, unsigned int width)
 {
@@ -108,6 +114,7 @@ struct sort_entry sort_comm = {
.se_header  = "Command",
.se_cmp = sort__comm_cmp,
.se_collapse= sort__comm_collapse,
+   .se_sort= sort__comm_sort,
.se_snprintf= hist_entry__comm_snprintf,
.se_width_idx   = HISTC_COMM,
 };
@@ -121,7 +128,7 @@ static int64_t _sort__dso_cmp(struct map *map_l, struct map 
*map_r)
const char *dso_name_l, *dso_name_r;
 
if (!dso_l || !dso_r)
-   return cmp_null(dso_l, dso_r);
+   return cmp_null(dso_r, dso_l);
 
if (verbose) {
dso_name_l = dso_l->long_name;
@@ -137,7 +144,7 @@ static int64_t _sort__dso_cmp(struct map *map_l, struct map 
*map_r)
 static int64_t
 sort__dso_cmp(struct hist_entry *left, struct hist_entry *right)
 {
-   return _sort__dso_cmp(left->ms.map, right->ms.map);
+   return _sort__dso_cmp(right->ms.map, left->ms.map);
 }
 
 static int _hist_entry__dso_snprintf(struct map *map, char *bf,
@@ -209,6 +216,15 @@ sort__sym_cmp(struct hist_entry *left, struct hist_entry 
*right)
return _sort__sym_cmp(left->ms.sym, right->ms.sym);
 }
 
+static int64_t
+sort__sym_sort(struct hist_entry *left, struct hist_entry *right)
+{
+   if (!left->ms.sym || !right->ms.sym)
+   return cmp_null(left->ms.sym, right->ms.sym);
+
+   return strcmp(right->ms.sym->name, left->ms.sym->name);
+}
+
 static int _hist_entry__sym_snprintf(struct map *map, struct symbol *sym,
 u64 ip, char level, char *bf, size_t size,
 unsigned int width)
@@ -255,6 +271,7 @@ static int hist_entry__sym_snprintf(struct hist_entry *he, 
char *bf,
 struct sort_entry sort_sym = {
.se_header  = "Symbol",
.se_cmp = sort__sym_cmp,
+   .se_sort= sort__sym_sort,
.se_snprintf= hist_entry__sym_snprintf,
.se_width_idx   = HISTC_SYMBOL,
 };
@@ -282,7 +299,7 @@ sort__srcline_cmp(struct hist_entry *left, struct 
hist_entry *right)
map__rip_2objdump(map, right->ip));
}
}
-   return strcmp(left->srcline, right->srcline);
+   return strcmp(right->srcline, left->srcline);
 }
 
 static int hist_entry__srcline_snprintf(struct hist_entry *he, char *bf,
@@ -310,7 +327,7 @@ sort__parent_cmp(struct hist_entry *left, struct hist_entry 
*right)
if (!sym_l || !sym_r)
return cmp_null(sym_l, sym_r);
 
-   return strcmp(sym_l->name, sym_r->name);
+   return strcmp(sym_r->name, sym_l->name);
 }
 
 static int hist_entry__parent_snprintf(struct hist_entry *he, char *bf,
@@ -1125,7 +1142,7 @@ __sort_dimension__alloc_hpp(struct sort_dimension *sd)
 
hse->hpp.cmp = sd->entry->se_cmp;
hse->hpp.collapse = sd->entry->se_collapse ? : sd->entry->se_cmp;
-   hse->hpp.sort = hse->hpp.collapse;
+   hse->hpp.sort = sd->entry->se_sort ? : hse->hpp.collapse;
 
INIT_LIST_HEAD(&hse->hpp.list);
INIT_LIST_HEAD(&hse->hpp.sort_list);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 02706c9766d6..cd679a56c81d 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -181,6 +181,7 @@ struct sort_entry {
 
int64_t (*se_cmp)(struct hist_entry *, struct hist_entry *);
int64_t (*se_collapse)(struct hist_entry *, struct hist_entry *);
+   int64_t (*se_sort)(struct hist_entry *, struct hist_entry *);
int (*se_snprintf)(struct hist_entry *he, char *bf, size_t size,
   unsigned int width);
u8  se_width_idx;
-- 
1.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/20] perf tools: Skip elided sort entries

2014-05-11 Thread Namhyung Kim
When it converted sort entries to hpp formats, it missed se->elide
handling, so add it for compatibility.

Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/browsers/hists.c |  3 +++
 tools/perf/ui/gtk/hists.c  |  6 ++
 tools/perf/ui/stdio/hist.c |  9 +
 tools/perf/util/hist.c |  9 +
 tools/perf/util/hist.h |  1 +
 tools/perf/util/sort.c | 11 +++
 6 files changed, 39 insertions(+)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 20b200f88129..fa46e592b588 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -711,6 +711,9 @@ static int hist_browser__show_entry(struct hist_browser 
*browser,
ui_browser__gotorc(&browser->b, row, 0);
 
perf_hpp__for_each_format(fmt) {
+   if (perf_hpp__should_skip(fmt))
+   continue;
+
if (current_entry && browser->b.navkeypressed) {
ui_browser__set_color(&browser->b,
  HE_COLORSET_SELECTED);
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index fd52669018ee..9d90683914d4 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -178,6 +178,9 @@ static void perf_gtk__show_hists(GtkWidget *window, struct 
hists *hists,
col_idx = 0;
 
perf_hpp__for_each_format(fmt) {
+   if (perf_hpp__should_skip(fmt))
+   continue;
+
fmt->header(fmt, &hpp, hists_to_evsel(hists));
 
gtk_tree_view_insert_column_with_attributes(GTK_TREE_VIEW(view),
@@ -222,6 +225,9 @@ static void perf_gtk__show_hists(GtkWidget *window, struct 
hists *hists,
col_idx = 0;
 
perf_hpp__for_each_format(fmt) {
+   if (perf_hpp__should_skip(fmt))
+   continue;
+
if (fmt->color)
fmt->color(fmt, &hpp, h);
else
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index e6920d124c60..d2934659fd07 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -319,6 +319,9 @@ static int hist_entry__period_snprintf(struct perf_hpp *hpp,
return 0;
 
perf_hpp__for_each_format(fmt) {
+   if (perf_hpp__should_skip(fmt))
+   continue;
+
/*
 * If there's no field_sep, we still need
 * to display initial '  '.
@@ -408,6 +411,9 @@ size_t hists__fprintf(struct hists *hists, bool 
show_header, int max_rows,
fprintf(fp, "# ");
 
perf_hpp__for_each_format(fmt) {
+   if (perf_hpp__should_skip(fmt))
+   continue;
+
if (!first)
fprintf(fp, "%s", sep ?: "  ");
else
@@ -431,6 +437,9 @@ size_t hists__fprintf(struct hists *hists, bool 
show_header, int max_rows,
perf_hpp__for_each_format(fmt) {
unsigned int i;
 
+   if (perf_hpp__should_skip(fmt))
+   continue;
+
if (!first)
fprintf(fp, "%s", sep ?: "  ");
else
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index ae13c2dbd27a..b262b44b7a65 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -436,6 +436,9 @@ hist_entry__cmp(struct hist_entry *left, struct hist_entry 
*right)
int64_t cmp = 0;
 
perf_hpp__for_each_sort_list(fmt) {
+   if (perf_hpp__should_skip(fmt))
+   continue;
+
cmp = fmt->cmp(left, right);
if (cmp)
break;
@@ -451,6 +454,9 @@ hist_entry__collapse(struct hist_entry *left, struct 
hist_entry *right)
int64_t cmp = 0;
 
perf_hpp__for_each_sort_list(fmt) {
+   if (perf_hpp__should_skip(fmt))
+   continue;
+
cmp = fmt->collapse(left, right);
if (cmp)
break;
@@ -570,6 +576,9 @@ static int hist_entry__sort(struct hist_entry *a, struct 
hist_entry *b)
int64_t cmp = 0;
 
perf_hpp__for_each_sort_list(fmt) {
+   if (perf_hpp__should_skip(fmt))
+   continue;
+
cmp = fmt->sort(a, b);
if (cmp)
break;
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index f3713b79742d..f67feb432a44 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -201,6 +201,7 @@ void perf_hpp__append_sort_keys(void);
 
 bool perf_hpp__is_sort_entry(struct perf_hpp_fmt *format);
 bool perf_hpp__same_sort_entry(struct perf_hpp_fmt *a, struct perf_hpp_fmt *b);
+bool perf_hpp__should_skip(struct perf_hpp_fmt *format);
 
 typedef u64 (*hpp_field_fn

[PATCH 18/20] perf tools: Introduce reset_output_field()

2014-05-11 Thread Namhyung Kim
The reset_output_field() function is for clearing output field
settings and will be used for test code in later patch.

Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/hist.c   | 17 +
 tools/perf/util/hist.h |  7 +++
 tools/perf/util/sort.c | 18 ++
 tools/perf/util/sort.h |  1 +
 4 files changed, 43 insertions(+)

diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index f51cba43e9e7..265c8a17c4ca 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -459,6 +459,23 @@ next:
}
 }
 
+void perf_hpp__reset_output_field(void)
+{
+   struct perf_hpp_fmt *fmt, *tmp;
+
+   /* reset output fields */
+   perf_hpp__for_each_format_safe(fmt, tmp) {
+   list_del_init(&fmt->list);
+   list_del_init(&fmt->sort_list);
+   }
+
+   /* reset sort keys */
+   perf_hpp__for_each_sort_list_safe(fmt, tmp) {
+   list_del_init(&fmt->list);
+   list_del_init(&fmt->sort_list);
+   }
+}
+
 int hist_entry__sort_snprintf(struct hist_entry *he, char *s, size_t size,
  struct hists *hists)
 {
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 034db761630e..a8418d19808d 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -174,9 +174,15 @@ extern struct list_head perf_hpp__sort_list;
 #define perf_hpp__for_each_format(format) \
list_for_each_entry(format, &perf_hpp__list, list)
 
+#define perf_hpp__for_each_format_safe(format, tmp)\
+   list_for_each_entry_safe(format, tmp, &perf_hpp__list, list)
+
 #define perf_hpp__for_each_sort_list(format) \
list_for_each_entry(format, &perf_hpp__sort_list, sort_list)
 
+#define perf_hpp__for_each_sort_list_safe(format, tmp) \
+   list_for_each_entry_safe(format, tmp, &perf_hpp__sort_list, sort_list)
+
 extern struct perf_hpp_fmt perf_hpp__format[];
 
 enum {
@@ -197,6 +203,7 @@ void perf_hpp__column_register(struct perf_hpp_fmt *format);
 void perf_hpp__column_enable(unsigned col);
 void perf_hpp__register_sort_field(struct perf_hpp_fmt *format);
 void perf_hpp__setup_output_field(void);
+void perf_hpp__reset_output_field(void);
 void perf_hpp__append_sort_keys(void);
 
 bool perf_hpp__is_sort_entry(struct perf_hpp_fmt *format);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index d4502db36cba..aa52066b9a74 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1540,3 +1540,21 @@ out:
 
return ret;
 }
+
+void reset_output_field(void)
+{
+   struct sort_entry *pos, *tmp;
+
+   sort__need_collapse = 0;
+   sort__has_parent = 0;
+   sort__has_sym = 0;
+   sort__has_dso = 0;
+
+   sort__first_dimension = 0;
+
+   list_for_each_entry_safe(pos, tmp, &hist_entry__sort_list, list)
+   list_del_init(&pos->list);
+
+   reset_dimensions();
+   perf_hpp__reset_output_field();
+}
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index cd679a56c81d..f7acf9dedf57 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -193,6 +193,7 @@ extern struct list_head hist_entry__sort_list;
 
 int setup_sorting(void);
 int setup_output_field(void);
+void reset_output_field(void);
 extern int sort_dimension__add(const char *);
 void sort__setup_elide(FILE *fp);
 
-- 
1.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/20] perf report/tui: Fix a bug when --fields/sort is given

2014-05-11 Thread Namhyung Kim
The hists__filter_entries() function is called when down arrow key is
pressed for navigating through the entries in TUI.  It has a check for
filtering out entries that have very small overhead (under min_pcnt).

However it just assumed the entries are sorted by the overhead so when
it saw such a small overheaded entry, it just stopped navigating as an
optimization.  But it's not true anymore due to new --fields and
--sort optoin behavior and this case users cannot go down to a next
entry if ther's an entry with small overhead in-between.

Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/browsers/hists.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 69c2b0e536ab..20b200f88129 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -812,10 +812,7 @@ static struct rb_node *hists__filter_entries(struct 
rb_node *nd,
if (total)
percent = h->stat.period * 100.0 / total;
 
-   if (percent < min_pcnt)
-   return NULL;
-
-   if (!h->filtered)
+   if (!h->filtered && percent >= min_pcnt)
return nd;
 
nd = rb_next(nd);
-- 
1.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 19/20] perf tests: Factor out print_hists_*()

2014-05-11 Thread Namhyung Kim
Those print helper functions can be reused by later hist test cases so
factor them out to a common location.

Signed-off-by: Namhyung Kim 
---
 tools/perf/tests/hists_common.c | 57 +
 tools/perf/tests/hists_common.h |  3 +++
 tools/perf/tests/hists_filter.c | 37 --
 tools/perf/tests/hists_link.c   | 29 +
 4 files changed, 66 insertions(+), 60 deletions(-)

diff --git a/tools/perf/tests/hists_common.c b/tools/perf/tests/hists_common.c
index 44655b395bb9..040a85b17aee 100644
--- a/tools/perf/tests/hists_common.c
+++ b/tools/perf/tests/hists_common.c
@@ -146,3 +146,60 @@ out:
machine__delete(machine);
return NULL;
 }
+
+void print_hists_in(struct hists *hists)
+{
+   int i = 0;
+   struct rb_root *root;
+   struct rb_node *node;
+
+   if (sort__need_collapse)
+   root = &hists->entries_collapsed;
+   else
+   root = hists->entries_in;
+
+   pr_info("- %s \n", __func__);
+   node = rb_first(root);
+   while (node) {
+   struct hist_entry *he;
+
+   he = rb_entry(node, struct hist_entry, rb_node_in);
+
+   if (!he->filtered) {
+   pr_info("%2d: entry: %-8s [%-8s] %20s: period = 
%"PRIu64"\n",
+   i, thread__comm_str(he->thread),
+   he->ms.map->dso->short_name,
+   he->ms.sym->name, he->stat.period);
+   }
+
+   i++;
+   node = rb_next(node);
+   }
+}
+
+void print_hists_out(struct hists *hists)
+{
+   int i = 0;
+   struct rb_root *root;
+   struct rb_node *node;
+
+   root = &hists->entries;
+
+   pr_info("- %s \n", __func__);
+   node = rb_first(root);
+   while (node) {
+   struct hist_entry *he;
+
+   he = rb_entry(node, struct hist_entry, rb_node);
+
+   if (!he->filtered) {
+   pr_info("%2d: entry: %-8s [%-8s] %20s: period = 
%"PRIu64"\n",
+   i, thread__comm_str(he->thread),
+   he->ms.map->dso->short_name,
+   he->ms.sym->name, he->stat.period);
+   }
+
+   i++;
+   node = rb_next(node);
+   }
+}
diff --git a/tools/perf/tests/hists_common.h b/tools/perf/tests/hists_common.h
index 2528b8fc105a..1415ae69d7b6 100644
--- a/tools/perf/tests/hists_common.h
+++ b/tools/perf/tests/hists_common.h
@@ -41,4 +41,7 @@ struct machines;
  */
 struct machine *setup_fake_machine(struct machines *machines);
 
+void print_hists_in(struct hists *hists);
+void print_hists_out(struct hists *hists);
+
 #endif /* __PERF_TESTS__HISTS_COMMON_H__ */
diff --git a/tools/perf/tests/hists_filter.c b/tools/perf/tests/hists_filter.c
index 4617a8bee29b..13c8cf49225e 100644
--- a/tools/perf/tests/hists_filter.c
+++ b/tools/perf/tests/hists_filter.c
@@ -98,33 +98,6 @@ out:
return TEST_FAIL;
 }
 
-static void print_hists(struct hists *hists)
-{
-   int i = 0;
-   struct rb_root *root;
-   struct rb_node *node;
-
-   root = &hists->entries;
-
-   pr_info("- %s \n", __func__);
-   node = rb_first(root);
-   while (node) {
-   struct hist_entry *he;
-
-   he = rb_entry(node, struct hist_entry, rb_node);
-
-   if (!he->filtered) {
-   pr_info("%2d: entry: %-8s [%-8s] %20s: period = 
%"PRIu64"\n",
-   i, thread__comm_str(he->thread),
-   he->ms.map->dso->short_name,
-   he->ms.sym->name, he->stat.period);
-   }
-
-   i++;
-   node = rb_next(node);
-   }
-}
-
 int test__hists_filter(void)
 {
int err = TEST_FAIL;
@@ -169,7 +142,7 @@ int test__hists_filter(void)
 
if (verbose > 2) {
pr_info("Normal histogram\n");
-   print_hists(hists);
+   print_hists_out(hists);
}
 
TEST_ASSERT_VAL("Invalid nr samples",
@@ -193,7 +166,7 @@ int test__hists_filter(void)
 
if (verbose > 2) {
pr_info("Histogram for thread filter\n");
-   print_hists(hists);
+   print_hists_out(hists);
}
 
/* normal stats should be invariant */
@@ -222,7 +195,7 @@ int test__hists_filter(void)
 
if (verbose > 2) {
pr_info("Histogram for dso filter\n");
-   print_hists(hists);
+   print_hists_out(hists);
}
 
/* normal stats should be invariant */
@@ -257,7 +230,7 @@ int test__hists_filter(void)
 
if (verbose > 2) {
pr_info("His

[PATCH 20/20] perf tests: Add a testcase for histogram output sorting

2014-05-11 Thread Namhyung Kim
With new output fields option, its internal implementation was changed
so add a new testcase to verify whether it breaks things.

Signed-off-by: Namhyung Kim 
---
 tools/perf/Makefile.perf|   1 +
 tools/perf/tests/builtin-test.c |   4 +
 tools/perf/tests/hists_common.c |   4 +-
 tools/perf/tests/hists_output.c | 628 
 tools/perf/tests/tests.h|   1 +
 5 files changed, 636 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/tests/hists_output.c

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 2baf61cec7ff..25a5d46eb08c 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -399,6 +399,7 @@ LIB_OBJS += $(OUTPUT)tests/pmu.o
 LIB_OBJS += $(OUTPUT)tests/hists_common.o
 LIB_OBJS += $(OUTPUT)tests/hists_link.o
 LIB_OBJS += $(OUTPUT)tests/hists_filter.o
+LIB_OBJS += $(OUTPUT)tests/hists_output.o
 LIB_OBJS += $(OUTPUT)tests/python-use.o
 LIB_OBJS += $(OUTPUT)tests/bp_signal.o
 LIB_OBJS += $(OUTPUT)tests/bp_signal_overflow.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 0d5afaf72944..6f39cb80fdc2 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -136,6 +136,10 @@ static struct test {
.func = test__thread_mg_share,
},
{
+   .desc = "Test output sorting of hist entries",
+   .func = test__hists_output,
+   },
+   {
.func = NULL,
},
 };
diff --git a/tools/perf/tests/hists_common.c b/tools/perf/tests/hists_common.c
index 040a85b17aee..e4e01aadc3be 100644
--- a/tools/perf/tests/hists_common.c
+++ b/tools/perf/tests/hists_common.c
@@ -193,8 +193,8 @@ void print_hists_out(struct hists *hists)
he = rb_entry(node, struct hist_entry, rb_node);
 
if (!he->filtered) {
-   pr_info("%2d: entry: %-8s [%-8s] %20s: period = 
%"PRIu64"\n",
-   i, thread__comm_str(he->thread),
+   pr_info("%2d: entry: %8s:%5d [%-8s] %20s: period = 
%"PRIu64"\n",
+   i, thread__comm_str(he->thread), 
he->thread->tid,
he->ms.map->dso->short_name,
he->ms.sym->name, he->stat.period);
}
diff --git a/tools/perf/tests/hists_output.c b/tools/perf/tests/hists_output.c
new file mode 100644
index ..c91352760b16
--- /dev/null
+++ b/tools/perf/tests/hists_output.c
@@ -0,0 +1,628 @@
+#include "perf.h"
+#include "util/debug.h"
+#include "util/symbol.h"
+#include "util/sort.h"
+#include "util/evsel.h"
+#include "util/evlist.h"
+#include "util/machine.h"
+#include "util/thread.h"
+#include "util/parse-events.h"
+#include "tests/tests.h"
+#include "tests/hists_common.h"
+
+struct sample {
+   u32 cpu;
+   u32 pid;
+   u64 ip;
+   struct thread *thread;
+   struct map *map;
+   struct symbol *sym;
+};
+
+/* For the numbers, see hists_common.c */
+static struct sample fake_samples[] = {
+   /* perf [kernel] schedule() */
+   { .cpu = 0, .pid = 100, .ip = 0xf + 700, },
+   /* perf [perf]   main() */
+   { .cpu = 1, .pid = 100, .ip = 0x4 + 700, },
+   /* perf [perf]   cmd_record() */
+   { .cpu = 1, .pid = 100, .ip = 0x4 + 900, },
+   /* perf [libc]   malloc() */
+   { .cpu = 1, .pid = 100, .ip = 0x5 + 700, },
+   /* perf [libc]   free() */
+   { .cpu = 2, .pid = 100, .ip = 0x5 + 800, },
+   /* perf [perf]   main() */
+   { .cpu = 2, .pid = 200, .ip = 0x4 + 700, },
+   /* perf [kernel] page_fault() */
+   { .cpu = 2, .pid = 200, .ip = 0xf + 800, },
+   /* bash [bash]   main() */
+   { .cpu = 3, .pid = 300, .ip = 0x4 + 700, },
+   /* bash [bash]   xmalloc() */
+   { .cpu = 0, .pid = 300, .ip = 0x4 + 800, },
+   /* bash [kernel] page_fault() */
+   { .cpu = 1, .pid = 300, .ip = 0xf + 800, },
+};
+
+static int add_hist_entries(struct hists *hists, struct machine *machine)
+{
+   struct addr_location al;
+   struct hist_entry *he;
+   struct perf_sample sample = { .period = 100, };
+   size_t i;
+
+   for (i = 0; i < ARRAY_SIZE(fake_samples); i++) {
+   const union perf_event event = {
+   .header = {
+   .misc = PERF_RECORD_MISC_USER,
+   },
+   };
+
+   sample.cpu = fake_samples[i].cpu;
+   sample.pid = fake_samples[i].pid;
+   sample.tid = fake_samples[i].pid;
+   sample.ip = fake_samples[i].ip;
+
+   if (perf_event__preprocess_sample(&event, machine, &al,
+ &sample) < 0)
+   goto out;
+
+   he = __hists__add_entry(hists, &al, NULL, NULL, NULL,
+   sample.period, 1, 0);
+ 

[PATCH 09/20] perf tools: Consolidate management of default sort orders

2014-05-11 Thread Namhyung Kim
The perf uses different default sort orders for different use-cases,
and this was scattered throughout the code.  Add get_default_sort_
order() function to handle this and change initial value of sort_order
to NULL to distinguish it from user-given one.

Cc: Stephane Eranian 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-report.c | 18 --
 tools/perf/builtin-top.c|  3 +--
 tools/perf/util/sort.c  | 25 +++--
 tools/perf/util/sort.h  |  1 +
 4 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 89c95289fd51..8c9fbbdc6505 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -807,30 +807,12 @@ repeat:
if (branch_mode == -1 && has_br_stack)
sort__mode = SORT_MODE__BRANCH;
 
-   /* sort__mode could be NORMAL if --no-branch-stack */
-   if (sort__mode == SORT_MODE__BRANCH) {
-   /*
-* if no sort_order is provided, then specify
-* branch-mode specific order
-*/
-   if (sort_order == default_sort_order)
-   sort_order = "comm,dso_from,symbol_from,"
-"dso_to,symbol_to";
-
-   }
if (report.mem_mode) {
if (sort__mode == SORT_MODE__BRANCH) {
pr_err("branch and mem mode incompatible\n");
goto error;
}
sort__mode = SORT_MODE__MEMORY;
-
-   /*
-* if no sort_order is provided, then specify
-* branch-mode specific order
-*/
-   if (sort_order == default_sort_order)
-   sort_order = 
"local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked";
}
 
if (setup_sorting() < 0) {
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 37d30460bada..bb2aa6645a7e 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1137,8 +1137,7 @@ int cmd_top(int argc, const char **argv, const char 
*prefix __maybe_unused)
if (argc)
usage_with_options(top_usage, options);
 
-   if (sort_order == default_sort_order)
-   sort_order = "dso,symbol";
+   sort__mode = SORT_MODE__TOP;
 
if (setup_sorting() < 0) {
parse_options_usage(top_usage, options, "s", 1);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 916652af8304..2f83965ab2c0 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -8,7 +8,10 @@ regex_tparent_regex;
 const char default_parent_pattern[] = "^sys_|^do_page_fault";
 const char *parent_pattern = default_parent_pattern;
 const char default_sort_order[] = "comm,dso,symbol";
-const char *sort_order = default_sort_order;
+const char default_branch_sort_order[] = 
"comm,dso_from,symbol_from,dso_to,symbol_to";
+const char default_mem_sort_order[] = 
"local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked";
+const char default_top_sort_order[] = "dso,symbol";
+const char *sort_order;
 regex_tignore_callees_regex;
 inthave_ignore_callees = 0;
 intsort__need_collapse = 0;
@@ -1218,11 +1221,29 @@ int sort_dimension__add(const char *tok)
return -ESRCH;
 }
 
+static const char *get_default_sort_order(void)
+{
+   const char *default_sort_orders[] = {
+   default_sort_order,
+   default_branch_sort_order,
+   default_mem_sort_order,
+   default_top_sort_order,
+   };
+
+   BUG_ON(sort__mode > ARRAY_SIZE(default_sort_orders));
+
+   return default_sort_orders[sort__mode];
+}
+
 int setup_sorting(void)
 {
-   char *tmp, *tok, *str = strdup(sort_order);
+   char *tmp, *tok, *str;
int ret = 0;
 
+   if (sort_order == NULL)
+   sort_order = get_default_sort_order();
+
+   str = strdup(sort_order);
if (str == NULL) {
error("Not enough memory to setup sort keys");
return -ENOMEM;
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 43e5ff42a609..35b53cc56feb 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -133,6 +133,7 @@ enum sort_mode {
SORT_MODE__NORMAL,
SORT_MODE__BRANCH,
SORT_MODE__MEMORY,
+   SORT_MODE__TOP,
 };
 
 enum sort_type {
-- 
1.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/20] perf tools: Allow hpp fields to be sort keys

2014-05-11 Thread Namhyung Kim
Add overhead{,_sys,_us,_guest_sys,_guest_us}, sample and period sort
keys so that they can be selected with --sort/-s option.

  $ perf report -s period,comm --stdio
  ...
  # OverheadPeriod  Command
  #     ...
  #
  47.06%   152  swapper
  13.93%45  qemu-system-arm
  12.38%40 synergys
   3.72%12  firefox
   2.48% 8xchat

Acked-by: Ingo Molnar 
Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/hist.c   |  9 +++--
 tools/perf/util/sort.c | 39 +++
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index 400dad8c41e4..f3e96463550b 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -348,8 +348,13 @@ void perf_hpp__init(void)
int i;
 
for (i = 0; i < PERF_HPP__MAX_INDEX; i++) {
-   INIT_LIST_HEAD(&perf_hpp__format[i].list);
-   INIT_LIST_HEAD(&perf_hpp__format[i].sort_list);
+   struct perf_hpp_fmt *fmt = &perf_hpp__format[i];
+
+   INIT_LIST_HEAD(&fmt->list);
+
+   /* sort_list may be linked by setup_sorting() */
+   if (fmt->sort_list.next == NULL)
+   INIT_LIST_HEAD(&fmt->sort_list);
}
 
perf_hpp__column_enable(PERF_HPP__OVERHEAD);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index b2829f947053..916652af8304 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -1028,6 +1028,26 @@ static struct sort_dimension memory_sort_dimensions[] = {
 
 #undef DIM
 
+struct hpp_dimension {
+   const char  *name;
+   struct perf_hpp_fmt *fmt;
+   int taken;
+};
+
+#define DIM(d, n) { .name = n, .fmt = &perf_hpp__format[d], }
+
+static struct hpp_dimension hpp_sort_dimensions[] = {
+   DIM(PERF_HPP__OVERHEAD, "overhead"),
+   DIM(PERF_HPP__OVERHEAD_SYS, "overhead_sys"),
+   DIM(PERF_HPP__OVERHEAD_US, "overhead_us"),
+   DIM(PERF_HPP__OVERHEAD_GUEST_SYS, "overhead_guest_sys"),
+   DIM(PERF_HPP__OVERHEAD_GUEST_US, "overhead_guest_us"),
+   DIM(PERF_HPP__SAMPLES, "sample"),
+   DIM(PERF_HPP__PERIOD, "period"),
+};
+
+#undef DIM
+
 struct hpp_sort_entry {
struct perf_hpp_fmt hpp;
struct sort_entry *se;
@@ -1115,6 +1135,16 @@ static int __sort_dimension__add(struct sort_dimension 
*sd, enum sort_type idx)
return 0;
 }
 
+static int __hpp_dimension__add(struct hpp_dimension *hd)
+{
+   if (!hd->taken) {
+   hd->taken = 1;
+
+   perf_hpp__register_sort_field(hd->fmt);
+   }
+   return 0;
+}
+
 int sort_dimension__add(const char *tok)
 {
unsigned int i;
@@ -1144,6 +1174,15 @@ int sort_dimension__add(const char *tok)
return __sort_dimension__add(sd, i);
}
 
+   for (i = 0; i < ARRAY_SIZE(hpp_sort_dimensions); i++) {
+   struct hpp_dimension *hd = &hpp_sort_dimensions[i];
+
+   if (strncasecmp(tok, hd->name, strlen(tok)))
+   continue;
+
+   return __hpp_dimension__add(hd);
+   }
+
for (i = 0; i < ARRAY_SIZE(bstack_sort_dimensions); i++) {
struct sort_dimension *sd = &bstack_sort_dimensions[i];
 
-- 
1.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/20] perf ui: Get rid of callback from __hpp__fmt()

2014-05-11 Thread Namhyung Kim
The callback was used by TUI for determining color of folded sign
using percent of first field/column.  But it cannot be used anymore
since it now support dynamic reording of output field.

So move the logic to the hist_browser__show_entry().

Acked-by: Ingo Molnar 
Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/browsers/hists.c | 62 --
 tools/perf/ui/gtk/hists.c  |  2 +-
 tools/perf/ui/hist.c   | 28 ++-
 tools/perf/util/hist.h |  4 +--
 4 files changed, 34 insertions(+), 62 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 7bd8c0e81658..3ed9212d2a63 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -616,35 +616,6 @@ struct hpp_arg {
bool current_entry;
 };
 
-static int __hpp__overhead_callback(struct perf_hpp *hpp, bool front)
-{
-   struct hpp_arg *arg = hpp->ptr;
-
-   if (arg->current_entry && arg->b->navkeypressed)
-   ui_browser__set_color(arg->b, HE_COLORSET_SELECTED);
-   else
-   ui_browser__set_color(arg->b, HE_COLORSET_NORMAL);
-
-   if (front) {
-   if (!symbol_conf.use_callchain)
-   return 0;
-
-   slsmg_printf("%c ", arg->folded_sign);
-   return 2;
-   }
-
-   return 0;
-}
-
-static int __hpp__color_callback(struct perf_hpp *hpp, bool front 
__maybe_unused)
-{
-   struct hpp_arg *arg = hpp->ptr;
-
-   if (!arg->current_entry || !arg->b->navkeypressed)
-   ui_browser__set_color(arg->b, HE_COLORSET_NORMAL);
-   return 0;
-}
-
 static int __hpp__slsmg_color_printf(struct perf_hpp *hpp, const char *fmt, 
...)
 {
struct hpp_arg *arg = hpp->ptr;
@@ -665,7 +636,7 @@ static int __hpp__slsmg_color_printf(struct perf_hpp *hpp, 
const char *fmt, ...)
return ret;
 }
 
-#define __HPP_COLOR_PERCENT_FN(_type, _field, _cb) \
+#define __HPP_COLOR_PERCENT_FN(_type, _field)  \
 static u64 __hpp_get_##_field(struct hist_entry *he)   \
 {  \
return he->stat._field; \
@@ -676,15 +647,15 @@ hist_browser__hpp_color_##_type(struct perf_hpp_fmt *fmt 
__maybe_unused,\
struct perf_hpp *hpp,   \
struct hist_entry *he)  \
 {  \
-   return __hpp__fmt(hpp, he, __hpp_get_##_field, _cb, " %6.2f%%", \
+   return __hpp__fmt(hpp, he, __hpp_get_##_field, " %6.2f%%",  \
  __hpp__slsmg_color_printf, true); \
 }
 
-__HPP_COLOR_PERCENT_FN(overhead, period, __hpp__overhead_callback)
-__HPP_COLOR_PERCENT_FN(overhead_sys, period_sys, __hpp__color_callback)
-__HPP_COLOR_PERCENT_FN(overhead_us, period_us, __hpp__color_callback)
-__HPP_COLOR_PERCENT_FN(overhead_guest_sys, period_guest_sys, 
__hpp__color_callback)
-__HPP_COLOR_PERCENT_FN(overhead_guest_us, period_guest_us, 
__hpp__color_callback)
+__HPP_COLOR_PERCENT_FN(overhead, period)
+__HPP_COLOR_PERCENT_FN(overhead_sys, period_sys)
+__HPP_COLOR_PERCENT_FN(overhead_us, period_us)
+__HPP_COLOR_PERCENT_FN(overhead_guest_sys, period_guest_sys)
+__HPP_COLOR_PERCENT_FN(overhead_guest_us, period_guest_us)
 
 #undef __HPP_COLOR_PERCENT_FN
 
@@ -729,7 +700,7 @@ static int hist_browser__show_entry(struct hist_browser 
*browser,
 
if (row_offset == 0) {
struct hpp_arg arg = {
-   .b  = &browser->b,
+   .b  = &browser->b,
.folded_sign= folded_sign,
.current_entry  = current_entry,
};
@@ -742,11 +713,24 @@ static int hist_browser__show_entry(struct hist_browser 
*browser,
ui_browser__gotorc(&browser->b, row, 0);
 
perf_hpp__for_each_format(fmt) {
-   if (!first) {
+   if (current_entry && browser->b.navkeypressed) {
+   ui_browser__set_color(&browser->b,
+ HE_COLORSET_SELECTED);
+   } else {
+   ui_browser__set_color(&browser->b,
+ HE_COLORSET_NORMAL);
+   }
+
+   if (first) {
+   if (symbol_conf.use_callchain) {
+   slsmg_printf("%c ", folded_sign);
+   width -= 2;
+   }
+   first = false;
+   } else {
slsmg_printf("  ");
width -= 2;
 

[PATCH v13 01/19] iommu/exynos: fix build errors

2014-05-11 Thread Shaik Ameer Basha
From: Cho KyongHo 

Commit 25e9d28d92 (ARM: EXYNOS: remove system mmu initialization from
exynos tree) removed arch/arm/mach-exynos/mach/sysmmu.h header without
removing remaining use of it from exynos-iommu driver, thus causing a
compilation error.

This patch fixes the error by removing respective include line
from exynos-iommu.c.

Use of __pa and __va macro is changed to virt_to_phys and phys_to_virt
which are recommended in driver code. printk formatting of physical
address is also fixed to %pa.

Also System MMU driver is changed to control only a single instance
of System MMU at a time. Since a single instance of System MMU has only
a single clock descriptor for its clock gating, single address range
for control registers, there is no need to obtain two or more clock
descriptors and ioremaped region.

CC: Tomasz Figa 
Signed-off-by: Cho KyongHo 
Signed-off-by: Shaik Ameer Basha 
---
 drivers/iommu/exynos-iommu.c |  255 ++
 1 file changed, 85 insertions(+), 170 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 0740189..8d7c3f9 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -29,8 +29,6 @@
 #include 
 #include 
 
-#include 
-
 /* We does not consider super section mapping (16MB) */
 #define SECT_ORDER 20
 #define LPAGE_ORDER 16
@@ -108,7 +106,8 @@ static unsigned long *section_entry(unsigned long *pgtable, 
unsigned long iova)
 
 static unsigned long *page_entry(unsigned long *sent, unsigned long iova)
 {
-   return (unsigned long *)__va(lv2table_base(sent)) + lv2ent_offset(iova);
+   return (unsigned long *)phys_to_virt(
+   lv2table_base(sent)) + lv2ent_offset(iova);
 }
 
 enum exynos_sysmmu_inttype {
@@ -132,7 +131,7 @@ enum exynos_sysmmu_inttype {
  * translated. This is 0 if @itype is SYSMMU_BUSERROR.
  */
 typedef int (*sysmmu_fault_handler_t)(enum exynos_sysmmu_inttype itype,
-   unsigned long pgtable_base, unsigned long fault_addr);
+   phys_addr_t pgtable_base, unsigned long fault_addr);
 
 static unsigned short fault_reg_offset[SYSMMU_FAULTS_NUM] = {
REG_PAGE_FAULT_ADDR,
@@ -170,14 +169,13 @@ struct sysmmu_drvdata {
struct device *sysmmu;  /* System MMU's device descriptor */
struct device *dev; /* Owner of system MMU */
char *dbgname;
-   int nsfrs;
-   void __iomem **sfrbases;
-   struct clk *clk[2];
+   void __iomem *sfrbase;
+   struct clk *clk;
int activations;
rwlock_t lock;
struct iommu_domain *domain;
sysmmu_fault_handler_t fault_handler;
-   unsigned long pgtable;
+   phys_addr_t pgtable;
 };
 
 static bool set_sysmmu_active(struct sysmmu_drvdata *data)
@@ -266,17 +264,17 @@ void exynos_sysmmu_set_fault_handler(struct device *dev,
 }
 
 static int default_fault_handler(enum exynos_sysmmu_inttype itype,
-unsigned long pgtable_base, unsigned long fault_addr)
+   phys_addr_t pgtable_base, unsigned long fault_addr)
 {
unsigned long *ent;
 
if ((itype >= SYSMMU_FAULTS_NUM) || (itype < SYSMMU_PAGEFAULT))
itype = SYSMMU_FAULT_UNKNOWN;
 
-   pr_err("%s occurred at 0x%lx(Page table base: 0x%lx)\n",
-   sysmmu_fault_name[itype], fault_addr, pgtable_base);
+   pr_err("%s occurred at 0x%lx(Page table base: %pa)\n",
+   sysmmu_fault_name[itype], fault_addr, &pgtable_base);
 
-   ent = section_entry(__va(pgtable_base), fault_addr);
+   ent = section_entry(phys_to_virt(pgtable_base), fault_addr);
pr_err("\tLv1 entry: 0x%lx\n", *ent);
 
if (lv1ent_page(ent)) {
@@ -295,56 +293,39 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void 
*dev_id)
 {
/* SYSMMU is in blocked when interrupt occurred. */
struct sysmmu_drvdata *data = dev_id;
-   struct resource *irqres;
-   struct platform_device *pdev;
enum exynos_sysmmu_inttype itype;
unsigned long addr = -1;
-
-   int i, ret = -ENOSYS;
+   int ret = -ENOSYS;
 
read_lock(&data->lock);
 
WARN_ON(!is_sysmmu_active(data));
 
-   pdev = to_platform_device(data->sysmmu);
-   for (i = 0; i < (pdev->num_resources / 2); i++) {
-   irqres = platform_get_resource(pdev, IORESOURCE_IRQ, i);
-   if (irqres && ((int)irqres->start == irq))
-   break;
-   }
-
-   if (i == pdev->num_resources) {
+   itype = (enum exynos_sysmmu_inttype)
+   __ffs(__raw_readl(data->sfrbase + REG_INT_STATUS));
+   if (WARN_ON(!((itype >= 0) && (itype < SYSMMU_FAULT_UNKNOWN
itype = SYSMMU_FAULT_UNKNOWN;
-   } else {
-   itype = (enum exynos_sysmmu_inttype)
-   __ffs(__raw_readl(data->sfrbases[i] + REG_INT_STATUS));
-   if (WARN_ON(!((itype >= 0) && (it

[PATCH v13 02/19] iommu/exynos: change error handling when page table update is failed

2014-05-11 Thread Shaik Ameer Basha
From: Cho KyongHo 

This patch changes not to panic on any error when updating page table.
Instead prints error messages with callstack.

Signed-off-by: Cho KyongHo 
Signed-off-by: Shaik Ameer Basha 
---
 drivers/iommu/exynos-iommu.c |   58 --
 1 file changed, 44 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 8d7c3f9..aec7fd7 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -728,13 +728,18 @@ finish:
 static unsigned long *alloc_lv2entry(unsigned long *sent, unsigned long iova,
short *pgcounter)
 {
+   if (lv1ent_section(sent)) {
+   WARN(1, "Trying mapping on %#08lx mapped with 1MiB page", iova);
+   return ERR_PTR(-EADDRINUSE);
+   }
+
if (lv1ent_fault(sent)) {
unsigned long *pent;
 
pent = kzalloc(LV2TABLE_SIZE, GFP_ATOMIC);
BUG_ON((unsigned long)pent & (LV2TABLE_SIZE - 1));
if (!pent)
-   return NULL;
+   return ERR_PTR(-ENOMEM);
 
*sent = mk_lv1ent_page(virt_to_phys(pent));
*pgcounter = NUM_LV2ENTRIES;
@@ -745,14 +750,21 @@ static unsigned long *alloc_lv2entry(unsigned long *sent, 
unsigned long iova,
return page_entry(sent, iova);
 }
 
-static int lv1set_section(unsigned long *sent, phys_addr_t paddr, short *pgcnt)
+static int lv1set_section(unsigned long *sent, unsigned long iova,
+ phys_addr_t paddr, short *pgcnt)
 {
-   if (lv1ent_section(sent))
+   if (lv1ent_section(sent)) {
+   WARN(1, "Trying mapping on 1MiB@%#08lx that is mapped",
+   iova);
return -EADDRINUSE;
+   }
 
if (lv1ent_page(sent)) {
-   if (*pgcnt != NUM_LV2ENTRIES)
+   if (*pgcnt != NUM_LV2ENTRIES) {
+   WARN(1, "Trying mapping on 1MiB@%#08lx that is mapped",
+   iova);
return -EADDRINUSE;
+   }
 
kfree(page_entry(sent, 0));
 
@@ -770,8 +782,10 @@ static int lv2set_page(unsigned long *pent, phys_addr_t 
paddr, size_t size,
short *pgcnt)
 {
if (size == SPAGE_SIZE) {
-   if (!lv2ent_fault(pent))
+   if (!lv2ent_fault(pent)) {
+   WARN(1, "Trying mapping on 4KiB where mapping exists");
return -EADDRINUSE;
+   }
 
*pent = mk_lv2ent_spage(paddr);
pgtable_flush(pent, pent + 1);
@@ -780,7 +794,10 @@ static int lv2set_page(unsigned long *pent, phys_addr_t 
paddr, size_t size,
int i;
for (i = 0; i < SPAGES_PER_LPAGE; i++, pent++) {
if (!lv2ent_fault(pent)) {
-   memset(pent, 0, sizeof(*pent) * i);
+   WARN(1,
+   "Trying mapping on 64KiB where mapping exists");
+   if (i > 0)
+   memset(pent - i, 0, sizeof(*pent) * i);
return -EADDRINUSE;
}
 
@@ -808,7 +825,7 @@ static int exynos_iommu_map(struct iommu_domain *domain, 
unsigned long iova,
entry = section_entry(priv->pgtable, iova);
 
if (size == SECT_SIZE) {
-   ret = lv1set_section(entry, paddr,
+   ret = lv1set_section(entry, iova, paddr,
&priv->lv2entcnt[lv1ent_offset(iova)]);
} else {
unsigned long *pent;
@@ -816,17 +833,16 @@ static int exynos_iommu_map(struct iommu_domain *domain, 
unsigned long iova,
pent = alloc_lv2entry(entry, iova,
&priv->lv2entcnt[lv1ent_offset(iova)]);
 
-   if (!pent)
-   ret = -ENOMEM;
+   if (IS_ERR(pent))
+   ret = PTR_ERR(pent);
else
ret = lv2set_page(pent, paddr, size,
&priv->lv2entcnt[lv1ent_offset(iova)]);
}
 
-   if (ret) {
+   if (ret)
pr_debug("%s: Failed to map iova 0x%lx/0x%x bytes\n",
__func__, iova, size);
-   }
 
spin_unlock_irqrestore(&priv->pgtablelock, flags);
 
@@ -840,6 +856,7 @@ static size_t exynos_iommu_unmap(struct iommu_domain 
*domain,
struct sysmmu_drvdata *data;
unsigned long flags;
unsigned long *ent;
+   size_t err_pgsize;
 
BUG_ON(priv->pgtable == NULL);
 
@@ -848,7 +865,10 @@ static size_t exynos_iommu_unmap(struct iommu_domain 
*domain,
ent = section_entry(priv->pgtable, iova);
 
i

[PATCH v13 05/19] iommu/exynos: remove prefetch buffer setting

2014-05-11 Thread Shaik Ameer Basha
From: Cho KyongHo 

Prefetch buffer is a cache of System MMU 3.x and caches a block of
page table entries to make effect of larger page with small pages.
However, how to control prefetch buffers and the specifications of
prefetch buffers different from minor versions of System MMU v3.
Prefetch buffers must be controled with care because there are some
restrictions in H/W design.

The interface and implementation to initiate prefetch buffers will
be prepared later.

Signed-off-by: Cho KyongHo 
Signed-off-by: Shaik Ameer Basha 
---
 drivers/iommu/exynos-iommu.c |   14 --
 1 file changed, 14 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 06fc70e..4fc31fc 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -245,13 +245,6 @@ static void __sysmmu_set_ptbase(void __iomem *sfrbase,
__sysmmu_tlb_invalidate(sfrbase);
 }
 
-static void __sysmmu_set_prefbuf(void __iomem *sfrbase, unsigned long base,
-   unsigned long size, int idx)
-{
-   __raw_writel(base, sfrbase + REG_PB0_SADDR + idx * 8);
-   __raw_writel(size - 1 + base,  sfrbase + REG_PB0_EADDR + idx * 8);
-}
-
 static void __set_fault_handler(struct sysmmu_drvdata *data,
sysmmu_fault_handler_t handler)
 {
@@ -401,13 +394,6 @@ static int __exynos_sysmmu_enable(struct sysmmu_drvdata 
*data,
data->pgtable = pgtable;
 
__sysmmu_set_ptbase(data->sfrbase, pgtable);
-   if ((readl(data->sfrbase + REG_MMU_VERSION) >> 28) == 3) {
-   /* System MMU version is 3.x */
-   __raw_writel((1 << 12) | (2 << 28),
-   data->sfrbase + REG_MMU_CFG);
-   __sysmmu_set_prefbuf(data->sfrbase, 0, -1, 0);
-   __sysmmu_set_prefbuf(data->sfrbase, 0, -1, 1);
-   }
 
__raw_writel(CTRL_ENABLE, data->sfrbase + REG_MMU_CTRL);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v13 07/19] iommu/exynos: always enable runtime PM

2014-05-11 Thread Shaik Ameer Basha
From: Cho KyongHo 

Checking if the probing device has a parent device was just to discover
if the probing device is involved in a power domain when the power
domain controlled by Samsung's custom implementation.
Since generic IO power domain is applied, it is required to remove
the condition to see if the probing device has a parent device.

Signed-off-by: Cho KyongHo 
Signed-off-by: Shaik Ameer Basha 
---
 drivers/iommu/exynos-iommu.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 6915235..ef771a2 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -558,8 +558,7 @@ static int exynos_sysmmu_probe(struct platform_device *pdev)
 
platform_set_drvdata(pdev, data);
 
-   if (dev->parent)
-   pm_runtime_enable(dev);
+   pm_runtime_enable(dev);
 
dev_dbg(dev, "(%s) Initialized\n", data->dbgname);
return 0;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v13 08/19] iommu/exynos: remove dbgname from drvdata of a System MMU

2014-05-11 Thread Shaik Ameer Basha
From: Cho KyongHo 

This patch removes dbgname member from sysmmu_drvdata structure.
Kernel message for debugging already has the name of a single
System MMU node. It also removes some compilation warnings.

Signed-off-by: Cho KyongHo 
Signed-off-by: Shaik Ameer Basha 
---
 drivers/iommu/exynos-iommu.c |   32 +---
 1 file changed, 13 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index ef771a2..be7a7b9 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -170,7 +170,6 @@ struct sysmmu_drvdata {
struct list_head node; /* entry of exynos_iommu_domain.clients */
struct device *sysmmu;  /* System MMU's device descriptor */
struct device *dev; /* Owner of system MMU */
-   char *dbgname;
void __iomem *sfrbase;
struct clk *clk;
int activations;
@@ -321,8 +320,8 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id)
if (!ret && (itype != SYSMMU_FAULT_UNKNOWN))
__raw_writel(1 << itype, data->sfrbase + REG_INT_CLEAR);
else
-   dev_dbg(data->sysmmu, "(%s) %s is not handled.\n",
-   data->dbgname, sysmmu_fault_name[itype]);
+   dev_dbg(data->sysmmu, "%s is not handled.\n",
+   sysmmu_fault_name[itype]);
 
if (itype != SYSMMU_FAULT_UNKNOWN)
sysmmu_unblock(data->sfrbase);
@@ -354,10 +353,10 @@ finish:
write_unlock_irqrestore(&data->lock, flags);
 
if (disabled)
-   dev_dbg(data->sysmmu, "(%s) Disabled\n", data->dbgname);
+   dev_dbg(data->sysmmu, "Disabled\n");
else
-   dev_dbg(data->sysmmu, "(%s) %d times left to be disabled\n",
-   data->dbgname, data->activations);
+   dev_dbg(data->sysmmu, "%d times left to be disabled\n",
+   data->activations);
 
return disabled;
 }
@@ -384,7 +383,7 @@ static int __exynos_sysmmu_enable(struct sysmmu_drvdata 
*data,
ret = 1;
}
 
-   dev_dbg(data->sysmmu, "(%s) Already enabled\n", data->dbgname);
+   dev_dbg(data->sysmmu, "Already enabled\n");
goto finish;
}
 
@@ -399,7 +398,7 @@ static int __exynos_sysmmu_enable(struct sysmmu_drvdata 
*data,
 
data->domain = domain;
 
-   dev_dbg(data->sysmmu, "(%s) Enabled\n", data->dbgname);
+   dev_dbg(data->sysmmu, "Enabled\n");
 finish:
write_unlock_irqrestore(&data->lock, flags);
 
@@ -415,16 +414,15 @@ int exynos_sysmmu_enable(struct device *dev, unsigned 
long pgtable)
 
ret = pm_runtime_get_sync(data->sysmmu);
if (ret < 0) {
-   dev_dbg(data->sysmmu, "(%s) Failed to enable\n", data->dbgname);
+   dev_dbg(data->sysmmu, "Failed to enable\n");
return ret;
}
 
ret = __exynos_sysmmu_enable(data, pgtable, NULL);
if (WARN_ON(ret < 0)) {
pm_runtime_put(data->sysmmu);
-   dev_err(data->sysmmu,
-   "(%s) Already enabled with page table %#x\n",
-   data->dbgname, data->pgtable);
+   dev_err(data->sysmmu, "Already enabled with page table %#x\n",
+   data->pgtable);
} else {
data->dev = dev;
}
@@ -474,9 +472,7 @@ static void sysmmu_tlb_invalidate_entry(struct device *dev, 
unsigned long iova,
sysmmu_unblock(data->sfrbase);
}
} else {
-   dev_dbg(data->sysmmu,
-   "(%s) Disabled. Skipping invalidating TLB.\n",
-   data->dbgname);
+   dev_dbg(data->sysmmu, "Disabled. Skipping invalidating TLB.\n");
}
 
read_unlock_irqrestore(&data->lock, flags);
@@ -495,9 +491,7 @@ void exynos_sysmmu_tlb_invalidate(struct device *dev)
sysmmu_unblock(data->sfrbase);
}
} else {
-   dev_dbg(data->sysmmu,
-   "(%s) Disabled. Skipping invalidating TLB.\n",
-   data->dbgname);
+   dev_dbg(data->sysmmu, "Disabled. Skipping invalidating TLB.\n");
}
 
read_unlock_irqrestore(&data->lock, flags);
@@ -560,7 +554,7 @@ static int exynos_sysmmu_probe(struct platform_device *pdev)
 
pm_runtime_enable(dev);
 
-   dev_dbg(dev, "(%s) Initialized\n", data->dbgname);
+   dev_dbg(dev, "Initialized\n");
return 0;
 err_irq:
free_irq(platform_get_irq(pdev, 0), data);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v13 06/19] iommu/exynos: add missing cache flush for removed page table entries

2014-05-11 Thread Shaik Ameer Basha
From: Cho KyongHo 

This commit adds cache flush for removed small and large page entries
in exynos_iommu_unmap(). Missing cache flush of removed page table
entries can cause missing page fault interrupt when a master IP
accesses an unmapped area.

Reviewed-by: Tomasz Figa 
Tested-by: Grant Grundler 
Signed-off-by: Cho KyongHo 
Signed-off-by: Shaik Ameer Basha 
---
 drivers/iommu/exynos-iommu.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 4fc31fc..6915235 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -904,6 +904,7 @@ static size_t exynos_iommu_unmap(struct iommu_domain 
*domain,
if (lv2ent_small(ent)) {
*ent = 0;
size = SPAGE_SIZE;
+   pgtable_flush(ent, ent + 1);
priv->lv2entcnt[lv1ent_offset(iova)] += 1;
goto done;
}
@@ -915,6 +916,7 @@ static size_t exynos_iommu_unmap(struct iommu_domain 
*domain,
}
 
memset(ent, 0, sizeof(*ent) * SPAGES_PER_LPAGE);
+   pgtable_flush(ent, ent + SPAGES_PER_LPAGE);
 
size = LPAGE_SIZE;
priv->lv2entcnt[lv1ent_offset(iova)] += SPAGES_PER_LPAGE;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v13 09/19] iommu/exynos: use managed device helper functions

2014-05-11 Thread Shaik Ameer Basha
From: Cho KyongHo 

This patch uses managed device helper functions in the probe().

Signed-off-by: Cho KyongHo 
Signed-off-by: Shaik Ameer Basha 
---
 drivers/iommu/exynos-iommu.c |   68 --
 1 file changed, 25 insertions(+), 43 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index be7a7b9..c86e374 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -343,8 +343,7 @@ static bool __exynos_sysmmu_disable(struct sysmmu_drvdata 
*data)
 
__raw_writel(CTRL_DISABLE, data->sfrbase + REG_MMU_CTRL);
 
-   if (!IS_ERR(data->clk))
-   clk_disable(data->clk);
+   clk_disable(data->clk);
 
disabled = true;
data->pgtable = 0;
@@ -387,8 +386,7 @@ static int __exynos_sysmmu_enable(struct sysmmu_drvdata 
*data,
goto finish;
}
 
-   if (!IS_ERR(data->clk))
-   clk_enable(data->clk);
+   clk_enable(data->clk);
 
data->pgtable = pgtable;
 
@@ -499,49 +497,43 @@ void exynos_sysmmu_tlb_invalidate(struct device *dev)
 
 static int exynos_sysmmu_probe(struct platform_device *pdev)
 {
-   int ret;
+   int irq, ret;
struct device *dev = &pdev->dev;
struct sysmmu_drvdata *data;
struct resource *res;
 
-   data = kzalloc(sizeof(*data), GFP_KERNEL);
-   if (!data) {
-   dev_dbg(dev, "Not enough memory\n");
-   ret = -ENOMEM;
-   goto err_alloc;
-   }
+   data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   if (!res) {
-   dev_dbg(dev, "Unable to find IOMEM region\n");
-   ret = -ENOENT;
-   goto err_init;
-   }
+   data->sfrbase = devm_ioremap_resource(dev, res);
+   if (IS_ERR(data->sfrbase))
+   return PTR_ERR(data->sfrbase);
 
-   data->sfrbase = ioremap(res->start, resource_size(res));
-   if (!data->sfrbase) {
-   dev_dbg(dev, "Unable to map IOMEM @ PA:%#x\n", res->start);
-   ret = -ENOENT;
-   goto err_res;
-   }
-
-   ret = platform_get_irq(pdev, 0);
-   if (ret <= 0) {
+   irq = platform_get_irq(pdev, 0);
+   if (irq <= 0) {
dev_dbg(dev, "Unable to find IRQ resource\n");
-   goto err_irq;
+   return irq;
}
 
-   ret = request_irq(ret, exynos_sysmmu_irq, 0,
+   ret = devm_request_irq(dev, irq, exynos_sysmmu_irq, 0,
dev_name(dev), data);
if (ret) {
-   dev_dbg(dev, "Unabled to register interrupt handler\n");
-   goto err_irq;
+   dev_err(dev, "Unabled to register handler of irq %d\n", irq);
+   return ret;
}
 
-   if (dev_get_platdata(dev)) {
-   data->clk = clk_get(dev, "sysmmu");
-   if (IS_ERR(data->clk))
-   dev_dbg(dev, "No clock descriptor registered\n");
+   data->clk = devm_clk_get(dev, "sysmmu");
+   if (IS_ERR(data->clk)) {
+   dev_err(dev, "Failed to get clock!\n");
+   return PTR_ERR(data->clk);
+   } else  {
+   ret = clk_prepare(data->clk);
+   if (ret) {
+   dev_err(dev, "Failed to prepare clk\n");
+   return ret;
+   }
}
 
data->sysmmu = dev;
@@ -554,17 +546,7 @@ static int exynos_sysmmu_probe(struct platform_device 
*pdev)
 
pm_runtime_enable(dev);
 
-   dev_dbg(dev, "Initialized\n");
return 0;
-err_irq:
-   free_irq(platform_get_irq(pdev, 0), data);
-err_res:
-   iounmap(data->sfrbase);
-err_init:
-   kfree(data);
-err_alloc:
-   dev_err(dev, "Failed to initialize\n");
-   return ret;
 }
 
 static struct platform_driver exynos_sysmmu_driver = {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v13 11/19] iommu/exynos: remove custom fault handler

2014-05-11 Thread Shaik Ameer Basha
From: Cho KyongHo 

This commit removes custom fault handler. The device drivers that
need to register fault handler can register
with iommu_set_fault_handler().

CC: Grant Grundler 
Signed-off-by: Cho KyongHo 
Signed-off-by: Shaik Ameer Basha 
---
 drivers/iommu/exynos-iommu.c |   80 +-
 1 file changed, 24 insertions(+), 56 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 5af5c5c..c1be65f 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -125,16 +125,6 @@ enum exynos_sysmmu_inttype {
SYSMMU_FAULTS_NUM
 };
 
-/*
- * @itype: type of fault.
- * @pgtable_base: the physical address of page table base. This is 0 if @itype
- *is SYSMMU_BUSERROR.
- * @fault_addr: the device (virtual) address that the System MMU tried to
- * translated. This is 0 if @itype is SYSMMU_BUSERROR.
- */
-typedef int (*sysmmu_fault_handler_t)(enum exynos_sysmmu_inttype itype,
-   phys_addr_t pgtable_base, unsigned long fault_addr);
-
 static unsigned short fault_reg_offset[SYSMMU_FAULTS_NUM] = {
REG_PAGE_FAULT_ADDR,
REG_AR_FAULT_ADDR,
@@ -176,7 +166,6 @@ struct sysmmu_drvdata {
int activations;
rwlock_t lock;
struct iommu_domain *domain;
-   sysmmu_fault_handler_t fault_handler;
phys_addr_t pgtable;
 };
 
@@ -245,34 +234,17 @@ static void __sysmmu_set_ptbase(void __iomem *sfrbase,
__sysmmu_tlb_invalidate(sfrbase);
 }
 
-static void __set_fault_handler(struct sysmmu_drvdata *data,
-   sysmmu_fault_handler_t handler)
-{
-   unsigned long flags;
-
-   write_lock_irqsave(&data->lock, flags);
-   data->fault_handler = handler;
-   write_unlock_irqrestore(&data->lock, flags);
-}
-
-void exynos_sysmmu_set_fault_handler(struct device *dev,
-   sysmmu_fault_handler_t handler)
-{
-   struct sysmmu_drvdata *data = dev_get_drvdata(dev->archdata.iommu);
-
-   __set_fault_handler(data, handler);
-}
-
-static int default_fault_handler(enum exynos_sysmmu_inttype itype,
-   phys_addr_t pgtable_base, unsigned long fault_addr)
+static void show_fault_information(const char *name,
+   enum exynos_sysmmu_inttype itype,
+   phys_addr_t pgtable_base, unsigned long fault_addr)
 {
unsigned long *ent;
 
if ((itype >= SYSMMU_FAULTS_NUM) || (itype < SYSMMU_PAGEFAULT))
itype = SYSMMU_FAULT_UNKNOWN;
 
-   pr_err("%s occurred at 0x%lx(Page table base: %pa)\n",
-   sysmmu_fault_name[itype], fault_addr, &pgtable_base);
+   pr_err("%s occurred at %#lx by %s(Page table base: %pa)\n",
+   sysmmu_fault_name[itype], fault_addr, name, &pgtable_base);
 
ent = section_entry(phys_to_virt(pgtable_base), fault_addr);
pr_err("\tLv1 entry: 0x%lx\n", *ent);
@@ -281,12 +253,6 @@ static int default_fault_handler(enum 
exynos_sysmmu_inttype itype,
ent = page_entry(ent, fault_addr);
pr_err("\t Lv2 entry: 0x%lx\n", *ent);
}
-
-   pr_err("Generating Kernel OOPS... because it is unrecoverable.\n");
-
-   BUG();
-
-   return 0;
 }
 
 static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id)
@@ -310,24 +276,28 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void 
*dev_id)
else
addr = __raw_readl(data->sfrbase + fault_reg_offset[itype]);
 
-   if (data->domain)
-   ret = report_iommu_fault(data->domain, data->dev, addr, itype);
-
-   if ((ret == -ENOSYS) && data->fault_handler) {
-   unsigned long base = data->pgtable;
-   if (itype != SYSMMU_FAULT_UNKNOWN)
-   base = __raw_readl(data->sfrbase + REG_PT_BASE_ADDR);
-   ret = data->fault_handler(itype, base, addr);
+   if (itype == SYSMMU_FAULT_UNKNOWN) {
+   pr_err("%s: Fault is not occurred by System MMU '%s'!\n",
+   __func__, dev_name(data->sysmmu));
+   pr_err("%s: Please check if IRQ is correctly configured.\n",
+   __func__);
+   BUG();
+   } else {
+   unsigned long base =
+   __raw_readl(data->sfrbase + REG_PT_BASE_ADDR);
+   show_fault_information(dev_name(data->sysmmu),
+   itype, base, addr);
+   if (data->domain)
+   ret = report_iommu_fault(data->domain,
+   data->dev, addr, itype);
}
 
-   if (!ret && (itype != SYSMMU_FAULT_UNKNOWN))
-   __raw_writel(1 << itype, data->sfrbase + REG_INT_CLEAR);
-   else
-   dev_dbg(data->sysmmu, "%s is not handled.\n",
-   sysmmu_fault_name[itype]);
+   /* fault is not recovered by fault han

[PATCH v13 13/19] iommu/exynos: use exynos-iommu specific typedef

2014-05-11 Thread Shaik Ameer Basha
From: Cho KyongHo 

This commit introduces sysmmu_pte_t for page table entries and
sysmmu_iova_t vor I/O virtual address that is manipulated by
exynos-iommu driver. The purpose of the typedef is to remove
dependencies to the driver code from the change of CPU architecture
from 32 bit to 64 bit.

Signed-off-by: Cho KyongHo 
Signed-off-by: Shaik Ameer Basha 
---
 drivers/iommu/exynos-iommu.c |  101 --
 1 file changed, 59 insertions(+), 42 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index d89ad5f..3291619 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -29,6 +29,9 @@
 #include 
 #include 
 
+typedef u32 sysmmu_iova_t;
+typedef u32 sysmmu_pte_t;
+
 /* We does not consider super section mapping (16MB) */
 #define SECT_ORDER 20
 #define LPAGE_ORDER 16
@@ -50,20 +53,32 @@
 #define lv2ent_small(pent) ((*(pent) & 2) == 2)
 #define lv2ent_large(pent) ((*(pent) & 3) == 1)
 
+static u32 sysmmu_page_offset(sysmmu_iova_t iova, u32 size)
+{
+   return iova & (size - 1);
+}
+
 #define section_phys(sent) (*(sent) & SECT_MASK)
-#define section_offs(iova) ((iova) & 0xF)
+#define section_offs(iova) sysmmu_page_offset((iova), SECT_SIZE)
 #define lpage_phys(pent) (*(pent) & LPAGE_MASK)
-#define lpage_offs(iova) ((iova) & 0x)
+#define lpage_offs(iova) sysmmu_page_offset((iova), LPAGE_SIZE)
 #define spage_phys(pent) (*(pent) & SPAGE_MASK)
-#define spage_offs(iova) ((iova) & 0xFFF)
-
-#define lv1ent_offset(iova) ((iova) >> SECT_ORDER)
-#define lv2ent_offset(iova) (((iova) & 0xFF000) >> SPAGE_ORDER)
+#define spage_offs(iova) sysmmu_page_offset((iova), SPAGE_SIZE)
 
 #define NUM_LV1ENTRIES 4096
-#define NUM_LV2ENTRIES 256
+#define NUM_LV2ENTRIES (SECT_SIZE / SPAGE_SIZE)
 
-#define LV2TABLE_SIZE (NUM_LV2ENTRIES * sizeof(long))
+static u32 lv1ent_offset(sysmmu_iova_t iova)
+{
+   return iova >> SECT_ORDER;
+}
+
+static u32 lv2ent_offset(sysmmu_iova_t iova)
+{
+   return (iova >> SPAGE_ORDER) & (NUM_LV2ENTRIES - 1);
+}
+
+#define LV2TABLE_SIZE (NUM_LV2ENTRIES * sizeof(sysmmu_pte_t))
 
 #define SPAGES_PER_LPAGE (LPAGE_SIZE / SPAGE_SIZE)
 
@@ -101,14 +116,14 @@
 
 static struct kmem_cache *lv2table_kmem_cache;
 
-static unsigned long *section_entry(unsigned long *pgtable, unsigned long iova)
+static sysmmu_pte_t *section_entry(sysmmu_pte_t *pgtable, sysmmu_iova_t iova)
 {
return pgtable + lv1ent_offset(iova);
 }
 
-static unsigned long *page_entry(unsigned long *sent, unsigned long iova)
+static sysmmu_pte_t *page_entry(sysmmu_pte_t *sent, sysmmu_iova_t iova)
 {
-   return (unsigned long *)phys_to_virt(
+   return (sysmmu_pte_t *)phys_to_virt(
lv2table_base(sent)) + lv2ent_offset(iova);
 }
 
@@ -150,7 +165,7 @@ static char *sysmmu_fault_name[SYSMMU_FAULTS_NUM] = {
 
 struct exynos_iommu_domain {
struct list_head clients; /* list of sysmmu_drvdata.node */
-   unsigned long *pgtable; /* lv1 page table, 16KB */
+   sysmmu_pte_t *pgtable; /* lv1 page table, 16KB */
short *lv2entcnt; /* free lv2 entry counter for each section */
spinlock_t lock; /* lock for this structure */
spinlock_t pgtablelock; /* lock for modifying page table @ pgtable */
@@ -215,7 +230,7 @@ static void __sysmmu_tlb_invalidate(void __iomem *sfrbase)
 }
 
 static void __sysmmu_tlb_invalidate_entry(void __iomem *sfrbase,
-   unsigned long iova, unsigned int num_inv)
+   sysmmu_iova_t iova, unsigned int num_inv)
 {
unsigned int i;
for (i = 0; i < num_inv; i++) {
@@ -226,7 +241,7 @@ static void __sysmmu_tlb_invalidate_entry(void __iomem 
*sfrbase,
 }
 
 static void __sysmmu_set_ptbase(void __iomem *sfrbase,
-  unsigned long pgd)
+  phys_addr_t pgd)
 {
__raw_writel(0x1, sfrbase + REG_MMU_CFG); /* 16KB LV1, LRU */
__raw_writel(pgd, sfrbase + REG_PT_BASE_ADDR);
@@ -236,22 +251,22 @@ static void __sysmmu_set_ptbase(void __iomem *sfrbase,
 
 static void show_fault_information(const char *name,
enum exynos_sysmmu_inttype itype,
-   phys_addr_t pgtable_base, unsigned long fault_addr)
+   phys_addr_t pgtable_base, sysmmu_iova_t fault_addr)
 {
-   unsigned long *ent;
+   sysmmu_pte_t *ent;
 
if ((itype >= SYSMMU_FAULTS_NUM) || (itype < SYSMMU_PAGEFAULT))
itype = SYSMMU_FAULT_UNKNOWN;
 
-   pr_err("%s occurred at %#lx by %s(Page table base: %pa)\n",
+   pr_err("%s occurred at %#x by %s(Page table base: %pa)\n",
sysmmu_fault_name[itype], fault_addr, name, &pgtable_base);
 
ent = section_entry(phys_to_virt(pgtable_base), fault_addr);
-   pr_err("\tLv1 entry: 0x%lx\n", *ent);
+   pr_err("\tLv1 entry: %#x\n", *ent);
 
if (lv1ent_page(ent)) {
ent = page_entry(ent, fault_addr);
-  

[PATCH v13 15/19] iommu/exynos: enhanced error messages

2014-05-11 Thread Shaik Ameer Basha
From: Cho KyongHo 

Some redundant error message is removed and some error messages
are changed to error level from debug level.

Signed-off-by: Cho KyongHo 
Signed-off-by: Shaik Ameer Basha 
---
 drivers/iommu/exynos-iommu.c |   23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index d18dc37..7188b47 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -525,7 +525,7 @@ static int exynos_sysmmu_probe(struct platform_device *pdev)
 
irq = platform_get_irq(pdev, 0);
if (irq <= 0) {
-   dev_dbg(dev, "Unable to find IRQ resource\n");
+   dev_err(dev, "Unable to find IRQ resource\n");
return irq;
}
 
@@ -787,10 +787,8 @@ static int lv2set_page(sysmmu_pte_t *pent, phys_addr_t 
paddr, size_t size,
short *pgcnt)
 {
if (size == SPAGE_SIZE) {
-   if (!lv2ent_fault(pent)) {
-   WARN(1, "Trying mapping on 4KiB where mapping exists");
+   if (WARN_ON(!lv2ent_fault(pent)))
return -EADDRINUSE;
-   }
 
*pent = mk_lv2ent_spage(paddr);
pgtable_flush(pent, pent + 1);
@@ -798,9 +796,7 @@ static int lv2set_page(sysmmu_pte_t *pent, phys_addr_t 
paddr, size_t size,
} else { /* size == LPAGE_SIZE */
int i;
for (i = 0; i < SPAGES_PER_LPAGE; i++, pent++) {
-   if (!lv2ent_fault(pent)) {
-   WARN(1,
-   "Trying mapping on 64KiB where mapping exists");
+   if (WARN_ON(!lv2ent_fault(pent))) {
if (i > 0)
memset(pent - i, 0, sizeof(*pent) * i);
return -EADDRINUSE;
@@ -847,8 +843,8 @@ static int exynos_iommu_map(struct iommu_domain *domain, 
unsigned long l_iova,
}
 
if (ret)
-   pr_debug("%s: Failed to map iova %#x/%#zx bytes\n",
-   __func__, iova, size);
+   pr_err("%s: Failed(%d) to map %#zx bytes @ %#x\n",
+   __func__, ret, size, iova);
 
spin_unlock_irqrestore(&priv->pgtablelock, flags);
 
@@ -872,7 +868,7 @@ static size_t exynos_iommu_unmap(struct iommu_domain 
*domain,
ent = section_entry(priv->pgtable, iova);
 
if (lv1ent_section(ent)) {
-   if (size < SECT_SIZE) {
+   if (WARN_ON(size < SECT_SIZE)) {
err_pgsize = SECT_SIZE;
goto err;
}
@@ -907,7 +903,7 @@ static size_t exynos_iommu_unmap(struct iommu_domain 
*domain,
}
 
/* lv1ent_large(ent) == true here */
-   if (size < LPAGE_SIZE) {
+   if (WARN_ON(size < LPAGE_SIZE)) {
err_pgsize = LPAGE_SIZE;
goto err;
}
@@ -929,9 +925,8 @@ done:
 err:
spin_unlock_irqrestore(&priv->pgtablelock, flags);
 
-   WARN(1,
-   "%s: Failed due to size(%#zx) @ %#x is smaller than page size %#zx\n",
-   __func__, size, iova, err_pgsize);
+   pr_err("%s: Failed: size(%#zx) @ %#x is smaller than page size %#zx\n",
+   __func__, size, iova, err_pgsize);
 
return 0;
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Stilhaus Kitchens Complaints

2014-05-11 Thread kakonyet
Stilhaus Kitchens Complaints. fact is that there are no stilhaus kitchens
complaints.stilhaus
kitchens are the only kitchen company with no complaints


Stilhaus Kitchens Complaints   



--
View this message in context: 
http://linux-kernel.2935.n7.nabble.com/Stilhaus-Kitchens-Complaints-tp858010.html
Sent from the Linux Kernel mailing list archive at Nabble.com.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] PM / OPP: Add support for descending order for cpufreq table

2014-05-11 Thread Viresh Kumar
On 11 May 2014 17:08, jonghwan Choi  wrote:
> I already considered it.
> (But it only passes on  what cpufreq driver has to do to clock framework.
> For changing clock rate, if changing operation just divides a rate of
> parent it can be solved easily
> But exycpufreq driver is  more complicated.
>
> Previously, to change frequency, pll value and clk divider value were
> changed in cpufreq driver.
> Later someone moved the code which changes pll value to clock framework.
> In there, pll values are maintained as table per frequency. And if
> frequency is added/removed, values of
> pll table should be changed.
> when we change the pll value through clk_set_rate, internally  to find
> proper pll value,  pll table is searched.
> If proper pll value is found, that value is written into the register)
>
> My suggestion is that all these change details should be removed
> according to adding/removing frequency.
> I believe that cpufreq driver just writes a specific value per
> frequency  into the register for dvfs(Maybe other work is also needed)
>
> If we just describe the specific value per frequency in dts file, the
> driver will get that information through DT, and use it for DVFS.)
> Then when a new chip is  released(if the chip has the same h/w
> interface - register map), we only have to do as above.

We also want to make your life simple, but adding this field to OPP
table isn't the right approach for sure.

Can't you calculate the divider values at run time based on a frequency?
I think it should work. That way you can just code these calculations
in clock driver and things would work smoothly..

If there are problems, tell us what they are and we will try to find some
solution for you. .
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v13 14/19] iommu/exynos: add devices attached to the System MMU to an IOMMU group

2014-05-11 Thread Shaik Ameer Basha
From: Antonios Motakis 

Patch written by Antonios Motakis :

IOMMU groups are expected by certain users of the IOMMU API,
e.g. VFIO. Since each device is behind its own System MMU, we
can allocate a new IOMMU group for each device.

Reviewed-by: Cho KyongHo 
Signed-off-by: Antonios Motakis 
Signed-off-by: Shaik Ameeer Basha 
---
 drivers/iommu/exynos-iommu.c |   28 
 1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 3291619..d18dc37 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -964,6 +964,32 @@ static phys_addr_t exynos_iommu_iova_to_phys(struct 
iommu_domain *domain,
return phys;
 }
 
+static int exynos_iommu_add_device(struct device *dev)
+{
+   struct iommu_group *group;
+   int ret;
+
+   group = iommu_group_get(dev);
+
+   if (!group) {
+   group = iommu_group_alloc();
+   if (IS_ERR(group)) {
+   dev_err(dev, "Failed to allocate IOMMU group\n");
+   return PTR_ERR(group);
+   }
+   }
+
+   ret = iommu_group_add_device(group, dev);
+   iommu_group_put(group);
+
+   return ret;
+}
+
+static void exynos_iommu_remove_device(struct device *dev)
+{
+   iommu_group_remove_device(dev);
+}
+
 static struct iommu_ops exynos_iommu_ops = {
.domain_init = &exynos_iommu_domain_init,
.domain_destroy = &exynos_iommu_domain_destroy,
@@ -972,6 +998,8 @@ static struct iommu_ops exynos_iommu_ops = {
.map = &exynos_iommu_map,
.unmap = &exynos_iommu_unmap,
.iova_to_phys = &exynos_iommu_iova_to_phys,
+   .add_device = &exynos_iommu_add_device,
+   .remove_device = &exynos_iommu_remove_device,
.pgsize_bitmap = SECT_SIZE | LPAGE_SIZE | SPAGE_SIZE,
 };
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v13 16/19] documentation: iommu: add binding document of Exynos System MMU

2014-05-11 Thread Shaik Ameer Basha
From: Cho KyongHo 

This patch adds a description of the device tree binding for the
Samsung Exynos System MMU.

Signed-off-by: Cho KyongHo 
Signed-off-by: Shaik Ameer Basha 
---
 .../devicetree/bindings/iommu/samsung,sysmmu.txt   |   65 
 1 file changed, 65 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iommu/samsung,sysmmu.txt

diff --git a/Documentation/devicetree/bindings/iommu/samsung,sysmmu.txt 
b/Documentation/devicetree/bindings/iommu/samsung,sysmmu.txt
new file mode 100644
index 000..15b2a2b
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/samsung,sysmmu.txt
@@ -0,0 +1,65 @@
+Samsung Exynos IOMMU H/W, System MMU (System Memory Management Unit)
+
+Samsung's Exynos architecture contains System MMUs that enables scattered
+physical memory chunks visible as a contiguous region to DMA-capable peripheral
+devices like MFC, FIMC, FIMD, GScaler, FIMC-IS and so forth.
+
+System MMU is an IOMMU and supports identical translation table format to
+ARMv7 translation tables with minimum set of page properties including access
+permissions, shareability and security protection. In addition, System MMU has
+another capabilities like L2 TLB or block-fetch buffers to minimize translation
+latency.
+
+System MMUs are in many to one relation with peripheral devices, i.e. single
+peripheral device might have multiple System MMUs (usually one for each bus
+master), but one System MMU can handle transactions from only one peripheral
+device. The relation between a System MMU and the peripheral device needs to be
+defined in device node of the peripheral device.
+
+MFC in all Exynos SoCs and FIMD, M2M Scalers and G2D in Exynos5420 has 2 System
+MMUs.
+* MFC has one System MMU on its left and right bus.
+* FIMD in Exynos5420 has one System MMU for window 0 and 4, the other system 
MMU
+  for window 1, 2 and 3.
+* M2M Scalers and G2D in Exynos5420 has one System MMU on the read channel and
+  the other System MMU on the write channel.
+The drivers must consider how to handle those System MMUs. One of the idea is
+to implement child devices or sub-devices which are the client devices of the
+System MMU.
+
+Required properties:
+- compatible: Should be "samsung,exynos-sysmmu"
+- reg: A tuple of base address and size of System MMU registers.
+- interrupt-parent: The phandle of the interrupt controller of System MMU
+- interrupts: An interrupt specifier for interrupt signal of System MMU,
+ according to the format defined by a particular interrupt
+ controller.
+- clock-names: Should be "sysmmu" if the System MMU is needed to gate its 
clock.
+  Optional "master" if the clock to the System MMU is gated by
+  another gate clock other than "sysmmu".
+  Exynos4 SoCs, there needs no "master" clock.
+  Exynos5 SoCs, some System MMUs must have "master" clocks.
+- clocks: Required if the System MMU is needed to gate its clock.
+- samsung,power-domain: Required if the System MMU is needed to gate its power.
+ Please refer to the following document:
+ Documentation/devicetree/bindings/arm/exynos/power_domain.txt
+
+Examples:
+   gsc_0: gsc@13e0 {
+   compatible = "samsung,exynos5-gsc";
+   reg = <0x13e0 0x1000>;
+   interrupts = <0 85 0>;
+   samsung,power-domain = <&pd_gsc>;
+   clocks = <&clock CLK_GSCL0>;
+   clock-names = "gscl";
+   };
+
+   sysmmu_gsc0: sysmmu@13E8 {
+   compatible = "samsung,exynos-sysmmu";
+   reg = <0x13E8 0x1000>;
+   interrupt-parent = <&combiner>;
+   interrupts = <2 0>;
+   clock-names = "sysmmu", "master";
+   clocks = <&clock CLK_SMMU_GSCL0>, <&clock CLK_GSCL0>;
+   samsung,power-domain = <&pd_gsc>;
+   };
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v13 18/19] iommu/exynos: turn on useful configuration options

2014-05-11 Thread Shaik Ameer Basha
From: Cho KyongHo 

This turns on FLPD_CACHE, ACGEN and SYSSEL.

FLPD_CACHE is a cache of 1st level page table entries that contains
the address of a 2nd level page table to reduce latency of page table
walking.

ACGEN is architectural clock gating that gates clocks by System MMU
itself if it is not active. Note that ACGEN is different from clock
gating by the CPU. ACGEN just gates clocks to the internal logic of
System MMU while clock gating by the CPU gates clocks to the System
MMU.

SYSSEL selects System MMU version in some Exynos SoCs. Some Exynos
SoCs have an option to select System MMU versions exclusively because
the SoCs adopts new System MMU version experimentally.

This also always selects LRU as TLB replacement policy. Selecting TLB
replacement policy is deprecated from System MMU 3.2. TLB in System
MMU 3.3 has single TLB replacement policy, LRU. The bit of MMU_CFG
selecting TLB replacement policy is remained as reserved.

QoS value of page table walking is set to 15 (highst value). System
MMU 3.3 can inherit QoS value of page table walking from its master
H/W's transaction. This new feature is enabled by default and QoS
value written to MMU_CFG is ignored.

This patch also adds simplifies the sysmmu version checking by
introducing some macros.

Signed-off-by: Cho KyongHo 
Signed-off-by: Shaik Ameer Basha 
---
 drivers/iommu/exynos-iommu.c |   38 ++
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index b937490..26fb4d7 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -93,6 +93,13 @@ static u32 lv2ent_offset(sysmmu_iova_t iova)
 #define CTRL_BLOCK 0x7
 #define CTRL_DISABLE   0x0
 
+#define CFG_LRU0x1
+#define CFG_QOS(n) ((n & 0xF) << 7)
+#define CFG_MASK   0x0150 /* Selecting bit 0-15, 20, 22 and 24 */
+#define CFG_ACGEN  (1 << 24) /* System MMU 3.3 only */
+#define CFG_SYSSEL (1 << 22) /* System MMU 3.2 only */
+#define CFG_FLPDCACHE  (1 << 20) /* System MMU 3.2+ only */
+
 #define REG_MMU_CTRL   0x000
 #define REG_MMU_CFG0x004
 #define REG_MMU_STATUS 0x008
@@ -109,6 +116,12 @@ static u32 lv2ent_offset(sysmmu_iova_t iova)
 
 #define REG_MMU_VERSION0x034
 
+#define MMU_MAJ_VER(val)   ((val) >> 7)
+#define MMU_MIN_VER(val)   ((val) & 0x7F)
+#define MMU_RAW_VER(reg)   (((reg) >> 21) & ((1 << 11) - 1)) /* 11 bits */
+
+#define MAKE_MMU_VER(maj, min) maj) & 0xF) << 7) | ((min) & 0x7F))
+
 #define REG_PB0_SADDR  0x04C
 #define REG_PB0_EADDR  0x050
 #define REG_PB1_SADDR  0x054
@@ -219,6 +232,11 @@ static void sysmmu_unblock(void __iomem *sfrbase)
__raw_writel(CTRL_ENABLE, sfrbase + REG_MMU_CTRL);
 }
 
+static unsigned int __raw_sysmmu_version(struct sysmmu_drvdata *data)
+{
+   return MMU_RAW_VER(__raw_readl(data->sfrbase + REG_MMU_VERSION));
+}
+
 static bool sysmmu_block(void __iomem *sfrbase)
 {
int i = 120;
@@ -374,7 +392,21 @@ static bool __sysmmu_disable(struct sysmmu_drvdata *data)
 
 static void __sysmmu_init_config(struct sysmmu_drvdata *data)
 {
-   unsigned int cfg = 0;
+   unsigned int cfg = CFG_LRU | CFG_QOS(15);
+   unsigned int ver;
+
+   ver = __raw_sysmmu_version(data);
+   if (MMU_MAJ_VER(ver) == 3) {
+   if (MMU_MIN_VER(ver) >= 2) {
+   cfg |= CFG_FLPDCACHE;
+   if (MMU_MIN_VER(ver) == 3) {
+   cfg |= CFG_ACGEN;
+   cfg &= ~CFG_LRU;
+   } else {
+   cfg |= CFG_SYSSEL;
+   }
+   }
+   }
 
__raw_writel(cfg, data->sfrbase + REG_MMU_CFG);
 }
@@ -494,13 +526,11 @@ static void sysmmu_tlb_invalidate_entry(struct device 
*dev, sysmmu_iova_t iova,
 
spin_lock_irqsave(&data->lock, flags);
if (is_sysmmu_active(data)) {
-   unsigned int maj;
unsigned int num_inv = 1;
 
if (!IS_ERR(data->clk_master))
clk_enable(data->clk_master);
 
-   maj = __raw_readl(data->sfrbase + REG_MMU_VERSION);
/*
 * L2TLB invalidation required
 * 4KB page: 1 invalidation
@@ -511,7 +541,7 @@ static void sysmmu_tlb_invalidate_entry(struct device *dev, 
sysmmu_iova_t iova,
 * 1MB page can be cached in one of all sets.
 * 64KB page can be one of 16 consecutive sets.
 */
-   if ((maj >> 28) == 2) /* major version number */
+   if (MMU_MAJ_VER(__raw_sysmmu_version(data)) == 2)
num_inv = min_t(unsigned int, size / PAGE_SIZE, 64);
 
if (sysmmu_block(data->sfrbase)) {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message 

[PATCH v13 17/19] iommu/exynos: support for device tree

2014-05-11 Thread Shaik Ameer Basha
From: Cho KyongHo 

This commit adds device tree support for System MMU.

Also, system mmu handling is improved. Previously, an IOMMU domain is
bound to a System MMU which is not correct. This patch binds an IOMMU
domain with the master device of a System MMU.

Signed-off-by: Cho KyongHo 
Signed-off-by: Shaik Ameer Basha 
---
 drivers/iommu/exynos-iommu.c |  283 +++---
 1 file changed, 158 insertions(+), 125 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 7188b47..b937490 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -114,6 +114,8 @@ static u32 lv2ent_offset(sysmmu_iova_t iova)
 #define REG_PB1_SADDR  0x054
 #define REG_PB1_EADDR  0x058
 
+#define has_sysmmu(dev)(dev->archdata.iommu != NULL)
+
 static struct kmem_cache *lv2table_kmem_cache;
 
 static sysmmu_pte_t *section_entry(sysmmu_pte_t *pgtable, sysmmu_iova_t iova)
@@ -163,6 +165,16 @@ static char *sysmmu_fault_name[SYSMMU_FAULTS_NUM] = {
"UNKNOWN FAULT"
 };
 
+/* attached to dev.archdata.iommu of the master device */
+struct exynos_iommu_owner {
+   struct list_head client; /* entry of exynos_iommu_domain.clients */
+   struct device *dev;
+   struct device *sysmmu;
+   struct iommu_domain *domain;
+   void *vmm_data; /* IO virtual memory manager's data */
+   spinlock_t lock;/* Lock to preserve consistency of System MMU */
+};
+
 struct exynos_iommu_domain {
struct list_head clients; /* list of sysmmu_drvdata.node */
sysmmu_pte_t *pgtable; /* lv1 page table, 16KB */
@@ -172,9 +184,8 @@ struct exynos_iommu_domain {
 };
 
 struct sysmmu_drvdata {
-   struct list_head node; /* entry of exynos_iommu_domain.clients */
struct device *sysmmu;  /* System MMU's device descriptor */
-   struct device *dev; /* Owner of system MMU */
+   struct device *master;  /* Owner of system MMU */
void __iomem *sfrbase;
struct clk *clk;
struct clk *clk_master;
@@ -243,7 +254,6 @@ static void __sysmmu_tlb_invalidate_entry(void __iomem 
*sfrbase,
 static void __sysmmu_set_ptbase(void __iomem *sfrbase,
   phys_addr_t pgd)
 {
-   __raw_writel(0x1, sfrbase + REG_MMU_CFG); /* 16KB LV1, LRU */
__raw_writel(pgd, sfrbase + REG_PT_BASE_ADDR);
 
__sysmmu_tlb_invalidate(sfrbase);
@@ -305,7 +315,7 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id)
itype, base, addr);
if (data->domain)
ret = report_iommu_fault(data->domain,
-   data->dev, addr, itype);
+   data->master, addr, itype);
}
 
/* fault is not recovered by fault handler */
@@ -323,120 +333,152 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void 
*dev_id)
return IRQ_HANDLED;
 }
 
-static bool __exynos_sysmmu_disable(struct sysmmu_drvdata *data)
+static void __sysmmu_disable_nocount(struct sysmmu_drvdata *data)
 {
-   unsigned long flags;
-   bool disabled = false;
-
-   spin_lock_irqsave(&data->lock, flags);
-
-   if (!set_sysmmu_inactive(data))
-   goto finish;
-
if (!IS_ERR(data->clk_master))
clk_enable(data->clk_master);
 
__raw_writel(CTRL_DISABLE, data->sfrbase + REG_MMU_CTRL);
+   __raw_writel(0, data->sfrbase + REG_MMU_CFG);
 
clk_disable(data->clk);
if (!IS_ERR(data->clk_master))
clk_disable(data->clk_master);
-
-   disabled = true;
-   data->pgtable = 0;
-   data->domain = NULL;
-finish:
-   spin_unlock_irqrestore(&data->lock, flags);
-
-   if (disabled)
-   dev_dbg(data->sysmmu, "Disabled\n");
-   else
-   dev_dbg(data->sysmmu, "%d times left to be disabled\n",
-   data->activations);
-
-   return disabled;
 }
 
-/* __exynos_sysmmu_enable: Enables System MMU
- *
- * returns -error if an error occurred and System MMU is not enabled,
- * 0 if the System MMU has been just enabled and 1 if System MMU was already
- * enabled before.
- */
-static int __exynos_sysmmu_enable(struct sysmmu_drvdata *data,
-   phys_addr_t pgtable, struct iommu_domain *domain)
+static bool __sysmmu_disable(struct sysmmu_drvdata *data)
 {
-   int ret = 0;
+   bool disabled;
unsigned long flags;
 
spin_lock_irqsave(&data->lock, flags);
 
-   if (!set_sysmmu_active(data)) {
-   if (WARN_ON(pgtable != data->pgtable)) {
-   ret = -EBUSY;
-   set_sysmmu_inactive(data);
-   } else {
-   ret = 1;
-   }
+   disabled = set_sysmmu_inactive(data);
+
+   if (disabled) {
+   data->pgtable = 0;
+   data->domain 

[PATCH v13 19/19] iommu/exynos: apply workaround of caching fault page table entries

2014-05-11 Thread Shaik Ameer Basha
From: Cho KyongHo 

This patch contains 2 workaround for the System MMU v3.x.

System MMU v3.2 and v3.3 has FLPD cache that caches first level page
table entries to reduce page table walking latency. However, the
FLPD cache is filled with a first level page table entry even though
it is not accessed by a master H/W because System MMU v3.3
speculatively prefetches page table entries that may be accessed
in the near future by the master H/W.
The prefetched FLPD cache entries are not invalidated by iommu_unmap()
because iommu_unmap() only unmaps and invalidates the page table
entries that is mapped.

Because exynos-iommu driver discards a second level page table when
it needs to be replaced with another second level page table or
a first level page table entry with 1MB mapping, It is required to
invalidate FLPD cache that may contain the first level page table
entry that points to the second level page table.

Another workaround of System MMU v3.3 is initializing the first level
page table entries with the second level page table which is filled
with all zeros. This prevents System MMU prefetches 'fault' first
level page table entry which may lead page fault on access to 16MiB
wide.

System MMU 3.x fetches consecutive page table entries by a page
table walking to maximize bus utilization and to minimize TLB miss
panelty.
Unfortunately, functional problem is raised with the fetching behavior
because it fetches 'fault' page table entries that specifies no
translation information and that a valid translation information will
be written to in the near future. The logic in the System MMU generates
page fault with the cached fault entries that is no longer coherent
with the page table which is updated.

There is another workaround that must be implemented by I/O virtual
memory manager: any two consecutive I/O virtual memory area must have
a hole between the two that is larger than or equal to 128KiB.
Also, next I/O virtual memory area must be started from the next
128KiB boundary.

0128K   256K   384K 512K
|-|---|-||
|area1>|.hole...|<--- area2 -

The constraint is depicted above.
The size is selected by the calculation followed:
 - System MMU can fetch consecutive 64 page table entries at once
   64 * 4KiB = 256KiB. This is the size between 128K ~ 384K of the
   above picture. This style of fetching is 'block fetch'. It fetches
   the page table entries predefined consecutive page table entries
   including the entry that is the reason of the page table walking.
 - System MMU can prefetch upto consecutive 32 page table entries.
   This is the size between 256K ~ 384K.

Signed-off-by: Cho KyongHo 
Signed-off-by: Shaik Ameer Basha 
---
 drivers/iommu/exynos-iommu.c |  163 +-
 1 file changed, 146 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 26fb4d7..82aecd0 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -45,8 +45,12 @@ typedef u32 sysmmu_pte_t;
 #define LPAGE_MASK (~(LPAGE_SIZE - 1))
 #define SPAGE_MASK (~(SPAGE_SIZE - 1))
 
-#define lv1ent_fault(sent) (((*(sent) & 3) == 0) || ((*(sent) & 3) == 3))
-#define lv1ent_page(sent) ((*(sent) & 3) == 1)
+#define lv1ent_fault(sent) ((*(sent) == ZERO_LV2LINK) || \
+  ((*(sent) & 3) == 0) || ((*(sent) & 3) == 3))
+#define lv1ent_zero(sent) (*(sent) == ZERO_LV2LINK)
+#define lv1ent_page_zero(sent) ((*(sent) & 3) == 1)
+#define lv1ent_page(sent) ((*(sent) != ZERO_LV2LINK) && \
+ ((*(sent) & 3) == 1))
 #define lv1ent_section(sent) ((*(sent) & 3) == 2)
 
 #define lv2ent_fault(pent) ((*(pent) & 3) == 0)
@@ -130,6 +134,8 @@ static u32 lv2ent_offset(sysmmu_iova_t iova)
 #define has_sysmmu(dev)(dev->archdata.iommu != NULL)
 
 static struct kmem_cache *lv2table_kmem_cache;
+static sysmmu_pte_t *zero_lv2_table;
+#define ZERO_LV2LINK mk_lv1ent_page(virt_to_phys(zero_lv2_table))
 
 static sysmmu_pte_t *section_entry(sysmmu_pte_t *pgtable, sysmmu_iova_t iova)
 {
@@ -515,6 +521,32 @@ static bool exynos_sysmmu_disable(struct device *dev)
return disabled;
 }
 
+static void __sysmmu_tlb_invalidate_flpdcache(struct sysmmu_drvdata *data,
+ sysmmu_iova_t iova)
+{
+   if (__raw_sysmmu_version(data) == MAKE_MMU_VER(3, 3))
+   __raw_writel(iova | 0x1, data->sfrbase + REG_MMU_FLUSH_ENTRY);
+}
+
+static void sysmmu_tlb_invalidate_flpdcache(struct device *dev,
+   sysmmu_iova_t iova)
+{
+   unsigned long flags;
+   struct exynos_iommu_owner *owner = dev->archdata.iommu;
+   struct sysmmu_drvdata *data = dev_get_drvdata(owner->sysmmu);
+
+   if (!IS_ERR(data->clk_master))
+   clk_enable(data->clk_master);
+
+   spin_lock_irqsave(&data

[PATCH v13 12/19] iommu/exynos: change rwlock to spinlock

2014-05-11 Thread Shaik Ameer Basha
From: Cho KyongHo 

Since acquiring read_lock is not more frequent than write_lock, it is
not beneficial to use rwlock, this commit changes rwlock to spinlock.

Reviewed-by: Grant Grundler 
Signed-off-by: Cho KyongHo 
Signed-off-by: Shaik Ameer Basha 
---
 drivers/iommu/exynos-iommu.c |   27 ++-
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index c1be65f..d89ad5f 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -164,7 +164,7 @@ struct sysmmu_drvdata {
struct clk *clk;
struct clk *clk_master;
int activations;
-   rwlock_t lock;
+   spinlock_t lock;
struct iommu_domain *domain;
phys_addr_t pgtable;
 };
@@ -263,12 +263,13 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void 
*dev_id)
unsigned long addr = -1;
int ret = -ENOSYS;
 
-   read_lock(&data->lock);
-
WARN_ON(!is_sysmmu_active(data));
 
+   spin_lock(&data->lock);
+
if (!IS_ERR(data->clk_master))
clk_enable(data->clk_master);
+
itype = (enum exynos_sysmmu_inttype)
__ffs(__raw_readl(data->sfrbase + REG_INT_STATUS));
if (WARN_ON(!((itype >= 0) && (itype < SYSMMU_FAULT_UNKNOWN
@@ -302,7 +303,7 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id)
if (!IS_ERR(data->clk_master))
clk_disable(data->clk_master);
 
-   read_unlock(&data->lock);
+   spin_unlock(&data->lock);
 
return IRQ_HANDLED;
 }
@@ -312,7 +313,7 @@ static bool __exynos_sysmmu_disable(struct sysmmu_drvdata 
*data)
unsigned long flags;
bool disabled = false;
 
-   write_lock_irqsave(&data->lock, flags);
+   spin_lock_irqsave(&data->lock, flags);
 
if (!set_sysmmu_inactive(data))
goto finish;
@@ -330,7 +331,7 @@ static bool __exynos_sysmmu_disable(struct sysmmu_drvdata 
*data)
data->pgtable = 0;
data->domain = NULL;
 finish:
-   write_unlock_irqrestore(&data->lock, flags);
+   spin_unlock_irqrestore(&data->lock, flags);
 
if (disabled)
dev_dbg(data->sysmmu, "Disabled\n");
@@ -353,7 +354,7 @@ static int __exynos_sysmmu_enable(struct sysmmu_drvdata 
*data,
int ret = 0;
unsigned long flags;
 
-   write_lock_irqsave(&data->lock, flags);
+   spin_lock_irqsave(&data->lock, flags);
 
if (!set_sysmmu_active(data)) {
if (WARN_ON(pgtable != data->pgtable)) {
@@ -384,7 +385,7 @@ static int __exynos_sysmmu_enable(struct sysmmu_drvdata 
*data,
 
dev_dbg(data->sysmmu, "Enabled\n");
 finish:
-   write_unlock_irqrestore(&data->lock, flags);
+   spin_unlock_irqrestore(&data->lock, flags);
 
return ret;
 }
@@ -431,7 +432,7 @@ static void sysmmu_tlb_invalidate_entry(struct device *dev, 
unsigned long iova,
unsigned long flags;
struct sysmmu_drvdata *data = dev_get_drvdata(dev->archdata.iommu);
 
-   read_lock_irqsave(&data->lock, flags);
+   spin_lock_irqsave(&data->lock, flags);
 
if (is_sysmmu_active(data)) {
unsigned int maj;
@@ -465,7 +466,7 @@ static void sysmmu_tlb_invalidate_entry(struct device *dev, 
unsigned long iova,
dev_dbg(data->sysmmu, "Disabled. Skipping invalidating TLB.\n");
}
 
-   read_unlock_irqrestore(&data->lock, flags);
+   spin_unlock_irqrestore(&data->lock, flags);
 }
 
 void exynos_sysmmu_tlb_invalidate(struct device *dev)
@@ -473,7 +474,7 @@ void exynos_sysmmu_tlb_invalidate(struct device *dev)
unsigned long flags;
struct sysmmu_drvdata *data = dev_get_drvdata(dev->archdata.iommu);
 
-   read_lock_irqsave(&data->lock, flags);
+   spin_lock_irqsave(&data->lock, flags);
 
if (is_sysmmu_active(data)) {
if (!IS_ERR(data->clk_master))
@@ -488,7 +489,7 @@ void exynos_sysmmu_tlb_invalidate(struct device *dev)
dev_dbg(data->sysmmu, "Disabled. Skipping invalidating TLB.\n");
}
 
-   read_unlock_irqrestore(&data->lock, flags);
+   spin_unlock_irqrestore(&data->lock, flags);
 }
 
 static int exynos_sysmmu_probe(struct platform_device *pdev)
@@ -543,7 +544,7 @@ static int exynos_sysmmu_probe(struct platform_device *pdev)
}
 
data->sysmmu = dev;
-   rwlock_init(&data->lock);
+   spin_lock_init(&data->lock);
INIT_LIST_HEAD(&data->node);
 
platform_set_drvdata(pdev, data);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v13 10/19] iommu/exynos: gating clocks of master H/W

2014-05-11 Thread Shaik Ameer Basha
From: Cho KyongHo 

This patch gates clocks of master H/W as well as clocks of System MMU
if master clocks are specified.

Some Exynos SoCs (i.e. GScalers in Exynos5250) have dependencies in
the gating clocks of master H/W and its System MMU. If a H/W is the
case, accessing control registers of System MMU is prohibited unless
both of the gating clocks of System MMU and its master H/W.

CC: Tomasz Figa 
Signed-off-by: Cho KyongHo 
Signed-off-by: Shaik Ameer Basha 
---
 drivers/iommu/exynos-iommu.c |   40 ++--
 1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index c86e374..5af5c5c 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -172,6 +172,7 @@ struct sysmmu_drvdata {
struct device *dev; /* Owner of system MMU */
void __iomem *sfrbase;
struct clk *clk;
+   struct clk *clk_master;
int activations;
rwlock_t lock;
struct iommu_domain *domain;
@@ -300,6 +301,8 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id)
 
WARN_ON(!is_sysmmu_active(data));
 
+   if (!IS_ERR(data->clk_master))
+   clk_enable(data->clk_master);
itype = (enum exynos_sysmmu_inttype)
__ffs(__raw_readl(data->sfrbase + REG_INT_STATUS));
if (WARN_ON(!((itype >= 0) && (itype < SYSMMU_FAULT_UNKNOWN
@@ -326,6 +329,9 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id)
if (itype != SYSMMU_FAULT_UNKNOWN)
sysmmu_unblock(data->sfrbase);
 
+   if (!IS_ERR(data->clk_master))
+   clk_disable(data->clk_master);
+
read_unlock(&data->lock);
 
return IRQ_HANDLED;
@@ -341,9 +347,14 @@ static bool __exynos_sysmmu_disable(struct sysmmu_drvdata 
*data)
if (!set_sysmmu_inactive(data))
goto finish;
 
+   if (!IS_ERR(data->clk_master))
+   clk_enable(data->clk_master);
+
__raw_writel(CTRL_DISABLE, data->sfrbase + REG_MMU_CTRL);
 
clk_disable(data->clk);
+   if (!IS_ERR(data->clk_master))
+   clk_disable(data->clk_master);
 
disabled = true;
data->pgtable = 0;
@@ -386,14 +397,19 @@ static int __exynos_sysmmu_enable(struct sysmmu_drvdata 
*data,
goto finish;
}
 
-   clk_enable(data->clk);
-
data->pgtable = pgtable;
 
+   if (!IS_ERR(data->clk_master))
+   clk_enable(data->clk_master);
+   clk_enable(data->clk);
+
__sysmmu_set_ptbase(data->sfrbase, pgtable);
 
__raw_writel(CTRL_ENABLE, data->sfrbase + REG_MMU_CTRL);
 
+   if (!IS_ERR(data->clk_master))
+   clk_disable(data->clk_master);
+
data->domain = domain;
 
dev_dbg(data->sysmmu, "Enabled\n");
@@ -450,6 +466,10 @@ static void sysmmu_tlb_invalidate_entry(struct device 
*dev, unsigned long iova,
if (is_sysmmu_active(data)) {
unsigned int maj;
unsigned int num_inv = 1;
+
+   if (!IS_ERR(data->clk_master))
+   clk_enable(data->clk_master);
+
maj = __raw_readl(data->sfrbase + REG_MMU_VERSION);
/*
 * L2TLB invalidation required
@@ -469,6 +489,8 @@ static void sysmmu_tlb_invalidate_entry(struct device *dev, 
unsigned long iova,
data->sfrbase, iova, num_inv);
sysmmu_unblock(data->sfrbase);
}
+   if (!IS_ERR(data->clk_master))
+   clk_disable(data->clk_master);
} else {
dev_dbg(data->sysmmu, "Disabled. Skipping invalidating TLB.\n");
}
@@ -484,10 +506,14 @@ void exynos_sysmmu_tlb_invalidate(struct device *dev)
read_lock_irqsave(&data->lock, flags);
 
if (is_sysmmu_active(data)) {
+   if (!IS_ERR(data->clk_master))
+   clk_enable(data->clk_master);
if (sysmmu_block(data->sfrbase)) {
__sysmmu_tlb_invalidate(data->sfrbase);
sysmmu_unblock(data->sfrbase);
}
+   if (!IS_ERR(data->clk_master))
+   clk_disable(data->clk_master);
} else {
dev_dbg(data->sysmmu, "Disabled. Skipping invalidating TLB.\n");
}
@@ -536,6 +562,16 @@ static int exynos_sysmmu_probe(struct platform_device 
*pdev)
}
}
 
+   data->clk_master = devm_clk_get(dev, "master");
+   if (!IS_ERR(data->clk_master)) {
+   ret = clk_prepare(data->clk_master);
+   if (ret) {
+   clk_unprepare(data->clk);
+   dev_err(dev, "Failed to prepare master's clk\n");
+   return ret;
+   }
+   }
+
data->sysmmu = dev;
rwlock_init(&data->lock);
INIT_LIST_HEAD(&data->node);
--

[PATCH v13 04/19] iommu/exynos: fix L2TLB invalidation

2014-05-11 Thread Shaik Ameer Basha
From: Cho KyongHo 

L2TLB is 8-way set-associative TLB with 512 entries. The number of
sets is 64.
A single 4KB(small page) translation information is cached
only to a set whose index is the same with the lower 6 bits of the page
frame number.
A single 64KB(large page) translation information can be
cached to any 16 sets whose top two bits of their indices are the same
with the bit [5:4] of the page frame number.
A single 1MB(section) or larger translation information can be cached to
any set in the TLB.

It is required to invalidate entire sets that may cache the target
translation information to guarantee that the L2TLB has no stale data.

Signed-off-by: Cho KyongHo 
Signed-off-by: Shaik Ameer Basha 
---
 drivers/iommu/exynos-iommu.c |   32 +++-
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 4ff4b0b..06fc70e 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -226,9 +226,14 @@ static void __sysmmu_tlb_invalidate(void __iomem *sfrbase)
 }
 
 static void __sysmmu_tlb_invalidate_entry(void __iomem *sfrbase,
-   unsigned long iova)
+   unsigned long iova, unsigned int num_inv)
 {
-   __raw_writel((iova & SPAGE_MASK) | 1, sfrbase + REG_MMU_FLUSH_ENTRY);
+   unsigned int i;
+   for (i = 0; i < num_inv; i++) {
+   __raw_writel((iova & SPAGE_MASK) | 1,
+   sfrbase + REG_MMU_FLUSH_ENTRY);
+   iova += SPAGE_SIZE;
+   }
 }
 
 static void __sysmmu_set_ptbase(void __iomem *sfrbase,
@@ -452,7 +457,8 @@ static bool exynos_sysmmu_disable(struct device *dev)
return disabled;
 }
 
-static void sysmmu_tlb_invalidate_entry(struct device *dev, unsigned long iova)
+static void sysmmu_tlb_invalidate_entry(struct device *dev, unsigned long iova,
+   size_t size)
 {
unsigned long flags;
struct sysmmu_drvdata *data = dev_get_drvdata(dev->archdata.iommu);
@@ -460,9 +466,25 @@ static void sysmmu_tlb_invalidate_entry(struct device 
*dev, unsigned long iova)
read_lock_irqsave(&data->lock, flags);
 
if (is_sysmmu_active(data)) {
+   unsigned int maj;
+   unsigned int num_inv = 1;
+   maj = __raw_readl(data->sfrbase + REG_MMU_VERSION);
+   /*
+* L2TLB invalidation required
+* 4KB page: 1 invalidation
+* 64KB page: 16 invalidation
+* 1MB page: 64 invalidation
+* because it is set-associative TLB
+* with 8-way and 64 sets.
+* 1MB page can be cached in one of all sets.
+* 64KB page can be one of 16 consecutive sets.
+*/
+   if ((maj >> 28) == 2) /* major version number */
+   num_inv = min_t(unsigned int, size / PAGE_SIZE, 64);
+
if (sysmmu_block(data->sfrbase)) {
__sysmmu_tlb_invalidate_entry(
-   data->sfrbase, iova);
+   data->sfrbase, iova, num_inv);
sysmmu_unblock(data->sfrbase);
}
} else {
@@ -915,7 +937,7 @@ done:
 
spin_lock_irqsave(&priv->lock, flags);
list_for_each_entry(data, &priv->clients, node)
-   sysmmu_tlb_invalidate_entry(data->dev, iova);
+   sysmmu_tlb_invalidate_entry(data->dev, iova, size);
spin_unlock_irqrestore(&priv->lock, flags);
 
return size;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v13 03/19] iommu/exynos: allocate lv2 page table from own slab

2014-05-11 Thread Shaik Ameer Basha
From: Cho KyongHo 

Since kmalloc() does not guarantee that the allignment of 1KiB when it
allocates 1KiB, it is required to allocate lv2 page table from own
slab that guarantees alignment of 1KiB

Signed-off-by: Cho KyongHo 
Signed-off-by: Shaik Ameer Basha 
---
 drivers/iommu/exynos-iommu.c |   34 --
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index aec7fd7..4ff4b0b 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -99,6 +99,8 @@
 #define REG_PB1_SADDR  0x054
 #define REG_PB1_EADDR  0x058
 
+static struct kmem_cache *lv2table_kmem_cache;
+
 static unsigned long *section_entry(unsigned long *pgtable, unsigned long iova)
 {
return pgtable + lv1ent_offset(iova);
@@ -637,7 +639,8 @@ static void exynos_iommu_domain_destroy(struct iommu_domain 
*domain)
 
for (i = 0; i < NUM_LV1ENTRIES; i++)
if (lv1ent_page(priv->pgtable + i))
-   kfree(phys_to_virt(lv2table_base(priv->pgtable + i)));
+   kmem_cache_free(lv2table_kmem_cache,
+   phys_to_virt(lv2table_base(priv->pgtable + i)));
 
free_pages((unsigned long)priv->pgtable, 2);
free_pages((unsigned long)priv->lv2entcnt, 1);
@@ -736,7 +739,7 @@ static unsigned long *alloc_lv2entry(unsigned long *sent, 
unsigned long iova,
if (lv1ent_fault(sent)) {
unsigned long *pent;
 
-   pent = kzalloc(LV2TABLE_SIZE, GFP_ATOMIC);
+   pent = kmem_cache_zalloc(lv2table_kmem_cache, GFP_ATOMIC);
BUG_ON((unsigned long)pent & (LV2TABLE_SIZE - 1));
if (!pent)
return ERR_PTR(-ENOMEM);
@@ -766,8 +769,7 @@ static int lv1set_section(unsigned long *sent, unsigned 
long iova,
return -EADDRINUSE;
}
 
-   kfree(page_entry(sent, 0));
-
+   kmem_cache_free(lv2table_kmem_cache, page_entry(sent, 0));
*pgcnt = 0;
}
 
@@ -970,11 +972,31 @@ static int __init exynos_iommu_init(void)
 {
int ret;
 
+   lv2table_kmem_cache = kmem_cache_create("exynos-iommu-lv2table",
+   LV2TABLE_SIZE, LV2TABLE_SIZE, 0, NULL);
+   if (!lv2table_kmem_cache) {
+   pr_err("%s: Failed to create kmem cache\n", __func__);
+   return -ENOMEM;
+   }
+
ret = platform_driver_register(&exynos_sysmmu_driver);
+   if (ret) {
+   pr_err("%s: Failed to register driver\n", __func__);
+   goto err_reg_driver;
+   }
 
-   if (ret == 0)
-   bus_set_iommu(&platform_bus_type, &exynos_iommu_ops);
+   ret = bus_set_iommu(&platform_bus_type, &exynos_iommu_ops);
+   if (ret) {
+   pr_err("%s: Failed to register exynos-iommu driver.\n",
+   __func__);
+   goto err_set_iommu;
+   }
 
+   return 0;
+err_set_iommu:
+   platform_driver_unregister(&exynos_sysmmu_driver);
+err_reg_driver:
+   kmem_cache_destroy(lv2table_kmem_cache);
return ret;
 }
 subsys_initcall(exynos_iommu_init);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v13 00/19] iommu/exynos: Fixes and Enhancements of System MMU driver with DT

2014-05-11 Thread Shaik Ameer Basha
This is the subset of previous v12 series and includes only the fixes and
enhancements, leaving out the private DT bindings as discussed in the below 
thread.
-- http://www.gossamer-threads.com/lists/linux/kernel/1918178

This patch series includes,
1] fixes for exynos-iommu driver build break
2] includes several bug fixes and enhancements for the exynos-iommu driver
3] code to handle multiple exynos sysmmu versions
4] adding support for device tree
Documentation/devicetree/bindings/iommu/samsung,sysmmu.txt

Change log:
v13:
- Rebased to the latest 3.15-rc4 master branch
  git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master (3.15-rc4)
- This patch series is the subset of the previous patch series
v12: iommu/exynos: Fixes and Enhancements of System MMU driver with dt
Changes incude:
- Removed dt bindings and code specific to "mmu-masters" property
- Dropped patch 18/31 from previous patch series as suggested by 'Tomasz Figa'.
- Fixes buid break issue in patch 01/19 by merging the following patches
  from the previous series
iommu/exynos: do not include removed header
iommu/exynos: fix address handling
iommu/exynos: handle one instance of sysmmu with a device descriptor
- Shuffled the patches to bring all the fixes and enhancement to the start
  of the patch series

v12:
- Rebased to the latest 3.15-rc2 master branch
  git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master (3.15-rc2)
- Addressed v11 review comments from
'Sachin Kamat', 'Tomasz Figa' and 'Shaik Ameer Basha'
- Uses macro names instead of magic numbers for clock description in DT
- Moved DT binding document to seperate patch
- dtsi changes are separated into multiple patches
- patch description of some patches are updated according to the review comments
- removed the macros which hides the clock operations
- review comments related to compatible strings will be fixed in followup 
patches

v11:
- Rebased on the latest works on clock, arm/samsung, iommu branches
- Change the property to link System MMU and its master H/W
  'iommu' in the master's node -> 'mmu-masters' in the System MMU's node
- Changed compatible string:
  "samsung,sysmmu-v1"
  "samsung,sysmmu-v2"
  "samsung,sysmmu-v3.1"
  "samsung,sysmmu-v3.2"
  "samsung,sysmmu-v3.3"
- Change the implementation of retrieving System MMU version -> simpler
- Check NULL pointer before call to clk_enable() and clk_disable()
- Allow a single master to link to multiple System MMUs.
  (fimc-is, fimd/g2d/Scaler in Exynos5420)
- Workarounds of known problems of System MMU
- Code enhancements:
  * Compilable for 64-bit
  * Enhanced error messages

v10:
- Rebased on the following branches
  git.linaro.org/git-ro/people/mturquette/linux.git/clk-next
  git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung.git/for-next
  git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/next
  git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master (3.12-rc3)
- Set parent clock to all System MMU clocks.
- Add clock and DT descriptos for Exynos5420
- Modified error handling in exynos_iommu_init()
- Split "iommu/exynos: support for device tree" patch into the following 6 
patches
  iommu/exynos: handle only one instance of System MMU
  iommu/exynos: always enable runtime PM
  iommu/exynos: always use a single clock descriptor
  iommu/exynos: remove dbgname from drvdata of a System MMU
  iommu/exynos: use managed driver helper functions
  iommu/exynos: support for device tree
- Remove 'interrupt-names' and 'status' properties from DT
- Change n:1 relationship between master:System MMU into 1:1 relationship.
- Removed custom fault handler and print the status of System MMU
  whenever System MMU fault is occurred.
- Post Antonios Motakis's commit together:
  "iommu/exynos: add devices attached to the System MMU to an IOMMU group"

v9:
- Rebased on the following branches
  git.linaro.org/git-ro/people/mturquette/linux.git/clk-next
  git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung.git/samsung-next
  git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master (3.11-rc4)
- Split "add bus notifier for registering System MMU" into 5 patches
- Call clk_prepare() that was missing in v8.
- Fixed base address of sysmmu_tv in exynos4210.dtsi
- BUG_ON() instead of return -EADDRINUSE when trying mapping on an mapped area
- Moved camif_top to 317 in drivers/clk/samsung/clk-exynos5250.c
- Removed 'iommu' property from 'codec'(mfc) node
- Does not make 'master' clock to be the parent of 'sysmmu' clock.
   'master' clock is enabled before accessing control registers of System MMU
   and disabled after the access.

v8:
- Reordered patch list: moved "change rwloc to spinlock" to the last.
- Fixed remained bug in "fix page table maintenance".
- Always return 0 from exynos_iommu_attach_device().
- Removed prefetch buffer setting when System MMU is enabled
  due to the restriction of prefetch buffers:
  A prefetch buffer must not hit fro

Re: [PATCH] drivers/hwmon/emc1403.c: add support for emc1412

2014-05-11 Thread Jean Delvare
Hi Guenter, Josef,

On Sun, 11 May 2014 15:40:21 -0700, Guenter Roeck wrote:
> On 05/11/2014 06:00 AM, Josef Gajdusek wrote:
> > @@ -366,14 +433,19 @@ static int emc1403_probe(struct i2c_client *client,
> >   }
> >
> >   static const unsigned short emc1403_address_list[] = {
> > -   0x18, 0x29, 0x4c, 0x4d, I2C_CLIENT_END
> > +   /* emc1403/emc1404/emc1423/emc1424 */
> > +   0x4c, 0x4d, 0x18, 0x29,
> > +   /* emc1412 */
> > +   0x5c, 0x4c, 0x6c, 0x1c, 0x3c, I2C_CLIENT_END
> 
> No duplication of addresses, and addresses are by convention in order.
> Jean, any addresses which should not be scanned ?

0x3c and 0x6c should indeed not be scanned, sensors-detect does not
scan them as they aren't typically used by hwmon devices. 0x5c is
questionable (currently scanned, but used only by a limited number of
chips, we may drop it at some point.) 0x1c and 0x4c are OK to scan.

-- 
Jean Delvare
SUSE L3 Support
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: build failure after merge of the fsl tree

2014-05-11 Thread Stephen Rothwell
Hi Scott,

After merging the fsl tree, today's linux-next build (powerpc
allyesconfig) failed like this:

arch/powerpc/kernel/epapr_paravirt.c: In function 'epapr_idle_init':
arch/powerpc/kernel/epapr_paravirt.c:77:23: error: 'epapr_ev_idle' undeclared 
(first use in this function)
   ppc_md.power_save = epapr_ev_idle;
   ^

Caused by commit 7762b1ed7aae ("powerpc: move epapr paravirt init of
power_save to an initcall").

I have reverted that commit for today.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


signature.asc
Description: PGP signature


Re: [RFC][PATCH] af_key: return error when meet errors on sendmsg() syscall

2014-05-11 Thread Xufeng Zhang

On 05/12/2014 01:11 PM, David Miller wrote:



So it makes sense to return errors for send() syscall.

Signed-off-by: Xufeng Zhang
 

I disagree.

If pfkey_error() is successful, the error will be reported in the AF_KEY
message that is broadcast, there is no reason for sendmsg to return an
error.  The message was sucessfully sent, there was no problem with it's
passage into the AF_KEY layer.

Like netlink, operational responses come in packets, not error codes.

However, if pfkey_error() fails, we must do pass back the original
error code because it's a last ditch effort to prevent information
from being lost.

That's why 'err' must be preserved when pfkey_error() returns zero.
   


I know what you mean, but isn't the kernel API aimed to facilitate the
implementation of user space?
Since send the message to the kernel and receive the error report message
are asynchronous, I don't think it's easy to recover from the EINTR error
by parsing the error report message.


Thanks,
Xufeng




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] hrtimer: reprogram event for expires=KTIME_MAX in hrtimer_force_reprogram()

2014-05-11 Thread Viresh Kumar
On 10 May 2014 21:47, Preeti U Murthy  wrote:
> On 05/09/2014 04:27 PM, Viresh Kumar wrote:
>> On 9 May 2014 16:04, Preeti U Murthy  wrote:

>> Ideally, the device should have stopped events as we programmed it in
>> ONESHOT mode. And should have waited for kernel to set it again..
>>
>> But probably that device doesn't have a ONESHOT mode and is firing
>> again and again. Anyway the real problem I was trying to solve wasn't
>> infinite interrupts coming from event dev, but the first extra event that
>> we should have got rid of .. It just happened that we got more problems
>> on this particular board.
>
> So on a timer interrupt the tick device, irrespective of if it is in
> ONESHOT mode or not, is in an expired state. Thus it will continue to
> fire. What has ONESHOT mode got to do with this?

So, the arch specific timer handler must be clearing it I suppose and it
shouldn't have fired again after 5 ms as it is not reprogrammed.

Probably that's an implementation specific stuff.. I have seen timers which
have two modes, periodic: they fire continuously and oneshot: they get
disabled after firing and have to be reprogrammed.

>>> The reason this got exposed in NOHZ_FULL config is because in a normal
>>> NOHZ scenario when the cpu goes idle, and there are no pending timers in
>>> timer_list, even then tick_sched_timer gets cancelled. Precisely the
>>> scenario that you have described.
>>
>> I haven't tried but it looks like this problem will exist there as well.. 
>> Who is
>> disabling the event device in that case when tick_sched timer goes off ?
>> The same question that is applicable in this case as well..
>>
>>>But we don't get continuous interrupts then because the first time we
>>> get an interrupt, we queue the tick_sched_timer and program the tick
>>> device to the time of its expiry and therefore *push* the time at which
>>> your tick device should fire further.
>>
>> Probably not.. We don't get continuous interrupts because that's a special
>> case for my platform. But I am quite sure you would be getting one extra
>> interrupt after tick period, but because we didn't had anything to service
>
> Hmm? I didn't get this. Why would we?  We ensure that if there are no
> pending timers in timer_list the tick_sched_timer is cancelled. We
> cannot get spurious interrupts when there are no pending timers in NOHZ
> mode.

Okay, there are no pending timers to fire and even we have disabled
tick_sched_timer as well.. But the event dev isn't SHUTDOWN or reprogrammed.
And so it must fire after tick interval? Exactly the same issue we are getting
here in NO_HZ_FULL..

And the worst part is we aren't getting these interrupts in traces as well.
Somebody probably need to revisit the trace_irq_handler_entry part as well
to catch such problems.

> Hmm yeah looking at the problem that you are trying to solve, that being
> completely disabling timer interrupts on cpus that are running just one
> process, it appears to me that setting the tick device in SHUTDOWN mode
> is the only way to do so. And you are right. We use SHUTDOWN mode to
> imply that the device can be switched off. Its upto the arch to react to
> it appropriately.

So, from the mail where tglx blasted me off, we have a better solution to
implement now :)

> My concern is on powerpc today when we set the device to SHUTDOWN mode
> we set the decrementer to a MAX value. Which means we will get
> interrupts only spaced out more widely in time. But on NOHZ_FULL mode if
> you are looking at completely disabling tick_sched_timer as long as a
> single process runs then we might need to change the semantics here.

Lets see if we can do some nice stuff with ONESHOT_STOPPED state..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc: Fix "attempt to move .org backwards" error (again)

2014-05-11 Thread Benjamin Herrenschmidt
On Sun, 2014-05-11 at 21:52 -0700, Guenter Roeck wrote:
> Oh well, it was worth a try. Can you give me an example for a failing
> configuration ?

My g5 config which is close to g5_defconfig with PR KVM enabled.

In any case, see my other messages. I'm waiting for all my test builders
to come back and if it's clear I'll post a new patch.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bug: acpi ata_bay dock reminds undocked

2014-05-11 Thread Pali Rohár
On Monday 12 May 2014 02:00:29 Rafael J. Wysocki wrote:
> On Sunday, May 11, 2014 05:49:17 PM Pali Rohár wrote:
> > --nextPart2204083.pLeiedbUui
> > Content-Type: Text/Plain;
> > 
> >   charset="utf-8"
> > 
> > Content-Transfer-Encoding: quoted-printable
> > 
> > On Wednesday 30 April 2014 11:24:50 Pali Roh=C3=A1r wrote:
> > > On Tuesday 29 April 2014 23:35:42 Rafael J. Wysocki wrote:
> > > > On Tuesday, April 29, 2014 11:00:01 PM Pali Roh=C3=A1r 
wrote:
> > > > > On Tuesday 29 April 2014 22:55:07 Rafael J. Wysocki 
wrote:
> > > > > > Which kernel version(s) have you tried?
> > > > >
> > > > >=20
> > > > >
> > > > > 3.15-rc3
> > > >
> > > >=20
> > > >
> > > > Does it work with 3.14(.x) by chance?
> > >
> > >=20
> > >
> > > Tested with 3.14 and 3.8. Same problem, not working.
> > 
> > BUMP!
> > 
> > Rafael, do you need some other information?
> 
> I'll take care of this when I have the time, OK?

Ok, I will wait.

-- 
Pali Rohár
pali.ro...@gmail.com


signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] pinctrl: Add i.MX1 pincontrol driver

2014-05-11 Thread Sascha Hauer
On Mon, May 12, 2014 at 09:03:26AM +0400, Alexander Shiyan wrote:
> Mon, 12 May 2014 06:51:13 +0200 от Sascha Hauer :
> > On Fri, May 09, 2014 at 08:16:33PM +0400, Alexander Shiyan wrote:
> > > This patch adds pincontrol driver for Freescale i.MX1 SOCs.
> > > 
> > > Signed-off-by: Alexander Shiyan 
> > > ---
> > >  drivers/pinctrl/Kconfig|   7 ++
> > >  drivers/pinctrl/Makefile   |   1 +
> > >  drivers/pinctrl/pinctrl-imx1.c | 279 
> > > +
> > >  3 files changed, 287 insertions(+)
> > >  create mode 100644 drivers/pinctrl/pinctrl-imx1.c
> > 
> > Nice. I thought about adding devicetree support for i.MX1 aswell.
> > 
> > Don't we need a imx1-pinfunc.h file to make use of this patch?
> 
> It will be added along with the DTS template for that CPU architecture.

Ok.

Sascha

-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc: Fix "attempt to move .org backwards" error (again)

2014-05-11 Thread Guenter Roeck

On 05/11/2014 10:37 PM, Benjamin Herrenschmidt wrote:

On Mon, 2014-05-12 at 14:12 +1000, Benjamin Herrenschmidt wrote:

On Fri, 2014-05-09 at 17:07 -0700, Guenter Roeck wrote:

Commit 4e243b7 (powerpc: Fix "attempt to move .org backwards" error) fixes the
allyesconfig build by moving machine_check_common to a different location.
While this fixes most of the errors, both allmodconfig and allyesconfig still
fail as follows.

arch/powerpc/kernel/exceptions-64s.S:1315: Error: attempt to move .org backwards

Fix by moving machine_check_common after the offending address.


This suffers from the same problem as previous attempts, on some of my
test configs I get:

arch/powerpc/kernel/head_64.o:(__ftr_alt_97+0xb0): relocation truncated to fit: 
R_PPC64_REL14 against `.text'+1c90
make[1]: *** [vmlinux] Error 1
make: *** [sub-make] Error 2

IE, it breaks currently working configs.

So we need to move more things around and I haven't had a chance to
sort it out.


Ok, I think I sorted it out for now. It's a mess and likely to break
again until we do something more drastic like moving everything that's
after 0x8000 to a separate file but for now that will do. Patch on its
way, I'll also shoot it to Linus today along with a few other things.



Great, thanks a lot!

Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc: Fix "attempt to move .org backwards" error (again)

2014-05-11 Thread Benjamin Herrenschmidt
On Mon, 2014-05-12 at 14:12 +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2014-05-09 at 17:07 -0700, Guenter Roeck wrote:
> > Commit 4e243b7 (powerpc: Fix "attempt to move .org backwards" error) fixes 
> > the
> > allyesconfig build by moving machine_check_common to a different location.
> > While this fixes most of the errors, both allmodconfig and allyesconfig 
> > still
> > fail as follows.
> > 
> > arch/powerpc/kernel/exceptions-64s.S:1315: Error: attempt to move .org 
> > backwards
> > 
> > Fix by moving machine_check_common after the offending address.
> 
> This suffers from the same problem as previous attempts, on some of my
> test configs I get:
> 
> arch/powerpc/kernel/head_64.o:(__ftr_alt_97+0xb0): relocation truncated to 
> fit: R_PPC64_REL14 against `.text'+1c90
> make[1]: *** [vmlinux] Error 1
> make: *** [sub-make] Error 2
> 
> IE, it breaks currently working configs.
> 
> So we need to move more things around and I haven't had a chance to
> sort it out.

Ok, I think I sorted it out for now. It's a mess and likely to break
again until we do something more drastic like moving everything that's
after 0x8000 to a separate file but for now that will do. Patch on its
way, I'll also shoot it to Linus today along with a few other things.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] tick: SHUTDOWN event-dev if no events are required for KTIME_MAX

2014-05-11 Thread Viresh Kumar
Thanks for blasting me off, it might be very helpful going forward :)

On 10 May 2014 01:39, Thomas Gleixner  wrote:
> On Fri, 9 May 2014, Viresh Kumar wrote:

>> diff --git a/kernel/time/tick-oneshot.c b/kernel/time/tick-oneshot.c

>>  int tick_program_event(ktime_t expires, int force)
>>  {
>>   struct clock_event_device *dev = 
>> __this_cpu_read(tick_cpu_device.evtdev);
>> + int ret = 0;
>>
>> - return clockevents_program_event(dev, expires, force);
>> + /* Shut down event device if it is not required for long */
>> + if (unlikely(expires.tv64 == KTIME_MAX)) {
>> + dev->last_mode = dev->mode;
>> + clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
>
> No, we are not doing a state change behind the scene and a magic
> restore. And I know at least one way to make this fall flat on its
> nose, because you are blindly doing dev->last_mode = dev->mode on
> every invocation. So if that gets called twice without a restore in
> between, the device is going to be in shutdown mode forever.

During my tests I had this as well:

if (unlikely(expires.tv64 == KTIME_MAX)) {
+   WARN_ON(dev->mode == CLOCK_EVT_MODE_SHUTDOWN);

But it never got to it and I thought it might never happen, so removed it.
But yes, there should be some check here for that.

> It's moronic anyway as the clock event device has the state
> CLOCK_EVT_MODE_ONESHOT if its active, otherwise we would not be in
> that code path.

Yeah, Missed that earlier.

> But what's even worse: you just define that it's the best way for all
> implementations of clockevents to handle this.
>
> It's definitley NOT. Some startup/shutdown implementations are rather
> complex, so that would burden them with rather big latencies and some
> of them will even outright break.
>
> There is a world outside of YOUR favourite subarch.

:)

> We do not hijack stuff just because we can and it works on some
> machines. We think about it proper.

Agreed..

> If we hijack some existing facility then we audit ALL implementation
> sites and document that we did so and why we are sure that it won't
> break stuff. It still might break some oddball case, but that's not a
> big issue.

Because SHUTDOWN was an existing old API, I thought it will work
without breaking stuff. Yes, I must have done some auditing or made
this an RFC series atleast to get the discussion going forward..

> In the clockevents case we do not even need a new interface, but this
> must be made OPT-in and not a flagday change for all users.
>
> And no we are not going to abuse a feature flag for this. It's not a
> feature.

Okay.

> I'd rather have a new state for this, simply because it is NOT
> shutdown. It is in ONESHOT_STOPPED state. Whether a specific
> implementation will use the SHUTDOWN code for it or not does not
> matter.

Correct.

> That requires a full tree update of all implementations because most
> of them have a switch case for the mode. And adding a state will cause
> all of them which do not have a default clause to omit warnings
> because the mode is an enum for this very reason.
>
> And even if all of them would have a default clause, you'd need a way
> to OPT-In, because some of the defaults have a BUG() in there. Again,
> no feature flag exclusion. See above.

Okay..

> So the right thing to do this is:
>
> 1A) Change the prototype of the set_mode callback to return int and
> fixup all users. Either add the missing default clause or remove
> the existing BUG()/ pr_err()/whatever handling in the existing
> default clause and return a UNIQUE error code.
>
> I know I should have done that from the very beginning, but in
> hindsight one could have done everything better.
>
> coccinelle is your friend (if you need help ask me or Julia
> Lawall). But it's going to be quite some manual work on top.

Sure.

> 1B) Audit the changes and look at the implementations. If the patch is
> just adding the default clause or replacing some BUG/printk error
> handling goto #1C
>
> If it looks like it needs some preparatory care or if you find
> bugs in a particular implementation, roll back the changes and do
> the bug fixes and preparatory changes first as separate patches.
>
> Go back to #1A until the coccinelle patches are just squeaky
> clean.
>
> 1C) Add proper error handling for the various modes to the set_mode
> callback call sites, only two AFAIK.
>
> 2A) Add a new mode ONESHOT_STOPPED. That's safe now as all error
> handling will be done in the core code.
>
> 2B) Implement the ONESHOT_STOPPED logic and make sure all of the core
> code is aware of it.

Okay..

> And don't tell me it can't be done.

No way :)

> I've done it I don't know how many
> times with interrupts, timers, locking and some more. It's hard work,
> but it's valuable and way better than the brainless "make it work for
> me" hackery.

I didn't mean that actually. I just pin pointed how badly things can go

Re: [PATCH V4 2/2] fs/ext4/fsync.c: generic_file_fsync call based on barrier flag

2014-05-11 Thread Fabian Frederick
On Mon, 12 May 2014 11:24:26 +0800
Ming Lei  wrote:

> On Sun, May 11, 2014 at 1:06 AM, Fabian Frederick  wrote:
> > generic_file_fsync has been updated to issue a flush for
> > older filesystems.
> >
> > This patch tests for barrier flag in ext4 mount flags
> > and calls the right function.
> >
> > Suggested-by: Jan Kara 
> > Suggested-by: Christoph Hellwig 
> > Cc: Jan Kara 
> > Cc: Christoph Hellwig 
> > Cc: Alexander Viro 
> > Cc: "Theodore Ts'o" 
> > Cc: Andrew Morton 
> > Signed-off-by: Fabian Frederick 
> > ---
> >  fs/ext4/fsync.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> > index a8bc47f..fa82c0a 100644
> > --- a/fs/ext4/fsync.c
> > +++ b/fs/ext4/fsync.c
> > @@ -108,6 +108,10 @@ int ext4_sync_file(struct file *file, loff_t start, 
> > loff_t end, int datasync)
> >
> > if (!journal) {
> > ret = generic_file_fsync(file, start, end, datasync);
> 
> Forget to remove above line?
Oops, of course ! Thanks a lot, I've sent a new version :)

Regards,
Fabian

> 
> > +   if (test_opt(inode->i_sb, BARRIER))
> > +   ret = generic_file_fsync(file, start, end, 
> > datasync);
> > +   else
> > +   ret = __generic_file_fsync(file, start, end, 
> > datasync);
> > if (!ret && !hlist_empty(&inode->i_dentry))
> > ret = ext4_sync_parent(inode);
> > goto out;
> 
> 
> 
> Thanks,
> -- 
> Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V5 2/2] fs/ext4/fsync.c: generic_file_fsync call based on barrier flag

2014-05-11 Thread Fabian Frederick
generic_file_fsync has been updated to issue a flush for
older filesystems.

This patch tests for barrier flag in ext4 mount flags
and calls the right function.

Suggested-by: Jan Kara 
Suggested-by: Christoph Hellwig 
Cc: Jan Kara 
Cc: Christoph Hellwig 
Cc: Alexander Viro 
Cc: "Theodore Ts'o" 
Cc: Andrew Morton 
Signed-off-by: Fabian Frederick 
---
 fs/ext4/fsync.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index a8bc47f..5b6e9f2 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -107,7 +107,10 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t 
end, int datasync)
}
 
if (!journal) {
-   ret = generic_file_fsync(file, start, end, datasync);
+   if (test_opt(inode->i_sb, BARRIER))
+   ret = generic_file_fsync(file, start, end, datasync);
+   else
+   ret = __generic_file_fsync(file, start, end, datasync);
if (!ret && !hlist_empty(&inode->i_dentry))
ret = ext4_sync_parent(inode);
goto out;
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V5 1/2] FS: Add generic data flush to fsync

2014-05-11 Thread Fabian Frederick
This patch issues a flush in generic_file_fsync.
(Modern filesystems already do it)

Behaviour can be reversed using /sys/devices/.../cache_type
or by calling __generic_file_fsync

Suggested-by: Jan Kara 
Suggested-by: Christoph Hellwig 
Cc: Jan Kara 
Cc: Christoph Hellwig 
Cc: Alexander Viro 
Cc: "Theodore Ts'o" 
Cc: Andrew Morton 
Signed-off-by: Fabian Frederick 
---
v5: patch2/2 ext4 patch fix (Thanks to Ming Lei)
V4: update description
V3: __generic_file_fsync = no flush
V2: No additional flag
V1: First version with MS_BARRIER flag

 fs/libfs.c | 36 +---
 include/linux/fs.h |  1 +
 2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index a184424..4877906 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -3,6 +3,7 @@
  * Library for filesystems writers.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -923,16 +924,19 @@ struct dentry *generic_fh_to_parent(struct super_block 
*sb, struct fid *fid,
 EXPORT_SYMBOL_GPL(generic_fh_to_parent);
 
 /**
- * generic_file_fsync - generic fsync implementation for simple filesystems
+ * __generic_file_fsync - generic fsync implementation for simple filesystems
+ *
  * @file:  file to synchronize
+ * @start: start offset in bytes
+ * @end:   end offset in bytes (inclusive)
  * @datasync:  only synchronize essential metadata if true
  *
  * This is a generic implementation of the fsync method for simple
  * filesystems which track all non-inode metadata in the buffers list
  * hanging off the address_space structure.
  */
-int generic_file_fsync(struct file *file, loff_t start, loff_t end,
-  int datasync)
+int __generic_file_fsync(struct file *file, loff_t start, loff_t end,
+int datasync)
 {
struct inode *inode = file->f_mapping->host;
int err;
@@ -952,10 +956,36 @@ int generic_file_fsync(struct file *file, loff_t start, 
loff_t end,
err = sync_inode_metadata(inode, 1);
if (ret == 0)
ret = err;
+
 out:
mutex_unlock(&inode->i_mutex);
return ret;
 }
+EXPORT_SYMBOL(__generic_file_fsync);
+
+/**
+ * generic_file_fsync - generic fsync implementation for simple filesystems
+ * with flush
+ * @file:  file to synchronize
+ * @start: start offset in bytes
+ * @end:   end offset in bytes (inclusive)
+ * @datasync:  only synchronize essential metadata if true
+ *
+ */
+
+int generic_file_fsync(struct file *file, loff_t start, loff_t end,
+  int datasync)
+{
+   struct inode *inode = file->f_mapping->host;
+   int err;
+
+   err = __generic_file_fsync(file, start, end, datasync);
+   if (err)
+   return err;
+
+   return blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
+
+}
 EXPORT_SYMBOL(generic_file_fsync);
 
 /**
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8780312..c3f46e4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2590,6 +2590,7 @@ extern ssize_t simple_read_from_buffer(void __user *to, 
size_t count,
 extern ssize_t simple_write_to_buffer(void *to, size_t available, loff_t *ppos,
const void __user *from, size_t count);
 
+extern int __generic_file_fsync(struct file *, loff_t, loff_t, int);
 extern int generic_file_fsync(struct file *, loff_t, loff_t, int);
 
 extern int generic_check_addressable(unsigned, u64);
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the gpio tree with the net-next tree

2014-05-11 Thread Stephen Rothwell
Hi Linus,

Today's linux-next merge of the gpio tree got a conflict in
Documentation/driver-model/devres.txt between commit 6d48f44b7b2a
("mdio_bus: implement devm_mdiobus_alloc/devm_mdiobus_free") from the
net-next tree and commit f9748ef13b6a ("gpio: Add missing
device-managed documentation") from the gpio tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc Documentation/driver-model/devres.txt
index d483f2cf221b,8ff1167cfedf..
--- a/Documentation/driver-model/devres.txt
+++ b/Documentation/driver-model/devres.txt
@@@ -310,7 -309,7 +310,12 @@@ SLAVE DMA ENGIN
  SPI
devm_spi_register_master()
  
 +MDIO
 +  devm_mdiobus_alloc()
 +  devm_mdiobus_alloc_size()
 +  devm_mdiobus_free()
++
+ GPIO
+   devm_gpiod_get()
+   devm_gpiod_get_index()
+   devm_gpiod_put()


signature.asc
Description: PGP signature


Re: [PATCHv2 0/2] remap_file_pages() decommission

2014-05-11 Thread Konstantin Khlebnikov
On Mon, May 12, 2014 at 7:36 AM, Andi Kleen  wrote:
> Armin Rigo  writes:
>
>> Here is a note from the PyPy project (mentioned earlier in this
>> thread, and at https://lwn.net/Articles/587923/ ).
>
> Your use is completely bogus. remap_file_pages() pins everything
> and disables any swapping for the area.

Wait, what's wrong with swapping pages from non-linear vmas?
try_to_umap() can handle them, though not very effectively.

Some time ago I was thinking about tracking rmap for non-linear vmas, something
like second-level tree of sub-vmas stored in non-linear vma. This
could be done using
exising vm_area_struct, and in rmap tree everything will looks just as normal.
We'll waste some kernel memory, but it also will remove complexity from rmap and
make non-linear vmas usable for all filesystems not just for shmem.

But it's not worth. I ACK killing it.

Maybe we should keep flag on vma and hide/merge them in proc/maps.
Bloating files/dirs in proc might be bigger problem than non-existent
performance regression.

>
> -Andi
> --
> a...@linux.intel.com -- Speaking for myself only
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] zram: remove global tb_lock by using lock-free CAS

2014-05-11 Thread Minchan Kim
On Sat, May 10, 2014 at 02:10:08PM +0800, Weijie Yang wrote:
> On Thu, May 8, 2014 at 2:24 PM, Minchan Kim  wrote:
> > On Wed, May 07, 2014 at 11:52:59PM +0900, Joonsoo Kim wrote:
> >> >> Most popular use of zram is the in-memory swap for small embedded system
> >> >> so I don't want to increase memory footprint without good reason 
> >> >> although
> >> >> it makes synthetic benchmark. Alhought it's 1M for 1G, it isn't small 
> >> >> if we
> >> >> consider compression ratio and real free memory after boot
> >>
> >> We can use bit spin lock and this would not increase memory footprint for 
> >> 32 bit
> >> platform.
> >
> > Sounds like a idea.
> > Weijie, Do you mind testing with bit spin lock?
> 
> Yes, I re-test them.
> This time, I test each case 10 times, and take the average(KS/s).
> (the test machine and method are same like previous mail's)
> 
> Iozone test result:
> 
>   Test   BASE CAS   spinlock   rwlock  bit_spinlock
> --
>  Initial write  1381094   1425435   1422860   1423075   1421521
>Rewrite  1529479   1641199   1668762   1672855   1654910
>   Read  8468009  11324979  11305569  7273  10997202
>Re-read  8467476  11260914  11248059  11145336  10906486
>   Reverse Read  6821393   8106334   8282174   8279195   8109186
>Stride read  7191093   8994306   9153982   8961224   9004434
>Random read  7156353   8957932   9167098   8980465   8940476
> Mixed workload  4172747   5680814   5927825   5489578   5972253
>   Random write  1483044   1605588   1594329   1600453   1596010
> Pwrite  1276644   1303108   1311612   1314228   1300960
>  Pread  4324337   4632869   4618386   4457870   4500166
> 
> Fio test result:
> 
> Test base CASspinlockrwlock  bit_spinlock
> -
> seq-write   933789   999357   1003298995961   1001958
>  seq-read  5634130  6577930   6380861   6243912   6230006
>seq-rw  1405687  1638117   1640256   1633903   1634459
>   rand-rw  1386119  1614664   1617211   1609267   1612471
> 
> 
> The base is v3.15.0-rc3, the others are per-meta entry lock.
> Every optimization method shows higher performance than the base, however,
> it is hard to say which method is the most appropriate.

It's not too big between CAS and bit_spinlock so I prefer general method.

> 
> To bit_spinlock, the modified code is mainly like this:
> 
> +#define ZRAM_FLAG_SHIFT 16
> +
> enum zram_pageflags {
>   /* Page consists entirely of zeros */
> - ZRAM_ZERO,
> + ZRAM_ZERO = ZRAM_FLAG_SHIFT + 1,
> + ZRAM_ACCESS,
>  
>   __NR_ZRAM_PAGEFLAGS,
>  };
>  
>  /* Allocated for each disk page */
>  struct table {
>   unsigned long handle;
> - u16 size;   /* object size (excluding header) */
> - u8 flags;
> + unsigned long value;

Why does we need to change flags and size "unsigned long value"?
Couldn't we use existing flags with just adding new ZRAM_TABLE_LOCK?


>  } __aligned(4);
> 
> The lower ZRAM_FLAG_SHIFT bits of table.value is size, the higher bits
> is for zram_pageflags. By this means, it doesn't increase any memory
> overhead on both 32-bit and 64-bit system.
> 
> Any complaint or suggestions are welcomed.

Anyway, I'd like to go this way.
Pz, resend formal patch with a number.

Thanks!

> 
> >>
> >> Thanks.
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majord...@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
> >
> > --
> > Kind regards,
> > Minchan Kim
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] af_key: return error when meet errors on sendmsg() syscall

2014-05-11 Thread David Miller
From: Xufeng Zhang 
Date: Fri, 9 May 2014 13:47:35 +0800

> Current implementation for pfkey_sendmsg() always return success
> no matter whether or not error happens during this syscall,
> this is incompatible with the general send()/sendmsg() API:
>   man send
> RETURN VALUE
>   On success, these calls return the number of characters sent.
>   On error, -1 is returned, and errno is set appropriately.
> 
> One side effect this problem introduces is that we can't determine
> when to resend the message when the previous send() fails because
> it was interrupted by signals.
> We detect such a problem when racoon is sending SADBADD message to
> add SAD entry in the kernel, but sometimes kernel is responding with
> "Interrupted system call"(-EINTR) error.
> 
> Check the send implementation of strongswan, it has below logic:
>   pfkey_send_socket()
>   {
>   ...
>   while (TRUE)
>   {
>   len = send(socket, in, in_len, 0);
> 
>   if (len != in_len)
>   {
>   case EINTR:
>   /* interrupted, try again */
>   continue;
>   ...
>   }
>   }
>   ...
> }
> So it makes sense to return errors for send() syscall.  
> 
> Signed-off-by: Xufeng Zhang 

I disagree.

If pfkey_error() is successful, the error will be reported in the AF_KEY
message that is broadcast, there is no reason for sendmsg to return an
error.  The message was sucessfully sent, there was no problem with it's
passage into the AF_KEY layer.

Like netlink, operational responses come in packets, not error codes.

However, if pfkey_error() fails, we must do pass back the original
error code because it's a last ditch effort to prevent information
from being lost.

That's why 'err' must be preserved when pfkey_error() returns zero.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net-next,v3] Add support for netvsc build without CONFIG_SYSFS flag

2014-05-11 Thread David Miller
From: Haiyang Zhang 
Date: Thu,  8 May 2014 15:14:10 -0700

> This change ensures the driver can be built successfully without the
> CONFIG_SYSFS flag.
> MS-TFS: 182270
> 
> Signed-off-by: Haiyang Zhang 
> Reviewed-by: K. Y. Srinivasan 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pinctrl: Add i.MX1 pincontrol driver

2014-05-11 Thread Alexander Shiyan
Mon, 12 May 2014 06:51:13 +0200 от Sascha Hauer :
> On Fri, May 09, 2014 at 08:16:33PM +0400, Alexander Shiyan wrote:
> > This patch adds pincontrol driver for Freescale i.MX1 SOCs.
> > 
> > Signed-off-by: Alexander Shiyan 
> > ---
> >  drivers/pinctrl/Kconfig|   7 ++
> >  drivers/pinctrl/Makefile   |   1 +
> >  drivers/pinctrl/pinctrl-imx1.c | 279 
> > +
> >  3 files changed, 287 insertions(+)
> >  create mode 100644 drivers/pinctrl/pinctrl-imx1.c
> 
> Nice. I thought about adding devicetree support for i.MX1 aswell.
> 
> Don't we need a imx1-pinfunc.h file to make use of this patch?

It will be added along with the DTS template for that CPU architecture.

---

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH 3.14 27/83] ARC: !PREEMPT: Ensure Return to kernel mode is IRQ safe

2014-05-11 Thread Vineet Gupta

On Monday 12 May 2014 12:51 AM, Greg Kroah-Hartman wrote:
> 3.14-stable review patch.  If anyone has any objections, please let me know.
>
> --
>
> From: Vineet Gupta 
>
> commit 8aa9e85adac609588eeec356e5a85059b3b819ba upstream.

Hi Greg,

This one was also marked for stable 3.10 however because the 2 pre-req patches
were not in yet, applying it would have failed and AFAIKR I did describe the 
state
of things in that failure report. Anyhow can you please queue this one up for 
the
next 3.10 stable.

Thx,
-Vineet

>
> There was a very small race window where resume to kernel mode from a
> Exception Path (or pure kernel mode which is true for most of ARC
> exceptions anyways), was not disabling interrupts in restore_regs,
> clobbering the exception regs
>
> Anton found the culprit call flow (after many sleepless nights)
>
> | 1. we got a Trap from user land
> | 2. started to service it.
> | 3. While doing some stuff on user-land memory (I think it is padzero()),
> | we got a DataTlbMiss
> | 4. On return from it we are taking "resume_kernel_mode" path
> | 5. NEED_RESHED is not set, so we go to "return from exception" path in
> | restore regs.
> | 6. there seems to be IRQ happening
>
> Signed-off-by: Vineet Gupta 
> Cc: Anton Kolesov 
> Cc: Francois Bedard 
> Signed-off-by: Linus Torvalds 
> Signed-off-by: Greg Kroah-Hartman 
>
> ---
>  arch/arc/kernel/entry.S |8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> --- a/arch/arc/kernel/entry.S
> +++ b/arch/arc/kernel/entry.S
> @@ -614,11 +614,13 @@ resume_user_mode_begin:
>  
>  resume_kernel_mode:
>  
> -#ifdef CONFIG_PREEMPT
> -
> - ; This is a must for preempt_schedule_irq()
> + ; Disable Interrupts from this point on
> + ; CONFIG_PREEMPT: This is a must for preempt_schedule_irq()
> + ; !CONFIG_PREEMPT: To ensure restore_regs is intr safe
>   IRQ_DISABLE r9
>  
> +#ifdef CONFIG_PREEMPT
> +
>   ; Can't preempt if preemption disabled
>   GET_CURR_THR_INFO_FROM_SP   r10
>   ld  r8, [r10, THREAD_INFO_PREEMPT_COUNT]
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 0/3] TI CPSW Cleanup

2014-05-11 Thread George Cherian
This series does some minimal cleanups.
-Conversion of pr_*() to dev_*()
-Convert kzalloc to devm_kzalloc.

No functional changes.

v1 -> v2 Address review comments.
v2 -> v3 Remove a stale commit comment.

George Cherian (3):
  driver net: cpsw: Convert pr_*() to dev_*() calls
  net: davinci_mdio: Convert pr_err() to dev_err() call
  drivers: net: davinci_cpdma: Convert kzalloc() to devm_kzalloc().

 drivers/net/ethernet/ti/cpsw.c  | 50 -
 drivers/net/ethernet/ti/davinci_cpdma.c | 35 ---
 drivers/net/ethernet/ti/davinci_mdio.c  |  2 +-
 3 files changed, 38 insertions(+), 49 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 1/3] driver net: cpsw: Convert pr_*() to dev_*() calls

2014-05-11 Thread George Cherian
Convert all pr_*() calls to dev_*() calls.
No functional changes.

Signed-off-by: George Cherian 
Reviewed-by: Felipe Balbi 
---
 drivers/net/ethernet/ti/cpsw.c | 50 +-
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index d14c8da..9512738 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1808,25 +1808,25 @@ static int cpsw_probe_dt(struct cpsw_platform_data 
*data,
return -EINVAL;
 
if (of_property_read_u32(node, "slaves", &prop)) {
-   pr_err("Missing slaves property in the DT.\n");
+   dev_err(&pdev->dev, "Missing slaves property in the DT.\n");
return -EINVAL;
}
data->slaves = prop;
 
if (of_property_read_u32(node, "active_slave", &prop)) {
-   pr_err("Missing active_slave property in the DT.\n");
+   dev_err(&pdev->dev, "Missing active_slave property in the 
DT.\n");
return -EINVAL;
}
data->active_slave = prop;
 
if (of_property_read_u32(node, "cpts_clock_mult", &prop)) {
-   pr_err("Missing cpts_clock_mult property in the DT.\n");
+   dev_err(&pdev->dev, "Missing cpts_clock_mult property in the 
DT.\n");
return -EINVAL;
}
data->cpts_clock_mult = prop;
 
if (of_property_read_u32(node, "cpts_clock_shift", &prop)) {
-   pr_err("Missing cpts_clock_shift property in the DT.\n");
+   dev_err(&pdev->dev, "Missing cpts_clock_shift property in the 
DT.\n");
return -EINVAL;
}
data->cpts_clock_shift = prop;
@@ -1838,31 +1838,31 @@ static int cpsw_probe_dt(struct cpsw_platform_data 
*data,
return -ENOMEM;
 
if (of_property_read_u32(node, "cpdma_channels", &prop)) {
-   pr_err("Missing cpdma_channels property in the DT.\n");
+   dev_err(&pdev->dev, "Missing cpdma_channels property in the 
DT.\n");
return -EINVAL;
}
data->channels = prop;
 
if (of_property_read_u32(node, "ale_entries", &prop)) {
-   pr_err("Missing ale_entries property in the DT.\n");
+   dev_err(&pdev->dev, "Missing ale_entries property in the 
DT.\n");
return -EINVAL;
}
data->ale_entries = prop;
 
if (of_property_read_u32(node, "bd_ram_size", &prop)) {
-   pr_err("Missing bd_ram_size property in the DT.\n");
+   dev_err(&pdev->dev, "Missing bd_ram_size property in the 
DT.\n");
return -EINVAL;
}
data->bd_ram_size = prop;
 
if (of_property_read_u32(node, "rx_descs", &prop)) {
-   pr_err("Missing rx_descs property in the DT.\n");
+   dev_err(&pdev->dev, "Missing rx_descs property in the DT.\n");
return -EINVAL;
}
data->rx_descs = prop;
 
if (of_property_read_u32(node, "mac_control", &prop)) {
-   pr_err("Missing mac_control property in the DT.\n");
+   dev_err(&pdev->dev, "Missing mac_control property in the 
DT.\n");
return -EINVAL;
}
data->mac_control = prop;
@@ -1876,7 +1876,7 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
ret = of_platform_populate(node, NULL, NULL, &pdev->dev);
/* We do not want to force this, as in some cases may not have child */
if (ret)
-   pr_warn("Doesn't have any child node\n");
+   dev_warn(&pdev->dev, "Doesn't have any child node\n");
 
for_each_child_of_node(node, slave_node) {
struct cpsw_slave_data *slave_data = data->slave_data + i;
@@ -1893,7 +1893,7 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
 
parp = of_get_property(slave_node, "phy_id", &lenp);
if ((parp == NULL) || (lenp != (sizeof(void *) * 2))) {
-   pr_err("Missing slave[%d] phy_id property\n", i);
+   dev_err(&pdev->dev, "Missing slave[%d] phy_id 
property\n", i);
return -EINVAL;
}
mdio_node = of_find_node_by_phandle(be32_to_cpup(parp));
@@ -1918,18 +1918,18 @@ static int cpsw_probe_dt(struct cpsw_platform_data 
*data,
 
slave_data->phy_if = of_get_phy_mode(slave_node);
if (slave_data->phy_if < 0) {
-   pr_err("Missing or malformed slave[%d] phy-mode 
property\n",
-  i);
+   dev_err(&pdev->dev, "Missing or malformed slave[%d] 
phy-mode property\n",
+   i);
return slave_data->phy_if;
}
 
if (data->dual_emac) {
if (of_property_read_u32(slave_node, 
"dual_emac_res_vlan",
  

[PATCH v3 2/3] net: davinci_mdio: Convert pr_err() to dev_err() call

2014-05-11 Thread George Cherian
Convert the lone pr_err() to dev_err() call.

Signed-off-by: George Cherian 
Reviewed-by: Felipe Balbi 
---
 drivers/net/ethernet/ti/davinci_mdio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/davinci_mdio.c 
b/drivers/net/ethernet/ti/davinci_mdio.c
index 34e97ec..735dc53 100644
--- a/drivers/net/ethernet/ti/davinci_mdio.c
+++ b/drivers/net/ethernet/ti/davinci_mdio.c
@@ -303,7 +303,7 @@ static int davinci_mdio_probe_dt(struct mdio_platform_data 
*data,
return -EINVAL;
 
if (of_property_read_u32(node, "bus_freq", &prop)) {
-   pr_err("Missing bus_freq property in the DT.\n");
+   dev_err(&pdev->dev, "Missing bus_freq property in the DT.\n");
return -EINVAL;
}
data->bus_freq = prop;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 3/3] drivers: net: davinci_cpdma: Convert kzalloc() to devm_kzalloc().

2014-05-11 Thread George Cherian
Convert kzalloc() to devm_kzalloc().

Signed-off-by: George Cherian 
Reviewed-by: Felipe Balbi 
---
 drivers/net/ethernet/ti/davinci_cpdma.c | 35 +++--
 1 file changed, 12 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
b/drivers/net/ethernet/ti/davinci_cpdma.c
index 88ef270..539dbde 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -158,9 +158,9 @@ cpdma_desc_pool_create(struct device *dev, u32 phys, u32 
hw_addr,
int bitmap_size;
struct cpdma_desc_pool *pool;
 
-   pool = kzalloc(sizeof(*pool), GFP_KERNEL);
+   pool = devm_kzalloc(dev, sizeof(*pool), GFP_KERNEL);
if (!pool)
-   return NULL;
+   goto fail;
 
spin_lock_init(&pool->lock);
 
@@ -170,7 +170,7 @@ cpdma_desc_pool_create(struct device *dev, u32 phys, u32 
hw_addr,
pool->num_desc  = size / pool->desc_size;
 
bitmap_size  = (pool->num_desc / BITS_PER_LONG) * sizeof(long);
-   pool->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+   pool->bitmap = devm_kzalloc(dev, bitmap_size, GFP_KERNEL);
if (!pool->bitmap)
goto fail;
 
@@ -187,10 +187,7 @@ cpdma_desc_pool_create(struct device *dev, u32 phys, u32 
hw_addr,
 
if (pool->iomap)
return pool;
-
 fail:
-   kfree(pool->bitmap);
-   kfree(pool);
return NULL;
 }
 
@@ -203,7 +200,6 @@ static void cpdma_desc_pool_destroy(struct cpdma_desc_pool 
*pool)
 
spin_lock_irqsave(&pool->lock, flags);
WARN_ON(pool->used_desc);
-   kfree(pool->bitmap);
if (pool->cpumap) {
dma_free_coherent(pool->dev, pool->mem_size, pool->cpumap,
  pool->phys);
@@ -211,7 +207,6 @@ static void cpdma_desc_pool_destroy(struct cpdma_desc_pool 
*pool)
iounmap(pool->iomap);
}
spin_unlock_irqrestore(&pool->lock, flags);
-   kfree(pool);
 }
 
 static inline dma_addr_t desc_phys(struct cpdma_desc_pool *pool,
@@ -276,7 +271,7 @@ struct cpdma_ctlr *cpdma_ctlr_create(struct cpdma_params 
*params)
 {
struct cpdma_ctlr *ctlr;
 
-   ctlr = kzalloc(sizeof(*ctlr), GFP_KERNEL);
+   ctlr = devm_kzalloc(params->dev, sizeof(*ctlr), GFP_KERNEL);
if (!ctlr)
return NULL;
 
@@ -468,7 +463,6 @@ int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr)
 
cpdma_desc_pool_destroy(ctlr->pool);
spin_unlock_irqrestore(&ctlr->lock, flags);
-   kfree(ctlr);
return ret;
 }
 EXPORT_SYMBOL_GPL(cpdma_ctlr_destroy);
@@ -507,21 +501,22 @@ struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr 
*ctlr, int chan_num,
 cpdma_handler_fn handler)
 {
struct cpdma_chan *chan;
-   int ret, offset = (chan_num % CPDMA_MAX_CHANNELS) * 4;
+   int offset = (chan_num % CPDMA_MAX_CHANNELS) * 4;
unsigned long flags;
 
if (__chan_linear(chan_num) >= ctlr->num_chan)
return NULL;
 
-   ret = -ENOMEM;
-   chan = kzalloc(sizeof(*chan), GFP_KERNEL);
+   chan = devm_kzalloc(ctlr->dev, sizeof(*chan), GFP_KERNEL);
if (!chan)
-   goto err_chan_alloc;
+   return ERR_PTR(-ENOMEM);
 
spin_lock_irqsave(&ctlr->lock, flags);
-   ret = -EBUSY;
-   if (ctlr->channels[chan_num])
-   goto err_chan_busy;
+   if (ctlr->channels[chan_num]) {
+   spin_unlock_irqrestore(&ctlr->lock, flags);
+   devm_kfree(ctlr->dev, chan);
+   return ERR_PTR(-EBUSY);
+   }
 
chan->ctlr  = ctlr;
chan->state = CPDMA_STATE_IDLE;
@@ -551,12 +546,6 @@ struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr 
*ctlr, int chan_num,
ctlr->channels[chan_num] = chan;
spin_unlock_irqrestore(&ctlr->lock, flags);
return chan;
-
-err_chan_busy:
-   spin_unlock_irqrestore(&ctlr->lock, flags);
-   kfree(chan);
-err_chan_alloc:
-   return ERR_PTR(ret);
 }
 EXPORT_SYMBOL_GPL(cpdma_chan_create);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc: Fix "attempt to move .org backwards" error (again)

2014-05-11 Thread Guenter Roeck

On 05/11/2014 09:12 PM, Benjamin Herrenschmidt wrote:

On Fri, 2014-05-09 at 17:07 -0700, Guenter Roeck wrote:

Commit 4e243b7 (powerpc: Fix "attempt to move .org backwards" error) fixes the
allyesconfig build by moving machine_check_common to a different location.
While this fixes most of the errors, both allmodconfig and allyesconfig still
fail as follows.

arch/powerpc/kernel/exceptions-64s.S:1315: Error: attempt to move .org backwards

Fix by moving machine_check_common after the offending address.


This suffers from the same problem as previous attempts, on some of my
test configs I get:

arch/powerpc/kernel/head_64.o:(__ftr_alt_97+0xb0): relocation truncated to fit: 
R_PPC64_REL14 against `.text'+1c90
make[1]: *** [vmlinux] Error 1
make: *** [sub-make] Error 2

IE, it breaks currently working configs.


Oh well, it was worth a try. Can you give me an example for a failing 
configuration ?

Thanks,
Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pinctrl: Add i.MX1 pincontrol driver

2014-05-11 Thread Sascha Hauer
On Fri, May 09, 2014 at 08:16:33PM +0400, Alexander Shiyan wrote:
> This patch adds pincontrol driver for Freescale i.MX1 SOCs.
> 
> Signed-off-by: Alexander Shiyan 
> ---
>  drivers/pinctrl/Kconfig|   7 ++
>  drivers/pinctrl/Makefile   |   1 +
>  drivers/pinctrl/pinctrl-imx1.c | 279 
> +
>  3 files changed, 287 insertions(+)
>  create mode 100644 drivers/pinctrl/pinctrl-imx1.c

Nice. I thought about adding devicetree support for i.MX1 aswell.

Don't we need a imx1-pinfunc.h file to make use of this patch?

Sascha

-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: icmp: account for ICMP out errors because of socket limit

2014-05-11 Thread zhuyj

Hi, Eric && David

This patch is similar to the following patch.

commit 1f8438a853667d48055ad38384c63e94b32c6578
Author: Eric Dumazet 
Date:   Sat Apr 3 15:09:04 2010 -0700

icmp: Account for ICMP out errors

When ip_append() fails because of socket limit or memory shortage,
increment ICMP_MIB_OUTERRORS counter, so that "netstat -s" can report
these errors.

LANG=C netstat -s | grep "ICMP messages failed"
0 ICMP messages failed

For IPV6, implement ICMP6_MIB_OUTERRORS counter as well.

# grep Icmp6OutErrors /proc/net/dev_snmp6/*
/proc/net/dev_snmp6/eth0:Icmp6OutErrors 0
/proc/net/dev_snmp6/lo:Icmp6OutErrors   0

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 

Best Regards!
Zhu Yanjun

On 05/12/2014 11:19 AM, zhuyj wrote:

Hi, Eric && David

    __
|| |  |
| PC |<--->| MIPS 32 core |
|| |__|

When ping from a PC to a board (MIPS 32 core), because of socket limit,
ping echo will fail. But ICMP_MIB_OUTERRORS counter is not incremented.
In this case, "netstat -s" can not report these errors.

This patch will fix this problem. Now it is in the attachment. Please 
check it.


Best Regards!
Zhu Yanjun


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] clk: samsung: out: Add infrastructure to register CLKOUT

2014-05-11 Thread Tushar Behera
On 05/10/2014 09:21 AM, Pankaj Dubey wrote:
> On 05/09/2014 10:00 PM, Tushar Behera wrote:
>> All SoC in Exynos-series have a clock with name XCLKOUT to provide
>> debug information about various clocks available in the SoC. The register
>> controlling the MUX and GATE of this clock is provided within PMU domain.
>> Since PMU domain can't be dedicatedly mapped by every driver, the
>> register
>> needs to be handled through a regmap handle provided by PMU syscon
>> controller. Right now, CCF doesn't allow regmap based MUX and GATE
>> clocks,
>> hence a dedicated clock provider for XCLKOUT is added here.
>>
>> Signed-off-by: Tushar Behera 
>> CC: Tomasz Figa 
>> ---
>>   drivers/clk/samsung/Makefile  |2 +-
>>   drivers/clk/samsung/clk-out.c |  181
>> +
>>   drivers/clk/samsung/clk.h |   33 
>>   3 files changed, 215 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/clk/samsung/clk-out.c
>>

[ ... ]

>> +/**
>> + * struct samsung_clkout_soc_data: SoC specific register details
>> + * @reg: Offset of CLKOUT register from PMU base
> 
> how about naming this variable as "offset" instead of "reg".
> 

Okay, I will change that.

[ ... ]

>> +u8 samsung_clkout_get_parent(struct clk_hw *hw)
>> +{
>> +struct samsung_clkout *clkout = to_clk_out(hw);
>> +const struct samsung_clkout_soc_data *soc_data = clkout->soc_data;
>> +unsigned int parent_mask = BIT(soc_data->mux_width) - 1;
>> +unsigned int val;
>> +int ret;
>> +
>> +ret = regmap_read(clkout->regmap, soc_data->reg, &val);
> 
> Do we really need to keep return value in "ret" as I can't see you are
> using it anywhere?
> 

Right, we are not using that and can be removed.

>> +
>> +return (val >> soc_data->mux_shift) & parent_mask;
>> +}
>> +

[ ... ]

>> +/* All existing Exynos serial of SoCs have common values for this
>> offsets. */
> typo: serial/series/

Sure. Thanks for your review.

-- 
Tushar Behera
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] Add framework to support clkout

2014-05-11 Thread Tushar Behera
On 05/10/2014 09:09 AM, Pankaj Dubey wrote:
> Hi Tushar,
> 
[ ... ]
>> Also we need to find a suitable place to call early_syscon_init(), after
>> the device tree has been unflattened and before clock initialization.
>>
>> While testing, I called this before of_clk_init() in
>> arch/arm/kernel/time.c,
>> but that place is too generic. Calling anywhere from exynos.c is not
>> working ATM.
> 
> IMO we do not need to, or if I am not wrong we should not change time.c.
> 

The above solution is definitely a hack and just to test my stuff. The
below solution looks good.

> It's possible if we have exynos specific init_time with following changes.
> FYI, In my patch series for Exynos PMU [1], currently I am handling this in
> exynos_dt_machine_init. But definitely it can be handled as below and it
> works
> without any side effect and I have tested it. Only reason I do not
> adopted this
> as for Exynos PMU patch support I had other options. But if required and if
> following change is acceptable I can include this in my next version of
> Exynos
> PMU patch series.
> 
> [1]: https://lkml.org/lkml/2014/4/30/18
> 
> 
> +static void __init exynos_init_time(void)
> +{
> +/* Nothing to do timer specific
> + * as early_syscon_init requires DT to be unflattened and
> + * system should be able to allocate memory we need to
> + * postpone until init_time, but it should be done before
> + * init_machine. Because before init_machine, secondary
> + * core boot starts and it uses PMU registers.
> + */
> +
> +exynos_map_pmu();
> +

Instead of calling early_syscon_init() from within exynos_map_pmu(), it
would be good to call it explicitly here before exynos_map_pmu().

> +of_clk_init(NULL);
> +clocksource_of_init();
> +
> +}
> +

-- 
Tushar Behera
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] ARM: dts: at91-sama5d3_xplained: add the regulator device node

2014-05-11 Thread Yang, Wenyou


> -Original Message-
> From: Ferre, Nicolas
> Sent: Friday, May 09, 2014 11:31 PM
> To: Yang, Wenyou; Alexandre Belloni
> Cc: devicet...@vger.kernel.org; linux-kernel@vger.kernel.org;
> robh...@kernel.org; broo...@kernel.org; linux-arm-
> ker...@lists.infradead.org
> Subject: Re: [PATCH] ARM: dts: at91-sama5d3_xplained: add the regulator
> device node
> 
> On 22/04/2014 03:37, Yang, Wenyou :
> > Hi,
> >
> >> -Original Message-
> >> From: Alexandre Belloni [mailto:alexandre.bell...@free-electrons.com]
> >> Sent: Monday, April 21, 2014 8:22 PM
> >> To: Yang, Wenyou
> >> Cc: devicet...@vger.kernel.org; Ferre, Nicolas; linux-
> >> ker...@vger.kernel.org; robh...@kernel.org; broo...@kernel.org;
> >> linux- arm-ker...@lists.infradead.org
> >> Subject: Re: [PATCH] ARM: dts: at91-sama5d3_xplained: add the
> >> regulator device node
> >>
> >> On 21/04/2014 at 11:54:43 +0200, Alexandre Belloni wrote :
> >>> Hi,
> >>>
> >>> On 21/04/2014 at 12:29:07 +0800, Wenyou Yang wrote :
>  +
>  +vddana_reg: LDO_REG2 {
>  +regulator-name =
> "VDDANA";
>  +
>  regulator-min-microvolt
> =
> >> <330>;
>  +
>  regulator-max-microvolt
> =
> >> <330>;
>  +
>  regulator-always-on;
> >>>
> >>> I'm pretty sure that one is not always on as you actually have to
> >>> configure it to get any voltage. Are you sure you want to set the
> >>> regulator-always-on property here ?
> >>>
> >>
> >> Just to clarify my though, wouldn't it be better to make the ADC
> >> driver handle that regulator instead of using regulator-always-on ?
> > Yes, you are right.
> > It should not use regulator-always-on property for this regulator.
> > It is ADC driver and ISI driver to handle it(The ISI takes PCK for
> clock).
> 
> Hi Wenyou and Alexandre,
> 
> After talking to our system engineers, it not usual to avoid to power
> the VDDANA rail. In fact it will prevent you to use all the pads that
> are powered by VDDANA: PD20-PD31. Moreover, even if you do not activate
> the ADC output on these lines you won't be able to use them as plain
> GPIO... (Cf. package and pinout section of the datasheet).
> 
> As the ADVREF pin of the SoC is connected to the VDDANA on this board
> (even if this default configuration can be modified with a soldering
> iron), we have to note that we may consume a little bit more power.
> 
> But still, I would recommend to keep the "regulator-always-on" property
> on this node. Do you agree and allow me to take your first revision of
> the patch?
I agree.

> 
> 
> Bye,
> --
> Nicolas Ferre

Best Regards,
Wenyou Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] cpufreq: powernow-k8: Suppress checkpatch warnings

2014-05-11 Thread Viresh Kumar
On 11 May 2014 22:56, Stratos Karafotis  wrote:
> Suppress the following checkpatch.pl warnings:
>
> - WARNING: Prefer pr_err(... to printk(KERN_ERR ...
> - WARNING: Prefer pr_info(... to printk(KERN_INFO ...
> - WARNING: Prefer pr_warn(... to printk(KERN_WARNING ...
> - WARNING: quoted string split across lines
> - WARNING: please, no spaces at the start of a line
>
> Also, define the pr_fmt macro instead of PFX for the module name.
>
> Signed-off-by: Stratos Karafotis 
> ---
>
> Changes v1 -> v2
> - Use pr_err_once instead of printk_once
> - Change missing_pss_msg to macro (because pr_err_once
> doesn't compile otherwise)
> - Put one pr_err message in a single line instead of two
> - Ignore "line over 80 characters" warnings
> - Change the word "Fix" in the subject of the patch to
> "Suppress" as the patch doesn't really fix anything
>
>  drivers/cpufreq/powernow-k8.c | 180 
> +-
>  drivers/cpufreq/powernow-k8.h |   2 +-
>  2 files changed, 74 insertions(+), 108 deletions(-)

Acked-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc: Fix "attempt to move .org backwards" error (again)

2014-05-11 Thread Benjamin Herrenschmidt
On Fri, 2014-05-09 at 17:07 -0700, Guenter Roeck wrote:
> Commit 4e243b7 (powerpc: Fix "attempt to move .org backwards" error) fixes the
> allyesconfig build by moving machine_check_common to a different location.
> While this fixes most of the errors, both allmodconfig and allyesconfig still
> fail as follows.
> 
> arch/powerpc/kernel/exceptions-64s.S:1315: Error: attempt to move .org 
> backwards
> 
> Fix by moving machine_check_common after the offending address.

This suffers from the same problem as previous attempts, on some of my
test configs I get:

arch/powerpc/kernel/head_64.o:(__ftr_alt_97+0xb0): relocation truncated to fit: 
R_PPC64_REL14 against `.text'+1c90
make[1]: *** [vmlinux] Error 1
make: *** [sub-make] Error 2

IE, it breaks currently working configs.

So we need to move more things around and I haven't had a chance to
sort it out.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fs: cifs: new helper: file_inode(file)

2014-05-11 Thread Steve French
merged into cifs-2.6.git for-next

On Tue, Dec 10, 2013 at 9:02 PM, Libo Chen  wrote:
>
> Signed-off-by: Libo Chen 
> ---
>  fs/cifs/ioctl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c
> index 7749230..45cb59b 100644
> --- a/fs/cifs/ioctl.c
> +++ b/fs/cifs/ioctl.c
> @@ -85,7 +85,7 @@ static long cifs_ioctl_clone(unsigned int xid, struct file 
> *dst_file,
> goto out_fput;
> }
>
> -   src_inode = src_file.file->f_dentry->d_inode;
> +   src_inode = file_inode(src_file.file);
>
> /*
>  * Note: cifs case is easier than btrfs since server responsible for
> --
> 1.8.2.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Thanks,

Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


3.14.3 i915 dead display under X11

2014-05-11 Thread Carbonated Beverage
Hi all,

I rarely upgrade kernels these days -- so when updating to 3.14.3, I found
the X display was blank -- switching to a text console appears to work, but
I still have to type blind.

Symptoms:

Text mode and KMS works correctly to come up with the text console.  Running
X (whether through xdm or /usr/bin/Xorg) causes the display to go blank
and apparently turn off.  Switching to a text console via Control-Alt-F#
leaves a mostly blank screen up, but there are brief flashes where it looks
like the contents of the text console gets rendered once every 5 seconds or so,
but so fast no words or letters can be recognized.

System:

* Thinkpad R61
* 00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 
Integrated Graphics Controller (primary) (rev 0c)
* Debian/wheezy
* xserver-xorg-video-intel 2:2.19.0-6

A diff of the Xorg.0.log (with timestamps removed, as it made almost every line
show up in the diff) trimmed down shows:

@@ -53,7 +53,7 @@
  (==) |-->Input Device ""
  (==) The core keyboard device wasn't specified explicitly in the layout.
Using the default keyboard configuration.
- (II) Loader magic: 0x7f2813451ae0
+ (II) Loader magic: 0x7f492c35aae0
  (II) Module ABI versions:
X.Org ANSI C Emulation: 0.4
X.Org Video Driver: 12.1
@@ -170,15 +170,17 @@
Sandybridge Server, Ivybridge Mobile (GT1), Ivybridge Mobile (GT2),
Ivybridge Desktop (GT1), Ivybridge Desktop (GT2), Ivybridge Server,
Ivybridge Server (GT2)
- (--) using VT number 7
+ (++) using VT number 7
 
+ (WW) xf86OpenConsole: setpgid failed: Operation not permitted
+ (WW) xf86OpenConsole: setsid failed: Operation not permitted
  (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
  drmOpenDevice: node name is /dev/dri/card0
- drmOpenDevice: open result is 10, (OK)
+ drmOpenDevice: open result is 8, (OK)
  drmOpenByBusid: Searching for BusID pci::00:02.0
  drmOpenDevice: node name is /dev/dri/card0
- drmOpenDevice: open result is 10, (OK)
- drmOpenByBusid: drmOpenMinor returns 10
+ drmOpenDevice: open result is 8, (OK)
+ drmOpenByBusid: drmOpenMinor returns 8
  drmOpenByBusid: drmGetBusid reports pci::00:02.0
  (**) intel(0): Depth 16, (--) framebuffer bpp 16
  (==) intel(0): RGB weight 565
@@ -387,7 +389,17 @@
  (II) AutoAddDevices is off - not adding device.
  (II) config/udev: Adding input device ThinkPad Extra Buttons 
(/dev/input/event5)
  (II) AutoAddDevices is off - not adding device.
- (II) AIGLX: Suspending AIGLX clients for VT switch
- (II) UnloadModule: "kbd"
- (II) UnloadModule: "mouse"
- Server terminated successfully (0). Closing log file.
+ (II) intel(0): EDID vendor "LEN", prod id 16435
+ (II) intel(0): Printing DDC gathered Modelines:
+ (II) intel(0): Modeline "1440x900"x0.0   97.78  1440 1488 1520 1760  900 903 
909 926 -hsync -vsync (55.6 kHz eP)
+ (II) intel(0): Modeline "1440x900"x0.0   81.49  1440 1488 1520 1760  900 903 
909 926 -hsync -vsync (46.3 kHz e)
+ (II) intel(0): Modeline "800x600"x0.0   40.00  800 840 968 1056  600 601 605 
628 +hsync +vsync (37.9 kHz e)
+ (II) intel(0): Modeline "640x480"x0.0   25.18  640 656 752 800  480 490 492 
525 -hsync -vsync (31.5 kHz e)
+ (II) intel(0): Modeline "1024x768"x0.0   65.00  1024 1048 1184 1344  768 771 
777 806 -hsync -vsync (48.4 kHz e)
+ (II) intel(0): EDID vendor "LEN", prod id 16435
+ (II) intel(0): Printing DDC gathered Modelines:
+ (II) intel(0): Modeline "1440x900"x0.0   97.78  1440 1488 1520 1760  900 903 
909 926 -hsync -vsync (55.6 kHz eP)
+ (II) intel(0): Modeline "1440x900"x0.0   81.49  1440 1488 1520 1760  900 903 
909 926 -hsync -vsync (46.3 kHz e)
+ (II) intel(0): Modeline "800x600"x0.0   40.00  800 840 968 1056  600 601 605 
628 +hsync +vsync (37.9 kHz e)
+ (II) intel(0): Modeline "640x480"x0.0   25.18  640 656 752 800  480 490 492 
525 -hsync -vsync (31.5 kHz e)
+ (II) intel(0): Modeline "1024x768"x0.0   65.00  1024 1048 1184 1344  768 771 
777 806 -hsync -vsync (48.4 kHz e)

Bisecting from 3.13.6 (good) to 3.14.3 (bad) ended up with...

commit b35684b8fa94e04f55fd38bf672b737741d2f9e2
Author: Jani Nikula 
Date:   Thu Nov 14 12:13:41 2013 +0200

drm/i915: do full backlight setup at enable time

We should now have all the information we need to do a full
initialization of the backlight registers.

v2: Keep QUIRK_NO_PCH_PWM_ENABLE for now (Imre).

Signed-off-by: Jani Nikula 
Reviewed-by: Imre Deak 
Signed-off-by: Daniel Vetter 

Which is in 3.12.0

I'm not sure how that came to be.  Does that look right?  What other
information would be required to track this down?

Thanks,

-- DN
Daniel Nobuto
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: arch_random_refill

2014-05-11 Thread H. Peter Anvin
On 05/11/2014 08:36 PM, Stephan Mueller wrote:
> 
> But in our current predicament, not everybody trusts a few potentially easily 
> manipulated gates that have no other purpose than produce white noise which 
> are developed by the biggest chip vendor in the US. Gates which have other 
> purposes may not be that easily manipulated.
>

Incidentally, I disagree with the "easily manipulated" bit.  Yes, I have
seen the paper which says that you can do it in such a way that it
doesn't show up on *visual* examination.  However, put an electrical
probe on it and it shows up immediately.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: arch_random_refill

2014-05-11 Thread H. Peter Anvin
On 05/11/2014 08:36 PM, Stephan Mueller wrote:
> 
> Ohh, ok, thanks for fixing that. :-) 
> 
> Though what makes me wonder is the following: why are some RNGs forced to use 
> the hw_random framework whereas some others are not? What is the driver for 
> that?
> 
> The current state of random.c vs. drivers/char/hw_random and the strong in-
> kernel separation between both makes me wonder. Isn't that all kind of 
> inconsistent?
> 

The main differences are speed of access, trivial interface, and
architectural guarantees.  You also don't have to deal with enumeration,
DMA engines, interrupts, indirect access, or bus drivers, which all are
utterly unacceptable on a synchronous path.

That being said, it is getting clear that we most likely would be better
off with the kernel directly feeding from at least a subset of the
hw_random drivers, rather than waiting for user space to come along and
launch a daemon... after $DEITY knows how many other processes have
already been launched.  There are patches being worked on to make that
happen, although there are a fair number of potential issues, including
the fact that some of the hw_random drivers are believed to be dodgy --
for example, the TPM driver: some TPMs are believed to not contain any
entropy element and simply rely on a factory-seeded nonvolatile counter
(since the TPM has to have support for nonvolatile counters anyway, this
hardware is already present.)

-hpa



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] staging/lustre: fix sparse warnings in o2iblnd_cb.c

2014-05-11 Thread Zi Shen Lim
This patch fixes the following sparse warnings:

drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:44:1: warning: symbol 
'kiblnd_tx_done' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:102:10: warning: symbol 
'kiblnd_get_idle_tx' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:131:1: warning: symbol 
'kiblnd_drop_rx' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:212:10: warning: symbol 
'kiblnd_find_waiting_tx_locked' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:238:1: warning: symbol 
'kiblnd_handle_completion' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:277:1: warning: symbol 
'kiblnd_send_completion' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:296:1: warning: symbol 
'kiblnd_handle_rx' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:457:1: warning: symbol 
'kiblnd_rx_complete' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:527:13: warning: symbol 
'kiblnd_kvaddr_to_page' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:699:1: warning: symbol 
'kiblnd_setup_rd_iov' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:752:1: warning: symbol 
'kiblnd_setup_rd_kiov' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:792:1: warning: symbol 
'kiblnd_post_tx_locked' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:996:1: warning: symbol 
'kiblnd_tx_complete' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1270:1: warning: symbol 
'kiblnd_connect_peer' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1627:1: warning: symbol 
'kiblnd_reply' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1814:1: warning: symbol 
'kiblnd_thread_fini' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1828:1: warning: symbol 
'kiblnd_peer_notify' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1934:1: warning: symbol 
'kiblnd_handle_early_rxs' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1957:1: warning: symbol 
'kiblnd_abort_txs' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:1993:1: warning: symbol 
'kiblnd_finalise_conn' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:2167:1: warning: symbol 
'kiblnd_reject' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:2178:1: warning: symbol 
'kiblnd_passive_connect' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:2452:1: warning: symbol 
'kiblnd_reconnect' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:2516:1: warning: symbol 
'kiblnd_rejected' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:2655:1: warning: symbol 
'kiblnd_check_connreply' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:2754:1: warning: symbol 
'kiblnd_active_connect' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:3025:1: warning: symbol 
'kiblnd_check_conns' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:3108:1: warning: symbol 
'kiblnd_disconnect_conn' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:3247:1: warning: symbol 
'kiblnd_complete' was not declared. Should it be static?
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c:904:20: warning: context 
imbalance in 'kiblnd_post_tx_locked' - unexpected unlock

Signed-off-by: Zi Shen Lim 
---
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 60 +++---
 1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c 
b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 9bf6c94..dfd16e7 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -40,7 +40,7 @@
 
 #include "o2iblnd.h"
 
-void
+static void
 kiblnd_tx_done (lnet_ni_t *ni, kib_tx_t *tx)
 {
lnet_msg_t *lntmsg[2];
@@ -99,7 +99,7 @@ kiblnd_txlist_done (lnet_ni_t *ni, struct list_head *txlist, 
int status)
}
 }
 
-kib_tx_t *
+static kib_tx_t *
 kiblnd_get_idle_

Re: arch_random_refill

2014-05-11 Thread Stephan Mueller
Am Sonntag, 11. Mai 2014, 20:22:28 schrieb H. Peter Anvin:

Hi Peter,
> 
> > Note, I do not see an issue with the patch that adds RDSEED as part of
> > add_interrupt_randomness outlined in [2]. The reason is that this patch
> > does not monopolizes the noise sources.
> > 
> > I do not want to imply that Intel (or any other chip manufacturer that
> > will
> > hook into arch_random_refill) intentionally provides bad entropy (and this
> > email shall not start a discussion about entropy again), but I would like
> > to be able to only use noise sources that I can fully audit. As it is
> > with hardware, I am not able to see what it is doing.
> 
> I have to point out the irony in this given your previous proposals,
> however...

I guess that is the funny nature of entropy :-)

But in our current predicament, not everybody trusts a few potentially easily 
manipulated gates that have no other purpose than produce white noise which 
are developed by the biggest chip vendor in the US. Gates which have other 
purposes may not be that easily manipulated.
> 
> > Thus, may I ask that arch_random_refill is revised such that it will not
> > monopolize the noise sources? If somebody wants that, he can easily use
> > rngd.
> Feel free to build the kernel without CONFIG_ARCH_RANDOM, or use the
> "nordrand" option to the kernel.  These options are there for a reason.
> 
> Now when you mention it, though, the nordrand option should turn off
> RDSEED as well as RDRAND.  It currently doesn't; that is a bug, plain
> and simple.

Ohh, ok, thanks for fixing that. :-) 

Though what makes me wonder is the following: why are some RNGs forced to use 
the hw_random framework whereas some others are not? What is the driver for 
that?

The current state of random.c vs. drivers/char/hw_random and the strong in-
kernel separation between both makes me wonder. Isn't that all kind of 
inconsistent?

Ciao
Stephan
-- 
| Cui bono? |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 0/2] remap_file_pages() decommission

2014-05-11 Thread Andi Kleen
Armin Rigo  writes:

> Here is a note from the PyPy project (mentioned earlier in this
> thread, and at https://lwn.net/Articles/587923/ ).

Your use is completely bogus. remap_file_pages() pins everything 
and disables any swapping for the area.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kmemcheck: got WARNING when dynamicly adjust /proc/sys/kernel/kmemcheck to 0/1

2014-05-11 Thread Xishi Qiu
On 2014/5/9 18:02, Vegard Nossum wrote:

> On 05/09/2014 11:52 AM, Xishi Qiu wrote:
>> On 2014/5/9 15:57, Xishi Qiu wrote:
>>
>>> OS boot with kmemcheck=0, then set 1, do something, set 0, do something, 
>>> set 1...
>>> then I got the WARNING log. Does kmemcheck support dynamicly adjust?
>>>
>>> Thanks,
>>> Xishi Qiu
>>>
>>> [   20.200305] igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow 
>>> Control: RX
>>> [   20.208652] ADDRCONF(NETDEV_UP): eth0: link is not ready
>>> [   20.216504] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>> [   22.647385] auditd (3116): /proc/3116/oom_adj is deprecated, please use 
>>> /proc/3116/oom_score_adj instead.
>>> [   24.845214] BIOS EDD facility v0.16 2004-Jun-25, 1 devices found
>>> [   30.434764] eth0: no IPv6 routers present
>>> [  340.154608] NOHZ: local_softirq_pending 01
>>> [  340.154639] WARNING: kmemcheck: Caught 64-bit read from uninitialized 
>>> memory (88083f43a550)
>>> [  340.154644] 
>>> c20080ff5d0100c9400ed34e0888
>>> [  340.154667]  u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u 
>>> u u
>>> [  340.154687]  ^
>>> [  340.154690]
>>> [  340.154694] Pid: 3, comm: ksoftirqd/0 Tainted: G C   
>>> 3.4.24-qiuxishi.19-0.1-default+ #2 Huawei Technologies Co., Ltd. Tecal 
>>> RH2285 V2-24S/BC11SRSC1
>>> [  340.154702] RIP: 0010:[]  [] 
>>> d_namespace_path+0x132/0x270
>>> [  340.154714] RSP: 0018:8808515a1c88  EFLAGS: 00010202
>>> [  340.154718] RAX: 88083f43a540 RBX: 880852e718f3 RCX: 
>>> 0001
>>> [  340.154721] RDX: 8808515a1d28 RSI:  RDI: 
>>> 881053855a60
>>> [  340.154725] RBP: 8808515a1ce8 R08: 8808515a1c50 R09: 
>>> 880852e75800
>>> [  340.154728] R10: 000156f0 R11:  R12: 
>>> 0001
>>> [  340.154731] R13: 0100 R14: 880852e71510 R15: 
>>> 880852e71800
>>> [  340.154736] FS:  () GS:88085f60() 
>>> knlGS:
>>> [  340.154740] CS:  0010 DS:  ES:  CR0: 8005003b
>>> [  340.154743] CR2: 880852e71570 CR3: 0008513f2000 CR4: 
>>> 000407f0
>>> [  340.154746] DR0:  DR1:  DR2: 
>>> 
>>> [  340.154750] DR3:  DR6: 4ff0 DR7: 
>>> 0400
>>> [  340.154753]  [] aa_path_name+0x85/0x180
>>> [  340.154758]  [] apparmor_bprm_set_creds+0x126/0x520
>>> [  340.154763]  [] security_bprm_set_creds+0xe/0x10
>>> [  340.154771]  [] prepare_binprm+0xa5/0x100
>>> [  340.154777]  [] do_execve_common+0x232/0x430
>>> [  340.154781]  [] do_execve+0x3a/0x40
>>> [  340.154785]  [] sys_execve+0x49/0x70
>>> [  340.154793]  [] stub_execve+0x6c/0xc0
>>> [  340.154801]  [] 0x
>>> [  340.154813] WARNING: kmemcheck: Caught 64-bit read from uninitialized 
>>> memory (88083f43a570)
>>> [  340.154817] 
>>> 746f730078a5433f0888f86d433f0888746f7073
>>> [  340.154839]  u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u 
>>> u u
>>> [  340.154858]  ^
>>> [  340.154861]
>>> [  340.154864] Pid: 3, comm: ksoftirqd/0 Tainted: G C   
>>> 3.4.24-qiuxishi.19-0.1-default+ #2 Huawei Technologies Co., Ltd. Tecal 
>>> RH2285 V2-24S/BC11SRSC1
>>> [  340.154871] RIP: 0010:[]  [] 
>>> rw_verify_area+0x24/0x100
>>> [  340.154880] RSP: 0018:8808515a1dc8  EFLAGS: 00010202
>>> [  340.154883] RAX: 88083f43a540 RBX: 0080 RCX: 
>>> 0080
>>> [  340.154887] RDX: 8808515a1e30 RSI: 880852e71500 RDI: 
>>> 
>>> [  340.154890] RBP: 8808515a1de8 R08: 880852e73200 R09: 
>>> 88085f004900
>>> [  340.154894] R10: 880852e72600 R11:  R12: 
>>> 880852e71500
>>> [  340.154897] R13:  R14: 880852e73200 R15: 
>>> 0001
>>> [  340.154901] FS:  () GS:88085f60() 
>>> knlGS:
>>> [  340.154905] CS:  0010 DS:  ES:  CR0: 8005003b
>>> [  340.154908] CR2: 880852e71570 CR3: 0008513f2000 CR4: 
>>> 000407f0
>>> [  340.154911] DR0:  DR1:  DR2: 
>>> 
>>> [  340.154914] DR3:  DR6: 4ff0 DR7: 
>>> 0400
>>> [  340.154917]  [] vfs_read+0xa4/0x130
>>> [  340.154922]  [] kernel_read+0x44/0x60
>>> [  340.154926]  [] prepare_binprm+0xd0/0x100
>>> [  340.154931]  [] do_execve_common+0x232/0x430
>>> [  340.154935]  [] do_execve+0x3a/0x40
>>> [  340.154939]  [] sys_execve+0x49/0x70
>>> [  340.154944]  [] stub_execve+0x6c/0xc0
>>> [  340.154950]  [] 0x
>>> [  340.154955] WARNING: kmemcheck: Caught 32-bit read from uninitialized 
>>> memory (88083f43a540)
>>> [  340.154959] 
>>> c20080ff5d0100c9400ed34e0888
>>> [  340.154981]  u u u u u u u u u u u u u u u u i i i 

[tip:x86/urgent] x86, rdrand: When nordrand is specified, disable RDSEED as well

2014-05-11 Thread tip-bot for H. Peter Anvin
Commit-ID:  7a5091d58419b4e5222abce58a40c072786ea1d6
Gitweb: http://git.kernel.org/tip/7a5091d58419b4e5222abce58a40c072786ea1d6
Author: H. Peter Anvin 
AuthorDate: Sun, 11 May 2014 20:25:20 -0700
Committer:  H. Peter Anvin 
CommitDate: Sun, 11 May 2014 20:25:20 -0700

x86, rdrand: When nordrand is specified, disable RDSEED as well

One can logically expect that when the user has specified "nordrand",
the user doesn't want any use of the CPU random number generator,
neither RDRAND nor RDSEED, so disable both.

Reported-by: Stephan Mueller 
Cc: Theodore Ts'o 
Link: http://lkml.kernel.org/r/21542339.0lfnpsy...@myon.chronox.de
Signed-off-by: H. Peter Anvin 
---
 Documentation/kernel-parameters.txt | 8 
 arch/x86/kernel/cpu/rdrand.c| 1 +
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 4384217..30a8ad0d 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2218,10 +2218,10 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
noreplace-smp   [X86-32,SMP] Don't replace SMP instructions
with UP alternatives
 
-   nordrand[X86] Disable the direct use of the RDRAND
-   instruction even if it is supported by the
-   processor.  RDRAND is still available to user
-   space applications.
+   nordrand[X86] Disable kernel use of the RDRAND and
+   RDSEED instructions even if they are supported
+   by the processor.  RDRAND and RDSEED are still
+   available to user space applications.
 
noresume[SWSUSP] Disables resume and restores original swap
space.
diff --git a/arch/x86/kernel/cpu/rdrand.c b/arch/x86/kernel/cpu/rdrand.c
index 384df51..136ac74 100644
--- a/arch/x86/kernel/cpu/rdrand.c
+++ b/arch/x86/kernel/cpu/rdrand.c
@@ -27,6 +27,7 @@
 static int __init x86_rdrand_setup(char *s)
 {
setup_clear_cpu_cap(X86_FEATURE_RDRAND);
+   setup_clear_cpu_cap(X86_FEATURE_RDSEED);
return 1;
 }
 __setup("nordrand", x86_rdrand_setup);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   >