date:20140316

Re: [PATCH 1/5] perf tests: Add tip/pid mmap automated tests

2014-03-16 Thread Jiri Olsa

On Fri, Mar 14, 2014 at 05:24:21PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Fri, Mar 14, 2014 at 03:00:02PM +0100, Jiri Olsa escreveu:
> > Adding automated test for memory maps lookup within
> > multiple machines threads.
> 
>   CC   /tmp/build/perf/arch/x86/util/tsc.o
> tests/mmap-events.c: In function ‘mmap_events’:
> tests/mmap-events.c:157:18: error: declaration of ‘thread’ shadows a global 
> declaration [-Werror=shadow]
> tests/mmap-events.c:46:14: error: shadowed declaration is here 
> [-Werror=shadow]

hum, I thought this one was fixed already.. will check

> tests/mmap-events.c: In function ‘thread’:
> tests/mmap-events.c:54:7: error: ignoring return value of ‘write’, declared 
> with attribute warn_unused_result [-Werror=unused-result]

got this one fixed in new version

thanks,
jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/8] printk: Remove separate printk_sched buffers and use printk buf instead

2014-03-16 Thread Jan Kara

From: Steven Rostedt 

To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created. The issue is that printk has a console_sem
that it can grab and release. The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler. This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.

What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag. On a timer tick, if this flag is
set, the printk() is done against the buffer.

There's a couple of issues with this approach.

1) If two printk_sched()s are called before the tick, the second one
will overwrite the first one.

2) The temporary buffer is 512 bytes and is per cpu. This is a quite a
bit of space wasted for something that is seldom used.

In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.

Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups. Unfortunately, printk()
also has a console_sem that it uses, and on release, the
up(&console_sem) may do a wake up of any pending waiters. This must be
avoided while holding the logbuf_lock.

Signed-off-by: Steven Rostedt 
Signed-off-by: Jan Kara 
---
 kernel/printk/printk.c | 47 +--
 1 file changed, 29 insertions(+), 18 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 56649edfae9c..91c554e027c5 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -68,6 +68,9 @@ int console_printk[4] = {
DEFAULT_CONSOLE_LOGLEVEL,   /* default_console_loglevel */
 };
 
+/* Deferred messaged from sched code are marked by this special level */
+#define SCHED_MESSAGE_LOGLEVEL -2
+
 /*
  * Low level drivers may need that to know if they can schedule in
  * their unblank() callback or not. So let's export it.
@@ -206,7 +209,9 @@ struct printk_log {
 };
 
 /*
- * The logbuf_lock protects kmsg buffer, indices, counters.
+ * The logbuf_lock protects kmsg buffer, indices, counters.  This can be taken
+ * within the scheduler's rq lock. It must be released before calling
+ * console_unlock() or anything else that might wake up a process.
  */
 static DEFINE_RAW_SPINLOCK(logbuf_lock);
 
@@ -1473,14 +1478,19 @@ asmlinkage int vprintk_emit(int facility, int level,
static int recursion_bug;
static char textbuf[LOG_LINE_MAX];
char *text = textbuf;
-   size_t text_len;
+   size_t text_len = 0;
enum log_flags lflags = 0;
unsigned long flags;
int this_cpu;
int printed_len = 0;
+   bool in_sched = false;
/* cpu currently holding logbuf_lock in this function */
static volatile unsigned int logbuf_cpu = UINT_MAX;
 
+   if (level == SCHED_MESSAGE_LOGLEVEL) {
+   level = -1;
+   in_sched = true;
+   }
 
boot_delay_msec(level);
printk_delay();
@@ -1527,7 +1537,12 @@ asmlinkage int vprintk_emit(int facility, int level,
 * The printf needs to come first; we need the syslog
 * prefix which might be passed-in as a parameter.
 */
-   text_len = vscnprintf(text, sizeof(textbuf), fmt, args);
+   if (in_sched)
+   text_len = scnprintf(text, sizeof(textbuf),
+KERN_WARNING "[sched_delayed] ");
+
+   text_len += vscnprintf(text + text_len,
+  sizeof(textbuf) - text_len, fmt, args);
 
/* mark and strip a trailing newline */
if (text_len && text[text_len-1] == '\n') {
@@ -1602,6 +1617,10 @@ asmlinkage int vprintk_emit(int facility, int level,
lockdep_on();
local_irq_restore(flags);
 
+   /* If called from the scheduler, we can not call up(). */
+   if (in_sched)
+   return printed_len;
+
/*
 * Disable preemption to avoid being preempted while holding
 * console_sem which would prevent anyone from printing to console
@@ -2423,21 +2442,19 @@ late_initcall(printk_late_init);
 /*
  * Delayed printk version, for scheduler-internal messages:
  */
-#define PRINTK_BUF_SIZE512
-
 #define PRINTK_PENDING_WAKEUP  0x01
-#define PRINTK_PENDING_SCHED   0x02
+#define PRINTK_PENDING_OUTPUT  0x02
 
 static DEFINE_PER_CPU(int, printk_pending);
-static DEFINE_PER_CPU(char [PRINTK_BUF_SIZE], printk_sched_buf);
 
 static void wake_up_klogd_work_func(struct irq_work *irq_work)
 {
int pending = __this_cpu_xchg(printk_pending, 0);
 
-   if (pending & PRINTK_PENDING_SCHED) {
-   char *buf = __get_cpu_var(printk_sched_buf);
-   pr_warn("[sched_delayed] %s", buf);
+   if (pending & PRINTK_PENDING_OUTPUT) {
+

[PATCH 2/8] printk: Release lockbuf_lock before calling console_trylock_for_printk()

2014-03-16 Thread Jan Kara

There's no reason to hold lockbuf_lock when entering
console_trylock_for_printk(). The first thing this function does is
calling down_trylock(console_sem) and if that fails it immediately
unlocks lockbuf_lock. So lockbuf_lock isn't needed for that branch.
When down_trylock() succeeds, the rest of console_trylock() is OK
without lockbuf_lock (it is called without it from other places), and
the only remaining thing in console_trylock_for_printk() is
can_use_console() call. For that call console_sem is enough (it
iterates all consoles and checks CON_ANYTIME flag).

So we drop logbuf_lock before entering console_trylock_for_printk()
which simplifies the code.

Signed-off-by: Jan Kara 
---
 kernel/printk/printk.c | 49 +
 1 file changed, 17 insertions(+), 32 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index bd7ee2a9f960..7a8ffd89875c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -249,9 +249,6 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
-/* cpu currently holding logbuf_lock */
-static volatile unsigned int logbuf_cpu = UINT_MAX;
-
 /* human readable text of the record */
 static char *log_text(const struct printk_log *msg)
 {
@@ -1332,36 +1329,22 @@ static inline int can_use_console(unsigned int cpu)
  * messages from a 'printk'. Return true (and with the
  * console_lock held, and 'console_locked' set) if it
  * is successful, false otherwise.
- *
- * This gets called with the 'logbuf_lock' spinlock held and
- * interrupts disabled. It should return with 'lockbuf_lock'
- * released but interrupts still disabled.
  */
 static int console_trylock_for_printk(unsigned int cpu)
-   __releases(&logbuf_lock)
 {
-   int retval = 0, wake = 0;
-
-   if (console_trylock()) {
-   retval = 1;
-
-   /*
-* If we can't use the console, we need to release
-* the console semaphore by hand to avoid flushing
-* the buffer. We need to hold the console semaphore
-* in order to do this test safely.
-*/
-   if (!can_use_console(cpu)) {
-   console_locked = 0;
-   wake = 1;
-   retval = 0;
-   }
-   }
-   logbuf_cpu = UINT_MAX;
-   raw_spin_unlock(&logbuf_lock);
-   if (wake)
+   if (!console_trylock())
+   return 0;
+   /*
+* If we can't use the console, we need to release the console
+* semaphore by hand to avoid flushing the buffer. We need to hold the
+* console semaphore in order to do this test safely.
+*/
+   if (!can_use_console(cpu)) {
+   console_locked = 0;
up(&console_sem);
-   return retval;
+   return 0;
+   }
+   return 1;
 }
 
 int printk_delay_msec __read_mostly;
@@ -1494,6 +1477,9 @@ asmlinkage int vprintk_emit(int facility, int level,
unsigned long flags;
int this_cpu;
int printed_len = 0;
+   /* cpu currently holding logbuf_lock in this function */
+   static volatile unsigned int logbuf_cpu = UINT_MAX;
+
 
boot_delay_msec(level);
printk_delay();
@@ -1609,13 +1595,12 @@ asmlinkage int vprintk_emit(int facility, int level,
}
printed_len += text_len;
 
+   logbuf_cpu = UINT_MAX;
+   raw_spin_unlock(&logbuf_lock);
/*
 * Try to acquire and then immediately release the console semaphore.
 * The release will print out buffers and wake up /dev/kmsg and syslog()
 * users.
-*
-* The console_trylock_for_printk() function will release 'logbuf_lock'
-* regardless of whether it actually gets the console semaphore or not.
 */
if (console_trylock_for_printk(this_cpu))
console_unlock();
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/8] printk: Enable interrupts before calling console_trylock_for_printk()

2014-03-16 Thread Jan Kara

We need interrupts disabled when calling console_trylock_for_printk()
only so that cpu id we pass to can_use_console() remains valid (for
other things console_sem provides all the exclusion we need and
deadlocks on console_sem due to interrupts are impossible because we use
down_trylock()).  However if we are rescheduled, we are guaranteed to
run on an online cpu so we can easily just get the cpu id in
can_use_console().

We can loose a bit of performance when we enable interrupts in
vprintk_emit() and then disable them again in console_unlock() but OTOH
it can somewhat reduce interrupt latency caused by console_unlock()
especially since later in the patch series we will want to spin on
console_sem in console_trylock_for_printk().

Signed-off-by: Jan Kara 
---
 kernel/printk/printk.c | 29 ++---
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7a8ffd89875c..56649edfae9c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1314,10 +1314,9 @@ static int have_callable_console(void)
 /*
  * Can we actually use the console at this time on this cpu?
  *
- * Console drivers may assume that per-cpu resources have
- * been allocated. So unless they're explicitly marked as
- * being able to cope (CON_ANYTIME) don't call them until
- * this CPU is officially up.
+ * Console drivers may assume that per-cpu resources have been allocated. So
+ * unless they're explicitly marked as being able to cope (CON_ANYTIME) don't
+ * call them until this CPU is officially up.
  */
 static inline int can_use_console(unsigned int cpu)
 {
@@ -1330,8 +1329,10 @@ static inline int can_use_console(unsigned int cpu)
  * console_lock held, and 'console_locked' set) if it
  * is successful, false otherwise.
  */
-static int console_trylock_for_printk(unsigned int cpu)
+static int console_trylock_for_printk(void)
 {
+   unsigned int cpu = smp_processor_id();
+
if (!console_trylock())
return 0;
/*
@@ -1501,7 +1502,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 */
if (!oops_in_progress && !lockdep_recursing(current)) {
recursion_bug = 1;
-   goto out_restore_irqs;
+   local_irq_restore(flags);
+   return 0;
}
zap_locks();
}
@@ -1597,17 +1599,22 @@ asmlinkage int vprintk_emit(int facility, int level,
 
logbuf_cpu = UINT_MAX;
raw_spin_unlock(&logbuf_lock);
+   lockdep_on();
+   local_irq_restore(flags);
+
+   /*
+* Disable preemption to avoid being preempted while holding
+* console_sem which would prevent anyone from printing to console
+*/
+   preempt_disable();
/*
 * Try to acquire and then immediately release the console semaphore.
 * The release will print out buffers and wake up /dev/kmsg and syslog()
 * users.
 */
-   if (console_trylock_for_printk(this_cpu))
+   if (console_trylock_for_printk())
console_unlock();
-
-   lockdep_on();
-out_restore_irqs:
-   local_irq_restore(flags);
+   preempt_enable();
 
return printed_len;
 }
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 6/8] printk: Start printing handover kthreads on demand

2014-03-16 Thread Jan Kara

Start kthreads for handing over printing only when printk.offload_chars
is set to value > 0 (i.e., when print offloading gets enabled).

Signed-off-by: Jan Kara 
---
 kernel/printk/printk.c | 64 +++---
 1 file changed, 50 insertions(+), 14 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 0910e1894224..d75e7e8c915a 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -109,6 +109,10 @@ static long printk_handover_state;
  * CPUs.
  */
 #define PRINTING_TASKS 2
+/* Pointers to printing kthreads */
+static struct task_struct *printing_kthread[PRINTING_TASKS];
+/* Serialization of changes to printk_offload_chars and kthread creation */
+static DEFINE_MUTEX(printk_kthread_mutex);
 
 /* Wait queue printing kthreads sleep on when idle */
 static DECLARE_WAIT_QUEUE_HEAD(print_queue); 
@@ -280,6 +284,12 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+static int offload_chars_set(const char *val, const struct kernel_param *kp);
+static struct kernel_param_ops offload_chars_ops = {
+   .set = offload_chars_set,
+   .get = param_get_uint,
+};
+
 /*
  * How many characters can we print in one call of printk before asking
  * other cpus to continue printing. 0 means infinity. Tunable via
@@ -288,7 +298,7 @@ static u32 log_buf_len = __LOG_BUF_LEN;
  */
 static unsigned int __read_mostly printk_offload_chars = 1000;
 
-module_param_named(offload_chars, printk_offload_chars, uint,
+module_param_cb(offload_chars, &offload_chars_ops, &printk_offload_chars,
   S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(offload_chars, "offload printing to console to a different"
" cpu after this number of characters");
@@ -2571,30 +2581,56 @@ static int printing_task(void *arg)
return 0;
 }
 
-static int __init printk_late_init(void)
+static void printk_start_offload_kthreads(void)
 {
-   struct console *con;
int i;
struct task_struct *task;
 
-   for_each_console(con) {
-   if (!keep_bootcon && con->flags & CON_BOOT) {
-   unregister_console(con);
-   }
-   }
-   hotcpu_notifier(console_cpu_notify, 0);
-
-   /* Does any handover of printing have any sence? */
-   if (num_possible_cpus() <= 1)
-   return 0;
-
+   /* Does handover of printing make any sense? */
+   if (printk_offload_chars == 0 || num_possible_cpus() <= 1)
+   return;
for (i = 0; i < PRINTING_TASKS; i++) {
+   if (printing_kthread[i])
+   continue;
task = kthread_run(printing_task, NULL, "print/%d", i);
if (IS_ERR(task)) {
pr_err("printk: Cannot create printing thread: %ld\n",
   PTR_ERR(task));
}
+   printing_kthread[i] = task;
}
+}
+
+static int offload_chars_set(const char *val, const struct kernel_param *kp)
+{
+   int ret;
+
+   /* Protect against parallel change of printk_offload_chars */
+   mutex_lock(&printk_kthread_mutex);
+   ret = param_set_uint(val, kp);
+   if (ret) {
+   mutex_unlock(&printk_kthread_mutex);
+   return ret;
+   }
+   printk_start_offload_kthreads();
+   mutex_unlock(&printk_kthread_mutex);
+   return 0;
+}
+
+static int __init printk_late_init(void)
+{
+   struct console *con;
+
+   for_each_console(con) {
+   if (!keep_bootcon && con->flags & CON_BOOT) {
+   unregister_console(con);
+   }
+   }
+   hotcpu_notifier(console_cpu_notify, 0);
+
+   mutex_lock(&printk_kthread_mutex);
+   printk_start_offload_kthreads();
+   mutex_unlock(&printk_kthread_mutex);
 
return 0;
 }
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/8] printk: Hand over printing to console if printing too long

2014-03-16 Thread Jan Kara

Currently, console_unlock() prints messages from kernel printk buffer to
console while the buffer is non-empty. When serial console is attached,
printing is slow and thus other CPUs in the system have plenty of time
to append new messages to the buffer while one CPU is printing. Thus the
CPU can spend unbounded amount of time doing printing in console_unlock().
This is especially serious problem if the printk() calling
console_unlock() was called with interrupts disabled.

In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Also during artificial stress testing SATA disk
disappears from the system because its interrupts aren't served for too
long.

This patch implements a mechanism where after printing specified number
of characters (tunable as a kernel parameter printk.offload_chars), CPU
doing printing asks for help by setting PRINTK_HANDOVER_B bit in
printk_handover_state variable and wakes up one of dedicated kthreads.
As soon as the printing CPU notices kthread got scheduled and is
spinning on console_sem, it drops console_sem and exits
console_unlock(). kthread then takes over printing instead.  This way no
CPU should spend printing too long even if there is heavy printk
traffic.

Signed-off-by: Jan Kara 
---
 Documentation/kernel-parameters.txt |  15 +++
 kernel/printk/printk.c  | 196 
 2 files changed, 194 insertions(+), 17 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 7116fda7077f..74826b1e2529 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2621,6 +2621,21 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
Format:   (1/Y/y=enable, 0/N/n=disable)
default: disabled
 
+   printk.offload_chars=
+   Printing to console can be relatively slow especially
+   in case of serial console. When there is intensive
+   printing happening from several cpus (as is the case
+   during boot), a cpu can be spending significant time
+   (seconds or more) doing printing. To avoid softlockups,
+   lost interrupts, and similar problems other cpus
+   will take over printing after the currently printing
+   cpu has printed 'printk.offload_chars' characters.
+   Higher value means possibly longer interrupt and other
+   latencies but lower overhead of printing due to handing
+   over of printing.
+   Format:  (0 = disabled)
+   default: 1000
+
printk.time=Show timing data prefixed to each printk message line
Format:   (1/Y/y=enable, 0/N/n=disable)
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 91c554e027c5..0910e1894224 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -87,6 +88,31 @@ static DEFINE_SEMAPHORE(console_sem);
 struct console *console_drivers;
 EXPORT_SYMBOL_GPL(console_drivers);
 
+/* State machine for handing over printing */
+enum {
+   /*
+* Set by the holder of console_sem if currently printing task wants to
+* hand over printing. Cleared before console_sem is released.
+*/
+   PRINTK_HANDOVER_B,
+   /*
+* Set if there's someone spinning on console_sem to take over printing.
+* Cleared after acquiring console_sem.
+*/
+   PRINTK_CONSOLE_SPIN_B,
+};
+static long printk_handover_state;
+
+/*
+ * Number of kernel threads for offloading printing. We need at least two so
+ * that they can hand over printing from one to another one and thus switch
+ * CPUs.
+ */
+#define PRINTING_TASKS 2
+
+/* Wait queue printing kthreads sleep on when idle */
+static DECLARE_WAIT_QUEUE_HEAD(print_queue); 
+
 #ifdef CONFIG_LOCKDEP
 static struct lockdep_map console_lock_dep_map = {
.name = "console_lock"
@@ -254,6 +280,19 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/*
+ * How many characters can we print in one call of printk before asking
+ * other cpus to continue printing. 0 means infinity. Tunable

[PATCH 8/8] printk: Add config option for disabling printk offloading

2014-03-16 Thread Jan Kara

Necessity for offloading of printing was observed only for large
systems. So add a config option (disabled by default) which removes most
of the overhead added by this functionality.

Signed-off-by: Jan Kara 
---
 Documentation/kernel-parameters.txt | 13 +++--
 init/Kconfig| 14 ++
 kernel/printk/printk.c  | 14 ++
 3 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 74826b1e2529..3c6d5aec501a 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2621,18 +2621,19 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
Format:   (1/Y/y=enable, 0/N/n=disable)
default: disabled
 
-   printk.offload_chars=
+   printk.offload_chars=   [KNL]
Printing to console can be relatively slow especially
in case of serial console. When there is intensive
printing happening from several cpus (as is the case
during boot), a cpu can be spending significant time
(seconds or more) doing printing. To avoid softlockups,
lost interrupts, and similar problems other cpus
-   will take over printing after the currently printing
-   cpu has printed 'printk.offload_chars' characters.
-   Higher value means possibly longer interrupt and other
-   latencies but lower overhead of printing due to handing
-   over of printing.
+   will take over printing (if CONFIG_PRINTK_OFFLOAD=y)
+   after the currently printing cpu has printed
+   'printk.offload_chars' characters. Higher value means
+   possibly longer interrupt and other latencies but
+   lower overhead of printing due to handing over of
+   printing.
Format:  (0 = disabled)
default: 1000
 
diff --git a/init/Kconfig b/init/Kconfig
index 009a797dd242..45aa7368d92f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1343,6 +1343,20 @@ config PRINTK
  very difficult to diagnose system problems, saying N here is
  strongly discouraged.
 
+config PRINTK_OFFLOAD
+   default y
+   bool "Enable support for offloading printing to different CPU"
+   depends on PRINTK
+   help
+ Printing to console can be relatively slow especially in case of
+ serial console. On large machines when there is intensive printing
+ happening from several cpus (as is the case during boot), a cpu can
+ be spending significant time (seconds or more) doing printing. To
+ avoid softlockups, lost interrupts, and similar problems other cpus
+ will take over printing after the currently printing cpu has printed
+ certain number of characters (tunable via 'printk.offload_chars'
+ kernel parameter).
+
 config BUG
bool "BUG() support" if EXPERT
default y
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 06e39c661a00..679805e98843 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -103,6 +103,7 @@ enum {
 };
 static long printk_handover_state;
 
+#ifdef CONFIG_PRINTK_OFFLOAD
 /*
  * Number of kernel threads for offloading printing. We need at least two so
  * that they can hand over printing from one to another one and thus switch
@@ -116,6 +117,7 @@ static DEFINE_MUTEX(printk_kthread_mutex);
 
 /* Wait queue printing kthreads sleep on when idle */
 static DECLARE_WAIT_QUEUE_HEAD(print_queue); 
+#endif /* CONFIG_PRINTK_OFFLOAD */
 
 #ifdef CONFIG_LOCKDEP
 static struct lockdep_map console_lock_dep_map = {
@@ -284,6 +286,7 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+#ifdef CONFIG_PRINTK_OFFLOAD
 static int offload_chars_set(const char *val, const struct kernel_param *kp);
 static struct kernel_param_ops offload_chars_ops = {
.set = offload_chars_set,
@@ -302,6 +305,7 @@ module_param_cb(offload_chars, &offload_chars_ops, 
&printk_offload_chars,
   S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(offload_chars, "offload printing to console to a different"
" cpu after this number of characters");
+#endif
 
 /* human readable text of the record */
 static char *log_text(const struct printk_log *msg)
@@ -2021,6 +2025,7 @@ int console_trylock(void)
 }
 EXPORT_SYMBOL(console_trylock);
 
+#ifdef CONFIG_PRINTK_OFFLOAD
 /*
  * This is a version of console_lock() which spins to acquire console_sem.
  * It is only for use by threads that take care of flushing printk buffer so
@@ -2052,6 +2057,7 @@ static int

[PATCH 0/8 v3] printk: Cleanups and softlockup avoidance

2014-03-16 Thread Jan Kara

  Hello,

  this is another revision of the printk softlockup series. Since previous
version I have fixed up some small problems pointed out by Andrew, added
possibility to configure out the printk offloading logic (for small systems),
and offload kthreads are now started only once printk.offload_chars is set
to value > 0.

Intro for the newcomers to the series below.

---

Currently, console_unlock() prints messages from kernel printk buffer to
console while the buffer is non-empty. When serial console is attached,
printing is slow and thus other CPUs in the system have plenty of time
to append new messages to the buffer while one CPU is printing. Thus the
CPU can spend unbounded amount of time doing printing in console_unlock().
This is especially serious since vprintk_emit() calls console_unlock()
with interrupts disabled.

In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Also during artificial stress testing SATA disk
disappears from the system because its interrupts aren't served for too
long.

This is a revised series using my new approach to the problem which doesn't
let CPU out of console_unlock() until there's someone else to take over the
printing. The main difference since the last version is that instead of
passing printing duty to different CPUs via IPIs we use dedicated kthreads.
This method is somewhat less reliable (in a sense that there are more
situations in which handover needn't work at all - e.g. when the currently
printing CPU holds a spinlock and the CPU where kthread is scheduled to run is
spinning on this spinlock) but the code is much simpler and in my practical
testing kthread approach was good enough to avoid any problems (with one
exception - see patch 8/8).

Honza
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/8] printk: Remove outdated comment

2014-03-16 Thread Jan Kara

Comment about interesting interlocking between lockbuf_lock and
console_sem is outdated. It was added in 2002 by commit
a880f45a48be2956d2c78a839c472287d54435c1 during conversion of
console_lock to console_sem + lockbuf_lock. At that time
release_console_sem() (today's equivalent is console_unlock()) was
indeed using lockbuf_lock to avoid races between trylock on console_sem
in printk() and unlock of console_sem. However these days the
interlocking is gone and the races are avoided by rechecking logbuf
state after releasing console_sem.

Signed-off-by: Jan Kara 
---
 kernel/printk/printk.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 4dae9cbe9259..bd7ee2a9f960 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -206,8 +206,7 @@ struct printk_log {
 };
 
 /*
- * The logbuf_lock protects kmsg buffer, indices, counters. It is also
- * used in interesting ways to provide interlocking in console_unlock();
+ * The logbuf_lock protects kmsg buffer, indices, counters.
  */
 static DEFINE_RAW_SPINLOCK(logbuf_lock);
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 7/8] kernel: Avoid softlockups in stop_machine() during heavy printing

2014-03-16 Thread Jan Kara

When there are lots of messages accumulated in printk buffer, printing
them (especially over serial console) can take a long time (tens of
seconds). stop_machine() will effectively make all cpus spin in
multi_cpu_stop() waiting for the CPU doing printing to print all the
messages which triggers NMI softlockup watchdog and RCU stall detector
which add even more to the messages to print. Since machine doesn't do
anything (except serving interrupts) during this time, also network
connections are dropped and other disturbances may happen.

Paper over the problem by waiting for printk buffer to be empty before
starting to stop CPUs. In theory a burst of new messages can be appended
to the printk buffer before CPUs enter multi_cpu_stop() so this isn't a 100%
solution but it works OK in practice and I'm not aware of a reasonably
simple better solution.

Signed-off-by: Jan Kara 
---
 include/linux/console.h |  1 +
 kernel/printk/printk.c  | 22 ++
 kernel/stop_machine.c   |  9 +
 3 files changed, 32 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index 7571a16bd653..c61c169f85b3 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -150,6 +150,7 @@ extern int console_trylock(void);
 extern void console_unlock(void);
 extern void console_conditional_schedule(void);
 extern void console_unblank(void);
+extern void console_flush(void);
 extern struct tty_driver *console_device(int *);
 extern void console_stop(struct console *);
 extern void console_start(struct console *);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d75e7e8c915a..06e39c661a00 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2306,6 +2306,28 @@ struct tty_driver *console_device(int *index)
 }
 
 /*
+ * Wait until all messages accumulated in the printk buffer are printed to
+ * console. Note that as soon as this function returns, new messages may be
+ * added to the printk buffer by other CPUs.
+ */
+void console_flush(void)
+{
+   bool retry;
+   unsigned long flags;
+
+   while (1) {
+   raw_spin_lock_irqsave(&logbuf_lock, flags);
+   retry = console_seq != log_next_seq;
+   raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+   if (!retry)
+   break;
+   /* Cycle console_sem to wait for outstanding printing */
+   console_lock();
+   console_unlock();
+   }
+}
+
+/*
  * Prevent further output on the passed console device so that (for example)
  * serial drivers can disable console output before suspending a port, and can
  * re-enable output afterwards.
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 84571e09c907..14ac740e0c7f 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Structure to determine completion condition and record errors.  May
@@ -574,6 +575,14 @@ int __stop_machine(int (*fn)(void *), void *data, const 
struct cpumask *cpus)
return ret;
}
 
+   /*
+* If there are lots of outstanding messages, printing them can take a
+* long time and all cpus would be spinning waiting for the printing to
+* finish thus triggering NMI watchdog, RCU lockups etc. Wait for the
+* printing here to avoid these.
+*/
+   console_flush();
+
/* Set the initial state and stop all online cpus. */
set_state(&msdata, MULTI_STOP_PREPARE);
return stop_cpus(cpu_online_mask, multi_cpu_stop, &msdata);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf sched latency: prettify printed table

2014-03-16 Thread Jiri Olsa

On Sat, Mar 15, 2014 at 12:17:38PM -0400, Ramkumar Ramachandra wrote:
> Cc: Frederic Weisbecker 
> Cc: David Ahern 
> Cc: Jiri Olsa 
> Cc: Arnaldo Carvalho de Melo 
> Signed-off-by: Ramkumar Ramachandra 
> ---
>  tools/perf/builtin-sched.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)

ENOCHANGELOG ;-)

please provide changelog details how's the output prettified
(before/after example)

thanks,
jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RESEND v2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and accessing ring pages.

2014-03-16 Thread Tang Chen


On 03/14/2014 11:14 PM, Benjamin LaHaise wrote:
..

What about the following patch? It adds additional reference to protect the page
avoid being freed when we reading it.
ps.It is applied on linux-next(3-13).


I think that's even worse than the spinlock approach since we'll end up
bouncing around the struct page's cacheline in addition to spinlock we're
going to end up taking anyways.



Hi Benjamin,

I'm sorry, I don't quite understand the cacheline problem you mentioned 
above.

Would you please explain more ?

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net V2] vhost: net: switch to use data copy if pending DMAs exceed the limit

2014-03-16 Thread Ronen Hod


On 03/13/2014 09:28 AM, Jason Wang wrote:

On 03/10/2014 04:03 PM, Michael S. Tsirkin wrote:

On Fri, Mar 07, 2014 at 01:28:27PM +0800, Jason Wang wrote:

We used to stop the handling of tx when the number of pending DMAs
exceeds VHOST_MAX_PEND. This is used to reduce the memory occupation
of both host and guest. But it was too aggressive in some cases, since
any delay or blocking of a single packet may delay or block the guest
transmission. Consider the following setup:

 +-++-+
 | VM1 || VM2 |
 +--+--++--+--+
|  |
 +--+--++--+--+
 | tap0|| tap1|
 +--+--++--+--+
|  |
 pfifo_fast   htb(10Mbit/s)
|  |
 +--+--+---+
 | bridge  |
 +--+--+
|
 pfifo_fast
|
 +-+
 | eth0|(100Mbit/s)
 +-+

- start two VMs and connect them to a bridge
- add an physical card (100Mbit/s) to that bridge
- setup htb on tap1 and limit its throughput to 10Mbit/s
- run two netperfs in the same time, one is from VM1 to VM2. Another is
   from VM1 to an external host through eth0.
- result shows that not only the VM1 to VM2 traffic were throttled but
   also the VM1 to external host through eth0 is also throttled somehow.

This is because the delay added by htb may lead the delay the finish
of DMAs and cause the pending DMAs for tap0 exceeds the limit
(VHOST_MAX_PEND). In this case vhost stop handling tx request until
htb send some packets. The problem here is all of the packets
transmission were blocked even if it does not go to VM2.

We can solve this issue by relaxing it a little bit: switching to use
data copy instead of stopping tx when the number of pending DMAs
exceed half of the vq size. This is safe because:

- The number of pending DMAs were still limited (half of the vq size)
- The out of order completion during mode switch can make sure that
   most of the tx buffers were freed in time in guest.

So even if about 50% packets were delayed in zero-copy case, vhost
could continue to do the transmission through data copy in this case.

Test result:

Before this patch:
VM1 to VM2 throughput is 9.3Mbit/s
VM1 to External throughput is 40Mbit/s
CPU utilization is 7%

After this patch:
VM1 to VM2 throughput is 9.3Mbit/s
Vm1 to External throughput is 93Mbit/s
CPU utilization is 16%

Completed performance test on 40gbe shows no obvious changes in both
throughput and cpu utilization with this patch.

The patch only solve this issue when unlimited sndbuf. We still need a
solution for limited sndbuf.

Cc: Michael S. Tsirkin 
Cc: Qin Chuanyu 
Signed-off-by: Jason Wang 

I thought hard about this.
Here's what worries me: if there are still head of line
blocking issues lurking in the stack, they will still
hurt guests such as windows which rely on timely
completion of buffers, but it makes it
that much harder to reproduce the problems with
linux guests which don't.
And this will make even it harder to figure out
whether zero copy is actually active to diagnose
high cpu utilization cases.

Yes.


So I think this is a good trick, but let's make
this path conditional on a new debugging module parameter:
how about head_of_line_blocking with default off?

Sure. But the head of line blocking was only partially solved in the
patch since we only support in-order completion of zerocopy packets.
Maybe we need consider switching to out of order completion even for
zerocopy skbs?


Yan, Dima,

I remember that there is an issue with out-of-order packets and WHQL.

Ronen.


This way if we suspect packets are delayed forever
somewhere, we can enable that and see guest networking block.

Additionally, I think we should add a way to count zero copy
and non zero copy packets.
I see two ways to implement this: add tracepoints in vhost-net
or add counters in tun accessible with ethtool.
This can be a patch on top and does not have to block
this one though.


Yes, I post a RFC about 2 years ago, see
https://lkml.org/lkml/2012/4/9/478 which only traces generic vhost
behaviours. I can refresh this and add some -net specific tracepoints.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFA][PATCH 3/4] tracing/module: Remove include of tracepoint.h from module.h

2014-03-16 Thread Steven Rostedt

On Mon, 2014-03-17 at 13:04 +1030, Rusty Russell wrote:
> Steven Rostedt  writes:
> > On Wed, 26 Feb 2014 14:01:43 -0500
> > Hi Rusty,
> >
> > This patch doesn't need to be stable, and can wait till v3.15. But I
> > have other patches that will break with this patch (headers that needed
> > to include tracepoint.h and not depend on a header chain to include it).
> >
> > Can you give me your Acked-by for this, and I'll just add it to my 3.15
> > queue?
> 
> Cleaning up old mail, in case I didn't ack this:
> 
> Acked-by: Rusty Russell 

Yep, you already gave me your ack. But I appreciate the double ack :-)

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V3] NTFS: Logging clean-up

2014-03-16 Thread Fabian Frederick

-Convert spinlock/static array to va_format (inspired by Joe Perches help
on previous logging patches).
-Convert printk(KERN_ERR to pr_warn in __ntfs_warning.
-Convert printk(KERN_ERR to pr_err in __ntfs_error.
-Convert printk(KERN_DEBUG to pr_debug in __ntfs_debug.
(Note that __ntfs_debug is still guarded by #if DEBUG)
-Improve !DEBUG to parse all arguments (Joe Perches).
-Sparse pr_foo() conversions in super.c

NTFS, NTFS-fs prefixes as well as 'warning' and 'error'
were removed : pr_foo() automatically adds module name
and error level is already specified.

Signed-off-by: Fabian Frederick 
---
 fs/ntfs/debug.c | 58 -
 fs/ntfs/debug.h |  7 ++-
 fs/ntfs/super.c | 28 
 3 files changed, 42 insertions(+), 51 deletions(-)

diff --git a/fs/ntfs/debug.c b/fs/ntfs/debug.c
index 807150e..dd6103c 100644
--- a/fs/ntfs/debug.c
+++ b/fs/ntfs/debug.c
@@ -18,16 +18,9 @@
  * distribution in the file COPYING); if not, write to the Free Software
  * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  */
-
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 #include "debug.h"
 
-/*
- * A static buffer to hold the error string being displayed and a spinlock
- * to protect concurrent accesses to it.
- */
-static char err_buf[1024];
-static DEFINE_SPINLOCK(err_buf_lock);
-
 /**
  * __ntfs_warning - output a warning to the syslog
  * @function:  name of function outputting the warning
@@ -50,6 +43,7 @@ static DEFINE_SPINLOCK(err_buf_lock);
 void __ntfs_warning(const char *function, const struct super_block *sb,
const char *fmt, ...)
 {
+   struct va_format vaf;
va_list args;
int flen = 0;
 
@@ -59,17 +53,15 @@ void __ntfs_warning(const char *function, const struct 
super_block *sb,
 #endif
if (function)
flen = strlen(function);
-   spin_lock(&err_buf_lock);
va_start(args, fmt);
-   vsnprintf(err_buf, sizeof(err_buf), fmt, args);
-   va_end(args);
+   vaf.fmt = fmt;
+   vaf.va = &args;
if (sb)
-   printk(KERN_ERR "NTFS-fs warning (device %s): %s(): %s\n",
-   sb->s_id, flen ? function : "", err_buf);
+   pr_warn("(device %s): %s(): %pV\n",
+   sb->s_id, flen ? function : "", &vaf);
else
-   printk(KERN_ERR "NTFS-fs warning: %s(): %s\n",
-   flen ? function : "", err_buf);
-   spin_unlock(&err_buf_lock);
+   pr_warn("%s(): %pV\n", flen ? function : "", &vaf);
+   va_end(args);
 }
 
 /**
@@ -94,6 +86,7 @@ void __ntfs_warning(const char *function, const struct 
super_block *sb,
 void __ntfs_error(const char *function, const struct super_block *sb,
const char *fmt, ...)
 {
+   struct va_format vaf;
va_list args;
int flen = 0;
 
@@ -103,17 +96,15 @@ void __ntfs_error(const char *function, const struct 
super_block *sb,
 #endif
if (function)
flen = strlen(function);
-   spin_lock(&err_buf_lock);
va_start(args, fmt);
-   vsnprintf(err_buf, sizeof(err_buf), fmt, args);
-   va_end(args);
+   vaf.fmt = fmt;
+   vaf.va = &args;
if (sb)
-   printk(KERN_ERR "NTFS-fs error (device %s): %s(): %s\n",
-   sb->s_id, flen ? function : "", err_buf);
+   pr_err("(device %s): %s(): %pV\n",
+  sb->s_id, flen ? function : "", &vaf);
else
-   printk(KERN_ERR "NTFS-fs error: %s(): %s\n",
-   flen ? function : "", err_buf);
-   spin_unlock(&err_buf_lock);
+   pr_err("%s(): %pV\n", flen ? function : "", &vaf);
+   va_end(args);
 }
 
 #ifdef DEBUG
@@ -124,6 +115,7 @@ int debug_msgs = 0;
 void __ntfs_debug (const char *file, int line, const char *function,
const char *fmt, ...)
 {
+   struct va_format vaf;
va_list args;
int flen = 0;
 
@@ -131,13 +123,11 @@ void __ntfs_debug (const char *file, int line, const char 
*function,
return;
if (function)
flen = strlen(function);
-   spin_lock(&err_buf_lock);
va_start(args, fmt);
-   vsnprintf(err_buf, sizeof(err_buf), fmt, args);
+   vaf.fmt = fmt;
+   vaf.va = &args;
+   pr_debug("(%s, %d): %s(): %pV", file, line, flen ? function : "", &vaf);
va_end(args);
-   printk(KERN_DEBUG "NTFS-fs DEBUG (%s, %d): %s(): %s\n", file, line,
-   flen ? function : "", err_buf);
-   spin_unlock(&err_buf_lock);
 }
 
 /* Dump a runlist. Caller has to provide synchronisation for @rl. */
@@ -149,12 +139,12 @@ void ntfs_debug_dump_runlist(const runlist_element *rl)
 
if (!debug_msgs)
return;
-   printk(KERN_DEBUG "NTFS-fs DEBUG: Dumping runlist (values in hex):\n");
+   pr_debug("Dumping runlist (val

[PATCH v4 02/16] perf, core: introduce pmu context switch callback

2014-03-16 Thread Yan, Zheng

The callback is invoked when process is scheduled in or out.
It provides mechanism for later patches to save/store the LBR
stack. For the schedule in case, the callback is invoked at
the same place that flush branch stack callback is invoked.
So it also can replace the flush branch stack callback. To
avoid unnecessary overhead, the callback is enabled only when
there are events use the LBR stack.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.c |  7 +
 arch/x86/kernel/cpu/perf_event.h |  2 ++
 include/linux/perf_event.h   |  8 ++
 kernel/events/core.c | 60 +++-
 4 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index ae407f7..03e977f 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1872,6 +1872,12 @@ static const struct attribute_group 
*x86_pmu_attr_groups[] = {
NULL,
 };
 
+static void x86_pmu_sched_task(struct perf_event_context *ctx, bool sched_in)
+{
+   if (x86_pmu.sched_task)
+   x86_pmu.sched_task(ctx, sched_in);
+}
+
 static void x86_pmu_flush_branch_stack(void)
 {
if (x86_pmu.flush_branch_stack)
@@ -1905,6 +1911,7 @@ static struct pmu pmu = {
 
.event_idx  = x86_pmu_event_idx,
.flush_branch_stack = x86_pmu_flush_branch_stack,
+   .sched_task = x86_pmu_sched_task,
 };
 
 void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index b58c0ba..1e2118a 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -429,6 +429,8 @@ struct x86_pmu {
 
void(*check_microcode)(void);
void(*flush_branch_stack)(void);
+   void(*sched_task)(struct perf_event_context *ctx,
+ bool sched_in);
 
/*
 * Intel Arch Perfmon v2+
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e56b07f..adc20f2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -251,6 +251,12 @@ struct pmu {
 * flush branch stack on context-switches (needed in cpu-wide mode)
 */
void (*flush_branch_stack)  (void);
+
+   /*
+* PMU callback for context-switches. optional
+*/
+   void (*sched_task)  (struct perf_event_context *ctx,
+bool sched_in);
 };
 
 /**
@@ -544,6 +550,8 @@ extern void perf_event_delayed_put(struct task_struct 
*task);
 extern void perf_event_print_debug(void);
 extern void perf_pmu_disable(struct pmu *pmu);
 extern void perf_pmu_enable(struct pmu *pmu);
+extern void perf_sched_cb_disable(struct pmu *pmu);
+extern void perf_sched_cb_enable(struct pmu *pmu);
 extern int perf_event_task_disable(void);
 extern int perf_event_task_enable(void);
 extern int perf_event_refresh(struct perf_event *event, int refresh);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 661951a..5a24193 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -142,6 +142,7 @@ enum event_type_t {
 struct static_key_deferred perf_sched_events __read_mostly;
 static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
 static DEFINE_PER_CPU(atomic_t, perf_branch_stack_events);
+static DEFINE_PER_CPU(int, perf_sched_cb_usages);
 
 static atomic_t nr_mmap_events __read_mostly;
 static atomic_t nr_comm_events __read_mostly;
@@ -151,6 +152,7 @@ static atomic_t nr_freq_events __read_mostly;
 static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
 static struct srcu_struct pmus_srcu;
+static struct idr pmu_idr;
 
 /*
  * perf event paranoia level:
@@ -2358,6 +2360,57 @@ unlock:
}
 }
 
+void perf_sched_cb_disable(struct pmu *pmu)
+{
+   __get_cpu_var(perf_sched_cb_usages)--;
+}
+
+void perf_sched_cb_enable(struct pmu *pmu)
+{
+   __get_cpu_var(perf_sched_cb_usages)++;
+}
+
+/*
+ * This function provides the context switch callback to the lower code
+ * layer. It is invoked ONLY when the context switch callback is enabled.
+ */
+static void perf_pmu_sched_task(struct task_struct *prev,
+   struct task_struct *next,
+   bool sched_in)
+{
+   struct perf_cpu_context *cpuctx;
+   struct pmu *pmu;
+   unsigned long flags;
+
+   if (prev == next)
+   return;
+
+   local_irq_save(flags);
+
+   rcu_read_lock();
+
+   pmu = idr_find(&pmu_idr, PERF_TYPE_RAW);
+
+   if (pmu && pmu->sched_task) {
+   cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+   pmu = cpuctx->ctx.pmu;
+
+   perf_ctx_lock(cpuctx, cpuctx->task_ctx);
+
+   perf_pmu_disable(pmu);
+
+   pmu->sched_task(cpuctx->task_ctx, sched_in);
+
+   perf_pmu_enable(pmu);
+
+

[PATCH v4 00/16] perf, x86: Haswell LBR call stack support

2014-03-16 Thread Yan, Zheng

For many profiling tasks we need the callgraph. For example we often
need to see the caller of a lock or the caller of a memcpy or other
library function to actually tune the program. Frame pointer unwinding
is efficient and works well. But frame pointers are off by default on
64bit code (and on modern 32bit gccs), so there are many binaries around
that do not use frame pointers. Profiling unchanged production code is
very useful in practice. On some CPUs frame pointer also has a high
cost. Dwarf2 unwinding also does not always work and is extremely slow
(upto 20% overhead).

Haswell has a new feature that utilizes the existing Last Branch Record
facility to record call chains. When the feature is enabled, function
call will be collected as normal, but as return instructions are
executed the last captured branch record is popped from the on-chip LBR
registers. The LBR call stack facility provides an alternative to get
callgraph. It has some limitations too, but should work in most cases
and is significantly faster than dwarf. Frame pointer unwinding is still
the best default, but LBR call stack is a good alternative when nothing
else works.

When profiling bc(1) on Fedora 19:
 echo 'scale=2000; 4*a(1)' > cmd; perf record -g bc -l < cmd

If this feature is enabled, perf report output looks like:
50.36%   bc  bc [.] bc_divide
 |
 --- bc_divide
 execute
 run_code
 yyparse
 main
 __libc_start_main
 _start

33.66%   bc  bc [.] _one_mult
 |
 --- _one_mult
 bc_divide
 execute
 run_code
 yyparse
 main
 __libc_start_main
 _start

 7.62%   bc  bc [.] _bc_do_add
 |
 --- _bc_do_add
|
|--99.89%-- 0x2000186a8
 --0.11%-- [...]

 6.83%   bc  bc [.] _bc_do_sub
 |
 --- _bc_do_sub
|
|--99.94%-- bc_add
|  execute
|  run_code
|  yyparse
|  main
|  __libc_start_main
|  _start
 --0.06%-- [...]

 0.46%   bc  libc-2.17.so   [.] __memset_sse2
 |
 --- __memset_sse2
|
|--54.13%-- bc_new_num
|  |
|  |--51.00%-- bc_divide
|  |  execute
|  |  run_code
|  |  yyparse
|  |  main
|  |  __libc_start_main
|  |  _start
|  |
|  |--30.46%-- _bc_do_sub
|  |  bc_add
|  |  execute
|  |  run_code
|  |  yyparse
|  |  main
|  |  __libc_start_main
|  |  _start
|  |
|   --18.55%-- _bc_do_add
| bc_add
| execute
| run_code
| yyparse
| main
| __libc_start_main
| _start
|
 --45.87%-- bc_divide
   execute
   run_code
   yyparse
   main
   __libc_start_main
   _start

If this feature is disabled, perf report output looks like:
50.49%   bc  bc [.] bc_divide
 |
 --- bc_divide

33.57%   bc  bc [.] _one_mult
 |
 --- _one_mult

 7.61%   bc  bc [.] _bc_do_add
 |
 --- _bc_do_add
 0x2000186a8

 6.88%   bc  bc [.] _bc_do_sub
 |
 --- _bc_do_sub

 0.42%   bc  libc-2.17.so   [.] __memcpy_ssse3_back
 |
 --- __memcpy_ssse3_back

The LBR call stack h

RE: [PATCH net-next v3 1/2] r8152:addRTL8152_EARLY_AGG_TIMEOUT_SUPER

2014-03-16 Thread hayeswang

 From: Francois Romieu [mailto:rom...@fr.zoreil.com] 
> Sent: Saturday, March 15, 2014 7:43 AM
[...]
> > Besides, I don't wish to modify the setting by ethtool when re-loading
> > the driver or rebooting every time.
> 
> Why ?
> 
> The recipe is different but there isn't much setup difference 
> between a module param and an ethtool command that is run through udev.
> The latter is more versatile though.

Thanks for your suggestion. I think I have to understand how to use them first.
 
Best Regards,
Hayes

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [PATCH V3] serial/uart/8250: Add tunable RX interrupt trigger I/F of FIFO buffers

2014-03-16 Thread Yoshihiro YUNOMAE


Hi Alan,

Thank you for your reply.

(2014/03/14 21:04), One Thousand Gnomes wrote:

@@ -2325,10 +2323,19 @@ serial8250_do_set_termios(struct uart_port *port, 
struct ktermios *termios,
if ((baud < 2400 && !up->dma) || fifo_bug) {
fcr &= ~UART_FCR_TRIGGER_MASK;
fcr |= UART_FCR_TRIGGER_1;
+   /* Don't use user setting RX trigger */
+   up->fcr = 0;


This breaks

set fcr via sysfs
set baud rate below 2400
set baud rate higher

If baud < 2400 and the user has set a value then probably we should honour


OK, I'll add !up->fcr in this flow as follows:

/* NOTE: If fifo_bug is not set, a user can set RX trigger. */
if ((baud < 2400 && !up->dma && !up->fcr) || fifo_bug) {
 fcr &= ~UART_TRIGGER_MASK;
 fcr |= UART_FCR_TRIGGER_1;
 up->fcr = 0;
}


it. If fifo_bug is set then we should never honour it (and should perhaps
eventually error it in the sysfs set).


When fifo_bug is set to "1", we need to check not only
whether (up->bugs & UART_BUG_PARITY) but whether parity is enabled.
We can check whether parity is enable only in this function currently,
so I think we need to store fifo_bug's value into up->fifo_bug and
refer it in the sysfs set(do_set_rx_int_trig()) as follows:

@do_set_rx_int_trig()
if (!(up->capabilities & UART_CAP_FIFO) || uport->fifosize <= 1
|| (up->fifo_bug & UART_BUG_PARITY))
return -EINVAL;




+static unsigned char convert_fcr2val(struct uart_8250_port *up,
+unsigned char fcr)
+{
+   unsigned char val = 0, trig_raw = fcr & UART_FCR_TRIGGER_MASK;
+
+   switch (up->port.type) {
+   case PORT_16550A:
+   if (trig_raw == UART_FCR_R_TRIG_00)
+   val = 1;
+   else if (trig_raw == UART_FCR_R_TRIG_01)
+   val = 4;
+   else if (trig_raw == UART_FCR_R_TRIG_10)
+   val = 8;
+   else if (trig_raw == UART_FCR_R_TRIG_11)
+   val = 14;
+   break;


Surely the default case should be returning 1 not 0 ?


In the default case, it returns "0" meaning error because "1" has
other meaning (1 byte RX trigger). But, "0" is not instinctive value for
the error, so it should return -EOPNOTSUPP here.




+static int convert_val2rxtrig(struct uart_8250_port *up, unsigned char val)
+{
+   switch (up->port.type) {
+   case PORT_16550A:
+   if (val == 1)
+   return UART_FCR_R_TRIG_00;
+   else if (val == 4)
+   return UART_FCR_R_TRIG_01;
+   else if (val == 8)
+   return UART_FCR_R_TRIG_10;
+   else if (val == 14)
+   return UART_FCR_R_TRIG_11;


What happens if you specify a meaningless value. Doing exact matching
means that you have to know the hardware exactly. How about

if (val < 4)
return UART_FCR_R_TRIG_00;
else if (val < 8)
return UART_FCR_R_TRIG_01;
else if (val < 14)
return UART_FCR_R_TRIG_10;
else
return UART_FCR_R_TRIG_11;

so you get the nearest lower value that the hardware can provide ?


It is a good idea. I was concerned about the same thing which users
must know the HW exactly.

I'll implement it as you say.


+   break;
+   default:
+   pr_info("Not support RX-trigger setting for this serial %u\n",
+   up->port.type);


That lets users spew into the logs. I think actually you just want

default:
return -EOPNOTSUPP;


OK, I'll use this.

Thank you,

Yoshihiro YUNOMAE

--
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: yoshihiro.yunomae...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 07/16] perf, x86: track number of events that use LBR callstack

2014-03-16 Thread Yan, Zheng

When enabling/disabling an event, check if the event uses the LBR
callstack feature, adjust the LBR callstack usage count accordingly.
Later patch will use the usage count to decide if LBR stack should
be saved/restored.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index 32423ff..34b121b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -198,9 +198,15 @@ void intel_pmu_lbr_sched_task(struct perf_event_context 
*ctx, bool sched_in)
}
 }
 
+static inline bool branch_user_callstack(unsigned br_sel)
+{
+   return (br_sel & X86_BR_USER) && (br_sel & X86_BR_CALL_STACK);
+}
+
 void intel_pmu_lbr_enable(struct perf_event *event)
 {
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+   struct x86_perf_task_context *task_ctx;
 
if (!x86_pmu.lbr_nr)
return;
@@ -214,6 +220,10 @@ void intel_pmu_lbr_enable(struct perf_event *event)
}
cpuc->br_sel = event->hw.branch_reg.reg;
 
+   task_ctx = event->ctx ? event->ctx->task_ctx_data : NULL;
+   if (branch_user_callstack(cpuc->br_sel))
+   task_ctx->lbr_callstack_users++;
+
cpuc->lbr_users++;
if (cpuc->lbr_users == 1)
perf_sched_cb_enable(event->ctx->pmu);
@@ -222,10 +232,15 @@ void intel_pmu_lbr_enable(struct perf_event *event)
 void intel_pmu_lbr_disable(struct perf_event *event)
 {
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+   struct x86_perf_task_context *task_ctx;
 
if (!x86_pmu.lbr_nr)
return;
 
+   task_ctx = event->ctx ? event->ctx->task_ctx_data : NULL;
+   if (branch_user_callstack(cpuc->br_sel))
+   task_ctx->lbr_callstack_users--;
+
cpuc->lbr_users--;
WARN_ON_ONCE(cpuc->lbr_users < 0);
 
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 06/16] perf, core: always switch pmu specific data during context switch

2014-03-16 Thread Yan, Zheng

If two tasks were both forked from the same parent task, Events in
their perf task contexts can be the same. Perf core may leave out
switching the perf event contexts.

Previous patch inroduces pmu specific data. The data is for saving
the LBR stack, it is task specific. So we need to switch the data
even when context switch is optimized out.

Signed-off-by: Yan, Zheng 
---
 kernel/events/core.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f262c57..20da73c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2336,6 +2336,7 @@ static void perf_event_context_sched_out(struct 
task_struct *task, int ctxn,
raw_spin_lock(&ctx->lock);
raw_spin_lock_nested(&next_ctx->lock, SINGLE_DEPTH_NESTING);
if (context_equiv(ctx, next_ctx)) {
+   void *ctx_data;
/*
 * XXX do we need a memory barrier of sorts
 * wrt to rcu_dereference() of perf_event_ctxp
@@ -2344,6 +2345,11 @@ static void perf_event_context_sched_out(struct 
task_struct *task, int ctxn,
next->perf_event_ctxp[ctxn] = ctx;
ctx->task = next;
next_ctx->task = task;
+
+   ctx_data = next_ctx->task_ctx_data;
+   next_ctx->task_ctx_data = ctx->task_ctx_data;
+   ctx->task_ctx_data = ctx_data;
+
do_switch = 0;
 
perf_event_sync_stat(ctx, next_ctx);
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 09/16] perf, x86: Save/resotre LBR stack during context switch

2014-03-16 Thread Yan, Zheng

When the LBR call stack is enabled, it is necessary to save/restore
the LBR stack on context switch. The solution is saving/restoring
the LBR stack to/from task's perf event context.

The LBR stack is saved/restored only when there are events that use
the LBR call stack. If no event uses LBR call stack, the LBR stack
is reset when task is scheduled in.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 81 --
 1 file changed, 65 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index 34b121b..d8ab484 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -180,21 +180,82 @@ void intel_pmu_lbr_reset(void)
intel_pmu_lbr_reset_64();
 }
 
+/*
+ * TOS = most recently recorded branch
+ */
+static inline u64 intel_pmu_lbr_tos(void)
+{
+   u64 tos;
+   rdmsrl(x86_pmu.lbr_tos, tos);
+   return tos;
+}
+
+enum {
+   LBR_NONE,
+   LBR_VALID,
+};
+
+static void __intel_pmu_lbr_restore(struct x86_perf_task_context *task_ctx)
+{
+   int i;
+   unsigned lbr_idx, mask = x86_pmu.lbr_nr - 1;
+   u64 tos = intel_pmu_lbr_tos();
+
+   for (i = 0; i < x86_pmu.lbr_nr; i++) {
+   lbr_idx = (tos - i) & mask;
+   wrmsrl(x86_pmu.lbr_from + lbr_idx, task_ctx->lbr_from[i]);
+   wrmsrl(x86_pmu.lbr_to + lbr_idx, task_ctx->lbr_to[i]);
+   }
+   task_ctx->lbr_stack_state = LBR_NONE;
+}
+
+static void __intel_pmu_lbr_save(struct x86_perf_task_context *task_ctx)
+{
+   int i;
+   unsigned lbr_idx, mask = x86_pmu.lbr_nr - 1;
+   u64 tos = intel_pmu_lbr_tos();
+
+   for (i = 0; i < x86_pmu.lbr_nr; i++) {
+   lbr_idx = (tos - i) & mask;
+   rdmsrl(x86_pmu.lbr_from + lbr_idx, task_ctx->lbr_from[i]);
+   rdmsrl(x86_pmu.lbr_to + lbr_idx, task_ctx->lbr_to[i]);
+   }
+   task_ctx->lbr_stack_state = LBR_VALID;
+}
+
+
 void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in)
 {
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+   struct x86_perf_task_context *task_ctx;
 
if (!x86_pmu.lbr_nr)
return;
 
/*
-* It is necessary to flush the stack on context switch. This happens
-* when the branch stack does not tag its entries with the pid of the
-* current task.
+* If LBR callstack feature is enabled and the stack was saved when
+* the task was scheduled out, restore the stack. Otherwise flush
+* the LBR stack.
 */
+   task_ctx = ctx ? ctx->task_ctx_data : NULL;
if (sched_in) {
-   intel_pmu_lbr_reset();
+   if (task_ctx &&
+   task_ctx->lbr_callstack_users > 0 &&
+   task_ctx->lbr_stack_state == LBR_VALID)
+   __intel_pmu_lbr_restore(task_ctx);
+   else
+   intel_pmu_lbr_reset();
+
cpuc->lbr_context = ctx;
+   return;
+   }
+
+   /* schedule out */
+   if (task_ctx) {
+   if (task_ctx->lbr_callstack_users)
+   __intel_pmu_lbr_save(task_ctx);
+   else
+   task_ctx->lbr_stack_state = LBR_NONE;
}
 }
 
@@ -270,18 +331,6 @@ void intel_pmu_lbr_disable_all(void)
__intel_pmu_lbr_disable();
 }
 
-/*
- * TOS = most recently recorded branch
- */
-static inline u64 intel_pmu_lbr_tos(void)
-{
-   u64 tos;
-
-   rdmsrl(x86_pmu.lbr_tos, tos);
-
-   return tos;
-}
-
 static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc)
 {
unsigned long mask = x86_pmu.lbr_nr - 1;
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 05/16] perf, core: pmu specific data for perf task context

2014-03-16 Thread Yan, Zheng

Introduce a new field to 'struct pmu' to specify the size of PMU
specific data. If the size is not zero, also allocate memory for
the PMU specific data when allocating perf task context. The PMU
specific data are initialized to zeros. Later patches will use
PMU specific data to save LBR stack.

Signed-off-by: Yan, Zheng 
Reviewed-by: Stephane Eranian 
---
 include/linux/perf_event.h |  5 +
 kernel/events/core.c   | 19 ++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 80ddc0c..3da433d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -252,6 +252,10 @@ struct pmu {
 */
void (*sched_task)  (struct perf_event_context *ctx,
 bool sched_in);
+   /*
+* PMU specific data size
+*/
+   size_t  task_ctx_size;
 };
 
 /**
@@ -493,6 +497,7 @@ struct perf_event_context {
u64 generation;
int pin_count;
int nr_cgroups;  /* cgroup evts */
+   void*task_ctx_data; /* pmu specific data */
struct rcu_head rcu_head;
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d07700a..f262c57 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -903,6 +903,15 @@ static void get_ctx(struct perf_event_context *ctx)
WARN_ON(!atomic_inc_not_zero(&ctx->refcount));
 }
 
+static void free_ctx(struct rcu_head *head)
+{
+   struct perf_event_context *ctx;
+
+   ctx = container_of(head, struct perf_event_context, rcu_head);
+   kfree(ctx->task_ctx_data);
+   kfree(ctx);
+}
+
 static void put_ctx(struct perf_event_context *ctx)
 {
if (atomic_dec_and_test(&ctx->refcount)) {
@@ -910,7 +919,7 @@ static void put_ctx(struct perf_event_context *ctx)
put_ctx(ctx->parent_ctx);
if (ctx->task)
put_task_struct(ctx->task);
-   kfree_rcu(ctx, rcu_head);
+   call_rcu(&ctx->rcu_head, free_ctx);
}
 }
 
@@ -3048,6 +3057,14 @@ alloc_perf_context(struct pmu *pmu, struct task_struct 
*task)
if (!ctx)
return NULL;
 
+   if (task && pmu->task_ctx_size > 0) {
+   ctx->task_ctx_data = kzalloc(pmu->task_ctx_size, GFP_KERNEL);
+   if (!ctx->task_ctx_data) {
+   kfree(ctx);
+   return NULL;
+   }
+   }
+
__perf_event_init_context(ctx);
if (task) {
ctx->task = task;
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 08/16] perf, x86: allocate space for storing LBR stack

2014-03-16 Thread Yan, Zheng

When the LBR call stack is enabled, it is necessary to save/restore
the LBR stack on context switch. We can use pmu specific data to
store LBR stack when task is scheduled out. This patch adds code
that allocates the pmu specific data.

Signed-off-by: Yan, Zheng 
Reviewed-by: Stephane Eranian 
---
 arch/x86/kernel/cpu/perf_event.c | 1 +
 arch/x86/kernel/cpu/perf_event.h | 7 +++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index b643823..978c055 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1905,6 +1905,7 @@ static struct pmu pmu = {
 
.event_idx  = x86_pmu_event_idx,
.sched_task = x86_pmu_sched_task,
+   .task_ctx_size  = sizeof(struct x86_perf_task_context),
 };
 
 void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index cc25819..b0eabca 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -472,6 +472,13 @@ struct x86_pmu {
struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
 };
 
+struct x86_perf_task_context {
+   u64 lbr_from[MAX_LBR_ENTRIES];
+   u64 lbr_to[MAX_LBR_ENTRIES];
+   int lbr_callstack_users;
+   int lbr_stack_state;
+};
+
 enum {
PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT = PERF_SAMPLE_BRANCH_MAX_SHIFT,
PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE,
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 04/16] perf, x86: Basic Haswell LBR call stack support

2014-03-16 Thread Yan, Zheng

Haswell has a new feature that utilizes the existing LBR facility to
record call chains. To enable this feature, bits (JCC, NEAR_IND_JMP,
NEAR_REL_JMP, FAR_BRANCH, EN_CALLSTACK) in LBR_SELECT must be set to 1,
bits (NEAR_REL_CALL, NEAR-IND_CALL, NEAR_RET) must be cleared. Due to
a hardware bug of Haswell, this feature doesn't work well with
FREEZE_LBRS_ON_PMI.

When the call stack feature is enabled, the LBR stack will capture
unfiltered call data normally, but as return instructions are executed,
the last captured branch record is flushed from the on-chip registers
in a last-in first-out (LIFO) manner. Thus, branch information relative
to leaf functions will not be captured, while preserving the call stack
information of the main line execution path.

This patch defines a separate lbr_sel map for Haswell. The map contains
a new entry for the call stack feature.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.h   | 13 -
 arch/x86/kernel/cpu/perf_event_intel.c |  2 +-
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 89 ++
 3 files changed, 80 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 8821f9e..cc25819 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -473,7 +473,10 @@ struct x86_pmu {
 };
 
 enum {
-   PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE = PERF_SAMPLE_BRANCH_MAX_SHIFT,
+   PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT = PERF_SAMPLE_BRANCH_MAX_SHIFT,
+   PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE,
+
+   PERF_SAMPLE_BRANCH_CALL_STACK = 1U << 
PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT,
 };
 
 #define x86_add_quirk(func_)   \
@@ -507,6 +510,12 @@ static struct perf_pmu_events_attr event_attr_##v = {  
\
 
 extern struct x86_pmu x86_pmu __read_mostly;
 
+static inline bool x86_pmu_has_lbr_callstack(void)
+{
+   return  x86_pmu.lbr_sel_map &&
+   x86_pmu.lbr_sel_map[PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT] > 0;
+}
+
 DECLARE_PER_CPU(struct cpu_hw_events, cpu_hw_events);
 
 int x86_perf_event_set_period(struct perf_event *event);
@@ -710,6 +719,8 @@ void intel_pmu_lbr_init_atom(void);
 
 void intel_pmu_lbr_init_snb(void);
 
+void intel_pmu_lbr_init_hsw(void);
+
 int intel_pmu_setup_lbr_filter(struct perf_event *event);
 
 int p4_pmu_init(void);
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 46848a0..be0e59e 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2489,7 +2489,7 @@ __init int intel_pmu_init(void)
memcpy(hw_cache_event_ids, snb_hw_cache_event_ids, 
sizeof(hw_cache_event_ids));
memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs, 
sizeof(hw_cache_extra_regs));
 
-   intel_pmu_lbr_init_snb();
+   intel_pmu_lbr_init_hsw();
 
x86_pmu.event_constraints = intel_hsw_event_constraints;
x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints;
diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index b0a4f03..32423ff 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -39,6 +39,7 @@ static enum {
 #define LBR_IND_JMP_BIT6 /* do not capture indirect jumps */
 #define LBR_REL_JMP_BIT7 /* do not capture relative jumps */
 #define LBR_FAR_BIT8 /* do not capture far branches */
+#define LBR_CALL_STACK_BIT 9 /* enable call stack */
 
 #define LBR_KERNEL (1 << LBR_KERNEL_BIT)
 #define LBR_USER   (1 << LBR_USER_BIT)
@@ -49,6 +50,7 @@ static enum {
 #define LBR_REL_JMP(1 << LBR_REL_JMP_BIT)
 #define LBR_IND_JMP(1 << LBR_IND_JMP_BIT)
 #define LBR_FAR(1 << LBR_FAR_BIT)
+#define LBR_CALL_STACK (1 << LBR_CALL_STACK_BIT)
 
 #define LBR_PLM (LBR_KERNEL | LBR_USER)
 
@@ -74,24 +76,25 @@ static enum {
  * x86control flow changes include branches, interrupts, traps, faults
  */
 enum {
-   X86_BR_NONE = 0,  /* unknown */
-
-   X86_BR_USER = 1 << 0, /* branch target is user */
-   X86_BR_KERNEL   = 1 << 1, /* branch target is kernel */
-
-   X86_BR_CALL = 1 << 2, /* call */
-   X86_BR_RET  = 1 << 3, /* return */
-   X86_BR_SYSCALL  = 1 << 4, /* syscall */
-   X86_BR_SYSRET   = 1 << 5, /* syscall return */
-   X86_BR_INT  = 1 << 6, /* sw interrupt */
-   X86_BR_IRET = 1 << 7, /* return from interrupt */
-   X86_BR_JCC  = 1 << 8, /* conditional */
-   X86_BR_JMP  = 1 << 9, /* jump */
-   X86_BR_IRQ  = 1 << 10,/* hw interrupt or trap or fault */
-   X86_BR_IND_CALL = 1 << 11,/* indirect calls */
-   X86_BR_ABORT= 1 << 12,/* transaction abort */
-   X86_BR_IN_TX= 1 << 13,/* in transaction */
-   X86_BR_NO_TX= 1 << 14,/*

[PATCH v4 14/16] perf, x86: enable LBR callstack when recording callchain

2014-03-16 Thread Yan, Zheng

If a task specific event wants user space callchain but does not want
branch stack sampling, enable the LBR call stack facility implicitly.
The LBR call stack facility can help perf to get user space callchain
in case of there is no frame pointer.

Note: this feature only affects how to get user callchain. The kernel
callchain is always got by frame pointers.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index ec972ea..92a61a6 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -429,6 +429,18 @@ int x86_pmu_hw_config(struct perf_event *event)
if (!event->attr.exclude_kernel)
*br_type |= PERF_SAMPLE_BRANCH_KERNEL;
}
+   } else if ((event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) &&
+  !has_branch_stack(event) &&
+  !event->attr.exclude_user &&
+  (event->attach_state & PERF_ATTACH_TASK)) {
+   /*
+* user did not specify branch_sample_type,
+* try using the LBR call stack facility to
+* record call chains of user program.
+*/
+   event->attr.branch_sample_type =
+   PERF_SAMPLE_BRANCH_USER |
+   PERF_SAMPLE_BRANCH_CALL_STACK;
}
 
/*
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 12/16] perf, x86: use LBR call stack to get user callchain

2014-03-16 Thread Yan, Zheng

Haswell has a new feature that utilizes the existing Last Branch Record
facility to record call chains. When the feature is enabled, function
call will be collected as normal, but as return instructions are executed
the last captured branch record is popped from the on-chip LBR registers.
The LBR call stack facility can help perf to get call chains of progam
without frame pointer.

This patch makes x86's perf_callchain_user() failback to use LBR call
stack data when there is no frame pointer in the user program. The 'from'
address of branch entry is used as 'return' address of function call.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.c   | 33 ++
 arch/x86/kernel/cpu/perf_event_intel.c |  2 +-
 arch/x86/kernel/cpu/perf_event_intel_lbr.c |  2 ++
 include/linux/perf_event.h |  1 +
 4 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index c7e9665..d520576 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1997,12 +1997,28 @@ static unsigned long get_segment_base(unsigned int 
segment)
return get_desc_base(desc + idx);
 }
 
+static inline void
+perf_callchain_lbr_callstack(struct perf_callchain_entry *entry,
+struct perf_sample_data *data)
+{
+   struct perf_branch_stack *br_stack = data->br_stack;
+
+   if (br_stack && br_stack->user_callstack) {
+   int i = 0;
+   while (i < br_stack->nr && entry->nr < PERF_MAX_STACK_DEPTH) {
+   perf_callchain_store(entry, br_stack->entries[i].from);
+   i++;
+   }
+   }
+}
+
 #ifdef CONFIG_COMPAT
 
 #include 
 
 static inline int
-perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
+perf_callchain_user32(struct perf_callchain_entry *entry,
+ struct pt_regs *regs, struct perf_sample_data *data)
 {
/* 32-bit process in 64-bit kernel. */
unsigned long ss_base, cs_base;
@@ -2031,11 +2047,16 @@ perf_callchain_user32(struct pt_regs *regs, struct 
perf_callchain_entry *entry)
perf_callchain_store(entry, cs_base + frame.return_address);
fp = compat_ptr(ss_base + frame.next_frame);
}
+
+   if (fp == compat_ptr(regs->bp))
+   perf_callchain_lbr_callstack(entry, data);
+
return 1;
 }
 #else
 static inline int
-perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
+perf_callchain_user32(struct perf_callchain_entry *entry,
+ struct pt_regs *regs, struct perf_sample_data *data)
 {
 return 0;
 }
@@ -2065,12 +2086,12 @@ void perf_callchain_user(struct perf_callchain_entry 
*entry,
if (!current->mm)
return;
 
-   if (perf_callchain_user32(regs, entry))
+   if (perf_callchain_user32(entry, regs, data))
return;
 
while (entry->nr < PERF_MAX_STACK_DEPTH) {
unsigned long bytes;
-   frame.next_frame = NULL;
+   frame.next_frame = NULL;
frame.return_address = 0;
 
bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
@@ -2083,6 +2104,10 @@ void perf_callchain_user(struct perf_callchain_entry 
*entry,
perf_callchain_store(entry, frame.return_address);
fp = frame.next_frame;
}
+
+   /* try LBR callstack if there is no frame pointer */
+   if (fp == (void __user *)regs->bp)
+   perf_callchain_lbr_callstack(entry, data);
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 4cf0d6d..dd106c1 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1396,7 +1396,7 @@ again:
 
perf_sample_data_init(&data, 0, event->hw.last_period);
 
-   if (has_branch_stack(event))
+   if (needs_branch_stack(event))
data.br_stack = &cpuc->lbr_stack;
 
if (perf_event_overflow(event, &data, regs))
diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index d8ab484..bf8bdf9 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -718,6 +718,8 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
int i, j, type;
bool compress = false;
 
+   cpuc->lbr_stack.user_callstack = branch_user_callstack(br_sel);
+
/* if sampling all branches, then nothing to filter */
if ((br_sel & X86_BR_ALL) == X86_BR_ALL)
return;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index b87974a..517c34a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -74,6 +74,7 @@ struct perf_raw_record {
  *

RE: [PATCH net-next v3 1/2]r8152:addRTL8152_EARLY_AGG_TIMEOUT_SUPER

2014-03-16 Thread hayeswang

 From: David Miller [mailto:da...@davemloft.net] 
> Sent: Saturday, March 15, 2014 2:43 AM
[...]
> > Besides, I don't wish to modify the setting by ethtool when re-loading
> > the driver or rebooting every time.
> 
> You have code to reset the driver, you can do it when the user asks
> for the setting to be changed via ethtool.  I do not see this as
> a problem.
> 
> The ethtool change can occur while the driver is already up, you'll
> just need to reset the chip and make the new configuration, this is
> not a problem.

Thanks for your answer. I would study how to set it by ethtool. 

Best Regards,
Hayes

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 10/16] perf, core: simplify need branch stack check

2014-03-16 Thread Yan, Zheng

event->attr.branch_sample_type is non-zero no matter branch stack
is enabled explicitly or is enabled implicitly. we can use it to
replace intel_pmu_needs_lbr_smpl(). This avoids duplicating code
that implicitly enables the LBR.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event_intel.c | 20 +++-
 include/linux/perf_event.h |  5 +
 kernel/events/core.c   |  3 +++
 3 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index be0e59e..4cf0d6d 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1030,20 +1030,6 @@ static __initconst const u64 slm_hw_cache_event_ids
  },
 };
 
-static inline bool intel_pmu_needs_lbr_smpl(struct perf_event *event)
-{
-   /* user explicitly requested branch sampling */
-   if (has_branch_stack(event))
-   return true;
-
-   /* implicit branch sampling to correct PEBS skid */
-   if (x86_pmu.intel_cap.pebs_trap && event->attr.precise_ip > 1 &&
-   x86_pmu.intel_cap.pebs_format < 2)
-   return true;
-
-   return false;
-}
-
 static void intel_pmu_disable_all(void)
 {
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
@@ -1208,7 +1194,7 @@ static void intel_pmu_disable_event(struct perf_event 
*event)
 * must disable before any actual event
 * because any event may be combined with LBR
 */
-   if (intel_pmu_needs_lbr_smpl(event))
+   if (needs_branch_stack(event))
intel_pmu_lbr_disable(event);
 
if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
@@ -1269,7 +1255,7 @@ static void intel_pmu_enable_event(struct perf_event 
*event)
 * must enabled before any actual event
 * because any event may be combined with LBR
 */
-   if (intel_pmu_needs_lbr_smpl(event))
+   if (needs_branch_stack(event))
intel_pmu_lbr_enable(event);
 
if (event->attr.exclude_host)
@@ -1739,7 +1725,7 @@ static int intel_pmu_hw_config(struct perf_event *event)
if (event->attr.precise_ip && x86_pmu.pebs_aliases)
x86_pmu.pebs_aliases(event);
 
-   if (intel_pmu_needs_lbr_smpl(event)) {
+   if (needs_branch_stack(event)) {
ret = intel_pmu_setup_lbr_filter(event);
if (ret)
return ret;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 3da433d..6afc675 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -763,6 +763,11 @@ static inline bool has_branch_stack(struct perf_event 
*event)
return event->attr.sample_type & PERF_SAMPLE_BRANCH_STACK;
 }
 
+static inline bool needs_branch_stack(struct perf_event *event)
+{
+   return event->attr.branch_sample_type != 0;
+}
+
 extern int perf_output_begin(struct perf_output_handle *handle,
 struct perf_event *event, unsigned int size);
 extern void perf_output_end(struct perf_output_handle *handle);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 20da73c..89940f6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6769,6 +6769,9 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))
goto err_ns;
 
+   if (!has_branch_stack(event))
+   event->attr.branch_sample_type = 0;
+
pmu = perf_init_event(event);
if (!pmu)
goto err_ns;
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 13/16] perf, x86: re-organize code that implicitly enables LBR/PEBS

2014-03-16 Thread Yan, Zheng

make later patch more readable, no logic change.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.c | 59 
 1 file changed, 29 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index d520576..ec972ea 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -399,36 +399,35 @@ int x86_pmu_hw_config(struct perf_event *event)
 
if (event->attr.precise_ip > precise)
return -EOPNOTSUPP;
-   /*
-* check that PEBS LBR correction does not conflict with
-* whatever the user is asking with attr->branch_sample_type
-*/
-   if (event->attr.precise_ip > 1 &&
-   x86_pmu.intel_cap.pebs_format < 2) {
-   u64 *br_type = &event->attr.branch_sample_type;
-
-   if (has_branch_stack(event)) {
-   if (!precise_br_compat(event))
-   return -EOPNOTSUPP;
-
-   /* branch_sample_type is compatible */
-
-   } else {
-   /*
-* user did not specify  branch_sample_type
-*
-* For PEBS fixups, we capture all
-* the branches at the priv level of the
-* event.
-*/
-   *br_type = PERF_SAMPLE_BRANCH_ANY;
-
-   if (!event->attr.exclude_user)
-   *br_type |= PERF_SAMPLE_BRANCH_USER;
-
-   if (!event->attr.exclude_kernel)
-   *br_type |= PERF_SAMPLE_BRANCH_KERNEL;
-   }
+   }
+   /*
+* check that PEBS LBR correction does not conflict with
+* whatever the user is asking with attr->branch_sample_type
+*/
+   if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format < 2) {
+   u64 *br_type = &event->attr.branch_sample_type;
+
+   if (has_branch_stack(event)) {
+   if (!precise_br_compat(event))
+   return -EOPNOTSUPP;
+
+   /* branch_sample_type is compatible */
+
+   } else {
+   /*
+* user did not specify  branch_sample_type
+*
+* For PEBS fixups, we capture all
+* the branches at the priv level of the
+* event.
+*/
+   *br_type = PERF_SAMPLE_BRANCH_ANY;
+
+   if (!event->attr.exclude_user)
+   *br_type |= PERF_SAMPLE_BRANCH_USER;
+
+   if (!event->attr.exclude_kernel)
+   *br_type |= PERF_SAMPLE_BRANCH_KERNEL;
}
}
 
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 11/16] perf, core: Pass perf_sample_data to perf_callchain()

2014-03-16 Thread Yan, Zheng

Haswell has a new feature that utilizes the existing Last Branch Record
facility to record call chains. When the feature is enabled, function
call will be collected as normal, but as return instructions are executed
the last captured branch record is popped from the on-chip LBR registers.
The LBR call stack facility can help perf to get call chains of progam
without frame pointer.

This patch modifies various architectures' perf_callchain() to accept
perf sample data. Later patch will add code that use the sample data to
get call chains.

Signed-off-by: Yan, Zheng 
---
 arch/arm/kernel/perf_event.c | 4 ++--
 arch/powerpc/perf/callchain.c| 4 ++--
 arch/sparc/kernel/perf_event.c   | 4 ++--
 arch/x86/kernel/cpu/perf_event.c | 4 ++--
 include/linux/perf_event.h   | 3 ++-
 kernel/events/callchain.c| 8 +---
 kernel/events/core.c | 2 +-
 kernel/events/internal.h | 3 ++-
 8 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index 789d846..276b13b 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -562,8 +562,8 @@ user_backtrace(struct frame_tail __user *tail,
return buftail.fp - 1;
 }
 
-void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+void perf_callchain_user(struct perf_callchain_entry *entry,
+struct pt_regs *regs, struct perf_sample_data *data)
 {
struct frame_tail __user *tail;
 
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 74d1e78..b379ebc 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -482,8 +482,8 @@ static void perf_callchain_user_32(struct 
perf_callchain_entry *entry,
}
 }
 
-void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+void perf_callchain_user(struct perf_callchain_entry *entry,
+struct pt_regs *regs, struct perf_sample_data *data)
 {
if (current_is_64bit())
perf_callchain_user_64(entry, regs);
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index b5c38fa..cba0306 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1785,8 +1785,8 @@ static void perf_callchain_user_32(struct 
perf_callchain_entry *entry,
} while (entry->nr < PERF_MAX_STACK_DEPTH);
 }
 
-void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+void perf_callchain_user(struct perf_callchain_entry *entry,
+struct pt_regs *regs, struct perf_sample_data *data)
 {
perf_callchain_store(entry, regs->tpc);
 
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 978c055..c7e9665 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -2041,8 +2041,8 @@ perf_callchain_user32(struct pt_regs *regs, struct 
perf_callchain_entry *entry)
 }
 #endif
 
-void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+void perf_callchain_user(struct perf_callchain_entry *entry,
+struct pt_regs *regs, struct perf_sample_data *data)
 {
struct stack_frame frame;
const void __user *fp;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6afc675..b87974a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -706,7 +706,8 @@ extern void perf_event_fork(struct task_struct *tsk);
 /* Callchains */
 DECLARE_PER_CPU(struct perf_callchain_entry, perf_callchain_entry);
 
-extern void perf_callchain_user(struct perf_callchain_entry *entry, struct 
pt_regs *regs);
+extern void perf_callchain_user(struct perf_callchain_entry *entry, struct 
pt_regs *regs,
+   struct perf_sample_data *data);
 extern void perf_callchain_kernel(struct perf_callchain_entry *entry, struct 
pt_regs *regs);
 
 static inline void perf_callchain_store(struct perf_callchain_entry *entry, 
u64 ip)
diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index 97b67df..19d497c 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -30,7 +30,8 @@ __weak void perf_callchain_kernel(struct perf_callchain_entry 
*entry,
 }
 
 __weak void perf_callchain_user(struct perf_callchain_entry *entry,
-   struct pt_regs *regs)
+   struct pt_regs *regs,
+   struct perf_sample_data *data)
 {
 }
 
@@ -157,7 +158,8 @@ put_callchain_entry(int rctx)
 }
 
 struct perf_callchain_entry *
-perf_callchain(struct perf_event *event, struct pt_regs *regs)
+perf_callchain(struct perf_event *event, struct pt_regs *regs,
+  struct perf_sample_data *data)
 {
int rctx;
struct perf_callchain_entry *entry;
@@ -198,7 +200,7 @@ perf_callchain(struct perf_event *event, struct pt_regs 
*regs)

[PATCH v4 01/16] perf, x86: Reduce lbr_sel_map size

2014-03-16 Thread Yan, Zheng

The index of lbr_sel_map is bit value of perf branch_sample_type.
PERF_SAMPLE_BRANCH_MAX is 1024 at present, so each lbr_sel_map uses
4096 bytes. By using bit shift as index, we can reduce lbr_sel_map
size to 40 bytes. This patch defines 'bit shift' for branch types,
and use 'bit shift' to define lbr_sel_maps.

Signed-off-by: Yan, Zheng 
Reviewed-by: Stephane Eranian 
---
 arch/x86/kernel/cpu/perf_event.h   |  4 +++
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 50 ++
 include/uapi/linux/perf_event.h| 42 +
 3 files changed, 56 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 3b2f9bd..b58c0ba 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -471,6 +471,10 @@ struct x86_pmu {
struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
 };
 
+enum {
+   PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE = PERF_SAMPLE_BRANCH_MAX_SHIFT,
+};
+
 #define x86_add_quirk(func_)   \
 do {   \
static struct x86_pmu_quirk __quirk __initdata = {  \
diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index d82d155..1ae2ec5 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -69,10 +69,6 @@ static enum {
 #define LBR_FROM_FLAG_IN_TX(1ULL << 62)
 #define LBR_FROM_FLAG_ABORT(1ULL << 61)
 
-#define for_each_branch_sample_type(x) \
-   for ((x) = PERF_SAMPLE_BRANCH_USER; \
-(x) < PERF_SAMPLE_BRANCH_MAX; (x) <<= 1)
-
 /*
  * x86control flow change classification
  * x86control flow changes include branches, interrupts, traps, faults
@@ -400,14 +396,14 @@ static int intel_pmu_setup_hw_lbr_filter(struct 
perf_event *event)
 {
struct hw_perf_event_extra *reg;
u64 br_type = event->attr.branch_sample_type;
-   u64 mask = 0, m;
-   u64 v;
+   u64 mask = 0, v;
+   int i;
 
-   for_each_branch_sample_type(m) {
-   if (!(br_type & m))
+   for (i = 0; i < PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE; i++) {
+   if (!(br_type & (1ULL << i)))
continue;
 
-   v = x86_pmu.lbr_sel_map[m];
+   v = x86_pmu.lbr_sel_map[i];
if (v == LBR_NOT_SUPP)
return -EOPNOTSUPP;
 
@@ -662,33 +658,33 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
 /*
  * Map interface branch filters onto LBR filters
  */
-static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
-   [PERF_SAMPLE_BRANCH_ANY]= LBR_ANY,
-   [PERF_SAMPLE_BRANCH_USER]   = LBR_USER,
-   [PERF_SAMPLE_BRANCH_KERNEL] = LBR_KERNEL,
-   [PERF_SAMPLE_BRANCH_HV] = LBR_IGN,
-   [PERF_SAMPLE_BRANCH_ANY_RETURN] = LBR_RETURN | LBR_REL_JMP
-   | LBR_IND_JMP | LBR_FAR,
+static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE] = {
+   [PERF_SAMPLE_BRANCH_ANY_SHIFT]  = LBR_ANY,
+   [PERF_SAMPLE_BRANCH_USER_SHIFT] = LBR_USER,
+   [PERF_SAMPLE_BRANCH_KERNEL_SHIFT]   = LBR_KERNEL,
+   [PERF_SAMPLE_BRANCH_HV_SHIFT]   = LBR_IGN,
+   [PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT]   = LBR_RETURN | LBR_REL_JMP
+   | LBR_IND_JMP | LBR_FAR,
/*
 * NHM/WSM erratum: must include REL_JMP+IND_JMP to get CALL branches
 */
-   [PERF_SAMPLE_BRANCH_ANY_CALL] =
+   [PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT] =
 LBR_REL_CALL | LBR_IND_CALL | LBR_REL_JMP | LBR_IND_JMP | LBR_FAR,
/*
 * NHM/WSM erratum: must include IND_JMP to capture IND_CALL
 */
-   [PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP,
+   [PERF_SAMPLE_BRANCH_IND_CALL_SHIFT] = LBR_IND_CALL | LBR_IND_JMP,
 };
 
-static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
-   [PERF_SAMPLE_BRANCH_ANY]= LBR_ANY,
-   [PERF_SAMPLE_BRANCH_USER]   = LBR_USER,
-   [PERF_SAMPLE_BRANCH_KERNEL] = LBR_KERNEL,
-   [PERF_SAMPLE_BRANCH_HV] = LBR_IGN,
-   [PERF_SAMPLE_BRANCH_ANY_RETURN] = LBR_RETURN | LBR_FAR,
-   [PERF_SAMPLE_BRANCH_ANY_CALL]   = LBR_REL_CALL | LBR_IND_CALL
-   | LBR_FAR,
-   [PERF_SAMPLE_BRANCH_IND_CALL]   = LBR_IND_CALL,
+static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE] = {
+   [PERF_SAMPLE_BRANCH_ANY_SHIFT]  = LBR_ANY,
+   [PERF_SAMPLE_BRANCH_USER_SHIFT] = LBR_USER,
+   [PERF_SAMPLE_BRANCH_KERNEL_SHIFT]   = LBR_KERNEL,
+   [PERF_SAMPLE_BRANCH_HV_SHIFT]   = LBR_IGN,
+   [PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT]   = LBR_RETURN | LBR_FAR,
+   [PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT] = LBR_REL_CALL | LBR_IND_

[PATCH v4 15/16] perf, x86: disable FREEZE_LBRS_ON_PMI when LBR operates in callstack mode

2014-03-16 Thread Yan, Zheng

Due to a hardware bug of Haswell, LBR callstack does not work well with
FREEZE_LBRS_ON_PMI. If FREEZE_LBRS_ON_PMI is set, PMIs near call/return
instructions may cause superfluous increase/decrease of LBR_TOS.

This patch modifies __intel_pmu_lbr_enable() to not enable
FREEZE_LBRS_ON_PMI when LBR operates in callstack mode. We currently
don't use LBR callstack to capture kernel space callchain, so disabling
FREEZE_LBRS_ON_PMI should not be a problem.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index bf8bdf9..5a6ea1c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -138,7 +138,14 @@ static void __intel_pmu_lbr_enable(void)
wrmsrl(MSR_LBR_SELECT, cpuc->lbr_sel->config);
 
rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
-   debugctl |= (DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI);
+   debugctl |= DEBUGCTLMSR_LBR;
+   /*
+* LBR callstack does not work well with FREEZE_LBRS_ON_PMI.
+* If FREEZE_LBRS_ON_PMI is set, PMI near call/return instructions
+* may cause superfluous increase/decrease of LBR_TOS.
+*/
+   if (!cpuc->lbr_sel || !(cpuc->lbr_sel->config & LBR_CALL_STACK))
+   debugctl |= DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;
wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
 }
 
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 16/16] perf, x86: Discard zero length call entries in LBR call stack

2014-03-16 Thread Yan, Zheng

"Zero length call" uses the attribute of the call instruction to push
the immediate instruction pointer on to the stack and then pops off
that address into a register. This is accomplished without any matching
return instruction. It confuses the hardware and make the recorded call
stack incorrect.

We can partially resolve this issue by: decode call instructions and
discard any zero length call entry in the LBR stack.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index 5a6ea1c..10bcda6 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -94,7 +94,8 @@ enum {
X86_BR_ABORT= 1 << 12,/* transaction abort */
X86_BR_IN_TX= 1 << 13,/* in transaction */
X86_BR_NO_TX= 1 << 14,/* not in transaction */
-   X86_BR_CALL_STACK   = 1 << 15,/* call stack */
+   X86_BR_ZERO_CALL= 1 << 15,/* zero length call */
+   X86_BR_CALL_STACK   = 1 << 16,/* call stack */
 };
 
 #define X86_BR_PLM (X86_BR_USER | X86_BR_KERNEL)
@@ -111,13 +112,15 @@ enum {
 X86_BR_JMP  |\
 X86_BR_IRQ  |\
 X86_BR_ABORT|\
-X86_BR_IND_CALL)
+X86_BR_IND_CALL |\
+X86_BR_ZERO_CALL)
 
 #define X86_BR_ALL (X86_BR_PLM | X86_BR_ANY)
 
 #define X86_BR_ANY_CALL \
(X86_BR_CALL|\
 X86_BR_IND_CALL|\
+X86_BR_ZERO_CALL   |\
 X86_BR_SYSCALL |\
 X86_BR_IRQ |\
 X86_BR_INT)
@@ -659,6 +662,12 @@ static int branch_type(unsigned long from, unsigned long 
to, int abort)
ret = X86_BR_INT;
break;
case 0xe8: /* call near rel */
+   insn_get_immediate(&insn);
+   if (insn.immediate1.value == 0) {
+   /* zero length call */
+   ret = X86_BR_ZERO_CALL;
+   break;
+   }
case 0x9a: /* call far absolute */
ret = X86_BR_CALL;
break;
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 03/16] perf, x86: use context switch callback to flush LBR stack

2014-03-16 Thread Yan, Zheng

Previous commit introduces context switch callback, its function
overlaps with the flush branch stack callback. So we can use the
context switch callback to flush LBR stack.

This patch adds code that uses the flush branch callback to
flush the LBR stack when task is being scheduled in. The callback
is enabled only when there are events use the LBR hardware. This
patch also removes all old flush branch stack code.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.c   |  7 ---
 arch/x86/kernel/cpu/perf_event.h   |  3 +-
 arch/x86/kernel/cpu/perf_event_intel.c | 14 +-
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 32 +++--
 include/linux/perf_event.h |  6 ---
 kernel/events/core.c   | 77 --
 6 files changed, 30 insertions(+), 109 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 03e977f..b643823 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1878,12 +1878,6 @@ static void x86_pmu_sched_task(struct perf_event_context 
*ctx, bool sched_in)
x86_pmu.sched_task(ctx, sched_in);
 }
 
-static void x86_pmu_flush_branch_stack(void)
-{
-   if (x86_pmu.flush_branch_stack)
-   x86_pmu.flush_branch_stack();
-}
-
 void perf_check_microcode(void)
 {
if (x86_pmu.check_microcode)
@@ -1910,7 +1904,6 @@ static struct pmu pmu = {
.commit_txn = x86_pmu_commit_txn,
 
.event_idx  = x86_pmu_event_idx,
-   .flush_branch_stack = x86_pmu_flush_branch_stack,
.sched_task = x86_pmu_sched_task,
 };
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 1e2118a..8821f9e 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -428,7 +428,6 @@ struct x86_pmu {
void(*cpu_dead)(int cpu);
 
void(*check_microcode)(void);
-   void(*flush_branch_stack)(void);
void(*sched_task)(struct perf_event_context *ctx,
  bool sched_in);
 
@@ -689,6 +688,8 @@ void intel_pmu_pebs_disable_all(void);
 
 void intel_ds_init(void);
 
+void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in);
+
 void intel_pmu_lbr_reset(void);
 
 void intel_pmu_lbr_enable(struct perf_event *event);
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index aa333d9..46848a0 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2036,18 +2036,6 @@ static void intel_pmu_cpu_dying(int cpu)
fini_debug_store_on_cpu(cpu);
 }
 
-static void intel_pmu_flush_branch_stack(void)
-{
-   /*
-* Intel LBR does not tag entries with the
-* PID of the current task, then we need to
-* flush it on ctxsw
-* For now, we simply reset it
-*/
-   if (x86_pmu.lbr_nr)
-   intel_pmu_lbr_reset();
-}
-
 PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");
 
 PMU_FORMAT_ATTR(ldlat, "config1:0-15");
@@ -2099,7 +2087,7 @@ static __initconst const struct x86_pmu intel_pmu = {
.cpu_starting   = intel_pmu_cpu_starting,
.cpu_dying  = intel_pmu_cpu_dying,
.guest_get_msrs = intel_guest_get_msrs,
-   .flush_branch_stack = intel_pmu_flush_branch_stack,
+   .sched_task = intel_pmu_lbr_sched_task,
 };
 
 static __init void intel_clovertown_quirk(void)
diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index 1ae2ec5..b0a4f03 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -177,7 +177,7 @@ void intel_pmu_lbr_reset(void)
intel_pmu_lbr_reset_64();
 }
 
-void intel_pmu_lbr_enable(struct perf_event *event)
+void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in)
 {
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 
@@ -185,6 +185,23 @@ void intel_pmu_lbr_enable(struct perf_event *event)
return;
 
/*
+* It is necessary to flush the stack on context switch. This happens
+* when the branch stack does not tag its entries with the pid of the
+* current task.
+*/
+   if (sched_in) {
+   intel_pmu_lbr_reset();
+   cpuc->lbr_context = ctx;
+   }
+}
+
+void intel_pmu_lbr_enable(struct perf_event *event)
+{
+   struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+
+   if (!x86_pmu.lbr_nr)
+   return;
+   /*
 * Reset the LBR stack if we changed task context to
 * avoid data leaks.
 */
@@ -195,6 +212,8 @@ void intel_pmu_lbr_enable(struct perf_event *event)
cpuc->br_sel = event->hw.branch_reg.reg;
 
cpuc->l

Re: [PATCHv2 3/8] devfreq: exynos4: Add ppmu's clock control and code clean about regulator control

2014-03-16 Thread Chanwoo Choi

Hi Tomasz,

On 03/17/2014 11:51 AM, Chanwoo Choi wrote:
> Hi Tomasz,
> 
> On 03/15/2014 02:42 AM, Tomasz Figa wrote:
>> Hi Chanwoo,
>>
>> On 13.03.2014 09:17, Chanwoo Choi wrote:
>>> There are not the clock controller of ppmudmc0/1. This patch control the 
>>> clock
>>> of ppmudmc0/1 which is used for monitoring memory bus utilization.
>>>
>>> Also, this patch code clean about regulator control and free resource
>>> when calling exit/remove function.
>>>
>>> For example,
>>> busfreq@106A {
>>> compatible = "samsung,exynos4x12-busfreq";
>>>
>>> /* Clock for PPMUDMC0/1 */
>>> clocks = <&clock CLK_PPMUDMC0>, <&clock CLK_PPMUDMC1>;
>>> clock-names = "ppmudmc0", "ppmudmc1";
>>>
>>> /* Regulator for MIF/INT block */
>>> vdd_mif-supply = <&buck1_reg>;
>>> vdd_int-supply = <&buck3_reg>;
>>> };
>>>
>>> Signed-off-by: Chanwoo Choi 

I modify this patch according your comment as following:

Best Regards,
Chanwoo Choi

>From c8f2fbc4c1166ec02fb2ad46164bc7ed9118721b Mon Sep 17 00:00:00 2001
From: Chanwoo Choi 
Date: Fri, 14 Mar 2014 12:05:54 +0900
Subject: [PATCH] devfreq: exynos4: Add ppmu's clock control and code clean
 about regulator control

There are not the clock controller of ppmudmc0/1. This patch control the clock
of ppmudmc0/1 which is used for monitoring memory bus utilization.

Also, this patch code clean about regulator control and free resource
when calling exit/remove function.

For example,
busfreq@106A {
compatible = "samsung,exynos4x12-busfreq";

/* Clock for PPMUDMC0/1 */
clocks = <&clock CLK_PPMUDMC0>, <&clock CLK_PPMUDMC1>;
clock-names = "ppmudmc0", "ppmudmc1";

/* Regulator for MIF/INT block */
vdd_mif-supply = <&buck1_reg>;
vdd_int-supply = <&buck3_reg>;
};

Signed-off-by: Chanwoo Choi 
---
 drivers/devfreq/exynos/exynos4_bus.c | 97 +++-
 1 file changed, 84 insertions(+), 13 deletions(-)

diff --git a/drivers/devfreq/exynos/exynos4_bus.c 
b/drivers/devfreq/exynos/exynos4_bus.c
index 4c630fb..3956bcc 100644
--- a/drivers/devfreq/exynos/exynos4_bus.c
+++ b/drivers/devfreq/exynos/exynos4_bus.c
@@ -62,6 +62,11 @@ enum exynos_ppmu_idx {
PPMU_END,
 };
 
+static const char *exynos_ppmu_clk_name[] = {
+   [PPMU_DMC0] = "ppmudmc0",
+   [PPMU_DMC1] = "ppmudmc1",
+};
+
 #define EX4210_LV_MAX  LV_2
 #define EX4x12_LV_MAX  LV_4
 #define EX4210_LV_NUM  (LV_2 + 1)
@@ -86,6 +91,7 @@ struct busfreq_data {
struct regulator *vdd_mif; /* Exynos4412/4212 only */
struct busfreq_opp_info curr_oppinfo;
struct exynos_ppmu ppmu[PPMU_END];
+   struct clk *clk_ppmu[PPMU_END];
 
struct notifier_block pm_notifier;
struct mutex lock;
@@ -724,6 +730,17 @@ static void exynos4_bus_exit(struct device *dev)
struct busfreq_data *data = dev_get_drvdata(dev);
int i;
 
+   /*
+* Un-map memory map and disable regulator/clocks
+* to prevent power leakage.
+*/
+   regulator_disable(data->vdd_int);
+   if (data->type == TYPE_BUSF_EXYNOS4x12)
+   regulator_disable(data->vdd_mif);
+
+   for (i = 0; i < PPMU_END; i++)
+   clk_disable_unprepare(data->clk_ppmu[i]);
+
for (i = 0; i < PPMU_END; i++)
iounmap(data->ppmu[i].hw_base);
 }
@@ -989,6 +1006,7 @@ static int exynos4_busfreq_parse_dt(struct busfreq_data 
*data)
 {
struct device *dev = data->dev;
struct device_node *np = dev->of_node;
+   const char **clk_name = exynos_ppmu_clk_name;
int i, ret = 0;
 
if (!np) {
@@ -1006,8 +1024,67 @@ static int exynos4_busfreq_parse_dt(struct busfreq_data 
*data)
}
}
 
+   for (i = 0; i < PPMU_END; i++) {
+   data->clk_ppmu[i] = devm_clk_get(dev, clk_name[i]);
+   if (IS_ERR(data->clk_ppmu[i])) {
+   dev_warn(dev, "Failed to get %s clock\n", clk_name[i]);
+   goto err_clocks;
+   }
+
+   ret = clk_prepare_enable(data->clk_ppmu[i]);
+   if (ret < 0) {
+   dev_warn(dev, "Failed to enable %s clock\n", 
clk_name[i]);
+   data->clk_ppmu[i] = NULL;
+   goto err_clocks;
+   }
+   }
+
+   /* Get regulator to control voltage of int block */
+   data->vdd_int = devm_regulator_get(dev, "vdd_int");
+   if (IS_ERR(data->vdd_int)) {
+   dev_err(dev, "Failed to get the regulator of vdd_int\n");
+   ret = PTR_ERR(data->vdd_int);
+   goto err_clocks;
+   }
+   ret = regulator_enable(data->vdd_int);
+   if (ret < 0) {
+   dev_err(dev, "Failed to enable regulator of vdd_int\n");
+   goto err_clocks;
+   }
+
+   switch (data->type) {
+   case TYPE_BUSF_EXYNOS4210:
+   break;
+   case TYPE_BUSF_EXYNOS4x12:
+   /* Get regulator to

[RFCv2 1/7] virtio-scsi.h: Add virtio_scsi_cmd_req_pi header definition

2014-03-16 Thread Nicholas A. Bellinger

From: Nicholas Bellinger 

This patch adds a virtio_scsi_cmd_req_pi header as recommened by
Paolo that contains do_pi_niov + di_pi_niov elements used for
signaling when protection information buffers are expected to
preceed the data buffers.

Cc: Paolo Bonzini 
Cc: Michael S. Tsirkin 
Cc: Martin K. Petersen 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Sagi Grimberg 
Cc: H. Peter Anvin 
Signed-off-by: Nicholas Bellinger 
---
 include/linux/virtio_scsi.h |   14 +-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/include/linux/virtio_scsi.h b/include/linux/virtio_scsi.h
index 4195b97..4dc5998 100644
--- a/include/linux/virtio_scsi.h
+++ b/include/linux/virtio_scsi.h
@@ -35,11 +35,23 @@ struct virtio_scsi_cmd_req {
u8 lun[8];  /* Logical Unit Number */
u64 tag;/* Command identifier */
u8 task_attr;   /* Task attribute */
-   u8 prio;
+   u8 prio;/* SAM command priority field */
u8 crn;
u8 cdb[VIRTIO_SCSI_CDB_SIZE];
 } __packed;
 
+/* SCSI command request, followed by protection information */
+struct virtio_scsi_cmd_req_pi {
+   u8 lun[8];  /* Logical Unit Number */
+   u64 tag;/* Command identifier */
+   u8 task_attr;   /* Task attribute */
+   u8 prio;/* SAM command priority field */
+   u8 crn;
+   u16 do_pi_niov; /* DataOUT PI Number of iovecs */
+   u16 di_pi_niov; /* DataIN PI Number of iovecs */
+   u8 cdb[VIRTIO_SCSI_CDB_SIZE];
+} __packed;
+
 /* Response, followed by sense data and data-in */
 struct virtio_scsi_cmd_resp {
u32 sense_len;  /* Sense data length */
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFCv2 0/7] vhost/scsi: Add T10 PI SGL passthrough support

2014-03-16 Thread Nicholas A. Bellinger

From: Nicholas Bellinger 

Hi MST, MKP, Paolo & Co,

This is an updated -v2 series for adding T1O protection information (PI)
SGL passthrough support between virtio-scsi LLD + vhost-scsi fabric
endpoints.

The patch series is available at:

  git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git vhost-dif

Following Paolo's recommendations, this patch adds a new virtio_scsi command
header (virtio_scsi_cmd_req_pi) with the following elements to signal the
existence of protection information:

 ->do_pi_niov (DataOUT PI number of iovecs)
 ->di_pi_noiv (DataIN PI number of iovecs)

Also included is the change to attach protection information preceeding the
actual DataOUT + DataIN data payload, thus making a future improvement of
processing virtio buffers inline a possibility.

vhost-scsi code has also been updated to determine virtio_scsi_cmd_req or
virtio_scsi_cmd_req_pi usage based upon the first iovec's (header) length,
and then continues to process in either mode accordingly.

As with the original RFC, the virtio-scsi patch still contains a hack
to force DIX/DIF to be enabled, regardless of host provided feature bits.
This regression bug still needs to be tracked down.

v2 changes:
  - Add virtio_scsi_cmd_req_pi header (Paolo + nab)
  - Use virtio_scsi_cmd_req_pi instead of existing ->prio (Paolo + nab)
  - Make protection buffer come before data buffer (Paolo + nab)
  - Update vhost_scsi_get_tag() parameter usage (nab)

Please review.

Thanks!

--nab

Nicholas Bellinger (7):
  virtio-scsi.h: Add virtio_scsi_cmd_req_pi header definition
  vhost/scsi: Move sanity check into vhost_scsi_map_iov_to_sgl
  vhost/scsi: Add preallocation of protection SGLs
  vhost/scsi: Add T10 PI IOV -> SGL memory mapping logic
  vhost/scsi: Enable T10 PI IOV -> SGL memory mapping
  vhost/scsi: Add new VIRTIO_SCSI_F_T10_PI feature bit
  virtio-scsi: Enable DIF/DIX modes in SCSI host LLD

 drivers/scsi/virtio_scsi.c  |   79 +---
 drivers/vhost/scsi.c|  289 +--
 include/linux/virtio_scsi.h |   15 ++-
 3 files changed, 273 insertions(+), 110 deletions(-)

-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFCv2 5/7] vhost/scsi: Enable T10 PI IOV -> SGL memory mapping

2014-03-16 Thread Nicholas A. Bellinger

From: Nicholas Bellinger 

This patch updates vhost_scsi_handle_vq() to check for the existance
of virtio_scsi_cmd_req_pi comparing vq->iov[0].iov_len in order to
calculate seperate data + protection SGLs from data_num.

Also update tcm_vhost_submission_work() to pass the pre-allocated
cmd->tvc_prot_sgl[] memory into target_submit_cmd_map_sgls(), and
update vhost_scsi_get_tag() parameters to accept scsi_tag, lun, and
task_attr.

v2 changes:
  - Use virtio_scsi_cmd_req_pi instead of existing ->prio (Paolo)
  - Make protection buffer come before data buffer (Paolo)
  - Update vhost_scsi_get_tag() parameter usage

Cc: Michael S. Tsirkin 
Cc: Paolo Bonzini 
Cc: Martin K. Petersen 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Sagi Grimberg 
Cc: H. Peter Anvin 
Signed-off-by: Nicholas Bellinger 
---
 drivers/vhost/scsi.c |  164 +
 1 files changed, 110 insertions(+), 54 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index f720709..00903dc 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -715,11 +715,9 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work 
*work)
 }
 
 static struct tcm_vhost_cmd *
-vhost_scsi_get_tag(struct vhost_virtqueue *vq,
-   struct tcm_vhost_tpg *tpg,
-   struct virtio_scsi_cmd_req *v_req,
-   u32 exp_data_len,
-   int data_direction)
+vhost_scsi_get_tag(struct vhost_virtqueue *vq, struct tcm_vhost_tpg *tpg,
+  unsigned char *cdb, u64 scsi_tag, u16 lun, u8 task_attr,
+  u32 exp_data_len, int data_direction)
 {
struct tcm_vhost_cmd *cmd;
struct tcm_vhost_nexus *tv_nexus;
@@ -751,13 +749,16 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq,
cmd->tvc_prot_sgl = prot_sg;
cmd->tvc_upages = pages;
cmd->tvc_se_cmd.map_tag = tag;
-   cmd->tvc_tag = v_req->tag;
-   cmd->tvc_task_attr = v_req->task_attr;
+   cmd->tvc_tag = scsi_tag;
+   cmd->tvc_lun = lun;
+   cmd->tvc_task_attr = task_attr;
cmd->tvc_exp_data_len = exp_data_len;
cmd->tvc_data_direction = data_direction;
cmd->tvc_nexus = tv_nexus;
cmd->inflight = tcm_vhost_get_inflight(vq);
 
+   memcpy(cmd->tvc_cdb, cdb, TCM_VHOST_MAX_CDB_SIZE);
+
return cmd;
 }
 
@@ -908,18 +909,15 @@ static void tcm_vhost_submission_work(struct work_struct 
*work)
container_of(work, struct tcm_vhost_cmd, work);
struct tcm_vhost_nexus *tv_nexus;
struct se_cmd *se_cmd = &cmd->tvc_se_cmd;
-   struct scatterlist *sg_ptr, *sg_bidi_ptr = NULL;
-   int rc, sg_no_bidi = 0;
+   struct scatterlist *sg_ptr, *sg_prot_ptr = NULL;
+   int rc;
 
+   /* FIXME: BIDI operation */
if (cmd->tvc_sgl_count) {
sg_ptr = cmd->tvc_sgl;
-/* FIXME: Fix BIDI operation in tcm_vhost_submission_work() */
-#if 0
-   if (se_cmd->se_cmd_flags & SCF_BIDI) {
-   sg_bidi_ptr = NULL;
-   sg_no_bidi = 0;
-   }
-#endif
+
+   if (cmd->tvc_prot_sgl_count)
+   sg_prot_ptr = cmd->tvc_prot_sgl;
} else {
sg_ptr = NULL;
}
@@ -930,7 +928,7 @@ static void tcm_vhost_submission_work(struct work_struct 
*work)
cmd->tvc_lun, cmd->tvc_exp_data_len,
cmd->tvc_task_attr, cmd->tvc_data_direction,
TARGET_SCF_ACK_KREF, sg_ptr, cmd->tvc_sgl_count,
-   sg_bidi_ptr, sg_no_bidi, NULL, 0);
+   NULL, 0, sg_prot_ptr, cmd->tvc_prot_sgl_count);
if (rc < 0) {
transport_send_check_condition_and_sense(se_cmd,
TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE, 0);
@@ -962,12 +960,18 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct 
vhost_virtqueue *vq)
 {
struct tcm_vhost_tpg **vs_tpg;
struct virtio_scsi_cmd_req v_req;
+   struct virtio_scsi_cmd_req_pi v_req_pi;
struct tcm_vhost_tpg *tpg;
struct tcm_vhost_cmd *cmd;
-   u32 exp_data_len, data_first, data_num, data_direction;
+   u64 tag;
+   u32 exp_data_len, data_first, data_num, data_direction, prot_first;
unsigned out, in, i;
-   int head, ret;
-   u8 target;
+   int head, ret, data_niov, prot_niov;
+   size_t req_size;
+   u16 lun;
+   u8 *target, task_attr;
+   bool hdr_pi;
+   unsigned char *req, *cdb;
 
mutex_lock(&vq->mutex);
/*
@@ -998,7 +1002,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct 
vhost_virtqueue *vq)
break;
}
 
-/* FIXME: BIDI operation */
+   /* FIXME: BIDI operation */
if (out == 1 && in == 1) {
data_direction = DMA_NONE;
data_first = 0;
@@ -1028,23 +1032,31 @@ vhost_scsi_han

[RFCv2 3/7] vhost/scsi: Add preallocation of protection SGLs

2014-03-16 Thread Nicholas A. Bellinger

From: Nicholas Bellinger 

This patch updates tcm_vhost_make_nexus() to pre-allocate per descriptor
tcm_vhost_cmd->tvc_prot_sgl[] used to expose protection SGLs from within
virtio-scsi guest memory to vhost-scsi.

Cc: Michael S. Tsirkin 
Cc: Paolo Bonzini 
Cc: Martin K. Petersen 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Signed-off-by: Nicholas Bellinger 
---
 drivers/vhost/scsi.c |   15 ++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 8c88ce9..a2cb289 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -58,6 +58,7 @@
 #define TCM_VHOST_DEFAULT_TAGS 256
 #define TCM_VHOST_PREALLOC_SGLS 2048
 #define TCM_VHOST_PREALLOC_UPAGES 2048
+#define TCM_VHOST_PREALLOC_PROT_SGLS 512
 
 struct vhost_scsi_inflight {
/* Wait for the flush operation to finish */
@@ -83,6 +84,7 @@ struct tcm_vhost_cmd {
u32 tvc_lun;
/* Pointer to the SGL formatted memory from virtio-scsi */
struct scatterlist *tvc_sgl;
+   struct scatterlist *tvc_prot_sgl;
struct page **tvc_upages;
/* Pointer to response */
struct virtio_scsi_cmd_resp __user *tvc_resp;
@@ -717,7 +719,7 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq,
struct tcm_vhost_cmd *cmd;
struct tcm_vhost_nexus *tv_nexus;
struct se_session *se_sess;
-   struct scatterlist *sg;
+   struct scatterlist *sg, *prot_sg;
struct page **pages;
int tag;
 
@@ -736,10 +738,12 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq,
 
cmd = &((struct tcm_vhost_cmd *)se_sess->sess_cmd_map)[tag];
sg = cmd->tvc_sgl;
+   prot_sg = cmd->tvc_prot_sgl;
pages = cmd->tvc_upages;
memset(cmd, 0, sizeof(struct tcm_vhost_cmd));
 
cmd->tvc_sgl = sg;
+   cmd->tvc_prot_sgl = prot_sg;
cmd->tvc_upages = pages;
cmd->tvc_se_cmd.map_tag = tag;
cmd->tvc_tag = v_req->tag;
@@ -1692,6 +1696,7 @@ static void tcm_vhost_free_cmd_map_res(struct 
tcm_vhost_nexus *nexus,
tv_cmd = &((struct tcm_vhost_cmd *)se_sess->sess_cmd_map)[i];
 
kfree(tv_cmd->tvc_sgl);
+   kfree(tv_cmd->tvc_prot_sgl);
kfree(tv_cmd->tvc_upages);
}
 }
@@ -1750,6 +1755,14 @@ static int tcm_vhost_make_nexus(struct tcm_vhost_tpg 
*tpg,
pr_err("Unable to allocate tv_cmd->tvc_upages\n");
goto out;
}
+
+   tv_cmd->tvc_prot_sgl = kzalloc(sizeof(struct scatterlist) *
+   TCM_VHOST_PREALLOC_PROT_SGLS, 
GFP_KERNEL);
+   if (!tv_cmd->tvc_prot_sgl) {
+   mutex_unlock(&tpg->tv_tpg_mutex);
+   pr_err("Unable to allocate tv_cmd->tvc_prot_sgl\n");
+   goto out;
+   }
}
/*
 * Since we are running in 'demo mode' this call with generate a
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFCv2 4/7] vhost/scsi: Add T10 PI IOV -> SGL memory mapping logic

2014-03-16 Thread Nicholas A. Bellinger

From: Nicholas Bellinger 

This patch adds vhost_scsi_map_iov_to_prot() to perform the mapping of
T10 data integrity memory between virtio iov + struct scatterlist using
get_user_pages_fast() following existing code.

As with vhost_scsi_map_iov_to_sgl(), this does sanity checks against the
total prot_sgl_count vs. pre-allocated SGLs, and loops across protection
iovs using vhost_scsi_map_to_sgl() to perform the actual memory mapping.

Also update tcm_vhost_release_cmd() to release associated tvc_prot_sgl[]
struct page.

Cc: Michael S. Tsirkin 
Cc: Paolo Bonzini 
Cc: Martin K. Petersen 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Sagi Grimberg 
Cc: H. Peter Anvin 
Signed-off-by: Nicholas Bellinger 
---
 drivers/vhost/scsi.c |   48 +++-
 1 files changed, 47 insertions(+), 1 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index a2cb289..f720709 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -80,6 +80,7 @@ struct tcm_vhost_cmd {
u64 tvc_tag;
/* The number of scatterlists associated with this cmd */
u32 tvc_sgl_count;
+   u32 tvc_prot_sgl_count;
/* Saved unpacked SCSI LUN for tcm_vhost_submission_work() */
u32 tvc_lun;
/* Pointer to the SGL formatted memory from virtio-scsi */
@@ -458,12 +459,16 @@ static void tcm_vhost_release_cmd(struct se_cmd *se_cmd)
struct tcm_vhost_cmd *tv_cmd = container_of(se_cmd,
struct tcm_vhost_cmd, tvc_se_cmd);
struct se_session *se_sess = se_cmd->se_sess;
+   int i;
 
if (tv_cmd->tvc_sgl_count) {
-   u32 i;
for (i = 0; i < tv_cmd->tvc_sgl_count; i++)
put_page(sg_page(&tv_cmd->tvc_sgl[i]));
}
+   if (tv_cmd->tvc_prot_sgl_count) {
+   for (i = 0; i < tv_cmd->tvc_prot_sgl_count; i++)
+   put_page(sg_page(&tv_cmd->tvc_prot_sgl[i]));
+   }
 
tcm_vhost_put_inflight(tv_cmd->inflight);
percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
@@ -856,6 +861,47 @@ vhost_scsi_map_iov_to_sgl(struct tcm_vhost_cmd *cmd,
return 0;
 }
 
+static int
+vhost_scsi_map_iov_to_prot(struct tcm_vhost_cmd *cmd,
+  struct iovec *iov,
+  int niov,
+  bool write)
+{
+   struct scatterlist *prot_sg = cmd->tvc_prot_sgl;
+   unsigned int prot_sgl_count = 0;
+   int ret, i;
+
+   for (i = 0; i < niov; i++)
+   prot_sgl_count += iov_num_pages(&iov[i]);
+
+   if (prot_sgl_count > TCM_VHOST_PREALLOC_PROT_SGLS) {
+   pr_err("vhost_scsi_map_iov_to_prot() sgl_count: %u greater than"
+   " preallocated TCM_VHOST_PREALLOC_PROT_SGLS: %u\n",
+   prot_sgl_count, TCM_VHOST_PREALLOC_PROT_SGLS);
+   return -ENOBUFS;
+   }
+
+   pr_debug("%s prot_sg %p prot_sgl_count %u\n", __func__,
+prot_sg, prot_sgl_count);
+   sg_init_table(prot_sg, prot_sgl_count);
+   cmd->tvc_prot_sgl_count = prot_sgl_count;
+
+   for (i = 0; i < niov; i++) {
+   ret = vhost_scsi_map_to_sgl(cmd, prot_sg, prot_sgl_count, 
&iov[i],
+   cmd->tvc_upages, write);
+   if (ret < 0) {
+   for (i = 0; i < cmd->tvc_prot_sgl_count; i++)
+   put_page(sg_page(&cmd->tvc_prot_sgl[i]));
+
+   cmd->tvc_prot_sgl_count = 0;
+   return ret;
+   }
+   prot_sg += ret;
+   prot_sgl_count -= ret;
+   }
+   return 0;
+}
+
 static void tcm_vhost_submission_work(struct work_struct *work)
 {
struct tcm_vhost_cmd *cmd =
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFCv2 6/7] vhost/scsi: Add new VIRTIO_SCSI_F_T10_PI feature bit

2014-03-16 Thread Nicholas A. Bellinger

From: Nicholas Bellinger 

This patch adds a VIRTIO_SCSI_F_T10_PI feature bit for signaling
host support of accepting T10 protection information SGLs from
virtio-scsi guest.

Cc: Michael S. Tsirkin 
Cc: Paolo Bonzini 
Cc: Martin K. Petersen 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Sagi Grimberg 
Cc: H. Peter Anvin 
Signed-off-by: Nicholas Bellinger 
---
 drivers/vhost/scsi.c|3 ++-
 include/linux/virtio_scsi.h |1 +
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 00903dc..f9bbc5e6 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -169,7 +169,8 @@ enum {
 };
 
 enum {
-   VHOST_SCSI_FEATURES = VHOST_FEATURES | (1ULL << VIRTIO_SCSI_F_HOTPLUG)
+   VHOST_SCSI_FEATURES = VHOST_FEATURES | (1ULL << VIRTIO_SCSI_F_HOTPLUG) |
+  (1ULL << VIRTIO_SCSI_F_T10_PI)
 };
 
 #define VHOST_SCSI_MAX_TARGET  256
diff --git a/include/linux/virtio_scsi.h b/include/linux/virtio_scsi.h
index 4dc5998..e674b2b 100644
--- a/include/linux/virtio_scsi.h
+++ b/include/linux/virtio_scsi.h
@@ -109,6 +109,7 @@ struct virtio_scsi_config {
 #define VIRTIO_SCSI_F_INOUT0
 #define VIRTIO_SCSI_F_HOTPLUG  1
 #define VIRTIO_SCSI_F_CHANGE   2
+#define VIRTIO_SCSI_F_T10_PI   3
 
 /* Response codes */
 #define VIRTIO_SCSI_S_OK   0
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFCv2 2/7] vhost/scsi: Move sanity check into vhost_scsi_map_iov_to_sgl

2014-03-16 Thread Nicholas A. Bellinger

From: Nicholas Bellinger 

Move the overflow check for sgl_count > TCM_VHOST_PREALLOC_SGLS into
vhost_scsi_map_iov_to_sgl() so that it's based on the total number
of SGLs for all IOVs, instead of single IOVs.

Also, rename TCM_VHOST_PREALLOC_PAGES -> TCM_VHOST_PREALLOC_UPAGES
to better describe pointers to user-space pages.

Cc: Michael S. Tsirkin 
Cc: Paolo Bonzini 
Cc: Martin K. Petersen 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Signed-off-by: Nicholas Bellinger 
---
 drivers/vhost/scsi.c |   59 +
 1 files changed, 25 insertions(+), 34 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 0a025b8..8c88ce9 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -57,7 +57,7 @@
 #define TCM_VHOST_MAX_CDB_SIZE 32
 #define TCM_VHOST_DEFAULT_TAGS 256
 #define TCM_VHOST_PREALLOC_SGLS 2048
-#define TCM_VHOST_PREALLOC_PAGES 2048
+#define TCM_VHOST_PREALLOC_UPAGES 2048
 
 struct vhost_scsi_inflight {
/* Wait for the flush operation to finish */
@@ -762,35 +762,28 @@ vhost_scsi_map_to_sgl(struct tcm_vhost_cmd *tv_cmd,
  struct scatterlist *sgl,
  unsigned int sgl_count,
  struct iovec *iov,
- int write)
+ struct page **pages,
+ bool write)
 {
unsigned int npages = 0, pages_nr, offset, nbytes;
struct scatterlist *sg = sgl;
void __user *ptr = iov->iov_base;
size_t len = iov->iov_len;
-   struct page **pages;
int ret, i;
 
-   if (sgl_count > TCM_VHOST_PREALLOC_SGLS) {
-   pr_err("vhost_scsi_map_to_sgl() psgl_count: %u greater than"
-  " preallocated TCM_VHOST_PREALLOC_SGLS: %u\n",
-   sgl_count, TCM_VHOST_PREALLOC_SGLS);
-   return -ENOBUFS;
-   }
-
pages_nr = iov_num_pages(iov);
-   if (pages_nr > sgl_count)
+   if (pages_nr > sgl_count) {
+   pr_err("vhost_scsi_map_to_sgl() pages_nr: %u greater than"
+  " sgl_count: %u\n", pages_nr, sgl_count);
return -ENOBUFS;
-
-   if (pages_nr > TCM_VHOST_PREALLOC_PAGES) {
+   }
+   if (pages_nr > TCM_VHOST_PREALLOC_UPAGES) {
pr_err("vhost_scsi_map_to_sgl() pages_nr: %u greater than"
-  " preallocated TCM_VHOST_PREALLOC_PAGES: %u\n",
-   pages_nr, TCM_VHOST_PREALLOC_PAGES);
+  " preallocated TCM_VHOST_PREALLOC_UPAGES: %u\n",
+   pages_nr, TCM_VHOST_PREALLOC_UPAGES);
return -ENOBUFS;
}
 
-   pages = tv_cmd->tvc_upages;
-
ret = get_user_pages_fast((unsigned long)ptr, pages_nr, write, pages);
/* No pages were pinned */
if (ret < 0)
@@ -820,33 +813,32 @@ out:
 static int
 vhost_scsi_map_iov_to_sgl(struct tcm_vhost_cmd *cmd,
  struct iovec *iov,
- unsigned int niov,
- int write)
+ int niov,
+ bool write)
 {
-   int ret;
-   unsigned int i;
-   u32 sgl_count;
-   struct scatterlist *sg;
+   struct scatterlist *sg = cmd->tvc_sgl;
+   unsigned int sgl_count = 0;
+   int ret, i;
 
-   /*
-* Find out how long sglist needs to be
-*/
-   sgl_count = 0;
for (i = 0; i < niov; i++)
sgl_count += iov_num_pages(&iov[i]);
 
-   /* TODO overflow checking */
+   if (sgl_count > TCM_VHOST_PREALLOC_SGLS) {
+   pr_err("vhost_scsi_map_iov_to_sgl() sgl_count: %u greater than"
+   " preallocated TCM_VHOST_PREALLOC_SGLS: %u\n",
+   sgl_count, TCM_VHOST_PREALLOC_SGLS);
+   return -ENOBUFS;
+   }
 
-   sg = cmd->tvc_sgl;
pr_debug("%s sg %p sgl_count %u\n", __func__, sg, sgl_count);
sg_init_table(sg, sgl_count);
-
cmd->tvc_sgl_count = sgl_count;
 
-   pr_debug("Mapping %u iovecs for %u pages\n", niov, sgl_count);
+   pr_debug("Mapping iovec %p for %u pages\n", &iov[0], sgl_count);
+
for (i = 0; i < niov; i++) {
ret = vhost_scsi_map_to_sgl(cmd, sg, sgl_count, &iov[i],
-   write);
+   cmd->tvc_upages, write);
if (ret < 0) {
for (i = 0; i < cmd->tvc_sgl_count; i++)
put_page(sg_page(&cmd->tvc_sgl[i]));
@@ -854,7 +846,6 @@ vhost_scsi_map_iov_to_sgl(struct tcm_vhost_cmd *cmd,
cmd->tvc_sgl_count = 0;
return ret;
}
-
sg += ret;
sgl_count -= ret;
}
@@ -1753,7 +1744,7 @@ static int tcm_vhost_make_nexus(struct tcm_vhost_tpg *tpg,
}
 
tv_cmd->tvc_upages = kzalloc(sizeof(str

[RFCv2 7/7] virtio-scsi: Enable DIF/DIX modes in SCSI host LLD

2014-03-16 Thread Nicholas A. Bellinger

From: Nicholas Bellinger 

This patch updates virtscsi_probe() to setup necessary Scsi_Host
level protection resources. (currently hardcoded to 1)

It changes virtscsi_add_cmd() to attach outgoing / incoming
protection SGLs preceeding the data payload, and is using the
new virtio_scsi_cmd_req_pi->d[oi],pi_niv field to signal
to signal to vhost/scsi how many prot_sgs to expect.

v2 changes:
  - Make protection buffer come before data buffer (Paolo)
  - Enable virtio_scsi_cmd_req_pi usage (Paolo)

Cc: Paolo Bonzini 
Cc: Michael S. Tsirkin 
Cc: Martin K. Petersen 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Sagi Grimberg 
Cc: H. Peter Anvin 
Signed-off-by: Nicholas Bellinger 
---
 drivers/scsi/virtio_scsi.c |   79 ++--
 1 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 16bfd50..4cccfed 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -37,6 +37,7 @@ struct virtio_scsi_cmd {
struct completion *comp;
union {
struct virtio_scsi_cmd_req   cmd;
+   struct virtio_scsi_cmd_req_picmd_pi;
struct virtio_scsi_ctrl_tmf_req  tmf;
struct virtio_scsi_ctrl_an_req   an;
} req;
@@ -440,7 +441,7 @@ static int virtscsi_add_cmd(struct virtqueue *vq,
size_t req_size, size_t resp_size, gfp_t gfp)
 {
struct scsi_cmnd *sc = cmd->sc;
-   struct scatterlist *sgs[4], req, resp;
+   struct scatterlist *sgs[6], req, resp;
struct sg_table *out, *in;
unsigned out_num = 0, in_num = 0;
 
@@ -458,16 +459,24 @@ static int virtscsi_add_cmd(struct virtqueue *vq,
sgs[out_num++] = &req;
 
/* Data-out buffer.  */
-   if (out)
+   if (out) {
+   /* Place WRITE protection SGLs before Data OUT payload */
+   if (scsi_prot_sg_count(sc))
+   sgs[out_num++] = scsi_prot_sglist(sc);
sgs[out_num++] = out->sgl;
+   }
 
/* Response header.  */
sg_init_one(&resp, &cmd->resp, resp_size);
sgs[out_num + in_num++] = &resp;
 
/* Data-in buffer */
-   if (in)
+   if (in) {
+   /* Place READ protection SGLs before Data IN payload */
+   if (scsi_prot_sg_count(sc))
+   sgs[out_num + in_num++] = scsi_prot_sglist(sc);
sgs[out_num + in_num++] = in->sgl;
+   }
 
return virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, gfp);
 }
@@ -492,12 +501,36 @@ static int virtscsi_kick_cmd(struct virtio_scsi_vq *vq,
return err;
 }
 
+static void virtio_scsi_init_hdr(struct virtio_scsi_cmd_req *cmd,
+struct scsi_cmnd *sc)
+{
+   cmd->lun[0] = 1;
+   cmd->lun[1] = sc->device->id;
+   cmd->lun[2] = (sc->device->lun >> 8) | 0x40;
+   cmd->lun[3] = sc->device->lun & 0xff;
+   cmd->tag = (unsigned long)sc;
+   cmd->task_attr = VIRTIO_SCSI_S_SIMPLE;
+   cmd->prio = 0;
+   cmd->crn = 0;
+}
+
+static void virtio_scsi_init_hdr_pi(struct virtio_scsi_cmd_req_pi *cmd_pi,
+   struct scsi_cmnd *sc)
+{
+   virtio_scsi_init_hdr((struct virtio_scsi_cmd_req *)cmd_pi, sc);
+
+   if (sc->sc_data_direction == DMA_TO_DEVICE)
+   cmd_pi->do_pi_niov = scsi_prot_sg_count(sc);
+   else if (sc->sc_data_direction == DMA_FROM_DEVICE)
+   cmd_pi->di_pi_niov = scsi_prot_sg_count(sc);
+}
+
 static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
 struct virtio_scsi_vq *req_vq,
 struct scsi_cmnd *sc)
 {
struct virtio_scsi_cmd *cmd;
-   int ret;
+   int ret, req_size;
 
struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
BUG_ON(scsi_sg_count(sc) > shost->sg_tablesize);
@@ -515,22 +548,20 @@ static int virtscsi_queuecommand(struct virtio_scsi 
*vscsi,
 
memset(cmd, 0, sizeof(*cmd));
cmd->sc = sc;
-   cmd->req.cmd = (struct virtio_scsi_cmd_req){
-   .lun[0] = 1,
-   .lun[1] = sc->device->id,
-   .lun[2] = (sc->device->lun >> 8) | 0x40,
-   .lun[3] = sc->device->lun & 0xff,
-   .tag = (unsigned long)sc,
-   .task_attr = VIRTIO_SCSI_S_SIMPLE,
-   .prio = 0,
-   .crn = 0,
-   };
 
BUG_ON(sc->cmd_len > VIRTIO_SCSI_CDB_SIZE);
-   memcpy(cmd->req.cmd.cdb, sc->cmnd, sc->cmd_len);
 
-   if (virtscsi_kick_cmd(req_vq, cmd,
- sizeof cmd->req.cmd, sizeof cmd->resp.cmd,
+   if (scsi_prot_sg_count(sc)) {
+   virtio_scsi_init_hdr_pi(&cmd->req.cmd_pi, sc);
+   memcpy(cmd->req.cmd_pi.cdb, sc->cmnd, sc->cmd_len);
+   req_size = sizeof(cmd->req.cmd_pi);
+   } else {
+   virtio_scsi_init_hdr(&cmd-

Re: [PATCH v4 1/1] drm/i915: Enabling 128x128 and 256x256 ARGB Cursor Support

2014-03-16 Thread Sagar Arun Kamble

Gentle reminder for reviewing this and i-g-t patch.

On Mon, 2014-03-10 at 17:06 +0530, sagar.a.kam...@intel.com wrote:
> From: Sagar Kamble 
> 
> With this patch we allow larger cursor planes of sizes 128x128
> and 256x256.
> 
> v2: Added more precise check on size while setting cursor plane.
> 
> v3: Changes related to restructuring cursor size restrictions
> and DRM_DEBUG usage.
> 
> v4: Indentation related changes for setting cursor control and
> implementing DRM_CAP_CURSOR_WIDTH and DRM_CAP_CURSOR_HEIGHT
> 
> Testcase: igt/kms_cursor_crc
> Cc: Daniel Vetter 
> Cc: Jani Nikula 
> Cc: David Airlie 
> Cc: dri-de...@lists.freedesktop.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: G, Pallavi 
> Signed-off-by: Sagar Kamble 
> ---
>  drivers/gpu/drm/i915/i915_reg.h  |  4 +++
>  drivers/gpu/drm/i915/intel_display.c | 53 
> 
>  drivers/gpu/drm/i915/intel_drv.h |  7 +
>  3 files changed, 59 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 146609a..aee8258 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -3551,7 +3551,11 @@ enum punit_power_well {
>  /* New style CUR*CNTR flags */
>  #define   CURSOR_MODE0x27
>  #define   CURSOR_MODE_DISABLE   0x00
> +#define   CURSOR_MODE_128_32B_AX 0x02
> +#define   CURSOR_MODE_256_32B_AX 0x03
>  #define   CURSOR_MODE_64_32B_AX 0x07
> +#define   CURSOR_MODE_128_ARGB_AX ((1 << 5) | CURSOR_MODE_128_32B_AX)
> +#define   CURSOR_MODE_256_ARGB_AX ((1 << 5) | CURSOR_MODE_256_32B_AX)
>  #define   CURSOR_MODE_64_ARGB_AX ((1 << 5) | CURSOR_MODE_64_32B_AX)
>  #define   MCURSOR_PIPE_SELECT(1 << 28)
>  #define   MCURSOR_PIPE_A 0x00
> diff --git a/drivers/gpu/drm/i915/intel_display.c 
> b/drivers/gpu/drm/i915/intel_display.c
> index 0868afb..ec6a073 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -7440,10 +7440,26 @@ static void i9xx_update_cursor(struct drm_crtc *crtc, 
> u32 base)
>   bool visible = base != 0;
>  
>   if (intel_crtc->cursor_visible != visible) {
> + int16_t width = intel_crtc->cursor_width;
>   uint32_t cntl = I915_READ(CURCNTR(pipe));
>   if (base) {
>   cntl &= ~(CURSOR_MODE | MCURSOR_PIPE_SELECT);
> - cntl |= CURSOR_MODE_64_ARGB_AX | MCURSOR_GAMMA_ENABLE;
> + cntl |= MCURSOR_GAMMA_ENABLE;
> +
> + switch (width) {
> + case 64:
> + cntl |= CURSOR_MODE_64_ARGB_AX;
> + break;
> + case 128:
> + cntl |= CURSOR_MODE_128_ARGB_AX;
> + break;
> + case 256:
> + cntl |= CURSOR_MODE_256_ARGB_AX;
> + break;
> + default:
> + WARN_ON(1);
> + return;
> + }
>   cntl |= pipe << 28; /* Connect to correct pipe */
>   } else {
>   cntl &= ~(CURSOR_MODE | MCURSOR_GAMMA_ENABLE);
> @@ -7468,10 +7484,25 @@ static void ivb_update_cursor(struct drm_crtc *crtc, 
> u32 base)
>   bool visible = base != 0;
>  
>   if (intel_crtc->cursor_visible != visible) {
> + int16_t width = intel_crtc->cursor_width;
>   uint32_t cntl = I915_READ(CURCNTR_IVB(pipe));
>   if (base) {
>   cntl &= ~CURSOR_MODE;
> - cntl |= CURSOR_MODE_64_ARGB_AX | MCURSOR_GAMMA_ENABLE;
> + cntl |= MCURSOR_GAMMA_ENABLE;
> + switch (width) {
> + case 64:
> + cntl |= CURSOR_MODE_64_ARGB_AX;
> + break;
> + case 128:
> + cntl |= CURSOR_MODE_128_ARGB_AX;
> + break;
> + case 256:
> + cntl |= CURSOR_MODE_256_ARGB_AX;
> + break;
> + default:
> + WARN_ON(1);
> + return;
> + }
>   } else {
>   cntl &= ~(CURSOR_MODE | MCURSOR_GAMMA_ENABLE);
>   cntl |= CURSOR_MODE_DISABLE;
> @@ -7567,9 +7598,11 @@ static int intel_crtc_cursor_set(struct drm_crtc *crtc,
>   goto finish;
>   }
>  
> - /* Currently we only support 64x64 cursors */
> - if (width != 64 || height != 64) {
> - DRM_ERROR("we currently only support 64x64 cursors\n");
> + /* Check for which cursor types we support */
> + if (!((width == 64 && height == 64) ||
> + (width == 128 && height == 128 &&

Re: [PATCH] virtio-blk: make the queue depth the max supportable by the hypervisor

2014-03-16 Thread tytso

On Mon, Mar 17, 2014 at 11:12:15AM +1030, Rusty Russell wrote:
> 
> Note that with indirect descriptors (which is supported by Almost
> Everyone), we can actually use the full index, so this value is a bit
> pessimistic.  But it's OK as a starting point.

So is this something that can go upstream with perhaps a slight
adjustment in the commit description?  Do you think we need to be able
to dynamically adjust the queue depth after the module has been loaded
or the kernel has been booted?  If so, anyone a hint about the best
way to do that would be much appreciated.

Thanks,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the tip tree with the arm-soc tree

2014-03-16 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the tip tree got a conflict in
drivers/clocksource/Kconfig between commit 3f8e8cee2f4b ("clocksource:
qcom: Move clocksource code out of mach-msm") from the arm-soc tree and
commit fd3f1270d237 ("clocksource: Add Kconfig entries for CMT, MTU2, TMU
and STI") from the tip tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/clocksource/Kconfig
index 6510ec4f45ff,4f754a972139..
--- a/drivers/clocksource/Kconfig
+++ b/drivers/clocksource/Kconfig
@@@ -141,5 -141,46 +141,49 @@@ config VF_PIT_TIME
help
  Support for Period Interrupt Timer on Freescale Vybrid Family SoCs.
  
 +config CLKSRC_QCOM
 +  bool
++
+ config SYS_SUPPORTS_SH_CMT
+ bool
+ 
+ config SYS_SUPPORTS_SH_MTU2
+ bool
+ 
+ config SYS_SUPPORTS_SH_TMU
+ bool
+ 
+ config SYS_SUPPORTS_EM_STI
+ bool
+ 
+ config SH_TIMER_CMT
+   bool "Renesas CMT timer driver" if COMPILE_TEST
+   default SYS_SUPPORTS_SH_CMT
+   help
+ This enables build of a clocksource and clockevent driver for
+ the Compare Match Timer (CMT) hardware available in 16/32/48-bit
+ variants on a wide range of Mobile and Automotive SoCs from Renesas.
+ 
+ config SH_TIMER_MTU2
+   bool "Renesas MTU2 timer driver" if COMPILE_TEST
+   default SYS_SUPPORTS_SH_MTU2
+   help
+ This enables build of a clockevent driver for the Multi-Function
+ Timer Pulse Unit 2 (TMU2) hardware available on SoCs from Renesas.
+ This hardware comes with 16 bit-timer registers.
+ 
+ config SH_TIMER_TMU
+   bool "Renesas TMU timer driver" if COMPILE_TEST
+   default SYS_SUPPORTS_SH_TMU
+   help
+ This enables build of a clocksource and clockevent driver for
+ the 32-bit Timer Unit (TMU) hardware available on a wide range
+ SoCs from Renesas.
+ 
+ config EM_TIMER_STI
+   bool "Renesas STI timer driver" if COMPILE_TEST
+   default SYS_SUPPORTS_EM_STI
+   help
+ This enables build of a clocksource and clockevent driver for
+ the 48-bit System Timer (STI) hardware available on a SoCs
+ such as EMEV2 from former NEC Electronics.


pgp1tr25mJXOL.pgp
Description: PGP signature

linux-next: manual merge of the tip tree with the arm-soc tree

2014-03-16 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the tip tree got a conflict in
arch/arm/mach-zynq/Kconfig between commit ddb902cc3459 ("ARM: centralize
common multi-platform kconfig options") from the arm-soc tree and commit
cd325295871f ("arm: zynq: Add support for cpufreq") from the tip tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/arm/mach-zynq/Kconfig
index 105d39b72a25,f03e75bd0b2b..
--- a/arch/arm/mach-zynq/Kconfig
+++ b/arch/arm/mach-zynq/Kconfig
@@@ -2,10 -2,19 +2,12 @@@ config ARCH_ZYN
bool "Xilinx Zynq ARM Cortex A9 Platform" if ARCH_MULTI_V7
select ARM_AMBA
select ARM_GIC
+   select ARCH_HAS_CPUFREQ
+   select ARCH_HAS_OPP
 -  select COMMON_CLK
 -  select CPU_V7
 -  select GENERIC_CLOCKEVENTS
select HAVE_ARM_SCU if SMP
select HAVE_ARM_TWD if SMP
select ICST
 -  select MIGHT_HAVE_CACHE_L2X0
 -  select USE_OF
 -  select HAVE_SMP
 -  select SPARSE_IRQ
select CADENCE_TTC_TIMER
-   select ARM_GLOBAL_TIMER
+   select ARM_GLOBAL_TIMER if !CPU_FREQ
help
  Support for Xilinx Zynq ARM Cortex A9 Platform


pgpus4txwVwm3.pgp
Description: PGP signature

Re: [PATCHv2 3/8] devfreq: exynos4: Add ppmu's clock control and code clean about regulator control

2014-03-16 Thread Chanwoo Choi

Hi Tomasz,

On 03/17/2014 11:51 AM, Chanwoo Choi wrote:
> Hi Tomasz,
> 
> On 03/15/2014 02:42 AM, Tomasz Figa wrote:
>> Hi Chanwoo,
>>
>> On 13.03.2014 09:17, Chanwoo Choi wrote:
>>> There are not the clock controller of ppmudmc0/1. This patch control the 
>>> clock
>>> of ppmudmc0/1 which is used for monitoring memory bus utilization.
>>>
>>> Also, this patch code clean about regulator control and free resource
>>> when calling exit/remove function.
>>>
>>> For example,
>>> busfreq@106A {
>>> compatible = "samsung,exynos4x12-busfreq";
>>>
>>> /* Clock for PPMUDMC0/1 */
>>> clocks = <&clock CLK_PPMUDMC0>, <&clock CLK_PPMUDMC1>;
>>> clock-names = "ppmudmc0", "ppmudmc1";
>>>
>>> /* Regulator for MIF/INT block */
>>> vdd_mif-supply = <&buck1_reg>;
>>> vdd_int-supply = <&buck3_reg>;
>>> };
>>>
>>> Signed-off-by: Chanwoo Choi 
>>> ---
>>>   drivers/devfreq/exynos/exynos4_bus.c | 114 
>>> ++-
>>>   1 file changed, 100 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/devfreq/exynos/exynos4_bus.c 
>>> b/drivers/devfreq/exynos/exynos4_bus.c
>>> index 1a0effa..a2a3a47 100644
>>> --- a/drivers/devfreq/exynos/exynos4_bus.c
>>> +++ b/drivers/devfreq/exynos/exynos4_bus.c
>>> @@ -62,6 +62,11 @@ enum exynos_ppmu_idx {
>>>   PPMU_END,
>>>   };
>>>
>>> +static const char *exynos_ppmu_clk_name[] = {
>>> +[PPMU_DMC0]= "ppmudmc0",
>>> +[PPMU_DMC1]= "ppmudmc1",
>>> +};
>>> +
>>>   #define EX4210_LV_MAXLV_2
>>>   #define EX4x12_LV_MAXLV_4
>>>   #define EX4210_LV_NUM(LV_2 + 1)
>>> @@ -86,6 +91,7 @@ struct busfreq_data {
>>>   struct regulator *vdd_mif; /* Exynos4412/4212 only */
>>>   struct busfreq_opp_info curr_oppinfo;
>>>   struct exynos_ppmu ppmu[PPMU_END];
>>> +struct clk *clk_ppmu[PPMU_END];
>>>
>>>   struct notifier_block pm_notifier;
>>>   struct mutex lock;
>>> @@ -722,8 +728,26 @@ static int exynos4_bus_get_dev_status(struct device 
>>> *dev,
>>>   static void exynos4_bus_exit(struct device *dev)
>>>   {
>>>   struct busfreq_data *data = dev_get_drvdata(dev);
>>> +int i;
>>> +
>>> +/*
>>> + * Un-map memory map and disable regulator/clocks
>>> + * to prevent power leakage.
>>> + */
>>> +regulator_disable(data->vdd_int);
>>> +if (data->type == TYPE_BUSF_EXYNOS4x12)
>>> +regulator_disable(data->vdd_mif);
>>> +
>>> +for (i = 0; i < PPMU_END; i++) {
>>> +if (data->clk_ppmu[i])
>>
>> This check is invalid. Clock pointers must be checked for validity using the 
>> IS_ERR() macro, because NULL is a valid clock pointer value indicating a 
>> dummy clock.
> 
> OK, I'll check it by using the IS_ERR() macro as following:
> 

I'll modify it as following:

for (i = 0; i < PPMU_END; i++) {
if (IS_ERR(data->clk_ppmu[i])
continue;
else
clk_unprepare_disable(data->clk_ppmu[i]);
}


>   if (IS_ERR(data->clk_ppmu[i]) {
> 
> 
>>
>>> +clk_disable_unprepare(data->clk_ppmu[i]);
>>> +}
>>>
>>> -devfreq_unregister_opp_notifier(dev, data->devfreq);
>>> +for (i = 0; i < PPMU_END; i++) {
>>> +if (data->ppmu[i].hw_base)
>>
>> Can this even happen? Is there a PPMU without registers?

OK, I'll always unmap the ppmu address.

>>
>>> +iounmap(data->ppmu[i].hw_base);
>>> +
>>> +}
>>>   }
>>>
>>>   static struct devfreq_dev_profile exynos4_devfreq_profile = {
>>> @@ -987,6 +1011,7 @@ static int exynos4_busfreq_parse_dt(struct 
>>> busfreq_data *data)
>>>   {
>>>   struct device *dev = data->dev;
>>>   struct device_node *np = dev->of_node;
>>> +const char **clk_name = exynos_ppmu_clk_name;
>>>   int i, ret;
>>>
>>>   if (!np) {
>>> @@ -1005,8 +1030,70 @@ static int exynos4_busfreq_parse_dt(struct 
>>> busfreq_data *data)
>>>   }
>>>   }
>>>
>>> +/*
>>> + * Get PPMU's clocks to control them. But, if PPMU's clocks
>>> + * is default 'pass' state, this driver don't need control
>>> + * PPMU's clock.
>>> + */
>>> +for (i = 0; i < PPMU_END; i++) {
>>> +data->clk_ppmu[i] = devm_clk_get(dev, clk_name[i]);
>>> +if (IS_ERR_OR_NULL(data->clk_ppmu[i])) {
>>
>> Again, this check is invalid. Only IS_ERR() is the correct way to check 
>> whether returned clock pointer is valid.
> 
> ditto.
>   if (IS_ERR(data->clk_ppmu[i]) {
> 
>>
>>> +dev_warn(dev, "Cannot get %s clock\n", clk_name[i]);
>>> +data->clk_ppmu[i] = NULL;
>>
>> This assignment is wrong. To allow further checking whether the clock was 
>> found the value returned from devm_clk_get() must be retained and then 
>> IS_ERR() used in further code.
>>
>> However, I believe it should be an error if a clock is not provided. The 
>> driver must make sure that PPMU clocks are ungated before trying to access 
>> them, otherwise the system might hang.
> 
> OK, I'll use IS_ERR() macro when checki

linux-next: manual merge of the tip tree with the renesas tree

2014-03-16 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the tip tree got a conflict in
arch/arm/mach-shmobile/Kconfig between commit 4a51856b4267 ("ARM:
shmobile: Use 64-bit dma_addr_t on r8a7790/r8a7791") from the renesas
tree and commit aeb8fb7910fc ("ARM: shmobile: Remove CMT, TMU and STI
Kconfig entries") from the tip tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/arm/mach-shmobile/Kconfig
index 5249ff0511a8,f6db7dcae3f4..
--- a/arch/arm/mach-shmobile/Kconfig
+++ b/arch/arm/mach-shmobile/Kconfig
@@@ -116,7 -137,7 +130,8 @@@ config ARCH_R8A779
select MIGHT_HAVE_PCI
select SH_CLK_CPG
select RENESAS_IRQC
 +  select ARCH_DMA_ADDR_T_64BIT if ARM_LPAE
+   select SYS_SUPPORTS_SH_CMT
  
  config ARCH_R8A7791
bool "R-Car M2 (R8A77910)"
@@@ -126,7 -147,7 +141,8 @@@
select MIGHT_HAVE_PCI
select SH_CLK_CPG
select RENESAS_IRQC
 +  select ARCH_DMA_ADDR_T_64BIT if ARM_LPAE
+   select SYS_SUPPORTS_SH_CMT
  
  config ARCH_EMEV2
bool "Emma Mobile EV2"


pgpY7DEncSBie.pgp
Description: PGP signature

[RFC PATCH v2] edac: synopsys: Added EDAC support for zynq ddr ecc controller

2014-03-16 Thread Punnaiah Choudary Kalluri

Added EDAC support for reporting the ecc errors of synopsys ddr controller.
The ddr ecc controller corrects single bit errors and detects double bit
errors

Signed-off-by: Punnaiah Choudary Kalluri 
---
Changes for v2:
- Updated the commit header and message
- Renamed the filenames to synopsys_edac
- Corrected the compatilble string, commnets
- Renamed the macros,fucntions and data structures
---
 .../devicetree/bindings/edac/synopsys_edac.txt |   18 +
 drivers/edac/Kconfig   |7 +
 drivers/edac/Makefile  |1 +
 drivers/edac/synopsys_edac.c   |  614 
 4 files changed, 640 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/edac/synopsys_edac.txt
 create mode 100644 drivers/edac/synopsys_edac.c

diff --git a/Documentation/devicetree/bindings/edac/synopsys_edac.txt 
b/Documentation/devicetree/bindings/edac/synopsys_edac.txt
new file mode 100644
index 000..c4a559b
--- /dev/null
+++ b/Documentation/devicetree/bindings/edac/synopsys_edac.txt
@@ -0,0 +1,18 @@
+Synopsys EDAC driver, it does reports the DDR ECC single bit errors that are
+corrected and double bit ecc errors that are detected by the DDR ECC 
controller.
+ECC support for DDR is available in half-bus width(16 bit) configuration only.
+
+Required properties:
+- compatible: Should be "xlnx,zynq-ddrc-1.04"
+- reg: Should contain DDR controller registers location and length.
+
+Example:
+
+
+ddrc0: ddrc@f8006000 {
+   compatible = "xlnx,zynq-ddrc-1.04";
+   reg = <0xf8006000 0x1000>;
+};
+
+Synopsys EDAC driver detects the DDR ECC enable state by reading the 
appropriate
+control register.
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 878f090..58b69b1 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -368,4 +368,11 @@ config EDAC_OCTEON_PCI
  Support for error detection and correction on the
  Cavium Octeon family of SOCs.
 
+config EDAC_SYNOPSYS
+   tristate "Synopsys DDR Memory Controller"
+   depends on EDAC_MM_EDAC && ARCH_ZYNQ
+   help
+ This enables support for EDAC on the ECC memory used
+ with the Synopsys DDR memory controller.
+
 endif # EDAC
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
index 4154ed6..5628a6f 100644
--- a/drivers/edac/Makefile
+++ b/drivers/edac/Makefile
@@ -64,3 +64,4 @@ obj-$(CONFIG_EDAC_OCTEON_PC)  += octeon_edac-pc.o
 obj-$(CONFIG_EDAC_OCTEON_L2C)  += octeon_edac-l2c.o
 obj-$(CONFIG_EDAC_OCTEON_LMC)  += octeon_edac-lmc.o
 obj-$(CONFIG_EDAC_OCTEON_PCI)  += octeon_edac-pci.o
+obj-$(CONFIG_EDAC_SYNOPSYS)+= synopsys_edac.o
diff --git a/drivers/edac/synopsys_edac.c b/drivers/edac/synopsys_edac.c
new file mode 100644
index 000..7cec331
--- /dev/null
+++ b/drivers/edac/synopsys_edac.c
@@ -0,0 +1,614 @@
+/*
+ * Synopsys DDR ECC Driver
+ * This driver is based on ppc4xx_edac.c drivers
+ *
+ * Copyright (C) 2012 - 2014 Xilinx, Inc.
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+
+#include "edac_core.h"
+
+/* Number of cs_rows needed per memory controller */
+#define SYNOPSYS_EDAC_NR_CSROWS1
+
+/* Number of channels per memory controller */
+#define SYNOPSYS_EDAC_NR_CHANS 1
+
+/* Granularity of reported error in bytes */
+#define SYNOPSYS_EDAC_ERROR_GRAIN  1
+
+#define SYNOPSYS_EDAC_MESSAGE_SIZE 256
+
+/* Synopsys DDR memory controller registers that are relevant to ECC */
+#define SYNOPSYS_DDRC_CONTROL_REG_OFFSET   0x0 /* Control regsieter */
+#define SYNOPSYS_DDRC_T_ZQ_REG_OFFSET  0xA4 /* ZQ register */
+
+/* ECC control register */
+#define SYNOPSYS_DDRC_ECC_CONTROL_REG_OFFSET   0xC4
+/* ECC log register */
+#define SYNOPSYS_DDRC_ECC_CE_LOG_REG_OFFSET0xC8
+/* ECC address register */
+#define SYNOPSYS_DDRC_ECC_CE_ADDR_REG_OFFSET   0xCC
+/* ECC data[31:0] register */
+#define SYNOPSYS_DDRC_ECC_CE_DATA_31_0_REG_OFFSET  0xD0
+
+/* Uncorrectable error info regsisters */
+#define SYNOPSYS_DDRC_ECC_UE_LOG_REG_OFFSET0xDC /* ECC log register */
+#define SYNOPSYS_DDRC_ECC_UE_ADDR_REG_OFFSET   0xE0 /* ECC address register */
+#define SYNOPSYS_DDRC_ECC_UE_DATA_31_0_REG_OFFSET  0xE4 /* ECC data reg */
+
+#define SYNOPSYS_DDRC_ECC_STAT_REG_OFFSET  0xF0 /* ECC stats

Re: [PATCHv2 8/8] devfreq: exynos4: Add busfreq driver for exynos4210/exynos4x12

2014-03-16 Thread Chanwoo Choi

Hi Tomasz,

On 03/15/2014 02:35 AM, Tomasz Figa wrote:
> Hi Chanwoo, Mark,
> 
> On 14.03.2014 11:56, Chanwoo Choi wrote:
>> Hi Mark,
>>
>> On 03/14/2014 07:35 PM, Mark Rutland wrote:
>>> On Fri, Mar 14, 2014 at 07:14:37AM +, Chanwoo Choi wrote:
 Hi Mark,

 On 03/14/2014 02:53 AM, Mark Rutland wrote:
> On Thu, Mar 13, 2014 at 08:17:29AM +, Chanwoo Choi wrote:
>> This patch add busfreq driver for Exynos4210/Exynos4x12 memory interface
>> and bus to support DVFS(Dynamic Voltage Frequency Scaling) according to 
>> PPMU
>> counters. PPMU (Performance Profiling Monitorings Units) of Exynos4 SoC 
>> provides
>> PPMU counters for DMC(Dynamic Memory Controller) to check memory bus 
>> utilization
>> and then busfreq driver adjusts dynamically the operating 
>> frequency/voltage
>> by using DEVFREQ Subsystem.
>>
>> Signed-off-by: Chanwoo Choi 
>> ---
>>   .../devicetree/bindings/devfreq/exynos4_bus.txt| 49 
>> ++
>>   1 file changed, 49 insertions(+)
>>   create mode 100644 
>> Documentation/devicetree/bindings/devfreq/exynos4_bus.txt
>>
>> diff --git a/Documentation/devicetree/bindings/devfreq/exynos4_bus.txt 
>> b/Documentation/devicetree/bindings/devfreq/exynos4_bus.txt
>> new file mode 100644
>> index 000..2a83fcc
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/devfreq/exynos4_bus.txt
>> @@ -0,0 +1,49 @@
>> +
>> +Exynos4210/4x12 busfreq driver
>> +-
>> +
>> +Exynos4210/4x12 Soc busfreq driver with devfreq for Memory bus 
>> frequency/voltage
>> +scaling according to PPMU counters of memory controllers
>> +
>> +Required properties:
>> +- compatible: should contain Exynos4 SoC type as follwoing:
>> +  - "samsung,exynos4x12-busfreq" for Exynos4x12
>> +  - "samsung,exynos4210-busfreq" for Exynos4210
>
> Is there a device called "busfreq"? What device does this binding
> describe?

 I'll add detailed description of busfreq as following:

 "busfreq(bus frequendcy)" driver means that busfreq driver control 
 dynamically
 memory bus frequency/voltage by checking memory bus utilization to optimize
 power-consumption. When checking memeory bus utilization, exynos4_busfreq 
 driver
 would use PPMU(Performance Profiling Monitoring Units).
>>>
>>> This still sounds like a description of the _driver_, not the _device_.
>>> The binding should describe the hardware, now the high level abstraction
>>> that software is going to build atop of it.
>>>
>>> It sounds like this is a binding for the DMC PPMU?
>>>
>>> Is the PPMU a component of the DMC, or is it bolted on the side?
>>
>> PPMU(Performance Profiling Monitoring Unit) is to profile performance event 
>> of
>> various IP on Exynos4. Each PPMU provide perforamnce event for each IP.
>> We can check various PPMU as following:
>>
>> PPMU_3D
>> PPMU_ACP
>> PPMU_CAMIF
>> PPMU_CPU
>> PPMU_DMC0
>> PPMU_DMC1
>> PPMU_FSYS
>> PPMU_IMAGE
>> PPMU_LCD0
>> PPMU_LCD1
>> PPMU_MFC_L
>> PPMU_MFC_R
>> PPMU_TV
>> PPMU_LEFT_BUS
>> PPMU_RIGHT_BUS
>>
>> DMC (Dynamic Memory Controller) control the operation of DRAM in Exynos4 SoC.
>> If we need to get memory bust utilization of DMC, we can get memory bus 
>> utilization
>> from PPMU_DMC0/PPMU_DMC1.
>>
>> So, Exynos4's busfreq used two(PPMU_DMC0/PPMU_DMC1) among upper various PPMU 
>> list.
> 
> Well, PPMUs and DMCs are separate hardware blocks found inside Exynos SoCs. 
> Busfreq/devfreq is just a Linux-specific abstraction responsible for 
> collecting data using PPMUs and controlling frequencies and voltages of 
> appropriate power planes, vdd_int responsible for powering DMC0 and DMC1 
> blocks in this case.
> 

I knew already.

> I'm afraid that the binding you're proposing is unfortunately incorrect, 
> because it represents the software abstraction, not the real hardware.

What is exactly incorrect part of this patch?

> 
> Instead, this should be separated into several independent bindings:
> 
>  - PPMU bindings to list all the PPMU instances present in the SoC and 
> resources they need,
> 
>  - power plane bindings, which define a power plane in which multiple IP 
> blocks might reside, can be monitored by one or more PPMU units and frequency 
> and voltage of which can be configured according to determined performance 
> level. Needed resources will be clocks and regulators to scale and probably 
> also operating points.
> 
> Then, exynos-busfreq driver should bind to such power planes, parse necessary 
> data from DT (list of PPMUs and IP blocks, clocks, regulators and operating 
> points) and register a devfreq entity.

What is 'power plane'? I don't know 'power plane'.
If you suggest 'power plane' concept and then merge this concept to mainline,
After merged 'power plane' concept, I will apply 'power plane' concept to 
Exynos4's b

Re: [PATCHv2 4/8] devfreq: exynos4: Fix bug of resource leak and code clean on probe()

2014-03-16 Thread Chanwoo Choi

Hi Tomasz,

On 03/15/2014 02:49 AM, Tomasz Figa wrote:
> Hi Chanwoo,
> 
> On 13.03.2014 09:17, Chanwoo Choi wrote:
>> This patch fix bug about resource leak when happening probe fail and code 
>> clean
>> to add debug message.
>>
>> Signed-off-by: Chanwoo Choi 
>> ---
>>   drivers/devfreq/exynos/exynos4_bus.c | 32 ++--
>>   1 file changed, 26 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/devfreq/exynos/exynos4_bus.c 
>> b/drivers/devfreq/exynos/exynos4_bus.c
>> index a2a3a47..152a3e9 100644
>> --- a/drivers/devfreq/exynos/exynos4_bus.c
>> +++ b/drivers/devfreq/exynos/exynos4_bus.c
>> @@ -1152,8 +1152,11 @@ static int exynos4_busfreq_probe(struct 
>> platform_device *pdev)
>>   dev_err(dev, "Cannot determine the device id %d\n", data->type);
>>   err = -EINVAL;
>>   }
>> -if (err)
>> +if (err) {
>> +dev_err(dev, "Cannot initialize busfreq table %d\n",
>> + data->type);
>>   return err;
>> +}
>>
>>   rcu_read_lock();
>>   opp = dev_pm_opp_find_freq_floor(dev,
>> @@ -1176,7 +1179,7 @@ static int exynos4_busfreq_probe(struct 
>> platform_device *pdev)
>>   if (IS_ERR(data->devfreq)) {
>>   dev_err(dev, "Failed to add devfreq device\n");
>>   err = PTR_ERR(data->devfreq);
>> -goto err_opp;
>> +goto err_devfreq;
>>   }
>>
>>   /*
>> @@ -1185,18 +1188,35 @@ static int exynos4_busfreq_probe(struct 
>> platform_device *pdev)
>>*/
>>   busfreq_mon_reset(data);
>>
>> -devfreq_register_opp_notifier(dev, data->devfreq);
>> +/* Register opp_notifier for Exynos4 busfreq */
>> +err = devfreq_register_opp_notifier(dev, data->devfreq);
>> +if (err < 0) {
>> +dev_err(dev, "Failed to register opp notifier\n");
>> +goto err_notifier_opp;
>> +}
>>
>> +/* Register pm_notifier for Exynos4 busfreq */
>>   err = register_pm_notifier(&data->pm_notifier);
>>   if (err) {
>>   dev_err(dev, "Failed to setup pm notifier\n");
>> -devfreq_remove_device(data->devfreq);
>> -return err;
>> +goto err_notifier_pm;
>>   }
>>
>>   return 0;
>>
>> -err_opp:
>> +err_notifier_pm:
>> +devfreq_unregister_opp_notifier(dev, data->devfreq);
>> +err_notifier_opp:
>> +/*
>> + * The devfreq_remove_device() would execute finally devfreq->profile
>> + * ->exit(). To avoid duplicate resource free operation, return directly
>> + * before executing resource free below 'err_devfreq' goto statement.
>> + */
> 
> I'm not quite sure about this. I believe that in this case 
> devfreq->profile->exit() would be exynos4_bus_exit() and all it does is 
> devfreq_unregister_opp_notifier(dev, data->devfreq), so all remaining 
> resources (regulators, clocks, etc.) would get leaked.

This patch execute following sequence to probe exynos4_busfreq.c:

1. Parse dt node to get resource(regulator/clock/memory address).
2. Enable regulator/clock and map memory.
3. Add devfreq device using devfreq_add_device().
   The devfreq_add_device() return devfreq instance(data->devfreq).
4. Register opp_notifier using devfreq instance(data->devfreq) which is created 
in sequence #3.

Case 1,
If an error happens in sequence #3 for registering devfreq_add_device(),

this case can't execute devfreq->profile->exit() to free resource
because this case has failed to register devfreq->profile to devfreq_list.

So, must goto 'err_devfreq' statement to free resource(regulator/clock/memory).


Case 2,
If an error happens in sequence #4 for registering opp_notifier,

In contrast
this case can execute devfreq->profile->exit() to free resource.
But, After executed devfreq->profile->exit(),
should not execute 'err_devfreq' statement to free resource.
In case, will operate twice of resource.

If my explanation is wrong, please reply your comment.

> 
> I believe the correct thing to do would be to remove the .exit() callback 
> from exynos4_devfreq_profile struct and handle all the clean-up here in error 
> path.
> 

Best Regards,
Chanwoo Choi





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] virtio-blk: make the queue depth configurable

2014-03-16 Thread Joe Perches

On Mon, 2014-03-17 at 14:25 +1030, Rusty Russell wrote:

> Erk, our tests are insufficient.  Testbuilding an allmodconfig with this
> now:

Good idea.

> diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
[]
> @@ -188,6 +188,9 @@ struct kparam_array
>   /* Default value instead of permissions? */ \
>   static int __param_perm_check_##name __attribute__((unused)) =  \
>   BUILD_BUG_ON_ZERO((perm) < 0 || (perm) > 0777 || ((perm) & 2))  \
> + /* User perms >= group perms >= other perms. */ \
> + + BUILD_BUG_ON_ZERO(((perm) >> 6) < (((perm) >> 3) & 7))\
> + + BUILD_BUG_ON_ZEROperm) >> 3) & 7) < ((perm) & 7)) \
>   + BUILD_BUG_ON_ZERO(sizeof(""prefix) > MAX_PARAM_PREFIX_LEN);   \
>   static const char __param_str_##name[] = prefix #name;  \
>   static struct kernel_param __moduleparam_const __param_##name   \

It might make sense to separate this octal permissions
test into a new macro for other checks in macros like
CLASS_ATTR, DEVICE_ATTR, SENSOR_ATTR and SENSOR_ATTR_2.

Maybe something like:

#define VERIFY_OCTAL_PERMISSIONS(perm)  \
static int __param_perm_check_##name __attribute__((unused)) =  \
BUILD_BUG_ON_ZERO((perm) < 0 || (perm) > 0777 || ((perm) & 2))  \
/* User perms >= group perms >= other perms. */ \
+ BUILD_BUG_ON_ZERO(((perm) >> 6) < (((perm) >> 3) & 7))\
+ BUILD_BUG_ON_ZEROperm) >> 3) & 7) < ((perm) & 7));\

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Question] Linux CFS sched_entity

2014-03-16 Thread lwcheng


Hi,

I am studying Linux CFS recently. After numerous searches, I am still
unable to get rid of the following question. Finally I decide to bother
this group for the answer.

In group scheduling, sched_entity can represent both "task group" and "task",
indicated by "se->my_q".
When CFS tries to pick a new task, it first selects a "task group" and then
selects a "task" from that "task group":
--
static struct task_struct *pick_next_task_fair(struct rq *rq)
{
struct task_struct *p;
struct cfs_rq *cfs_rq = &rq->cfs;
struct sched_entity *se;

... ...
do {
se = pick_next_entity(cfs_rq);
set_next_entity(cfs_rq, se);
cfs_rq = group_cfs_rq(se);
} while (cfs_rq);

p = task_of(se);
... ...
}
--
Therefore, cfs_rq->curr should *always* represent a "task" here, right?

However, cfs_rq->curr seems *not* to be a "task" sometimes:
--
static void update_curr(struct cfs_rq *cfs_rq)
{
struct sched_entity *curr = cfs_rq->curr;

... ...
if (entity_is_task(curr)) {  /* WHY? */
struct task_struct *curtask = task_of(curr);

trace_sched_stat_runtime(curtask, delta_exec, curr->vruntime);
cpuacct_charge(curtask, delta_exec);
account_group_exec_runtime(curtask, delta_exec);
}
... ...
}
--

My question is:
In what situations does cfs_rq->curr point to a task_group?
What's the sense? I thought it should always contain a "real" task.

Appreciated for anyone who is able to help. Thanks in advance!

Regards,
Luwei Cheng
--
CS Student
The University of Hong Kong
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/5] perf tests: Add tip/pid mmap automated tests

2014-03-16 Thread Namhyung Kim

Hi Jiri,

On Fri, 14 Mar 2014 15:00:02 +0100, Jiri Olsa wrote:
> +static int thread_init(struct thread_data *td)
> +{
> + void *map;
> +
> + map = mmap(NULL, page_size, PROT_READ|PROT_WRITE,
> +MAP_SHARED|MAP_ANONYMOUS, -1, 0);

Shouldn't it be an executable mapping to be found by MAP__FUNCTION?

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: zynq: Add OCM driver

2014-03-16 Thread Olof Johansson

On Wed, Mar 12, 2014 at 01:00:51PM +0100, Michal Simek wrote:
> Hi Olof,
> 
> >> diff --git a/Documentation/devicetree/bindings/arm/zynq/xlnx,zynq-ocm.txt 
> >> b/Documentation/devicetree/bindings/arm/zynq/xlnx,zynq-ocm.txt
> >> new file mode 100644
> >> index 000..64cb5e8
> >> --- /dev/null
> >> +++ b/Documentation/devicetree/bindings/arm/zynq/xlnx,zynq-ocm.txt
> >> @@ -0,0 +1,17 @@
> >> +Device tree bindings for Zynq's OCM
> >> +
> >> +The OCM is divided to 4 64kB segments which can be separately configured
> >> +to low or high location. Location is controlled via SLCR.
> >> +
> >> +Required properties:
> >> + compatible: Compatibility string. Must be "xlnx,zynq-ocm-1.0".
> >> + reg: Specify the base and size of the OCM registers in the memory map.
> >> +  E.g.: reg = <0xf800c000 0x1000>;
> >> +
> >> +Example:
> >> +ocmc: ocmc@f800c000 {
> >> +   compatible =  "xlnx,zynq-ocm-1.0";
> >> +   interrupt-parent = <&intc>;
> >> +   interrupts = <0 3 4>;
> >> +   reg = <0xf800c000 0x1000>;
> >> +} ;
> >> diff --git a/arch/arm/boot/dts/zynq-7000.dtsi 
> >> b/arch/arm/boot/dts/zynq-7000.dtsi
> >> index 1d942e2..4929be5 100644
> >> --- a/arch/arm/boot/dts/zynq-7000.dtsi
> >> +++ b/arch/arm/boot/dts/zynq-7000.dtsi
> >> @@ -66,6 +66,13 @@
> >> cache-level = <2>;
> >> };
> >>
> >> +   ocmc: ocmc@f800c000 {
> >> +   compatible = "xlnx,zynq-ocm-1.0";
> >> +   interrupt-parent = <&intc>;
> >> +   interrupts = <0 3 4>;
> >> +   reg = <0xf800c000 0x1000>;
> >> +   } ;
> >> +
> >> uart0: uart@e000 {
> >> compatible = "xlnx,xuartps";
> >> status = "disabled";
> >> diff --git a/arch/arm/mach-zynq/Kconfig b/arch/arm/mach-zynq/Kconfig
> >> index 323e505..f3e6ce4 100644
> >> --- a/arch/arm/mach-zynq/Kconfig
> >> +++ b/arch/arm/mach-zynq/Kconfig
> >> @@ -15,5 +15,6 @@ config ARCH_ZYNQ
> >> select CADENCE_TTC_TIMER
> >> select ARM_GLOBAL_TIMER
> >> select MFD_SYSCON
> >> +   select GENERIC_ALLOCATOR
> >> help
> >>   Support for Xilinx Zynq ARM Cortex A9 Platform
> >> diff --git a/arch/arm/mach-zynq/Makefile b/arch/arm/mach-zynq/Makefile
> >> index 1b25d92..626f64b 100644
> >> --- a/arch/arm/mach-zynq/Makefile
> >> +++ b/arch/arm/mach-zynq/Makefile
> >> @@ -3,7 +3,7 @@
> >>  #
> >>
> >>  # Common support
> >> -obj-y  := common.o slcr.o
> >> +obj-y  := common.o slcr.o zynq_ocm.o
> >>  CFLAGS_REMOVE_hotplug.o=-march=armv6k
> >>  CFLAGS_hotplug.o   =-Wa,-march=armv7-a -mcpu=cortex-a9
> >>  obj-$(CONFIG_HOTPLUG_CPU)  += hotplug.o
> >> diff --git a/arch/arm/mach-zynq/common.h b/arch/arm/mach-zynq/common.h
> >> index b097844..953f6a1 100644
> >> --- a/arch/arm/mach-zynq/common.h
> >> +++ b/arch/arm/mach-zynq/common.h
> >> @@ -24,6 +24,7 @@ extern int zynq_early_slcr_init(void);
> >>  extern void zynq_slcr_system_reset(void);
> >>  extern void zynq_slcr_cpu_stop(int cpu);
> >>  extern void zynq_slcr_cpu_start(int cpu);
> >> +extern u32 zynq_slcr_get_ocm_config(void);
> >>
> >>  #ifdef CONFIG_SMP
> >>  extern void secondary_startup(void);
> >> diff --git a/arch/arm/mach-zynq/slcr.c b/arch/arm/mach-zynq/slcr.c
> >> index c1f1499..9a37ab3 100644
> >> --- a/arch/arm/mach-zynq/slcr.c
> >> +++ b/arch/arm/mach-zynq/slcr.c
> >> @@ -26,6 +26,7 @@
> >>  #define SLCR_PS_RST_CTRL_OFFSET0x200 /* PS Software Reset 
> >> Control */
> >>  #define SLCR_A9_CPU_RST_CTRL_OFFSET0x244 /* CPU Software Reset 
> >> Control */
> >>  #define SLCR_REBOOT_STATUS_OFFSET  0x258 /* PS Reboot Status */
> >> +#define SLCR_OCM_CFG_OFFSET0x910 /* OCM Address Mapping */
> >>
> >>  #define SLCR_UNLOCK_MAGIC  0xDF0D
> >>  #define SLCR_A9_CPU_CLKSTOP0x10
> >> @@ -107,6 +108,20 @@ void zynq_slcr_system_reset(void)
> >>  }
> >>
> >>  /**
> >> + * zynq_slcr_get_ocm_config - Get SLCR OCM config
> >> + *
> >> + * return: OCM config bits
> >> + */
> >> +u32 zynq_slcr_get_ocm_config(void)
> >> +{
> >> +   u32 val;
> >> +
> >> +   zynq_slcr_read(&val, SLCR_OCM_CFG_OFFSET);
> >> +
> >> +   return val;
> >> +}
> >> +
> >> +/**
> >>   * zynq_slcr_cpu_start - Start cpu
> >>   * @cpu:   cpu number
> >>   */
> >> diff --git a/arch/arm/mach-zynq/zynq_ocm.c b/arch/arm/mach-zynq/zynq_ocm.c
> >> new file mode 100644
> >> index 000..034a65b
> >> --- /dev/null
> >> +++ b/arch/arm/mach-zynq/zynq_ocm.c
> >> @@ -0,0 +1,243 @@
> >> +/*
> >> + * Copyright (C) 2013 Xilinx
> >> + *
> >> + * Based on "Generic on-chip SRAM allocation driver"
> > 
> > We're not adding new drivers under arch/arm, so if you need this
> > driver then you should either merge it under drivers/ somewhere, or
> > look at extending the generic driver in a way that you can reuse it.
> 
> Driver i

Re: [PATCH] dma: Add Keystone Packet DMA Engine driver

2014-03-16 Thread Vinod Koul

On Thu, Mar 13, 2014 at 05:16:52AM +0800, Santosh Shilimkar wrote:
> On Thursday 13 March 2014 12:00 AM, Vinod Koul wrote:
> > On Wed, Mar 12, 2014 at 03:50:32AM +0800, Santosh Shilimkar wrote:
> >> On Tuesday 11 March 2014 06:23 PM, Vinod Koul wrote:
> >>> On Fri, Feb 28, 2014 at 05:56:40PM -0500, Santosh Shilimkar wrote:
>  From: Sandeep Nair 
> 
>  The Packet DMA driver sets up the dma channels and flows for the
>  QMSS(Queue Manager SubSystem) who triggers the actual data movements
>  across clients using destination queues. Every client modules like
>  NETCP(Network Coprocessor), SRIO(Serial Rapid IO) and CRYPTO
>  Engines has its own instance of packet dma hardware. QMSS has also
>  an internal packet DMA module which is used as an infrastructure
>  DMA with zero copy.
> 
>  Patch adds DMAEngine driver for Keystone Packet DMA hardware.
>  The specifics on the device tree bindings for Packet DMA can be
>  found in:
>   Documentation/devicetree/bindings/dma/keystone-pktdma.txt
> 
>  The driver implements the configuration functions using standard 
>  DMAEngine
>  apis. The data movement is managed by QMSS device driver.
> >>> Pls use subsystem appropriate name so here would have been dmaengine: ...
> >>>
> >>> So i am still missing stuff like prepare calls, irq, descriptor 
> >>> management to
> >>> call this a dmaengine driver.
> >>>
> >>> I guess you need to explain a bit more why the data movement is handled 
> >>> by some
> >>> other driver and not by this one
> >>>
> >> To expand above statement, Packet DMA hardware blocks on Keystone SOCs
> >> are DMAEngines. QMSS is centralised subsystem manages multiple 
> >> functionalities
> >> including triggering dma transfers, descriptor management and completion
> >> irqs. There are separate instance of packet DMA hardware block per client
> >> device. We program the DMA hardware to allocate channels and flows. So
> >> the packet DMA resouces like dma channels and dma flows are configured
> >> and managed through the DMAEngine driver. Thats why we implement only
> >> device_alloc_chan_resources, device_free_chan_resources and device_control
> >> DMAEngine APIs.
> > Sorry am bit lost. If its a dmaengine then you still need to program the
> > transfers, how does that part work?
> >
> To simplify this bit more, you can think of this as DMA channels, flows
> are allocated and DMA channels are enabled by DMA engine and they remains
> enabled always as long as the channel in use. Enablling dma channel
> actually don't start the DMA transfer but just sets up the connection/pipe
> with peripheral and memory and vice a versa.
> 
> All the descriptor management, triggering, sending completion interrupt or
> hardware signal to DMAEngine all managed by centralised QMSS.
> 
> Actual copy of data is still done by DMA hardware but its completely
> transparent to software. DMAEngine hardware takes care of that in the
> backyard.
So you will use the dmaengine just for setting up the controller. Not for actual
transfers. Those would be governed by the QMSS, right?

This means that someone expecting to use dmaengine API will get confused about
this and doing part (alloc) thru dmaengine and rest (transfers) using some other
API. This brings to me the design approach, does it really make sense creating
dmaengine driver for this when we are not fully complying to the API

-- 
~Vinod

> 
> 
>  +#define BITS(x) (BIT(x) - 1)
> >>> this might get confusing, perhaps a better name could be given?
> >>>
> >> Would "BIT_MASK" be ok with you ?
> > something which would imply its x -1, am not really good with name so no
> > suggestions :)
> > 
> me too. BIT_INVERT_MASK() ?
> 
>  +
>  +#define BUILD_CHECK_REGS()  
>  \
>  +do {
>  \
>  +BUILD_BUG_ON(sizeof(struct reg_global)   != 32);
>  \
>  +BUILD_BUG_ON(sizeof(struct reg_chan) != 32);
>  \
>  +BUILD_BUG_ON(sizeof(struct reg_rx_flow)  != 32);
>  \
>  +BUILD_BUG_ON(sizeof(struct reg_tx_sched) !=  4);
>  \
>  +} while (0)
> >>> why is this required, do you want to use __packed__ to ensure right size?
> >>>
> >> This is just to ensure the register sanity. We should use __packed__ as
> >> well. We can take the BUILD_CHECK_REGS() out if you don't prefer it.
> > putting packed ensures so no need for this check.
> > 
> OK

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] virtio-blk: make the queue depth configurable

2014-03-16 Thread Rusty Russell

Theodore Ts'o  writes:
> On Fri, Mar 14, 2014 at 10:38:40AM -0700, Joe Perches wrote:
>> > +static int queue_depth = 64;
>> > +module_param(queue_depth, int, 444);
>> 
>> 444?  Really Ted?
>
> Oops, *blush*.   Thanks for catching that.

Erk, our tests are insufficient.  Testbuilding an allmodconfig with this
now:

diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
index 175f6995d1af..626b85888a6b 100644
--- a/include/linux/moduleparam.h
+++ b/include/linux/moduleparam.h
@@ -188,6 +188,9 @@ struct kparam_array
/* Default value instead of permissions? */ \
static int __param_perm_check_##name __attribute__((unused)) =  \
BUILD_BUG_ON_ZERO((perm) < 0 || (perm) > 0777 || ((perm) & 2))  \
+   /* User perms >= group perms >= other perms. */ \
+   + BUILD_BUG_ON_ZERO(((perm) >> 6) < (((perm) >> 3) & 7))\
+   + BUILD_BUG_ON_ZEROperm) >> 3) & 7) < ((perm) & 7)) \
+ BUILD_BUG_ON_ZERO(sizeof(""prefix) > MAX_PARAM_PREFIX_LEN);   \
static const char __param_str_##name[] = prefix #name;  \
static struct kernel_param __moduleparam_const __param_##name   \


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFA][PATCH 3/4] tracing/module: Remove include of tracepoint.h from module.h

2014-03-16 Thread Rusty Russell

Steven Rostedt  writes:
> On Wed, 26 Feb 2014 14:01:43 -0500
> Hi Rusty,
>
> This patch doesn't need to be stable, and can wait till v3.15. But I
> have other patches that will break with this patch (headers that needed
> to include tracepoint.h and not depend on a header chain to include it).
>
> Can you give me your Acked-by for this, and I'll just add it to my 3.15
> queue?

Cleaning up old mail, in case I didn't ack this:

Acked-by: Rusty Russell 

Cheers,
Rusty.

>> Cc: Rusty Russell 
>> Signed-off-by: Steven Rostedt 
>> ---
>>  include/linux/module.h | 1 -
>>  1 file changed, 1 deletion(-)
>> 
>> diff --git a/include/linux/module.h b/include/linux/module.h
>> index eaf60ff..6cc28d9 100644
>> --- a/include/linux/module.h
>> +++ b/include/linux/module.h
>> @@ -15,7 +15,6 @@
>>  #include 
>>  #include 
>>  #include 
>> -#include 
>>  #include 
>>  
>>  #include 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] module: LLVMLinux: Remove unused function warning from __param_check macro

2014-03-16 Thread Rusty Russell

beh...@converseincode.com writes:
> From: Mark Charlebois 
>
> This code makes a compile time type check that is optimized away. Clang
> complains that it generates an unused function:

Thanks, applied.

Cheers,
Rusty.

>
> linux/kernel/panic.c:471:1: warning: unused function '__check_panic'
>   [-Wunused-function]
> core_param(panic, panic_timeout, int, 0644);
> ^
> linux/moduleparam.h:283:2: note: expanded from macro
>   'core_param'
> param_check_##type(name, &(var));   \
> ^
> :87:1: note: expanded from here
> param_check_int
> ^
> linux/moduleparam.h:369:34: note: expanded from macro
>   'param_check_int'
> #define param_check_int(name, p) __param_check(name, p, int)
>  ^
> linux/moduleparam.h:349:22: note: expanded from macro
>   '__param_check'
> static inline type *__check_##name(void) { return(p); }
> ^
> :88:1: note: expanded from here
> __check_panic
>
> GCC won't complain for a static inline function but would if it was just
> a static function.
>
> Adding the unused attribute to the function declaration removes the warning.
> Per request from Rusty Russell it is marked as __always_unused as the code
> is meant to be optimized away.
>
> This code works for both GCC and clang.
>
> Signed-off-by: Mark Charlebois 
> Signed-off-by: Behan Webster 
> ---
>  include/linux/moduleparam.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
> index c3eb102..175f699 100644
> --- a/include/linux/moduleparam.h
> +++ b/include/linux/moduleparam.h
> @@ -346,7 +346,7 @@ static inline void destroy_params(const struct 
> kernel_param *params,
>  /* The macros to do compile-time type checking stolen from Jakub
> Jelinek, who IIRC came up with this idea for the 2.4 module init code. */
>  #define __param_check(name, p, type) \
> - static inline type *__check_##name(void) { return(p); }
> + static inline type __always_unused *__check_##name(void) { return(p); }
>  
>  extern struct kernel_param_ops param_ops_byte;
>  extern int param_set_byte(const char *val, const struct kernel_param *kp);
> -- 
> 1.8.3.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/2] kallsyms: handle special absolute symbols

2014-03-16 Thread Rusty Russell

Kees Cook  writes:
> On Wed, Mar 12, 2014 at 8:40 PM, Rusty Russell  wrote:
>> Kees Cook  writes:
>>> Why not just do this with 0-base-address detection like my v2? That
>>> would mean we don't need to remember to add this flag in the future to
>>> imagined new architectures that might want this 0-based per_cpu
>>> feature.
>>
>> Because future architectures will get this right and emit absolute
>> symbols.  I hope!
>>
>> I'm swamped at the moment, but am hoping to investigate that for
>> x86-64.  This is a stop-gap.
>
> Okay, I'm convinced. :)
>
> Acked-by: Kees Cook 
>
> Thanks!

Applied, thanks.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mmotm 2014-03-10-15-35 uploaded (virtio_balloon)

2014-03-16 Thread Rusty Russell

Josh Triplett  writes:
> On Tue, Mar 11, 2014 at 12:31:33PM -0700, Andrew Morton wrote:
> I'd love to do that, but as far as I can tell, VIRTIO_BALLOON has gone
> out of its way to support !CONFIG_BALLOON_COMPACTION.
>
> Could someone who works on VIRTIO_BALLOON provide some details here
> about the distinction?

Balloon gives pages back to the host.  If you want to do compaction,
we'll try to help you, but it's independent.

The normal way to do this would be to put a dummy inline version of
balloon_page_enqueue etc in the header.  If you look at how the virtio
balloon code looked before e22504296d4f64fbbbd741602ab47ee874649c18
you'll see what it should do, eg:

#ifndef CONFIG_BALLOON_COMPACTION
struct balloon_dev_info {
struct list_head pages; /* Pages enqueued & handled to Host */
};

static inline struct page *balloon_page_enqueue(struct balloon_dev_info 
*b_dev_info)
{
struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
   __GFP_NOMEMALLOC | __GFP_NOWARN)
if (page)
list_add(&page->lru, &b_dev_info->pages);

return page;
}

static inline struct page *balloon_page_dequeue(struct balloon_dev_info 
*b_dev_info)
{
struct page *page;
page = list_first_entry(&b_dev_info->pages, struct page, lru);
list_del(&page->lru);
return page;
}

static inline void balloon_page_free(struct page *page)
{
__free_page(page);
}
#else
...

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-16 Thread H. Peter Anvin

On 03/16/2014 09:12 PM, Sarah Newman wrote:
> 
> Unconditional eager allocation works. Can xen users count on this being 
> included and applied to the
> stable kernels?
> 

I don't know.  If we state that it is a bug fix for Xen it might be
possible, but it would be up to Greg (Cc:'d) and the rest of the stable
team.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 1/2] ACPICA: Dispatcher: Ignore SyncLevel for auto-serialization mechanism.

2014-03-16 Thread Lv Zheng

From: Robert Moore 

It is reported that the auto-serialization mechanism has broken some
machine.  This patch fixes the issues by igoring the SyncLevel
attribute of the marked Method.

References: http://www.spinics.net/lists/linux-acpi/msg49496.html
Reported-by: Valdis Kletnieks 
Reported-by: Sabrina Dubroka 
Signed-off-by: Bob Moore 
Signed-off-by: Lv Zheng 
---
 drivers/acpi/acpica/acobject.h |3 ++-
 drivers/acpi/acpica/dsmethod.c |   24 +++-
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/acpica/acobject.h b/drivers/acpi/acpica/acobject.h
index 1a4d618..22fb644 100644
--- a/drivers/acpi/acpica/acobject.h
+++ b/drivers/acpi/acpica/acobject.h
@@ -193,7 +193,8 @@ struct acpi_object_method {
 #define ACPI_METHOD_INTERNAL_ONLY   0x02   /* Method is implemented 
internally (_OSI) */
 #define ACPI_METHOD_SERIALIZED  0x04   /* Method is serialized */
 #define ACPI_METHOD_SERIALIZED_PENDING  0x08   /* Method is to be marked 
serialized */
-#define ACPI_METHOD_MODIFIED_NAMESPACE  0x10   /* Method modified the 
namespace */
+#define ACPI_METHOD_IGNORE_SYNC_LEVEL   0x10   /* Method was auto-serialized 
at table load time */
+#define ACPI_METHOD_MODIFIED_NAMESPACE  0x20   /* Method modified the 
namespace */
 
 /**
  *
diff --git a/drivers/acpi/acpica/dsmethod.c b/drivers/acpi/acpica/dsmethod.c
index 73764c7..3c7f737 100644
--- a/drivers/acpi/acpica/dsmethod.c
+++ b/drivers/acpi/acpica/dsmethod.c
@@ -175,8 +175,15 @@ acpi_ds_detect_named_opcodes(struct acpi_walk_state 
*walk_state,
 * At this point, we know we have a Named object opcode.
 * Mark the method as serialized. Later code will create a mutex for
 * this method to enforce serialization.
+*
+* Note, ACPI_METHOD_IGNORE_SYNC_LEVEL flag means that we will ignore 
the
+* Sync Level mechanism for this method, even though it is now 
serialized.
+* Otherwise, there can be conflicts with existing ASL code that 
actually
+* uses sync levels.
 */
-   walk_state->method_desc->method.info_flags |= ACPI_METHOD_SERIALIZED;
+   walk_state->method_desc->method.sync_level = 0;
+   walk_state->method_desc->method.info_flags |=
+   (ACPI_METHOD_SERIALIZED | ACPI_METHOD_IGNORE_SYNC_LEVEL);
 
ACPI_DEBUG_PRINT((ACPI_DB_INFO,
  "Method serialized [%4.4s] %p - [%s] (%4.4X)\n",
@@ -349,13 +356,19 @@ acpi_ds_begin_method_execution(struct acpi_namespace_node 
*method_node,
/*
 * The current_sync_level (per-thread) must be less than or 
equal to
 * the sync level of the method. This mechanism provides some
-* deadlock prevention
+* deadlock prevention.
+*
+* If the method was auto-serialized, we just ignore the sync 
level
+* mechanism, because auto-serialization of methods can 
interfere
+* with ASL code that actually uses sync levels.
 *
 * Top-level method invocation has no walk state at this point
 */
if (walk_state &&
-   (walk_state->thread->current_sync_level >
-obj_desc->method.mutex->mutex.sync_level)) {
+   (!(obj_desc->method.
+  info_flags & ACPI_METHOD_IGNORE_SYNC_LEVEL))
+   && (walk_state->thread->current_sync_level >
+   obj_desc->method.mutex->mutex.sync_level)) {
ACPI_ERROR((AE_INFO,
"Cannot acquire Mutex for method [%4.4s], 
current SyncLevel is too large (%u)",
acpi_ut_get_node_name(method_node),
@@ -800,7 +813,8 @@ acpi_ds_terminate_control_method(union acpi_operand_object 
*method_desc,
method_desc->method.info_flags &=
~ACPI_METHOD_SERIALIZED_PENDING;
method_desc->method.info_flags |=
-   ACPI_METHOD_SERIALIZED;
+   (ACPI_METHOD_SERIALIZED |
+ACPI_METHOD_IGNORE_SYNC_LEVEL);
method_desc->method.sync_level = 0;
}
 
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 2/2] ACPI: Default disable auto-serialization.

2014-03-16 Thread Lv Zheng

This feature enabled by the following commit is still under development.

  Commit: cd52379678785b02d7a357988cfba214fdaf92f4
  Subject: ACPICA: Add global option to disable method auto-serialization.
  This change adds an option to disable the auto-serialization of
  methods that create named objects.

This patch disables it by default temporarily according to the bug reports.

References: http://www.spinics.net/lists/linux-acpi/msg49496.html
Reported-by: Valdis Kletnieks 
Reported-by: Sabrina Dubroka 
Signed-off-by: Lv Zheng 
---
 Documentation/kernel-parameters.txt |8 
 drivers/acpi/acpica/acglobal.h  |2 +-
 drivers/acpi/osl.c  |   12 ++--
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 91f0be8..a159537 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -229,13 +229,13 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
use by PCI
Format: ,...
 
-   acpi_no_auto_serialize  [HW,ACPI]
-   Disable auto-serialization of AML methods
+   acpi_auto_serialize [HW,ACPI]
+   Enable auto-serialization of AML methods
AML control methods that contain the opcodes to create
named objects will be marked as "Serialized" by the
auto-serialization feature.
-   This feature is enabled by default.
-   This option allows to turn off the feature.
+   This feature is disabled by default.
+   This option allows to turn on the feature.
 
acpi_no_auto_ssdt   [HW,ACPI] Disable automatic loading of SSDT
 
diff --git a/drivers/acpi/acpica/acglobal.h b/drivers/acpi/acpica/acglobal.h
index 49bbc71..ea0f838 100644
--- a/drivers/acpi/acpica/acglobal.h
+++ b/drivers/acpi/acpica/acglobal.h
@@ -99,7 +99,7 @@ ACPI_INIT_GLOBAL(u8, acpi_gbl_enable_interpreter_slack, 
FALSE);
  * that create named objects are marked Serialized in order to prevent
  * possible run-time problems if they are entered by more than one thread.
  */
-ACPI_INIT_GLOBAL(u8, acpi_gbl_auto_serialize_methods, TRUE);
+ACPI_INIT_GLOBAL(u8, acpi_gbl_auto_serialize_methods, FALSE);
 
 /*
  * Create the predefined _OSI method in the namespace? Default is TRUE
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index d40d6dc..928f0c2 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -1538,20 +1538,20 @@ static int __init osi_setup(char *str)
 __setup("acpi_osi=", osi_setup);
 
 /*
- * Disable the auto-serialization of named objects creation methods.
+ * Enable the auto-serialization of named objects creation methods.
  *
- * This feature is enabled by default.  It marks the AML control methods
+ * This feature is disabled by default.  It marks the AML control methods
  * that contain the opcodes to create named objects as "Serialized".
  */
-static int __init acpi_no_auto_serialize_setup(char *str)
+static int __init acpi_auto_serialize_setup(char *str)
 {
-   acpi_gbl_auto_serialize_methods = FALSE;
-   pr_info("ACPI: auto-serialization disabled\n");
+   acpi_gbl_auto_serialize_methods = TRUE;
+   pr_info("ACPI: auto-serialization enabled\n");
 
return 1;
 }
 
-__setup("acpi_no_auto_serialize", acpi_no_auto_serialize_setup);
+__setup("acpi_auto_serialize", acpi_auto_serialize_setup);
 
 /* Check of resource interference between native drivers and ACPI
  * OperationRegions (SystemIO and System Memory only).
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] drm/exynos: Fix (more) freeing issues in exynos_drm_drv.c

2014-03-16 Thread Sachin Kamat

Hi Daniel,

Thanks for the patch.

On 17 March 2014 08:58, Daniel Kurtz  wrote:
> The following commit [0] fixed a use-after-free, but left the subdrv open
> in the error path.
>
> [0] commit 6ca605f7c70895a35737435f17ae9cc5e36f1466
> drm/exynos: Fix freeing issues in exynos_drm_drv.c
>
> Signed-off-by: Daniel Kurtz 

Acked-by: Sachin Kamat 

-- 
With warm regards,
Sachin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 0/2] ACPICA: Updates for auto-serialization mechanism.

2014-03-16 Thread Lv Zheng

It is reported that the auto-serialization mechanism has broken some
machine.  A patch in this series tries to fix the reported issue, it is
sent to the reporter for verification, thus marked as RFC.

Since this feature may still need to be validated, we can disable it
temporarily in order not to break users.  The patch to make it default
disabled is also marked as RFC.  We can determine if it is needed according
to the verification result.

References: https://bugzilla.kernel.org/show_bug.cgi?id=52191
References: http://www.spinics.net/lists/linux-acpi/msg49496.html

Lv Zheng (1):
  ACPI: Default disable auto-serialization.

Robert Moore (1):
  ACPICA: Dispatcher: Add sync_level support for auto-serialization
mechanism.

 Documentation/kernel-parameters.txt |8 
 drivers/acpi/acpica/acglobal.h  |2 +-
 drivers/acpi/acpica/acobject.h  |3 ++-
 drivers/acpi/acpica/dsmethod.c  |   24 +++-
 drivers/acpi/osl.c  |   12 ++--
 5 files changed, 32 insertions(+), 17 deletions(-)

-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-16 Thread Sarah Newman

On 03/16/2014 08:43 PM, H. Peter Anvin wrote:
> On 03/16/2014 08:35 PM, Sarah Newman wrote:
>> Can you please review my patch first?  It's only enabled when absolutely 
>> required.
> 
> It doesn't help.  It means you're running on Xen, and you will have
> processes subjected to random SIGKILL because they happen to touch the
> FPU when the atomic pool is low.
> 
> However, there is probably a happy medium: you don't actually need eager
> FPU restore, you just need eager FPU *allocation*.  We have been
> intending to allocate the FPU state at task creation time for eagerfpu,
> and Suresh Siddha has already produced such a patch; it just needs some
> minor fixups due to an __init failure.
> 
> http://lkml.kernel.org/r/1391325599.6481.5.camel@europa
> 
> In the Xen case we could turn on eager allocation but not eager fpu.  In
> fact, it might be justified to *always* do eager allocation...

Unconditional eager allocation works. Can xen users count on this being 
included and applied to the
stable kernels?

Thanks, Sarah
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 3/3] kmemleak: change some global variables to int

2014-03-16 Thread Li Zefan

They don't have to be atomic_t, because they are simple boolean
toggles.

Signed-off-by: Li Zefan 
---
 mm/kmemleak.c | 78 +--
 1 file changed, 39 insertions(+), 39 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 54270f2..c352c63 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -192,15 +192,15 @@ static struct kmem_cache *object_cache;
 static struct kmem_cache *scan_area_cache;
 
 /* set if tracing memory operations is enabled */
-static atomic_t kmemleak_enabled = ATOMIC_INIT(0);
+static int kmemleak_enabled;
 /* set in the late_initcall if there were no errors */
-static atomic_t kmemleak_initialized = ATOMIC_INIT(0);
+static int kmemleak_initialized;
 /* enables or disables early logging of the memory operations */
-static atomic_t kmemleak_early_log = ATOMIC_INIT(1);
+static int kmemleak_early_log = 1;
 /* set if a kmemleak warning was issued */
-static atomic_t kmemleak_warning = ATOMIC_INIT(0);
+static int kmemleak_warning;
 /* set if a fatal kmemleak error has occurred */
-static atomic_t kmemleak_error = ATOMIC_INIT(0);
+static int kmemleak_error;
 
 /* minimum and maximum address that may be valid pointers */
 static unsigned long min_addr = ULONG_MAX;
@@ -267,7 +267,7 @@ static void kmemleak_disable(void);
 #define kmemleak_warn(x...)do {\
pr_warning(x);  \
dump_stack();   \
-   atomic_set(&kmemleak_warning, 1);   \
+   kmemleak_warning = 1;   \
 } while (0)
 
 /*
@@ -805,7 +805,7 @@ static void __init log_early(int op_type, const void *ptr, 
size_t size,
unsigned long flags;
struct early_log *log;
 
-   if (atomic_read(&kmemleak_error)) {
+   if (kmemleak_error) {
/* kmemleak stopped recording, just count the requests */
crt_early_log++;
return;
@@ -840,7 +840,7 @@ static void early_alloc(struct early_log *log)
unsigned long flags;
int i;
 
-   if (!atomic_read(&kmemleak_enabled) || !log->ptr || IS_ERR(log->ptr))
+   if (!kmemleak_enabled || !log->ptr || IS_ERR(log->ptr))
return;
 
/*
@@ -893,9 +893,9 @@ void __ref kmemleak_alloc(const void *ptr, size_t size, int 
min_count,
 {
pr_debug("%s(0x%p, %zu, %d)\n", __func__, ptr, size, min_count);
 
-   if (atomic_read(&kmemleak_enabled) && ptr && !IS_ERR(ptr))
+   if (kmemleak_enabled && ptr && !IS_ERR(ptr))
create_object((unsigned long)ptr, size, min_count, gfp);
-   else if (atomic_read(&kmemleak_early_log))
+   else if (kmemleak_early_log)
log_early(KMEMLEAK_ALLOC, ptr, size, min_count);
 }
 EXPORT_SYMBOL_GPL(kmemleak_alloc);
@@ -919,11 +919,11 @@ void __ref kmemleak_alloc_percpu(const void __percpu 
*ptr, size_t size)
 * Percpu allocations are only scanned and not reported as leaks
 * (min_count is set to 0).
 */
-   if (atomic_read(&kmemleak_enabled) && ptr && !IS_ERR(ptr))
+   if (kmemleak_enabled && ptr && !IS_ERR(ptr))
for_each_possible_cpu(cpu)
create_object((unsigned long)per_cpu_ptr(ptr, cpu),
  size, 0, GFP_KERNEL);
-   else if (atomic_read(&kmemleak_early_log))
+   else if (kmemleak_early_log)
log_early(KMEMLEAK_ALLOC_PERCPU, ptr, size, 0);
 }
 EXPORT_SYMBOL_GPL(kmemleak_alloc_percpu);
@@ -939,9 +939,9 @@ void __ref kmemleak_free(const void *ptr)
 {
pr_debug("%s(0x%p)\n", __func__, ptr);
 
-   if (atomic_read(&kmemleak_enabled) && ptr && !IS_ERR(ptr))
+   if (kmemleak_enabled && ptr && !IS_ERR(ptr))
delete_object_full((unsigned long)ptr);
-   else if (atomic_read(&kmemleak_early_log))
+   else if (kmemleak_early_log)
log_early(KMEMLEAK_FREE, ptr, 0, 0);
 }
 EXPORT_SYMBOL_GPL(kmemleak_free);
@@ -959,9 +959,9 @@ void __ref kmemleak_free_part(const void *ptr, size_t size)
 {
pr_debug("%s(0x%p)\n", __func__, ptr);
 
-   if (atomic_read(&kmemleak_enabled) && ptr && !IS_ERR(ptr))
+   if (kmemleak_enabled && ptr && !IS_ERR(ptr))
delete_object_part((unsigned long)ptr, size);
-   else if (atomic_read(&kmemleak_early_log))
+   else if (kmemleak_early_log)
log_early(KMEMLEAK_FREE_PART, ptr, size, 0);
 }
 EXPORT_SYMBOL_GPL(kmemleak_free_part);
@@ -979,11 +979,11 @@ void __ref kmemleak_free_percpu(const void __percpu *ptr)
 
pr_debug("%s(0x%p)\n", __func__, ptr);
 
-   if (atomic_read(&kmemleak_enabled) && ptr && !IS_ERR(ptr))
+   if (kmemleak_enabled && ptr && !IS_ERR(ptr))
for_each_possible_cpu(cpu)
delete_object_full((unsigned long)per_cpu_ptr(ptr,
  cpu));
-   else if (atomic_read(&kmemleak_early_log))
+   else if (kmemleak_ear

[PATCH v2 2/3] kmemleak: remove redundant code

2014-03-16 Thread Li Zefan

- remove kmemleak_padding().
- remove kmemleak_release().

Signed-off-by: Li Zefan 
---
 include/linux/kmemleak.h | 2 --
 mm/kmemleak.c| 7 +--
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/include/linux/kmemleak.h b/include/linux/kmemleak.h
index 2a5e554..5bb4246 100644
--- a/include/linux/kmemleak.h
+++ b/include/linux/kmemleak.h
@@ -30,8 +30,6 @@ extern void kmemleak_alloc_percpu(const void __percpu *ptr, 
size_t size) __ref;
 extern void kmemleak_free(const void *ptr) __ref;
 extern void kmemleak_free_part(const void *ptr, size_t size) __ref;
 extern void kmemleak_free_percpu(const void __percpu *ptr) __ref;
-extern void kmemleak_padding(const void *ptr, unsigned long offset,
-size_t size) __ref;
 extern void kmemleak_not_leak(const void *ptr) __ref;
 extern void kmemleak_ignore(const void *ptr) __ref;
 extern void kmemleak_scan_area(const void *ptr, size_t size, gfp_t gfp) __ref;
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 7fc030e..54270f2 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1545,11 +1545,6 @@ static int kmemleak_open(struct inode *inode, struct 
file *file)
return seq_open(file, &kmemleak_seq_ops);
 }
 
-static int kmemleak_release(struct inode *inode, struct file *file)
-{
-   return seq_release(inode, file);
-}
-
 static int dump_str_object_info(const char *str)
 {
unsigned long flags;
@@ -1680,7 +1675,7 @@ static const struct file_operations kmemleak_fops = {
.read   = seq_read,
.write  = kmemleak_write,
.llseek = seq_lseek,
-   .release= kmemleak_release,
+   .release= seq_release,
 };
 
 /*
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 1/3] kmemleak: allow freeing internal objects after kmemleak was disabled

2014-03-16 Thread Li Zefan

Currently if kmemleak is disabled, the kmemleak objects can never be freed,
no matter if it's disabled by a user or due to fatal errors.

Those objects can be a big waste of memory.

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
1200264 1197433  99%0.30K  46164   26369312K kmemleak_object

With this patch, internal objects will be freed immediately if kmemleak is
disabled explicitly by a user. If it's disabled due to a kmemleak error,
The user will be informed, and then he/she can reclaim memory with:

# echo off > /sys/kernel/debug/kmemleak

v2: use "off" handler instead of "clear" handler to do this, suggested
by Catalin.

Signed-off-by: Li Zefan 
---
 Documentation/kmemleak.txt | 14 +-
 mm/kmemleak.c  | 21 -
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt
index 6dc8013..00aa013 100644
--- a/Documentation/kmemleak.txt
+++ b/Documentation/kmemleak.txt
@@ -42,7 +42,8 @@ objects to be reported as orphan.
 Memory scanning parameters can be modified at run-time by writing to the
 /sys/kernel/debug/kmemleak file. The following parameters are supported:
 
-  off  - disable kmemleak (irreversible)
+  off  - disable kmemleak, or free all kmemleak objects if kmemleak
+ has been disabled due to fatal errors. (irreversible).
   stack=on - enable the task stacks scanning (default)
   stack=off- disable the tasks stacks scanning
   scan=on  - start the automatic memory scanning thread (default)
@@ -118,6 +119,17 @@ Then as usual to get your report with:
 
   # cat /sys/kernel/debug/kmemleak
 
+Freeing kmemleak internal objects
+-
+
+To allow access to previously found memory leaks even when an error fatal
+to kmemleak happens, internal kmemleak objects won't be freed in this case.
+Those objects may occupy a large part of physical memory.
+
+You can reclaim memory from those objects with:
+
+  # echo off > /sys/kernel/debug/kmemleak
+
 Kmemleak API
 
 
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 31f01c5..7fc030e 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1616,9 +1616,6 @@ static ssize_t kmemleak_write(struct file *file, const 
char __user *user_buf,
int buf_size;
int ret;
 
-   if (!atomic_read(&kmemleak_enabled))
-   return -EBUSY;
-
buf_size = min(size, (sizeof(buf) - 1));
if (strncpy_from_user(buf, user_buf, buf_size) < 0)
return -EFAULT;
@@ -1628,9 +1625,18 @@ static ssize_t kmemleak_write(struct file *file, const 
char __user *user_buf,
if (ret < 0)
return ret;
 
-   if (strncmp(buf, "off", 3) == 0)
+   if (strncmp(buf, "off", 3) == 0) {
+   stop_scan_thread();
kmemleak_disable();
-   else if (strncmp(buf, "stack=on", 8) == 0)
+   goto out;
+   }
+
+   if (!atomic_read(&kmemleak_enabled)) {
+   ret = -EBUSY;
+   goto out;
+   }
+
+   if (strncmp(buf, "stack=on", 8) == 0)
kmemleak_stack_scan = 1;
else if (strncmp(buf, "stack=off", 9) == 0)
kmemleak_stack_scan = 0;
@@ -1695,6 +1701,11 @@ static void kmemleak_do_cleanup(struct work_struct *work)
list_for_each_entry_rcu(object, &object_list, object_list)
delete_object_full(object->pointer);
rcu_read_unlock();
+   } else {
+   pr_info("Disable kmemleak without freeing internal objects, "
+   "so you may still check information on memory leak. "
+   "You may reclaim memory by writing \"off\" to "
+   "/sys/kernel/debug/kmemleak\n");
}
mutex_unlock(&scan_mutex);
 }
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the block tree with Linus' tree

2014-03-16 Thread Stephen Rothwell

Hi Jens,

Today's linux-next merge of the block tree got a conflict in
fs/bio-integrity.c between commit eec70897d81b ("bio-integrity: Drop
bio_integrity_verify BUG_ON in post bip->bip_iter world") from Linus'
tree and commit bf36f9cfa6d3 ("fs/bio-integrity: remove duplicate code")
from the block tree.

I fixed it up (by using the latter - however, I am not sure that this is
completely correct as the BUG_ON more or less returns?) and can carry the
fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpBlWba7XxU0.pgp
Description: PGP signature

Re: [PATCH] spi: sc18is602: Don't be that restrictive with the maximum transfer speed

2014-03-16 Thread Guenter Roeck


On 03/16/2014 07:07 PM, Axel Lin wrote:

2014-03-17 9:47 GMT+08:00 Guenter Roeck :

Commit 09e99bca8 (spi: sc18is602: Convert to let spi core validate
transfer speed) made the maximum transfer speed much more restrictive
than before. The transfer speed used to be adjusted to 1/4 of the chip
clock rate if a higher transfer speed was requested. Now such transfers are
simply rejected. With default settings, this causes, for example, a transfer
request at 2 mbps to be rejected because the maximum speed with the default
chip clock is 1.843 mbps.

This is unnecessarily restrictive and causes unnecessary failures. Loosen
the limit to accept transfers up to 50% of the clock rate and adjust
the speed as needed when setting up the actualt transfer.


I suppose this controller can only set to SC18IS602_MODE_CLOCK_DIV_4 for the
highest transfer speed. If this is the case, master->max_speed_hz should be
hw->freq / 4.



That really depends on one's point of view. The chip does not support a transfer
speed of, say, hw->freq / 5 or hw->freq / 6 either, but adjusts it to the next
available speed. Following your logic, every non-exact speed should be rejected,
which would make it a pain for a user to find a working speed.


Now I'm thinking if it is ok to use master->max_speed_hz as transfer speed when
xfer->speed_hz > master->max_speed_hz. And it should be handled in spi core.
I'm sending a RFC patch now.


That is an acceptable alternate solution for me.

Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v3 0/4] ASoC: simple-card: multi DAI links extension

2014-03-16 Thread li.xi...@freescale.com

> Subject: [PATCH v3 0/4] ASoC: simple-card: multi DAI links extension
> 
> This patch series extends the simple card driver to handle
> many DAI links as this exists in the Cubox audio subsystem.
> 
> -v3
>   - remove 'Fix the reference count of device nodes'
>   which is applied (Mark Brown)
>   - new patch 'Simplify code'
>   - dynamically allocate and use properties for all DAI links
>   (Jyri Sarha and Li Xiubo)


This patch series looks good to me.

For this patch series:
Reviewed-by: Xiubo Li 


Thanks,

--
Best Regards,
Xiubo


> - v2
>   - change subject/comment about device node reference count
>   (Mark Brown)
>   - use a null size array instead of an implicit area for the DAI links
>   (Li Xiubo)
>   - update the reference count of the device node at end of probe
> 
> Jean-Francois Moine (4):
>   ASoC: simple-card: Simplify code
>   ASoC: simple-card: dynamically allocate the DAI link and properties
>   ASoC: simple-card: Handle many DAI links
>   ASoC: simple-card: Add DT documentation for multi-DAI links
> 
>  .../devicetree/bindings/sound/simple-card.txt  |  34 +++-
>  sound/soc/generic/simple-card.c| 181 +---
> -
>  2 files changed, 145 insertions(+), 70 deletions(-)
> 
> --
> 1.9.0
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Apsveicam Uzvarētājs - Ref Nē: Sp/229/0-01/07/5-02/ES.

2014-03-16 Thread Terry K. Halladay





Apsveicam Uzvarētājs - Ref Nē: Sp/229/0-01/07/5-02/ES.

Jūsu e-pasts ID ir tikko ieguva € 450,000.00 Eiro(četri simti piecdesmit 
tūkstoši. Eiro) uz Uplift Starptautiskā Labdarības programma. Ref Nr 
Sp/229/0-01/07/5-02/ES. Lucky Nr 9/11/13/24/40.

Lai iegūtu vairāk informācijas un kontaktu prasījuma procedūrām;

CAPITAL CLAIM AGENCY
Mr.  John Carlos.
E-pasts: infoass...@aol.com
Tel: +34-672-594-567 (Runāt un Angleščina tikai)

Ar savu vārdu un uzvārdu, adresi, vecums, nodarbošanās, tālruņa numurus

Sūtiet savu atbildi uz šo e-pastu: infoass...@aol.com

Piezīme: Šis ir starptautisks loterija programmu. Šī ziņa tika tulkots no 
Latvijas Angleščina

Apsveicam!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-16 Thread H. Peter Anvin

On 03/16/2014 08:35 PM, Sarah Newman wrote:
> Can you please review my patch first?  It's only enabled when absolutely 
> required.

It doesn't help.  It means you're running on Xen, and you will have
processes subjected to random SIGKILL because they happen to touch the
FPU when the atomic pool is low.

However, there is probably a happy medium: you don't actually need eager
FPU restore, you just need eager FPU *allocation*.  We have been
intending to allocate the FPU state at task creation time for eagerfpu,
and Suresh Siddha has already produced such a patch; it just needs some
minor fixups due to an __init failure.

http://lkml.kernel.org/r/1391325599.6481.5.camel@europa

In the Xen case we could turn on eager allocation but not eager fpu.  In
fact, it might be justified to *always* do eager allocation...

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-16 Thread Sarah Newman

Can you please review my patch first?  It's only enabled when absolutely 
required.

On 03/16/2014 08:33 PM, H. Peter Anvin wrote:
> No, the right thing is to unf*ck the Xen braindamage and use eagerfpu as a 
> workaround for the legacy hypervisor versions.
> 
> GFP_ATOMIC -> SIGKILL is definitely a NAK.
> 
> On March 16, 2014 8:13:05 PM PDT, Sarah Newman  wrote:
>> On 03/10/2014 10:15 AM, David Vrabel wrote:
>>> On 10/03/14 16:40, H. Peter Anvin wrote:
 On 03/10/2014 09:17 AM, David Vrabel wrote:
> math_state_restore() is called from the #NM exception handler.  It
>> may
> do a GFP_KERNEL allocation (in init_fpu()) which may schedule.
>
> Change this allocation to GFP_ATOMIC, but leave all the other
>> callers
> of init_fpu() or fpu_alloc() using GFP_KERNEL.

 And what the [Finnish] do you do if GFP_ATOMIC fails?
>>>
>>> The same thing it used to do -- kill the task with SIGKILL.  I
>> haven't
>>> changed this behaviour.
>>>
 Sarah's patchset switches Xen PV to use eagerfpu unconditionally,
>> which
 removes the dependency on #NM and is the right thing to do.
>>>
>>> Ok. I'll wait for this series and not pursue this patch any further.
>>
>> Sorry, this got swallowed by my mail filter.
>>
>> I did some more testing and I think eagerfpu is going to noticeably
>> slow things down. When I ran
>> "time sysbench --num-threads=64 --test=threads run" I saw on the order
>> of 15% more time spent in
>> system mode and this seemed consistent over different runs.
>>
>> As for GFP_ATOMIC, unfortunately I don't know a sanctioned test here so
>> I rolled my own. This test
>> sequentially allocated math-using processes in the background until it
>> could not any more.  On a
>> 64MB instance, I saw 10% fewer processes allocated with GFP_ATOMIC
>> compared to GFP_KERNEL when I
>> continually allocated new processes up to OOM conditions (256 vs 228.) 
>> A similar test on a
>> different RFS and a kernel using GFP_NOWAIT showed pretty much no
>> difference in how many processes I
>> could allocate. This doesn't seem too bad unless there is some kind of
>> fragmentation over time which
>> would cause worse performance.
>>
>> Since performance degradation applies at all times and not just under
>> extreme conditions, I think
>> the lesser evil will actually be GFP_ATOMIC.  But it's not necessary to
>> always use GFP_ATOMIC, only
>> under certain conditions - IE when the xen PVABI forces us to.
>>
>> Patches will be supplied shortly.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-16 Thread H. Peter Anvin

No, the right thing is to unf*ck the Xen braindamage and use eagerfpu as a 
workaround for the legacy hypervisor versions.

GFP_ATOMIC -> SIGKILL is definitely a NAK.

On March 16, 2014 8:13:05 PM PDT, Sarah Newman  wrote:
>On 03/10/2014 10:15 AM, David Vrabel wrote:
>> On 10/03/14 16:40, H. Peter Anvin wrote:
>>> On 03/10/2014 09:17 AM, David Vrabel wrote:
 math_state_restore() is called from the #NM exception handler.  It
>may
 do a GFP_KERNEL allocation (in init_fpu()) which may schedule.

 Change this allocation to GFP_ATOMIC, but leave all the other
>callers
 of init_fpu() or fpu_alloc() using GFP_KERNEL.
>>>
>>> And what the [Finnish] do you do if GFP_ATOMIC fails?
>> 
>> The same thing it used to do -- kill the task with SIGKILL.  I
>haven't
>> changed this behaviour.
>> 
>>> Sarah's patchset switches Xen PV to use eagerfpu unconditionally,
>which
>>> removes the dependency on #NM and is the right thing to do.
>> 
>> Ok. I'll wait for this series and not pursue this patch any further.
>
>Sorry, this got swallowed by my mail filter.
>
>I did some more testing and I think eagerfpu is going to noticeably
>slow things down. When I ran
>"time sysbench --num-threads=64 --test=threads run" I saw on the order
>of 15% more time spent in
>system mode and this seemed consistent over different runs.
>
>As for GFP_ATOMIC, unfortunately I don't know a sanctioned test here so
>I rolled my own. This test
>sequentially allocated math-using processes in the background until it
>could not any more.  On a
>64MB instance, I saw 10% fewer processes allocated with GFP_ATOMIC
>compared to GFP_KERNEL when I
>continually allocated new processes up to OOM conditions (256 vs 228.) 
>A similar test on a
>different RFS and a kernel using GFP_NOWAIT showed pretty much no
>difference in how many processes I
>could allocate. This doesn't seem too bad unless there is some kind of
>fragmentation over time which
>would cause worse performance.
>
>Since performance degradation applies at all times and not just under
>extreme conditions, I think
>the lesser evil will actually be GFP_ATOMIC.  But it's not necessary to
>always use GFP_ATOMIC, only
>under certain conditions - IE when the xen PVABI forces us to.
>
>Patches will be supplied shortly.

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] drm/exynos: Fix (more) freeing issues in exynos_drm_drv.c

2014-03-16 Thread Daniel Kurtz

The following commit [0] fixed a use-after-free, but left the subdrv open
in the error path.

[0] commit 6ca605f7c70895a35737435f17ae9cc5e36f1466
drm/exynos: Fix freeing issues in exynos_drm_drv.c

Signed-off-by: Daniel Kurtz 
---
Hi, I noticed this when reviewing some recent patches.
I am only able to compile test this patch.

 drivers/gpu/drm/exynos/exynos_drm_drv.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c 
b/drivers/gpu/drm/exynos/exynos_drm_drv.c
index 215131a..c204b4e 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c
@@ -172,20 +172,24 @@ static int exynos_drm_open(struct drm_device *dev, struct 
drm_file *file)
 
ret = exynos_drm_subdrv_open(dev, file);
if (ret)
-   goto out;
+   goto err_file_priv_free;
 
anon_filp = anon_inode_getfile("exynos_gem", &exynos_drm_gem_fops,
NULL, 0);
if (IS_ERR(anon_filp)) {
ret = PTR_ERR(anon_filp);
-   goto out;
+   goto err_subdrv_close;
}
 
anon_filp->f_mode = FMODE_READ | FMODE_WRITE;
file_priv->anon_filp = anon_filp;
 
return ret;
-out:
+
+err_subdrv_close:
+   exynos_drm_subdrv_close(dev, file);
+
+err_file_priv_free:
kfree(file_priv);
file->driver_priv = NULL;
return ret;
-- 
1.9.0.279.gdc9e3eb

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] drm/exynos: Fix (more) freeing issues in exynos_drm_drv.c

2014-03-16 Thread Daniel Kurtz

The following commit [0] fixed a use-after-free, but left the subdrv open
in the error path.

[0] commit 6ca605f7c70895a35737435f17ae9cc5e36f1466
drm/exynos: Fix freeing issues in exynos_drm_drv.c

Change-Id: I452e944bf090fb11434d9e34213c890c41c15d73
Signed-off-by: Daniel Kurtz 
---
 drivers/gpu/drm/exynos/exynos_drm_drv.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c 
b/drivers/gpu/drm/exynos/exynos_drm_drv.c
index 215131a..c204b4e 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c
@@ -172,20 +172,24 @@ static int exynos_drm_open(struct drm_device *dev, struct 
drm_file *file)
 
ret = exynos_drm_subdrv_open(dev, file);
if (ret)
-   goto out;
+   goto err_file_priv_free;
 
anon_filp = anon_inode_getfile("exynos_gem", &exynos_drm_gem_fops,
NULL, 0);
if (IS_ERR(anon_filp)) {
ret = PTR_ERR(anon_filp);
-   goto out;
+   goto err_subdrv_close;
}
 
anon_filp->f_mode = FMODE_READ | FMODE_WRITE;
file_priv->anon_filp = anon_filp;
 
return ret;
-out:
+
+err_subdrv_close:
+   exynos_drm_subdrv_close(dev, file);
+
+err_file_priv_free:
kfree(file_priv);
file->driver_priv = NULL;
return ret;
-- 
1.9.0.279.gdc9e3eb

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86, fpu, xen: Allocate fpu state for xen pv based on PVABI behavior

2014-03-16 Thread Sarah Newman

The xen PVABI dictates that CR0 TS will be automatically cleared for
the device not available trap.  This means it is not safe to task
switch with the default PVABI behavior.

One method of working around this is to disallow scheduling when
allocating memory for the fpu state, but in extremely low memory
circumstances this may fail. Therefore only require this behavior
when xen pv mode is active and the xen PVABI does not allow task
switching.

One other solution, enabling eagerfpu, was explored but eventually
discarded due to notable performance impact.

Reported-by: Zhu Yanhai 
Signed-off-by: Sarah Newman 
---
 arch/x86/include/asm/fpu-internal.h |2 +-
 arch/x86/include/asm/processor.h|5 +
 arch/x86/kernel/i387.c  |   13 +
 arch/x86/kernel/traps.c |2 --
 arch/x86/xen/enlighten.c|1 +
 arch/x86/xen/setup.c|   27 +++
 6 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h 
b/arch/x86/include/asm/fpu-internal.h
index cea1c76..9ec236c 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -571,7 +571,7 @@ static inline int fpu_alloc(struct fpu *fpu)
 {
if (fpu_allocated(fpu))
return 0;
-   fpu->state = kmem_cache_alloc(task_xstate_cachep, GFP_KERNEL);
+   fpu_ops.fpu_state_alloc(fpu);
if (!fpu->state)
return -ENOMEM;
WARN_ON((unsigned long)fpu->state & 15);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index fdedd38..941b55d 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -413,6 +413,11 @@ struct fpu {
union thread_xstate *state;
 };
 
+struct fpu_ops {
+   void (*fpu_state_alloc)(struct fpu *fpu);
+};
+extern struct fpu_ops fpu_ops;
+
 #ifdef CONFIG_X86_64
 DECLARE_PER_CPU(struct orig_ist, orig_ist);
 
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index d5dd808..24ce161 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -157,6 +157,19 @@ static void init_thread_xstate(void)
xstate_size = sizeof(struct i387_fsave_struct);
 }
 
+static void native_fpu_state_alloc(struct fpu *fpu)
+{
+   unsigned long flags;
+   local_save_flags(flags);
+   local_irq_enable();
+   fpu->state = kmem_cache_alloc(task_xstate_cachep, GFP_KERNEL);
+   local_irq_restore(flags);
+}
+
+struct fpu_ops fpu_ops = {
+   .fpu_state_alloc = native_fpu_state_alloc,
+};
+
 /*
  * Called at bootup to set up the initial FPU state that is later cloned
  * into all processes.
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 57409f6..97479d6 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -624,7 +624,6 @@ void math_state_restore(void)
struct task_struct *tsk = current;
 
if (!tsk_used_math(tsk)) {
-   local_irq_enable();
/*
 * does a slab alloc which can sleep
 */
@@ -635,7 +634,6 @@ void math_state_restore(void)
do_group_exit(SIGKILL);
return;
}
-   local_irq_disable();
}
 
__thread_fpu_begin(tsk);
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 201d09a..fb3aa30 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -69,6 +69,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_ACPI
 #include 
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 0982233..4e65b52 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -18,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -598,6 +600,28 @@ void __init xen_pvmmu_arch_setup(void)
xen_enable_nmi();
 }
 
+static void xen_fpu_state_alloc(struct fpu *fpu)
+{
+   fpu->state = kmem_cache_alloc(task_xstate_cachep, GFP_NOWAIT);
+}
+
+static const struct fpu_ops xen_fpu_ops __initconst = {
+   .fpu_state_alloc = xen_fpu_state_alloc,
+};
+
+#define _XEN_CPUID_FEAT1_DEV_NA_TS_ALLOWED 1
+#define XEN_CPUID_FEAT1_DEV_NA_TS_ALLOWED \
+   (1u<<_XEN_CPUID_FEAT1_DEV_NA_TS_ALLOWED)
+static bool __init xen_check_dev_na_ts_allowed(void)
+{
+   uint32_t pages, msr, feat1, feat2, base;
+
+   base = xen_cpuid_base();
+   cpuid(base + 2, &pages, &msr, &feat1, &feat2);
+
+   return !!(feat1 & XEN_CPUID_FEAT1_DEV_NA_TS_ALLOWED);
+}
+
 /* This function is not called for HVM domains */
 void __init xen_arch_setup(void)
 {
@@ -605,6 +629,9 @@ void __init xen_arch_setup(void)
if (!xen_feature(XENFEAT_auto_translated_physmap))
xen_pvmmu_arch_setup();
 
+   if (!xen_check_dev_na_ts_allowed())
+   fpu_ops = xen_fpu_ops;
+
 #ifdef CONFIG_ACPI

Re: [ PATCH 0/8] sched: remove cpu_load array

2014-03-16 Thread Alex Shi

On 03/13/2014 01:57 PM, Alex Shi wrote:
> In the cpu_load decay usage, we mixed the long term, short term load with
> balance bias, randomly pick a big/small value from them according to balance 
> destination or source. This mix is wrong, the balance bias should be based
> on task moving cost between cpu groups, not on random history or instant load.
> History load maybe diverage a lot from real load, that lead to incorrect bias.
> 
> In fact, the cpu_load decays can be replaced by the sched_avg decay, that 
> also decays load on time. The balance bias part can fullly use fixed bias --
> imbalance_pct, which is already used in newly idle, wake, forkexec balancing
> and numa balancing scenarios.
> 
> Currently the only working idx is busy_idx and idle_idx.
> As to busy_idx:
> We mix history load decay and bias together. The ridiculous thing is, when 
> all cpu load are continuous stable, long/short term load is same. then we 
> lose the bias meaning, so any minimum imbalance may cause unnecessary task
> moving. To prevent this funny thing happen, we have to reuse the 
> imbalance_pct again in find_busiest_group().  But that clearly causes over
> bias in normal time. If there are some burst load in system, it is more worse.
> 

Any comments?

> As to idle_idx:
> Though I have some cencern of usage corretion, 
> https://lkml.org/lkml/2014/3/12/247, but since we are working on cpu
> idle migration into scheduler. The problem will be reconsidered. We don't
> need to care it now.
> 
> This patch removed the cpu_load idx decay, since it can be replaced by
> sched_avg feature. and left the imbalance_pct bias untouched, since only 
> idle_idx missed it, but it is fine. and will be reconsidered soon.
> 
> 
> V5,
> 1, remove unify bias patch and biased_load function. Thanks for PeterZ's 
> comments!
> 2, remove get_sd_load_idx() in the 1st patch as SrikarD's suggestion.
> 3, remove LB_BIAS feature, it is not needed now.


-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv1] x86: don't schedule when handling #NM exception

2014-03-16 Thread Sarah Newman

On 03/10/2014 10:15 AM, David Vrabel wrote:
> On 10/03/14 16:40, H. Peter Anvin wrote:
>> On 03/10/2014 09:17 AM, David Vrabel wrote:
>>> math_state_restore() is called from the #NM exception handler.  It may
>>> do a GFP_KERNEL allocation (in init_fpu()) which may schedule.
>>>
>>> Change this allocation to GFP_ATOMIC, but leave all the other callers
>>> of init_fpu() or fpu_alloc() using GFP_KERNEL.
>>
>> And what the [Finnish] do you do if GFP_ATOMIC fails?
> 
> The same thing it used to do -- kill the task with SIGKILL.  I haven't
> changed this behaviour.
> 
>> Sarah's patchset switches Xen PV to use eagerfpu unconditionally, which
>> removes the dependency on #NM and is the right thing to do.
> 
> Ok. I'll wait for this series and not pursue this patch any further.

Sorry, this got swallowed by my mail filter.

I did some more testing and I think eagerfpu is going to noticeably slow things 
down. When I ran
"time sysbench --num-threads=64 --test=threads run" I saw on the order of 15% 
more time spent in
system mode and this seemed consistent over different runs.

As for GFP_ATOMIC, unfortunately I don't know a sanctioned test here so I 
rolled my own. This test
sequentially allocated math-using processes in the background until it could 
not any more.  On a
64MB instance, I saw 10% fewer processes allocated with GFP_ATOMIC compared to 
GFP_KERNEL when I
continually allocated new processes up to OOM conditions (256 vs 228.)  A 
similar test on a
different RFS and a kernel using GFP_NOWAIT showed pretty much no difference in 
how many processes I
could allocate. This doesn't seem too bad unless there is some kind of 
fragmentation over time which
would cause worse performance.

Since performance degradation applies at all times and not just under extreme 
conditions, I think
the lesser evil will actually be GFP_ATOMIC.  But it's not necessary to always 
use GFP_ATOMIC, only
under certain conditions - IE when the xen PVABI forces us to.

Patches will be supplied shortly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] kmemleak: allow freeing internal objects after disabling kmemleak

2014-03-16 Thread Li Zefan

On 2014/3/13 20:14, Catalin Marinas wrote:
> On Thu, Mar 13, 2014 at 06:47:46AM +, Li Zefan wrote:
>> +Freeing kmemleak internal objects
>> +-
>> +
>> +To allow access to previosuly found memory leaks even when an error fatal
>> +to kmemleak happens, internal kmemleak objects won't be freed when kmemleak
>> +is disabled, and those objects may occupy a large part of physical
>> +memory.
>> +
>> +If you want to make sure they're freed before disabling kmemleak:
>> +
>> +  # echo scan=off > /sys/kernel/debug/kmemleak
>> +  # echo off > /sys/kernel/debug/kmemleak
> 
> I would actually change the code to do a stop_scan_thread() as part of
> the "off" handling so that scan=off is not required (we can't put it as
> part of the kmemleak_disable because we need scan_mutex held).
> 

Sounds reasonable.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [PATCH V3] serial/uart/8250: Add tunable RX interrupt trigger I/F of FIFO buffers

2014-03-16 Thread Yoshihiro YUNOMAE


Hi Heikki,

Thank you for your reply.

(2014/03/14 23:16), Heikki Krogerus wrote:

Hi,

On Fri, Mar 14, 2014 at 11:21:54AM +0900, Yoshihiro YUNOMAE wrote:

  void serial8250_clear_and_reinit_fifos(struct uart_8250_port *p)
  {
-   unsigned char fcr;
-
serial8250_clear_fifos(p);
-   fcr = uart_config[p->port.type].fcr;
-   serial_out(p, UART_FCR, fcr);
+   p->fcr = uart_config[p->port.type].fcr;
+   serial_out(p, UART_FCR, p->fcr);


You should allow also the probe drivers to set this..

 if (!p->fcr)
 p->fcr = uart_config[p->port.type].fcr;


Oh, I'll fix it.


  }
  EXPORT_SYMBOL_GPL(serial8250_clear_and_reinit_fifos);

@@ -2325,10 +2323,19 @@ serial8250_do_set_termios(struct uart_port *port, 
struct ktermios *termios,
if ((baud < 2400 && !up->dma) || fifo_bug) {
fcr &= ~UART_FCR_TRIGGER_MASK;
fcr |= UART_FCR_TRIGGER_1;
+   /* Don't use user setting RX trigger */
+   up->fcr = 0;


I don't know about this but..


}
}

/*
+* If up->fcr exists, a user has opened this port, changed RX trigger,
+* or read RX trigger before. So, we don't need to change up->fcr here.
+*/
+   if (!up->fcr)
+   up->fcr = fcr;


Why not just set fcr = up->fcr in the beginning of the function?


Ah, we don't need to set up->fcr = 0 in the previous flow when we
implement as follows:

unsigned char fcr = up->fcr;

 ...  ...

/*
 * If fcr exists, a user has opened this port, changed RX
 * trigger, or read RX trigger before. If so, we do not change
 * fcr.
 */
if (!fcr)
fcr = uart_config[port_type].fcr;
if ((baud < 2400 && !up->dma) || fifo_bug) {
fcr &= ~UART_FCR_TRIGGER_MASK;
fcr |= UART_FCR_TRIGGER_1;
}

up->fcr = fcr;





+static int do_set_rx_int_trig(struct tty_port *port, unsigned char val)
+{
+   struct uart_state *state = container_of(port, struct uart_state, port);
+   struct uart_port *uport = state->uart_port;
+   struct uart_8250_port *up =
+   container_of(uport, struct uart_8250_port, port);
+   unsigned char fcr;
+   int rx_trig;
+
+   if (!(up->capabilities & UART_CAP_FIFO) || uport->fifosize <= 1)
+   return -EINVAL;
+
+   rx_trig = convert_val2rxtrig(up, val);
+   if (rx_trig < 0)
+   return rx_trig;
+
+   serial8250_clear_fifos(up);
+   if (!up->fcr)
+   /* termios is not set yet */
+   fcr = uart_config[up->port.type].fcr;
+   else
+   fcr = up->fcr;
+   fcr &= ~UART_FCR_TRIGGER_MASK;
+   fcr |= (unsigned char)rx_trig;
+   up->fcr = fcr;
+   serial_out(up, UART_FCR, up->fcr);
+   return 0;
+}


Where are you setting UART_FCR_ENABLE_FIFO bit? Am I missing
something?


In these implementation, the driver sets up->fcr as
uart_config[up->port.type].fcr first and changes only RX trigger. So,
we don't need to set the bit in a direct way.

Thank you,
Yoshihiro YUNOMAE

--
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: yoshihiro.yunomae...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2 6/8] devfreq: exynos4: Fix power-leakage of clock on suspend state

2014-03-16 Thread Chanwoo Choi

Hi Tomasz,

On 03/15/2014 02:52 AM, Tomasz Figa wrote:
> Hi Chanwoo,
> 
> On 13.03.2014 09:17, Chanwoo Choi wrote:
>> This patch disable ppmu clocks before entering suspend state to remove
>> power-leakage and enable ppmu clocks on resume function.
> 
> I don't think there is any need for this, because all the clocks are stopped 
> anyway in SLEEP mode.

OK, I'll discard this patch on next patchset.

Best Regards,
Chanwoo Choi



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2 3/8] devfreq: exynos4: Add ppmu's clock control and code clean about regulator control

2014-03-16 Thread Chanwoo Choi

Hi Tomasz,

On 03/15/2014 02:42 AM, Tomasz Figa wrote:
> Hi Chanwoo,
> 
> On 13.03.2014 09:17, Chanwoo Choi wrote:
>> There are not the clock controller of ppmudmc0/1. This patch control the 
>> clock
>> of ppmudmc0/1 which is used for monitoring memory bus utilization.
>>
>> Also, this patch code clean about regulator control and free resource
>> when calling exit/remove function.
>>
>> For example,
>> busfreq@106A {
>> compatible = "samsung,exynos4x12-busfreq";
>>
>> /* Clock for PPMUDMC0/1 */
>> clocks = <&clock CLK_PPMUDMC0>, <&clock CLK_PPMUDMC1>;
>> clock-names = "ppmudmc0", "ppmudmc1";
>>
>> /* Regulator for MIF/INT block */
>> vdd_mif-supply = <&buck1_reg>;
>> vdd_int-supply = <&buck3_reg>;
>> };
>>
>> Signed-off-by: Chanwoo Choi 
>> ---
>>   drivers/devfreq/exynos/exynos4_bus.c | 114 
>> ++-
>>   1 file changed, 100 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/devfreq/exynos/exynos4_bus.c 
>> b/drivers/devfreq/exynos/exynos4_bus.c
>> index 1a0effa..a2a3a47 100644
>> --- a/drivers/devfreq/exynos/exynos4_bus.c
>> +++ b/drivers/devfreq/exynos/exynos4_bus.c
>> @@ -62,6 +62,11 @@ enum exynos_ppmu_idx {
>>   PPMU_END,
>>   };
>>
>> +static const char *exynos_ppmu_clk_name[] = {
>> +[PPMU_DMC0]= "ppmudmc0",
>> +[PPMU_DMC1]= "ppmudmc1",
>> +};
>> +
>>   #define EX4210_LV_MAXLV_2
>>   #define EX4x12_LV_MAXLV_4
>>   #define EX4210_LV_NUM(LV_2 + 1)
>> @@ -86,6 +91,7 @@ struct busfreq_data {
>>   struct regulator *vdd_mif; /* Exynos4412/4212 only */
>>   struct busfreq_opp_info curr_oppinfo;
>>   struct exynos_ppmu ppmu[PPMU_END];
>> +struct clk *clk_ppmu[PPMU_END];
>>
>>   struct notifier_block pm_notifier;
>>   struct mutex lock;
>> @@ -722,8 +728,26 @@ static int exynos4_bus_get_dev_status(struct device 
>> *dev,
>>   static void exynos4_bus_exit(struct device *dev)
>>   {
>>   struct busfreq_data *data = dev_get_drvdata(dev);
>> +int i;
>> +
>> +/*
>> + * Un-map memory map and disable regulator/clocks
>> + * to prevent power leakage.
>> + */
>> +regulator_disable(data->vdd_int);
>> +if (data->type == TYPE_BUSF_EXYNOS4x12)
>> +regulator_disable(data->vdd_mif);
>> +
>> +for (i = 0; i < PPMU_END; i++) {
>> +if (data->clk_ppmu[i])
> 
> This check is invalid. Clock pointers must be checked for validity using the 
> IS_ERR() macro, because NULL is a valid clock pointer value indicating a 
> dummy clock.

OK, I'll check it by using the IS_ERR() macro as following:

if (IS_ERR(data->clk_ppmu[i]) {


> 
>> +clk_disable_unprepare(data->clk_ppmu[i]);
>> +}
>>
>> -devfreq_unregister_opp_notifier(dev, data->devfreq);
>> +for (i = 0; i < PPMU_END; i++) {
>> +if (data->ppmu[i].hw_base)
> 
> Can this even happen? Is there a PPMU without registers?
> 
>> +iounmap(data->ppmu[i].hw_base);
>> +
>> +}
>>   }
>>
>>   static struct devfreq_dev_profile exynos4_devfreq_profile = {
>> @@ -987,6 +1011,7 @@ static int exynos4_busfreq_parse_dt(struct busfreq_data 
>> *data)
>>   {
>>   struct device *dev = data->dev;
>>   struct device_node *np = dev->of_node;
>> +const char **clk_name = exynos_ppmu_clk_name;
>>   int i, ret;
>>
>>   if (!np) {
>> @@ -1005,8 +1030,70 @@ static int exynos4_busfreq_parse_dt(struct 
>> busfreq_data *data)
>>   }
>>   }
>>
>> +/*
>> + * Get PPMU's clocks to control them. But, if PPMU's clocks
>> + * is default 'pass' state, this driver don't need control
>> + * PPMU's clock.
>> + */
>> +for (i = 0; i < PPMU_END; i++) {
>> +data->clk_ppmu[i] = devm_clk_get(dev, clk_name[i]);
>> +if (IS_ERR_OR_NULL(data->clk_ppmu[i])) {
> 
> Again, this check is invalid. Only IS_ERR() is the correct way to check 
> whether returned clock pointer is valid.

ditto.
if (IS_ERR(data->clk_ppmu[i]) {

> 
>> +dev_warn(dev, "Cannot get %s clock\n", clk_name[i]);
>> +data->clk_ppmu[i] = NULL;
> 
> This assignment is wrong. To allow further checking whether the clock was 
> found the value returned from devm_clk_get() must be retained and then 
> IS_ERR() used in further code.
> 
> However, I believe it should be an error if a clock is not provided. The 
> driver must make sure that PPMU clocks are ungated before trying to access 
> them, otherwise the system might hang.

OK, I'll use IS_ERR() macro when checking / handling clock instance of 
'data->clk_ppmu[i]'.
And If this driver can't get the clock of ppmu, handel error exception.

> 
>> +}
>> +
>> +ret = clk_prepare_enable(data->clk_ppmu[i]);
> 
> The code above allows the clock to be skipped, but this line doesn't check 
> whether it is valid. Still, I think the clock should be always required.

OK, I'll require clock of ppmu without exception.

> 
>> +if (ret < 0) {
>> +dev_warn(dev, "Cannot enable %s c

You have received 1.5m pounds as donation see link www.bbc.co.uk/news/uk-england-19254228 send name address and phone number for info

2014-03-16 Thread Adrian Gillian Bayford

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fixing some coding style issues on line6 driver.c

2014-03-16 Thread Davide Berardi

Thanks for your replies, I've applied your suggestions to the patch.

Davide

Fixed some line over 80 characters.

From: Davide Berardi 

Signed-off-by: Davide Berardi 

---
 drivers/staging/line6/driver.c  | 54 ++---
 drivers/staging/line6/usbdefs.h |  2 +-
 2 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/drivers/staging/line6/driver.c b/drivers/staging/line6/driver.c
index e7a9d8d..77f1b42 100644
--- a/drivers/staging/line6/driver.c
+++ b/drivers/staging/line6/driver.c
@@ -57,28 +57,32 @@ static const struct usb_device_id line6_id_table[] = {

 MODULE_DEVICE_TABLE(usb, line6_id_table);

+#define L6PROP(dev_bit, dev_id, dev_name, dev_cap)\
+   {.device_bit = LINE6_BIT_##dev_bit, .id = dev_id,\
+.name = dev_name, .capabilities = LINE6_BIT_##dev_cap}
+
 /* *INDENT-OFF* */
-static struct line6_properties line6_properties_table[] = {
-   { LINE6_BIT_BASSPODXT, "BassPODxt", "BassPODxt",
LINE6_BIT_CONTROL_PCM_HWMON },
-   { LINE6_BIT_BASSPODXTLIVE, "BassPODxtLive", "BassPODxt Live",   
LINE6_BIT_CONTROL_PCM_HWMON },
-   { LINE6_BIT_BASSPODXTPRO,  "BassPODxtPro",  "BassPODxt Pro",
LINE6_BIT_CONTROL_PCM_HWMON },
-   { LINE6_BIT_GUITARPORT,"GuitarPort","GuitarPort",   
LINE6_BIT_PCM   },
-   { LINE6_BIT_POCKETPOD, "PocketPOD", "Pocket POD",   
LINE6_BIT_CONTROL   },
-   { LINE6_BIT_PODHD300,  "PODHD300",  "POD HD300",
LINE6_BIT_CONTROL_PCM_HWMON },
-   { LINE6_BIT_PODHD400,  "PODHD400",  "POD HD400",
LINE6_BIT_CONTROL_PCM_HWMON },
-   { LINE6_BIT_PODHD500,  "PODHD500",  "POD HD500",
LINE6_BIT_CONTROL_PCM_HWMON },
-   { LINE6_BIT_PODSTUDIO_GX,  "PODStudioGX",   "POD Studio GX",
LINE6_BIT_PCM   },
-   { LINE6_BIT_PODSTUDIO_UX1, "PODStudioUX1",  "POD Studio UX1",   
LINE6_BIT_PCM   },
-   { LINE6_BIT_PODSTUDIO_UX2, "PODStudioUX2",  "POD Studio UX2",   
LINE6_BIT_PCM   },
-   { LINE6_BIT_PODX3, "PODX3", "POD X3",   
LINE6_BIT_PCM   },
-   { LINE6_BIT_PODX3LIVE, "PODX3Live", "POD X3 Live",  
LINE6_BIT_PCM   },
-   { LINE6_BIT_PODXT, "PODxt", "PODxt",
LINE6_BIT_CONTROL_PCM_HWMON },
-   { LINE6_BIT_PODXTLIVE, "PODxtLive", "PODxt Live",   
LINE6_BIT_CONTROL_PCM_HWMON },
-   { LINE6_BIT_PODXTPRO,  "PODxtPro",  "PODxt Pro",
LINE6_BIT_CONTROL_PCM_HWMON },
-   { LINE6_BIT_TONEPORT_GX,   "TonePortGX","TonePort GX",  
LINE6_BIT_PCM   },
-   { LINE6_BIT_TONEPORT_UX1,  "TonePortUX1",   "TonePort UX1", 
LINE6_BIT_PCM   },
-   { LINE6_BIT_TONEPORT_UX2,  "TonePortUX2",   "TonePort UX2", 
LINE6_BIT_PCM   },
-   { LINE6_BIT_VARIAX,"Variax","Variax Workbench", 
LINE6_BIT_CONTROL   },
+static const struct line6_properties line6_properties_table[] = {
+   L6PROP(BASSPODXT, "BassPODxt", "BassPODxt",CTRL_PCM_HW),
+   L6PROP(BASSPODXTLIVE, "BassPODxtLive", "BassPODxt Live",   CTRL_PCM_HW),
+   L6PROP(BASSPODXTPRO,  "BassPODxtPro",  "BassPODxt Pro",CTRL_PCM_HW),
+   L6PROP(GUITARPORT,"GuitarPort","GuitarPort",   PCM),
+   L6PROP(POCKETPOD, "PocketPOD", "Pocket POD",   CONTROL),
+   L6PROP(PODHD300,  "PODHD300",  "POD HD300",CTRL_PCM_HW),
+   L6PROP(PODHD400,  "PODHD400",  "POD HD400",CTRL_PCM_HW),
+   L6PROP(PODHD500,  "PODHD500",  "POD HD500",CTRL_PCM_HW),
+   L6PROP(PODSTUDIO_GX,  "PODStudioGX",   "POD Studio GX",PCM),
+   L6PROP(PODSTUDIO_UX1, "PODStudioUX1",  "POD Studio UX1",   PCM),
+   L6PROP(PODSTUDIO_UX2, "PODStudioUX2",  "POD Studio UX2",   PCM),
+   L6PROP(PODX3, "PODX3", "POD X3",   PCM),
+   L6PROP(PODX3LIVE, "PODX3Live", "POD X3 Live",  PCM),
+   L6PROP(PODXT, "PODxt", "PODxt",CTRL_PCM_HW),
+   L6PROP(PODXTLIVE, "PODxtLive", "PODxt Live",   CTRL_PCM_HW),
+   L6PROP(PODXTPRO,  "PODxtPro",  "PODxt Pro",CTRL_PCM_HW),
+   L6PROP(TONEPORT_GX,   "TonePortGX","TonePort GX",  PCM),
+   L6PROP(TONEPORT_UX1,  "TonePortUX1",   "TonePort UX1", PCM),
+   L6PROP(TONEPORT_UX2,  "TonePortUX2",   "TonePort UX2", PCM),
+   L6PROP(VARIAX,"Variax","Variax Workbench", CONTROL),
 };
 /* *INDENT-ON* */

@@ -152,10 +156,10 @@ int line6_send_raw_message(struct usb_line6 *line6, const 
char *buffer,
int retval;

retval = usb_interrupt_msg(line6->usbdev,
-  usb_sndintpipe(line6->usbdev,
- 
line6->ep_control_write),
-

linux-next: manual merge of the hid tree with Linus' tree

2014-03-16 Thread Stephen Rothwell

Hi Jiri,

Today's linux-next merge of the hid tree got a conflict in
drivers/hid/i2c-hid/i2c-hid.c between commit 9d27f43274e4 ("HID: fix
buffer allocations") from Linus' tree and commit 649f94790314 ("HID:
i2c-hid: use generic .request() implementation") from the hid tree.

I fixed it up (the latter removes the code modified by the former, so I
just used the latter) and can carry the fix as necessary (no action is
required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpqjJxi4rINZ.pgp
Description: PGP signature

[f2fs-dev] [PATCH] f2fs: print type for each segment in segment_info's show

2014-03-16 Thread Chao Yu

The original segment_info's show looks out-of-format:
cat /proc/fs/f2fs/loop0/segment_info
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 512
512 512 512 512 512 512 512 0 0 512
348 0 263 0 0 512 0 0 512 512
512 512 0 512 512 512 512 512 512 512
512 512 511 328 512 512 512 512 512 512
512 512 512 512 512 512 512 0 0 175

Let's fix this and show type for each segment.
cat /proc/fs/f2fs/loop0/segment_info
format: segment_type|valid_blocks
segment_type(0:HD, 1:WD, 2:CD, 3:HN, 4:WN, 5:CN)
02|0   1|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0
10   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0
20   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0
30   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0
40   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0   0|0
50   3|0   3|0   3|0   3|0   3|0   3|0   3|0   0|0   3|0   3|0
60   3|0   3|0   3|0   3|0   3|0   3|0   3|0   3|0   3|0   3|512
70   3|512 3|512 3|512 3|512 3|512 3|512 3|512 3|0   3|0   3|512
80   3|0   3|0   3|0   3|0   3|0   3|512 3|0   3|0   3|512 3|512
90   3|512 0|512 3|274 0|512 0|512 0|512 0|512 0|512 0|512 3|512
100  3|512 0|512 3|511 0|328 3|512 0|512 0|512 3|512 0|512 0|512
110  0|512 0|512 0|512 0|512 0|512 0|512 0|512 5|0   4|0   3|512

Signed-off-by: Chao Yu 
---
 fs/f2fs/super.c |   10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 3a51d7a..057a3ef 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -544,8 +544,16 @@ static int segment_info_seq_show(struct seq_file *seq, 
void *offset)
le32_to_cpu(sbi->raw_super->segment_count_main);
int i;
 
+   seq_puts(seq, "format: segment_type|valid_blocks\n"
+   "segment_type(0:HD, 1:WD, 2:CD, 3:HN, 4:WN, 5:CN)\n");
+
for (i = 0; i < total_segs; i++) {
-   seq_printf(seq, "%u", get_valid_blocks(sbi, i, 1));
+   struct seg_entry *se = get_seg_entry(sbi, i);
+
+   if ((i % 10) == 0)
+   seq_printf(seq, "%-5d", i);
+   seq_printf(seq, "%d|%-3u", se->type,
+   get_valid_blocks(sbi, i, 1));
if ((i % 10) == 9 || i == (total_segs - 1))
seq_putc(seq, '\n');
else
-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Linux 3.14-rc7

2014-03-16 Thread Linus Torvalds

What a difference a week makes. In a good way. A week ago, cutting
rc6, I was not a happy person: the release had much too much noise in
it, and I felt that an rc8 and even an rc9 might well be a real
possibility.

Now it's a week later, and rc7 looks much better. Yeah, there's random
stuff all over (biggest contribution: networking, both core and
drivers), but it's a lot smaller than rc6, and none of it makes me
feel unhappy. Pretty much all the patches are tiny, and I think the
patch that removes the old useless x86 Centaur OOSTORE option is
probably the biggest in the lot of them. And the only other patch that
migth challenge it in size is also just removing code ("r8152: disable
the ECM mode").

Now, things might change, and maybe next week ends up being another
ugly week, but with some luck that won't happen and this is the last
rc.

Go out and test. It all looks good..

   Linus

---

Aaro Koskinen (1):
  ASoC: n810: fix init with DT boot

Al Viro (3):
  ocfs2 syncs the wrong range...
  sockfd_lookup_light(): switch to fdget^W^Waway from fget_light
  get rid of fget_light()

Ales Novak (1):
  [SCSI] storvsc: NULL pointer dereference fix

Alex Deucher (4):
  drm/radeon: fix runpm disabling on non-PX harder
  drm/radeon/cik: properly set sdma ring status on disable
  drm/radeon/cik: stop the sdma engines in the enable() function
  drm/radeon/cik: properly set compute ring status on disable

Alexander Aring (1):
  at86rf230: fix lockdep splats

Alexei Starovoitov (1):
  x86: bpf_jit: support negative offsets

Amir Vadai (2):
  net/mlx4_core: Fix memory access error in mlx4_QUERY_DEV_CAP_wrapper()
  net/mlx4_core: mlx4_init_slave() shouldn't access comm channel
before PF is ready

Amitkumar Karwar (2):
  mwifiex: copy AP's HT capability info correctly
  mwifiex: save and copy AP's VHT capability info correctly

Andrew Lutomirski (1):
  net: Improve SO_TIMESTAMPING documentation and fix a minor code bug

Andrew Morton (1):
  revert "kallsyms: fix absolute addresses for kASLR"

Annie Li (1):
  Xen-netback: Fix issue caused by using gso_type wrongly

Anton Blanchard (2):
  net: unix socket code abuses csum_partial
  ibmveth: Fix endian issues with MAC addresses

Anton Nayshtut (1):
  ipv6: Fix exthdrs offload registration.

Arnd Bergmann (1):
  vmxnet3: fix building without CONFIG_PCI_MSI

Artem Fetishev (1):
  fs/proc/base.c: fix GPF in /proc/$PID/map_files

Ben Hutchings (3):
  perf trace: Decode architecture-specific signal numbers
  bna: Replace large udelay() with mdelay()
  mm/Kconfig: fix URL for zsmalloc benchmark

Bjorn Helgaas (3):
  phy: fix compiler array bounds warning on settings[]
  PCI: Enable INTx in pci_reenable_device() only when MSI/MSI-X not enabled
  PCI: Don't check resource_size() in pci_bus_alloc_resource()

Chad Dupuis (1):
  [SCSI] qla2xxx: Fix multiqueue MSI-X registration.

Colin Ian King (1):
  tools/testing/selftests/ipc/msgque.c: handle msgget failure
return correctly

Dan Williams (1):
  [SCSI] isci: fix reset timeout handling

Daniel Borkmann (2):
  net: sctp: fix skb leakage in COOKIE ECHO path of chunk->auth_chunk
  MAINTAINERS: add networking selftests to NETWORKING

Daniel J Blueman (1):
  x86/amd/numa: Fix northbridge quirk to assign correct NUMA node

Dave Jones (2):
  perf/x86: Fix leak in uncore_type_init failure paths
  x86: Remove CONFIG_X86_OOSTORE

Don Zickus (1):
  perf machine: Use map as success in ip__resolve_ams

Eliad Peller (1):
  mac80211: consider virtual mon when calculating min_def

Emmanuel Grumbach (1):
  iwlwifi: mvm: don't WARN when statistics are handled late

Eric Dumazet (3):
  pkt_sched: move the sanity test in qdisc_list_add()
  pkt_sched: fq: do not hold qdisc lock while allocating memory
  tcp: tcp_release_cb() should release socket ownership

Eric W. Biederman (3):
  audit: Use struct net not pid_t to remember the network namespce
to reply in
  audit: Send replies in the proper network namespace.
  audit: Update kdoc for audit_send_reply and audit_list_rules_send

Erik Hugne (3):
  tipc: drop subscriber connection id invalidation
  tipc: fix memory leak during module removal
  tipc: don't log disabled tasklet handler errors

Fernando Luis Vazquez Cao (1):
  sched/clock: Prevent tracing recursion in sched_clock_cpu()

Florian Westphal (1):
  inet: frag: make sure forced eviction removes all frags

Gavin Shan (1):
  net/mlx4: Support shutdown() interface

Geert Uytterhoeven (2):
  cris: convert ffs from an object-like macro to a function-like macro
  packet: doc: Spelling s/than/that/

Giridhar Malavali (1):
  [SCSI] qla2xxx: Poll during initialization for ISP25xx and ISP83xx

Giuseppe CAVALLARO (4):
  stmmac: disable at run-time the EEE if not supported
  stmmac: fix and better tune the default bu

RE: [f2fs-dev] [PATCH 3/5] f2fs: format segment_info's show for better legibility

2014-03-16 Thread Chao Yu

Hi Gu,

> -Original Message-
> From: Gu Zheng [mailto:guz.f...@cn.fujitsu.com]
> Sent: Thursday, March 13, 2014 5:58 PM
> To: Chao Yu
> Cc: 'Kim'; 'linux-kernel'; 'f2fs'
> Subject: Re: [f2fs-dev] [PATCH 3/5] f2fs: format segment_info's show for 
> better legibility
> 
> Hi,
> On 03/13/2014 05:13 PM, Chao Yu wrote:
> 
> > Hi Gu,
> >
> >> -Original Message-
> >> From: Gu Zheng [mailto:guz.f...@cn.fujitsu.com]
> >> Sent: Friday, March 07, 2014 6:44 PM
> >> To: Kim
> >> Cc: linux-kernel; f2fs
> >> Subject: [f2fs-dev] [PATCH 3/5] f2fs: format segment_info's show for 
> >> better legibility
> >>
> >> The original segment_info's show is a bit out-of-format:
> >>
> >> [root@guz Demoes]# cat /proc/fs/f2fs/loop0/segment_info
> >> 0 0 0 0 0 0 0 0 0 0 0
> >> 0 0 0 0 0 0 0 0 0 0
> >> 0 0 0 0 0 0 0 0 0 0
> >> ..
> >> 0 0 0 0 0 0 0 0 0 0
> >> 0 0 1 0 0 1 [root@guz Demoes]#
> >>
> >> so we fix it here for better legibility.
> >> [root@guz Demoes]# cat /proc/fs/f2fs/loop0/segment_info
> >> 0 0 0 0 0 0 0 0 0 0
> >> 0 0 0 0 0 0 0 0 0 0
> >> 0 0 0 0 0 0 0 0 0 0
> >> ..
> >> 0 0 0 0 0 0 0 0 0 0
> >> 0 0 1 0 0 1
> >> [root@guz Demoes]#
> >
> > Here is one case, this looks not legible.
> >
> > 0 0 0 0 0 0 0 0 0 0
> > 1 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 509 512 512 512
> > 512 512 512 512 512 512 512 512 512 512
> > 512 512 512 512 331 278 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 512 512 512
> > 512 512 512 512 512 512 512 512 512 512
> > 0 512 512 512 512 512 512 1 0 512
> >
> > So how about modifying code as following?
> 
> >
> >>
> >> Signed-off-by: Gu Zheng 
> >
> > Reviewed-by: Chao Yu 
> >
> >> ---
> >>  fs/f2fs/super.c |7 ---
> >>  1 files changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> >> index 72df734..6e4851c 100644
> >> --- a/fs/f2fs/super.c
> >> +++ b/fs/f2fs/super.c
> >> @@ -546,11 +546,12 @@ static int segment_info_seq_show(struct seq_file 
> >> *seq, void *offset)
> >>
> >>for (i = 0; i < total_segs; i++) {
> >>seq_printf(seq, "%u", get_valid_blocks(sbi, i, 1));
> >
> > seq_printf(seq, "%-3u ", get_valid_blocks(sbi, i, 1));
> 
> Hmm, this patch has been applied to f2fs-dev tree, so maybe you can send a
> patch to enhance it directly!:)

Alright, I will send another patch to fix it. :)
Thanks.

> 
> Regards,
> Gu
> 
> >
> >> -  if (i != 0 && (i % 10) == 0)
> >> -  seq_puts(seq, "\n");
> >> +  if ((i % 10) == 9 || i == (total_segs - 1))
> >> +  seq_putc(seq, '\n');
> >>else
> >> -  seq_puts(seq, " ");
> >> +  seq_putc(seq, ' ');
> >>}
> >> +
> >>return 0;
> >>  }
> >>
> >> --
> >> 1.7.7
> >>
> >>
> >> --
> >> Subversion Kills Productivity. Get off Subversion & Make the Move to 
> >> Perforce.
> >> With Perforce, you get hassle-free workflows. Merge that actually works.
> >> Faster operations. Version large binaries.  Built-in WAN optimization and 
> >> the
> >> freedom to use Git, Perforce or both. Make the move to Perforce.
> >> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
> >> ___
> >> Linux-f2fs-devel mailing list
> >> linux-f2fs-de...@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> >
> >


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RESEND 3] hwrng: add randomness to system from rng sources

2014-03-16 Thread H. Peter Anvin

On 03/04/2014 02:39 PM, Matt Mackall wrote:
> 
> [temporarily coming out of retirement to provide a clue]
> 
> The pool mixing function is intentionally _reversible_. This is a
> crucial security property.
> 
> That means, if I have an initial secret pool state X, and hostile
> attacker controlled data Y, then we can do:
> 
> X' = mix(X, Y)
> 
>  and 
> 
> X = unmix(X', Y)
> 
> We can see from this that the combination of (X' and Y) still contain
> the information that was originally in X. Since it's clearly not in Y..
> it must all remain in X'.
> 

This of course assumes that the attacker doesn't know the state of the
pool X.

The other thing to note is that reversible doesn't necessarily mean
linear (the current mixing function is linear.)  AES, for example, is
reversible (if and only if you possess the key) but is highly nonlinear.

I'm not saying we should use AES to mix the pool -- it is almost
guaranteed to be too expensive.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [sched/balance] BUG: unable to handle kernel paging request at ffffffff00000058

2014-03-16 Thread Alex Shi

On 03/15/2014 09:46 AM, Fengguang Wu wrote:
> Alex, we noticed the below changes in
> 
> https://github.com/alexshi/power-scheduling.git single-balance
> commit e1f728f230025ba2f2ed71e19b156291f53b68fe ("sched/balance: replace 
> idle_balance")
> 


Thanks a lot for your data! But I am wondering if the buggy commit
location is right? Since the following panic looks like due to original
kernel, not on my code, since I neither touched any functions on call
path, nor change the struct in panic function.

Could you use addr2line locate the issue line in function
wq_worker_waking_up().

void wq_worker_waking_up(struct task_struct *task, int cpu)
{
struct worker *worker = kthread_data(task);

if (!(worker->flags & WORKER_NOT_RUNNING)) {
WARN_ON_ONCE(worker->pool->cpu != cpu);
atomic_inc(&worker->pool->nr_running);
}
}

> [3.814901] PM: Registering ACPI NVS region [mem 0x650a-0x65375fff] 
> (2973696 bytes)
> [3.816130] PM: Registering ACPI NVS region [mem 0x65df8000-0x66df7fff] 
> (16777216 bytes)
> [3.820540] PM: Registering ACPI NVS region [mem 0x7accf000-0x7b6fefff] 
> (10682368 bytes)
> [3.824649] BUG: unable to handle kernel paging request at 0058
> [3.828000] IP: [] wq_worker_waking_up+0x14/0x5b
> [3.828000] PGD 220d067 PUD 0 
> [3.828000] Oops:  [#1] SMP 
> [3.828000] Modules linked in:
> [3.828000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 3.14.0-rc6-9-ge1f728f #1
> [3.828000] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS 
> BKLDSDP1.86B.0031.R01.1304221600 04/22/2013
> [3.828000] task: 8808541f8000 ti: 8808541e6000 task.ti: 
> 8808541e6000
> [3.828000] RIP: 0010:[]  [] 
> wq_worker_waking_up+0x14/0x5b
> [3.828000] RSP: :8808541e7cf0  EFLAGS: 00010002
> [3.828000] RAX:  RBX: 8808543b0910 RCX: 
> 824ee1c0
> [3.828000] RDX: 0005811058a8 RSI: 0001 RDI: 
> 8808543b0910
> [3.828000] RBP: 8808541e7d00 R08: 064d R09: 
> b7f9
> [3.828000] R10: 0027 R11: 6bd9 R12: 
> 0001
> [3.828000] R13: 00013000 R14: 0001 R15: 
> 0001
> [3.828000] FS:  () GS:88085f80() 
> knlGS:
> [3.828000] CS:  0010 DS:  ES:  CR0: 80050033
> [3.828000] CR2: 0058 CR3: 0220c000 CR4: 
> 001407f0
> [3.828000] Stack:
> [3.828000]  8808543b0910 88085f833000 8808541e7d20 
> 81105a04
> [3.828000]  8808543b0910 88085f833000 8808541e7d68 
> 81108259
> [3.828000]  0046 8808543b0f24 8808543b0910 
> 88085f838560
> [3.828000] Call Trace:
> [3.828000]  [] ttwu_do_activate.constprop.88+0x4f/0x61
> [3.828000]  [] try_to_wake_up+0x1f7/0x228
> [3.828000]  [] wake_up_process+0x34/0x37
> [3.828000]  [] wake_up_worker+0x24/0x26
> [3.828000]  [] pwq_adjust_max_active+0x7f/0xaa
> [3.828000]  [] link_pwq+0x2f/0x4a
> [3.828000]  [] __alloc_workqueue_key+0x29c/0x459
> [3.828000]  [] ? pm_debugfs_init+0x24/0x24
> [3.828000]  [] pm_init+0x1e/0x7b
> [3.828000]  [] do_one_initcall+0xa4/0x13a
> [3.828000]  [] ? parse_args+0x25f/0x33d
> [3.828000]  [] kernel_init_freeable+0x1a9/0x22e
> [3.828000]  [] ? do_early_param+0x88/0x88
> [3.828000]  [] ? rest_init+0x89/0x89
> [3.828000]  [] kernel_init+0xe/0xdf
> [3.828000]  [] ret_from_fork+0x7c/0xb0
> [3.828000]  [] ? rest_init+0x89/0x89
> [3.828000] Code: 46 40 83 60 14 df 48 83 c4 28 31 c0 5b 41 5c 41 5d 41 5e 
> 41 5f 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 41 89 f4 53 e8 c6 53 00 00  
> 40 58 c8 01 00 00 48 89 c3 75 36 48 8b 40 48 44 39 60 04 74 
> [3.828000] RIP  [] wq_worker_waking_up+0x14/0x5b
> [3.828000]  RSP 
> [3.828000] CR2: 0058
> [3.828000] ---[ end trace 5d87af0bad79b4cd ]---


-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V6 ] mm readahead: Fix readahead fail for memoryless cpu and limit readahead pages

2014-03-16 Thread Madper Xie


Raghavendra K T  writes:

> On 02/18/2014 03:19 PM, Jan Kara wrote:
>> On Tue 18-02-14 12:55:38, Raghavendra K T wrote:
>>> Currently max_sane_readahead() returns zero on the cpu having no local 
>>> memory node
>>> which leads to readahead failure. Fix the readahead failure by returning
>>> minimum of (requested pages, 512). Users running application on a 
>>> memory-less cpu
>>> which needs readahead such as streaming application see considerable boost 
>>> in the
>>> performance.
>>>
>>> Result:
>>> fadvise experiment with FADV_WILLNEED on a PPC machine having memoryless CPU
>>> with 1GB testfile ( 12 iterations) yielded around 46.66% improvement.
>>>
>>> fadvise experiment with FADV_WILLNEED on a x240 machine with 1GB testfile
>>> 32GB* 4G RAM  numa machine ( 12 iterations) showed no impact on the normal
>>> NUMA cases w/ patch.
>>Can you try one more thing please? Compare startup time of some big
>> executable (Firefox or LibreOffice come to my mind) for the patched and
>> normal kernel on a machine which wasn't hit by this NUMA issue. And don't
>> forget to do "echo 3 >/proc/sys/vm/drop_caches" before each test to flush
>> the caches. If this doesn't show significant differences, I'm OK with the
>> patch.
>>
>
> Thanks Honza, I checked with firefox (starting to particular point)..
> I do not see any difference. Both the case took around 14sec.
>
>   ( some time it is even faster.. may be because we do not do free page 
> calculation?. )
Hi. Just a concern. Will the performance reduce on some special storage
backend? E.g. tape.
The existent applications may using readahead for userspace I/O schedule
to decrease seeking time.
-- 
Thanks,
Madper
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] spi: sc18is602: Don't be that restrictive with the maximum transfer speed

2014-03-16 Thread Axel Lin

2014-03-17 9:47 GMT+08:00 Guenter Roeck :
> Commit 09e99bca8 (spi: sc18is602: Convert to let spi core validate
> transfer speed) made the maximum transfer speed much more restrictive
> than before. The transfer speed used to be adjusted to 1/4 of the chip
> clock rate if a higher transfer speed was requested. Now such transfers are
> simply rejected. With default settings, this causes, for example, a transfer
> request at 2 mbps to be rejected because the maximum speed with the default
> chip clock is 1.843 mbps.
>
> This is unnecessarily restrictive and causes unnecessary failures. Loosen
> the limit to accept transfers up to 50% of the clock rate and adjust
> the speed as needed when setting up the actualt transfer.

I suppose this controller can only set to SC18IS602_MODE_CLOCK_DIV_4 for the
highest transfer speed. If this is the case, master->max_speed_hz should be
hw->freq / 4.

Now I'm thinking if it is ok to use master->max_speed_hz as transfer speed when
xfer->speed_hz > master->max_speed_hz. And it should be handled in spi core.
I'm sending a RFC patch now.

Regards,
Axel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2 0/8] devfreq: exynos4: Support dt and use common ppmu driver

2014-03-16 Thread Chanwoo Choi

Hi Tomasz,

On 03/15/2014 02:58 AM, Tomasz Figa wrote:
> Hi Chanwoo,
> 
> On 13.03.2014 09:17, Chanwoo Choi wrote:
>> This patchset support devicetree and use common ppmu driver instead of
>> individual code of exynos4_bus.c to remove duplicate code. Also this patchset
>> get the resources for busfreq from dt data by using DT helper function.
>> - PPMU register address
>> - PPMU clock
>> - Regulator for INT/MIF block
>>
>> This patchset use SET_SYSTEM_SLEEP_PM_OPS macro intead of legacy method.
>> To remove power-leakage in suspend state, before entering suspend state,
>> disable ppmu clocks.
>>
>> Changes from v1:
>> - Add exynos4_bus.txt documentation for devicetree guide
>> - Fix probe failure if CONFIG_PM_OPP is disabled
>> - Fix typo and resource leak(regulator/clock/memory) when happening probe 
>> failure
>> - Add additionally comment for PPMU usage instead of previous PPC
>> - Split separate patch to remove ambiguous of patch
>>
>> Chanwoo Choi (8):
>>devfreq: exynos4: Support devicetree to get device id of Exynos4 SoC
>>devfreq: exynos4: Use common ppmu driver and get ppmu address from dt data
>>devfreq: exynos4: Add ppmu's clock control and code clean about regulator 
>> control
>>devfreq: exynos4: Fix bug of resource leak and code clean on probe()
>>devfreq: exynos4: Use SET_SYSTEM_SLEEP_PM_OPS macro
>>devfreq: exynos4: Fix power-leakage of clock on suspend state
>>devfreq: exynos4: Add CONFIG_PM_OPP dependency to fix probe fail
>>devfreq: exynos4: Add busfreq driver for exynos4210/exynos4x12
>>
>>   .../devicetree/bindings/devfreq/exynos4_bus.txt|  49 +++
>>   drivers/devfreq/Kconfig|   1 +
>>   drivers/devfreq/exynos/Makefile|   2 +-
>>   drivers/devfreq/exynos/exynos4_bus.c   | 415 
>> ++---
>>   4 files changed, 341 insertions(+), 126 deletions(-)
>>   create mode 100644 
>> Documentation/devicetree/bindings/devfreq/exynos4_bus.txt
>>
> 
> I have reviewed this series and there are several comments that I'd like to 
> ask you to address. Please see my replies to particular patches.

OK, I'll fix it about your comment.

> 
> However, this driver, even after applying your series, is still far from a 
> state that would allow it to be enabled. The most important issue is direct 
> access to CMU registers, based on static mapping, which is not allowed on 
> multiplatform kernels and multiplatform-awareness for drivers is currently a 
> must.
> 
> To allow this driver to be enabled, it needs to be converted to use common 
> clock framework functions to configure all clocks, e.g. clk_set_rate(), 
> clk_set_parent(), etc., without accessing CMU registers directly.
> 
> Of course as long as the driver is effectively unusable, to keep development, 
> we can proceed with refactoring it step-by-step and your series would be 
> basically the first step, after addressing the review comments.
> 

I agree your opinion. When setting frequency of memory bus, this driver access
directly to CMU registers. I know it should be modified by using common clk
framework as your comment.

I'll send patch set about using common clk framework instead of CMU register
based on static mapping after finished the review and apply of this patch set 
as next step.

Thanks for your review.

Best Regards,
Chanwoo Choi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2 0/8] devfreq: exynos4: Support dt and use common ppmu driver

2014-03-16 Thread Chanwoo Choi

Hi,
On 03/14/2014 07:47 PM, Bartlomiej Zolnierkiewicz wrote:
> On Friday, March 14, 2014 12:14:03 PM Chanwoo Choi wrote:
>> Hi,
>>
>> On 03/14/2014 01:43 AM, Bartlomiej Zolnierkiewicz wrote:
>>>
>>> Hi,
>>>
>>> On Thursday, March 13, 2014 05:17:21 PM Chanwoo Choi wrote:
 This patchset support devicetree and use common ppmu driver instead of
 individual code of exynos4_bus.c to remove duplicate code. Also this 
 patchset
 get the resources for busfreq from dt data by using DT helper function.
 - PPMU register address
 - PPMU clock
 - Regulator for INT/MIF block

 This patchset use SET_SYSTEM_SLEEP_PM_OPS macro intead of legacy method.
 To remove power-leakage in suspend state, before entering suspend state,
 disable ppmu clocks.

 Changes from v1:
 - Add exynos4_bus.txt documentation for devicetree guide
 - Fix probe failure if CONFIG_PM_OPP is disabled
 - Fix typo and resource leak(regulator/clock/memory) when happening probe 
 failure
 - Add additionally comment for PPMU usage instead of previous PPC
 - Split separate patch to remove ambiguous of patch

 Chanwoo Choi (8):
   devfreq: exynos4: Support devicetree to get device id of Exynos4 SoC
   devfreq: exynos4: Use common ppmu driver and get ppmu address from dt 
 data
   devfreq: exynos4: Add ppmu's clock control and code clean about 
 regulator control
   devfreq: exynos4: Fix bug of resource leak and code clean on probe()
   devfreq: exynos4: Use SET_SYSTEM_SLEEP_PM_OPS macro
   devfreq: exynos4: Fix power-leakage of clock on suspend state
   devfreq: exynos4: Add CONFIG_PM_OPP dependency to fix probe fail
   devfreq: exynos4: Add busfreq driver for exynos4210/exynos4x12

  .../devicetree/bindings/devfreq/exynos4_bus.txt|  49 +++
  drivers/devfreq/Kconfig|   1 +
  drivers/devfreq/exynos/Makefile|   2 +-
  drivers/devfreq/exynos/exynos4_bus.c   | 415 
 ++---
  4 files changed, 341 insertions(+), 126 deletions(-)
  create mode 100644 
 Documentation/devicetree/bindings/devfreq/exynos4_bus.txt
>>>
>>> Thanks for updating this patchset.  There are still some minor issues
>>> left though:
>>>
>>> - patch #4 should be at beginning of the patch series
>>>
>>> - moving of devfreq_unregister_opp_notifier(dev, data->devfreq) from
>>>   exynos4_bus_exit() to exynos4_busfreq_remove() should be in patch #4
>>>   (which should really be at the beggining of patch series) not #3
>>>
>>> - handling of iounmap(data->ppmu[i].hw_base) should be added to
>>>   exynos4_bus_exit() in patch #2 not #3
>>>
>>> - patch #8 summary and description should mention fact that it adds DT
>>>   binding documentation (not the driver itself) and the patch itself
>>>   can be slighlty polished
>>
>> OK, I'll re-order the sequence of patchset and modify minior issues about 
>> your comment.
>> Also, I'll modify the patch description for patch8.
>>
>>>
>>> One important note about this patchset not mentioned in the cover
>>> letter is that it is improving currently unused driver (because of
>>> DT-only mach-exynos conversion the only user was removed in June 2013
>>> and from the reading the code I suspect that even that user hadn't
>>> worked previously).  As such this patch series should not cause any
>>> regressions.
>>
>> I don't understand correct your meaning.I explained DT support on upper
>> patchset description by using DT helper function and I added PPMU 
>> descritpion.
>> Also, Each patch include detailed description of patch content.
> 
> Everything is okay, I just noted that since there are no users of this
> driver currently (the only user was NURI and it was removed by DT
> conversion of mach-exynos) it should be okay to merge the patch series
> quickly once reviewed and acked by the respective maintainers.
> 
>> What is more needed?
> 
> Users of the driver? ;)
> 
> Your patchset adds DT support and fixes to the driver but it doesn't
> add actual users of the driver to arch/arm/boot/dts/ files.

Ah, I didn't understand 'users'  meanings.

Now, clk-exynos4.c driver in mainline don't provide the clocks for PPMU IP. So, 
I can't add dt node of exynos4_busfreq to 
exynos4210.dtsi/exynos4x12.dtsi/exynos4210-trats.dts/exynos4412-trats2.dts.

First of all, I will add the ppmu clocks to clk-exynos4.c driver and then 
modify dts file for exynos4_busfreq as your comment. That which add the ppmu 
clocks is apart from this patch set.

Thanks for your comment.

Best Regards,
Chanwoo Choi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 >

1 - 100 of 246 matches

Mail list logo