Re: [PATCH] powerpc 8xx: Fixing memory init issue with CONFIG_PIN_TLB

2013-10-15 Thread leroy christophe


Le 15/10/2013 22:33, Scott Wood a écrit :

On Tue, 2013-10-15 at 18:27 +0200, leroy christophe wrote:

Le 11/10/2013 17:13, Joakim Tjernlund a écrit :

"Linuxppc-dev"

wrote on 2013/10/11 14:56:40:

Activating CONFIG_PIN_TLB allows access to the 24 first Mbytes of memory

at

bootup instead of 8. It is needed for "big" kernels for instance when

activating

CONFIG_LOCKDEP_SUPPORT. This needs to be taken into account in init_32

too,

otherwise memory allocation soon fails after startup.

Signed-off-by: Christophe Leroy 

diff -ur linux-3.11.org/arch/powerpc/kernel/head_8xx.S

linux-3.11/arch/powerpc/kernel/head_8xx.S

--- linux-3.11.org/arch/powerpc/mm/init_32.c   2013-09-02

22:46:10.0 +0200

+++ linux-3.11/arch/powerpc/mm/init_32.c   2013-09-09 11:28:54.0

+0200

@@ -213,7 +213,12 @@
   */
  BUG_ON(first_memblock_base != 0);

+#ifdef CONFIG_PIN_TLB
+   /* 8xx can only access 24MB at the moment */
+   memblock_set_current_limit(min_t(u64, first_memblock_size,

0x0180));

+#else
  /* 8xx can only access 8MB at the moment */
  memblock_set_current_limit(min_t(u64, first_memblock_size,

0x0080));

+#endif
   }
   #endif /* CONFIG_8xx */

hmm, I think you should always map 24 MB (or less if RAM < 24 MB) and do
the same
in head_8xx.S.

Or to keep it simple, just always map at least 16 MB here and in
head_8xx.S, assuming
that 16 MB is min RAM for any 8xx system running 3.x kernels.

Yes we could do a more elaborated modification in the future. However it
also has an impact on the boot loader, so I'm not sure we should make it
the default without thinking twice.

In the meantime, my patch does take into account the existing situation
where you have 8Mb by default and 24Mb when you activate CONFIG_PIN_TLB.
I see it as a bug fix and I believe we should include it at least in
order to allow including in the stable releases.

Do you see any issue with this approach ?

The patch is fine, but I don't think it's stable material (BTW, if it
were, you should have marked it as such when submitting).  If I
understand the situation correctly, there's no regression, and nothing
fails to work with CONFIG_PIN_TLB that would have worked without it.
It's just making CONFIG_PIN_TLB more useful.



Yes the patch is definitly stable. How should I have mark it ?

The situation is that in 2010, I discovered that I was not able to start 
a big Kernel because of the 8Mb limit.
You told me (see attached mail) that in order to get rid of that limit I 
shall use CONFIG_PIN_TLB: it was the first step, it helped pass the 
memory zeroize at init, but it was not enough as I then got problems 
with the Device Tree being erased because being put inside the first 8Mb 
area too. Then I temporarely gave up at that time.


Recently I started again. After fixing my bootloader to get the device 
tree somewhere else, I discovered this 8Mb limit hardcoded in 
mm/init_32.c for the 8xx

With the patch I submitted I can now boot a kernel which is bigger than 8Mb.

So, I'm a bit lost here on what to do.
--- Begin Message ---
On Tue, 2013-10-15 at 18:27 +0200, leroy christophe wrote:
> Le 11/10/2013 17:13, Joakim Tjernlund a écrit :
> > "Linuxppc-dev"
> > 
> > wrote on 2013/10/11 14:56:40:
> >> Activating CONFIG_PIN_TLB allows access to the 24 first Mbytes of memory
> > at
> >> bootup instead of 8. It is needed for "big" kernels for instance when
> > activating
> >> CONFIG_LOCKDEP_SUPPORT. This needs to be taken into account in init_32
> > too,
> >> otherwise memory allocation soon fails after startup.
> >>
> >> Signed-off-by: Christophe Leroy 
> >>
> >> diff -ur linux-3.11.org/arch/powerpc/kernel/head_8xx.S
> > linux-3.11/arch/powerpc/kernel/head_8xx.S
> >> --- linux-3.11.org/arch/powerpc/mm/init_32.c   2013-09-02
> > 22:46:10.0 +0200
> >> +++ linux-3.11/arch/powerpc/mm/init_32.c   2013-09-09 11:28:54.0
> > +0200
> >> @@ -213,7 +213,12 @@
> >>   */
> >>  BUG_ON(first_memblock_base != 0);
> >>
> >> +#ifdef CONFIG_PIN_TLB
> >> +   /* 8xx can only access 24MB at the moment */
> >> +   memblock_set_current_limit(min_t(u64, first_memblock_size,
> > 0x0180));
> >> +#else
> >>  /* 8xx can only access 8MB at the moment */
> >>  memblock_set_current_limit(min_t(u64, first_memblock_size,
> > 0x0080));
> >> +#endif
> >>   }
> >>   #endif /* CONFIG_8xx */
> > hmm, I think you should always map 24 MB (or less if RAM < 24 MB) and do
> > the same
> > in head_8xx.S.
> >
> > Or to keep it simple, just always map at least 16 MB here and in
> > head_8xx.S, assuming
> > that 16 MB is min RAM for any 8xx system running 3.x kernels.
> Yes we could do a more elaborated modification in the future. However it 
> also has an impact on the boot loader, so I'm not sure we should make it 
> the default without thinking twice.
> 
> In the meantime, my patch does take into account the existing situation 
> where you have 8Mb by default and 24Mb when you activate CONFIG_PIN_TLB.
> I see it as a bug fix and 

[PATCH TRIVIAL] sched/fair: simple cleanup in update_sg_lb_stats()

2013-10-15 Thread Kamalesh Babulal
Add rq->nr_running to sgs->sum_nr_running directly instead of
assigning it through an intermediate variable nr_running.

Signed-off-by: Kamalesh Babulal 
---
 kernel/sched/fair.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4aa0b10889d0..c7ebad6c40af 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5465,7 +5465,6 @@ static inline void update_sg_lb_stats(struct lb_env *env,
struct sched_group *group, int load_idx,
int local_group, struct sg_lb_stats *sgs)
 {
-   unsigned long nr_running;
unsigned long load;
int i;
 
@@ -5474,8 +5473,6 @@ static inline void update_sg_lb_stats(struct lb_env *env,
for_each_cpu_and(i, sched_group_cpus(group), env->cpus) {
struct rq *rq = cpu_rq(i);
 
-   nr_running = rq->nr_running;
-
/* Bias balancing toward cpus of our domain */
if (local_group)
load = target_load(i, load_idx);
@@ -5483,7 +5480,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
load = source_load(i, load_idx);
 
sgs->group_load += load;
-   sgs->sum_nr_running += nr_running;
+   sgs->sum_nr_running += rq->nr_running;
 #ifdef CONFIG_NUMA_BALANCING
sgs->nr_numa_running += rq->nr_numa_running;
sgs->nr_preferred_running += rq->nr_preferred_running;
-- 
1.8.4.474.g128a96c

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v12] i2c: exynos5: add High Speed I2C controller driver

2013-10-15 Thread Naveen Krishna Chatradhi
Adds support for High Speed I2C driver found in Exynos5 and
later SoCs from Samsung.

Driver only supports Device Tree method.

Signed-off-by: Naveen Krishna Chatradhi 
Signed-off-by: Taekgyun Ko 
Reviewed-by: Simon Glass 
Tested-by: Andrew Bresticker 
Signed-off-by: Yuvaraj Kumar C D 
Signed-off-by: Andrew Bresticker 
---
Changes since v10:
1. Remove the incomplete runtime_pm code
2. Correct the error checks as suggested by Thomas
3. Use i2c_add_numbered_adapter() as suggested
4. Modified the irq routine to handle the specific interrupts
5. Added spinlocks around the irq code
6. Remove the "mode" of operation field from device tree node and use the
   clock-frequency to decide the mode.

Changes since v11:
1. Use SIMPLE_DEV_PM_OPS instead of definition
2. Use i2c_add_adapter and remove i2c->adap.nr = -1;
3. Minor cosmotic changes based on review comments from Wolfram Sang

 .../devicetree/bindings/i2c/i2c-exynos5.txt|   44 ++
 drivers/i2c/busses/Kconfig |7 +
 drivers/i2c/busses/Makefile|1 +
 drivers/i2c/busses/i2c-exynos5.c   |  776 
 4 files changed, 828 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/i2c/i2c-exynos5.txt
 create mode 100644 drivers/i2c/busses/i2c-exynos5.c

diff --git a/Documentation/devicetree/bindings/i2c/i2c-exynos5.txt 
b/Documentation/devicetree/bindings/i2c/i2c-exynos5.txt
new file mode 100644
index 000..056732c
--- /dev/null
+++ b/Documentation/devicetree/bindings/i2c/i2c-exynos5.txt
@@ -0,0 +1,44 @@
+* Samsung's High Speed I2C controller
+
+The Samsung's High Speed I2C controller is used to interface with I2C devices
+at various speeds ranging from 100khz to 3.4Mhz.
+
+Required properties:
+  - compatible: value should be.
+  -> "samsung,exynos5-hsi2c", for i2c compatible with exynos5 hsi2c.
+  - reg: physical base address of the controller and length of memory mapped
+region.
+  - interrupts: interrupt number to the cpu.
+  - #address-cells: always 1 (for i2c addresses)
+  - #size-cells: always 0
+
+  - Pinctrl:
+- pinctrl-0: Pin control group to be used for this controller.
+- pinctrl-names: Should contain only one value - "default".
+
+Optional properties:
+  - clock-frequency: Desired operating frequency in Hz of the bus.
+-> If not specified, the bus operates in fast-speed mode at
+   at 100khz.
+-> If specified, the bus operates in high-speed mode only if the
+   clock-frequency is >= 1Mhz.
+
+Example:
+
+hsi2c@12ca {
+   compatible = "samsung,exynos5-hsi2c";
+   reg = <0x12ca 0x100>;
+   interrupts = <56>;
+   clock-frequency = <10>;
+
+   pinctrl-0 = <_bus>;
+   pinctrl-names = "default";
+
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   s2mps11_pmic@66 {
+   compatible = "samsung,s2mps11-pmic";
+   reg = <0x66>;
+   };
+};
diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index fcdd321..69b1848 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -436,6 +436,13 @@ config I2C_EG20T
  ML7213/ML7223/ML7831 is companion chip for Intel Atom E6xx series.
  ML7213/ML7223/ML7831 is completely compatible for Intel EG20T PCH.
 
+config I2C_EXYNOS5
+   tristate "Exynos5 high-speed I2C driver"
+   depends on ARCH_EXYNOS5 && OF
+   help
+ Say Y here to include support for high-speed I2C controller in the
+ Exynos5 based Samsung SoCs.
+
 config I2C_GPIO
tristate "GPIO-based bitbanging I2C"
depends on GPIOLIB
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index d00997f..d1ad371 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -42,6 +42,7 @@ i2c-designware-platform-objs := i2c-designware-platdrv.o
 obj-$(CONFIG_I2C_DESIGNWARE_PCI)   += i2c-designware-pci.o
 i2c-designware-pci-objs := i2c-designware-pcidrv.o
 obj-$(CONFIG_I2C_EG20T)+= i2c-eg20t.o
+obj-$(CONFIG_I2C_EXYNOS5)  += i2c-exynos5.o
 obj-$(CONFIG_I2C_GPIO) += i2c-gpio.o
 obj-$(CONFIG_I2C_HIGHLANDER)   += i2c-highlander.o
 obj-$(CONFIG_I2C_IBM_IIC)  += i2c-ibm_iic.o
diff --git a/drivers/i2c/busses/i2c-exynos5.c b/drivers/i2c/busses/i2c-exynos5.c
new file mode 100644
index 000..0b1e904
--- /dev/null
+++ b/drivers/i2c/busses/i2c-exynos5.c
@@ -0,0 +1,776 @@
+/**
+ * i2c-exynos5.c - Samsung Exynos5 I2C Controller Driver
+ *
+ * Copyright (C) 2013 Samsung Electronics Co., Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+*/
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * HSI2C controller from Samsung supports 2 

Re: linux-next: Tree for Oct 15

2013-10-15 Thread Guenter Roeck

On 10/15/2013 07:02 AM, Thierry Reding wrote:

Hi all,

I've uploaded today's linux-next tree to the master branch of the
repository below:

 git://gitorious.org/thierryreding/linux-next.git

A next-20131015 tag is also provided for convenience.

Gained a new conflict, but nothing too exciting. x86 and ARM default
configurations build fine. I've also used an x86 allmodconfig build to
check for build errors. Mark fixed most of those in the trees that he
created last Thursday and Friday, so I've cherry-picked them on top of
the final merge. There was one new build failure in the staging tree
that was trivial to fix so I added a patch to the tree as well.



This build does look much better than the previous ones. I 'only' get 12 build 
failures
out of 106 configurations. Worst are powerpc builds, with 7 out of 14 builds 
failed.

Details are at http://server.roeck-us.net:8010/builders; look for the 'next' 
column.

Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/18] perf tools: Introduce new 'ftrace' tool

2013-10-15 Thread Namhyung Kim
From: Namhyung Kim 

The ftrace command is a simple wrapper of kernel's ftrace
functionality.  It only supports single thread tracing currently and
just reads trace_pipe in text and then write it to stdout.

Cc: Steven Rostedt 
Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/Makefile.perf|   1 +
 tools/perf/builtin-ftrace.c | 228 
 tools/perf/builtin.h|   1 +
 tools/perf/command-list.txt |   1 +
 tools/perf/perf.c   |   1 +
 5 files changed, 232 insertions(+)
 create mode 100644 tools/perf/builtin-ftrace.c

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index c873e039aafb..79058b6a8435 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -431,6 +431,7 @@ BUILTIN_OBJS += $(OUTPUT)builtin-kmem.o
 BUILTIN_OBJS += $(OUTPUT)builtin-lock.o
 BUILTIN_OBJS += $(OUTPUT)builtin-kvm.o
 BUILTIN_OBJS += $(OUTPUT)builtin-inject.o
+BUILTIN_OBJS += $(OUTPUT)builtin-ftrace.o
 BUILTIN_OBJS += $(OUTPUT)tests/builtin-test.o
 BUILTIN_OBJS += $(OUTPUT)builtin-mem.o
 
diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
new file mode 100644
index ..6f3dd02c5b9a
--- /dev/null
+++ b/tools/perf/builtin-ftrace.c
@@ -0,0 +1,228 @@
+/*
+ * builtin-ftrace.c
+ *
+ * Copyright (c) 2013  LG Electronics,  Namhyung Kim 
+ *
+ * Released under the GPL v2.
+ */
+
+#include "builtin.h"
+#include "perf.h"
+
+#include 
+#include 
+
+#include "util/debug.h"
+#include "util/parse-options.h"
+#include "util/evlist.h"
+#include "util/target.h"
+#include "util/thread_map.h"
+
+
+#define DEFAULT_TRACER  "function_graph"
+
+struct perf_ftrace {
+   struct perf_evlist *evlist;
+   struct perf_target target;
+   const char *tracer;
+};
+
+static bool done;
+
+static void sig_handler(int sig __maybe_unused)
+{
+   done = true;
+}
+
+static int write_tracing_file(const char *name, const char *val)
+{
+   char *file;
+   int fd, ret = -1;
+   ssize_t size = strlen(val);
+
+   file = get_tracing_file(name);
+   if (!file) {
+   pr_debug("cannot get tracing file: %s\n", name);
+   return -1;
+   }
+
+   fd = open(file, O_WRONLY);
+   if (fd < 0) {
+   pr_debug("cannot open tracing file: %s\n", name);
+   goto out;
+   }
+
+   if (write(fd, val, size) == size)
+   ret = 0;
+   else
+   pr_debug("write '%s' to tracing/%s failed\n", val, name);
+
+   close(fd);
+out:
+   put_tracing_file(file);
+   return ret;
+}
+
+static int reset_tracing_files(struct perf_ftrace *ftrace __maybe_unused)
+{
+   if (write_tracing_file("tracing_on", "0") < 0)
+   return -1;
+
+   if (write_tracing_file("current_tracer", "nop") < 0)
+   return -1;
+
+   if (write_tracing_file("set_ftrace_pid", " ") < 0)
+   return -1;
+
+   return 0;
+}
+
+static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char 
**argv)
+{
+   char *trace_file;
+   int trace_fd;
+   char *trace_pid;
+   char buf[4096];
+   struct pollfd pollfd = {
+   .events = POLLIN,
+   };
+
+   if (geteuid() != 0) {
+   pr_err("ftrace only works for root!\n");
+   return -1;
+   }
+
+   if (argc < 1)
+   return -1;
+
+   signal(SIGINT, sig_handler);
+   signal(SIGUSR1, sig_handler);
+   signal(SIGCHLD, sig_handler);
+
+   reset_tracing_files(ftrace);
+
+   /* reset ftrace buffer */
+   if (write_tracing_file("trace", "0") < 0)
+   goto out;
+
+   if (perf_evlist__prepare_workload(ftrace->evlist, >target,
+ argv, false, true) < 0)
+   goto out;
+
+   if (write_tracing_file("current_tracer", ftrace->tracer) < 0) {
+   pr_err("failed to set current_tracer to %s\n", ftrace->tracer);
+   goto out;
+   }
+
+   if (asprintf(_pid, "%d", ftrace->evlist->threads->map[0]) < 0) {
+   pr_err("failed to allocate pid string\n");
+   goto out;
+   }
+
+   if (write_tracing_file("set_ftrace_pid", trace_pid) < 0) {
+   pr_err("failed to set pid: %s\n", trace_pid);
+   goto out_free_pid;
+   }
+
+   trace_file = get_tracing_file("trace_pipe");
+   if (!trace_file) {
+   pr_err("failed to open trace_pipe\n");
+   goto out_free_pid;
+   }
+
+   trace_fd = open(trace_file, O_RDONLY);
+
+   put_tracing_file(trace_file);
+
+   if (trace_fd < 0) {
+   pr_err("failed to open trace_pipe\n");
+   goto out_free_pid;
+   }
+
+   fcntl(trace_fd, F_SETFL, O_NONBLOCK);
+   pollfd.fd = trace_fd;
+
+   if (write_tracing_file("tracing_on", "1") < 0) {
+   pr_err("can't enable tracing\n");
+   goto out_close_fd;
+   }
+
+   

[PATCH 02/18] perf util: Add more debug message on failure path

2013-10-15 Thread Namhyung Kim
From: Namhyung Kim 

It's helpful for debugging on tracing features.

Signed-off-by: Namhyung Kim 
---
 tools/perf/util/header.c   |  4 ++-
 tools/perf/util/trace-event-read.c | 53 ++
 2 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index c3e5a3b817ab..f8afefbe99ae 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2721,8 +2721,10 @@ static int perf_evsel__prepare_tracepoint_event(struct 
perf_evsel *evsel,
}
 
event = pevent_find_event(pevent, evsel->attr.config);
-   if (event == NULL)
+   if (event == NULL) {
+   pr_debug("cannot find event format for %d\n", 
(int)evsel->attr.config);
return -1;
+   }
 
if (!evsel->name) {
snprintf(bf, sizeof(bf), "%s:%s", event->system, event->name);
diff --git a/tools/perf/util/trace-event-read.c 
b/tools/perf/util/trace-event-read.c
index e084e5e654b6..0e3b3f527320 100644
--- a/tools/perf/util/trace-event-read.c
+++ b/tools/perf/util/trace-event-read.c
@@ -262,39 +262,53 @@ static int read_header_files(struct pevent *pevent)
 
 static int read_ftrace_file(struct pevent *pevent, unsigned long long size)
 {
+   int ret;
char *buf;
 
buf = malloc(size);
-   if (buf == NULL)
+   if (buf == NULL) {
+   pr_debug("memory allocation failure\n");
return -1;
+   }
 
-   if (do_read(buf, size) < 0) {
-   free(buf);
-   return -1;
+   ret = do_read(buf, size);
+   if (ret < 0) {
+   pr_debug("error reading ftrace file.\n");
+   goto out;
}
 
-   parse_ftrace_file(pevent, buf, size);
+   ret = parse_ftrace_file(pevent, buf, size);
+   if (ret < 0)
+   pr_debug("error parsing ftrace file.\n");
+out:
free(buf);
-   return 0;
+   return ret;
 }
 
 static int read_event_file(struct pevent *pevent, char *sys,
unsigned long long size)
 {
+   int ret;
char *buf;
 
buf = malloc(size);
-   if (buf == NULL)
+   if (buf == NULL) {
+   pr_debug("memory allocation failure\n");
return -1;
+   }
 
-   if (do_read(buf, size) < 0) {
+   ret = do_read(buf, size);
+   if (ret < 0) {
free(buf);
-   return -1;
+   goto out;
}
 
-   parse_event_file(pevent, buf, size, sys);
+   ret = parse_event_file(pevent, buf, size, sys);
+   if (ret < 0)
+   pr_debug("error parsing event file.\n");
+out:
free(buf);
-   return 0;
+   return ret;
 }
 
 static int read_ftrace_files(struct pevent *pevent)
@@ -347,6 +361,7 @@ static int read_saved_cmdline(struct pevent *pevent)
 {
unsigned long long size;
char *buf;
+   int ret;
 
/* it can have 0 size */
size = read8(pevent);
@@ -354,18 +369,22 @@ static int read_saved_cmdline(struct pevent *pevent)
return 0;
 
buf = malloc(size + 1);
-   if (buf == NULL)
+   if (buf == NULL) {
+   pr_debug("memory allocation failure\n");
return -1;
+   }
 
-   if (do_read(buf, size) < 0) {
-   free(buf);
-   return -1;
+   ret = do_read(buf, size);
+   if (ret < 0) {
+   pr_debug("error reading saved cmdlines\n");
+   goto out;
}
 
parse_saved_cmdline(pevent, buf, size);
-
+   ret = 0;
+out:
free(buf);
-   return 0;
+   return ret;
 }
 
 ssize_t trace_report(int fd, struct pevent **ppevent, bool __repipe)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/18] perf ftrace: Add dump_printf() for low-level debugging

2013-10-15 Thread Namhyung Kim
From: Namhyung Kim 

For reusability, rename trace_event() to dump_raw_event() and pass
size as an argument.  And use it in do_ftrace_report() to show raw
data of ftrace entries.

Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-ftrace.c | 6 ++
 tools/perf/util/debug.c | 8 
 tools/perf/util/debug.h | 2 +-
 tools/perf/util/session.c   | 2 +-
 4 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index ba04a402974c..676b2aa7590f 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -1350,6 +1350,12 @@ static int do_ftrace_report(struct perf_ftrace *ftrace)
goto out;
}
 
+   dump_printf("  event: %s\n", perf_evsel__name(evsel));
+   dump_raw_event(record, record->size);
+   dump_printf("%3d %llu %#llx [%#x]\n\n",
+   record->cpu, record->ts, record->offset,
+   record->size);
+
/* TODO: update sample.period using calltime */
if (!__hists__add_entry(>hists, , NULL,
sample.period, 0, 0)) {
diff --git a/tools/perf/util/debug.c b/tools/perf/util/debug.c
index 399e74c34c1a..2c42aaed4528 100644
--- a/tools/perf/util/debug.c
+++ b/tools/perf/util/debug.c
@@ -47,7 +47,7 @@ int dump_printf(const char *fmt, ...)
return ret;
 }
 
-void trace_event(union perf_event *event)
+void dump_raw_event(void *event, int size)
 {
unsigned char *raw_event = (void *)event;
const char *color = PERF_COLOR_BLUE;
@@ -58,9 +58,9 @@ void trace_event(union perf_event *event)
 
printf(".");
color_fprintf(stdout, color, "\n. ... raw event: size %d bytes\n",
- event->header.size);
+ size);
 
-   for (i = 0; i < event->header.size; i++) {
+   for (i = 0; i < size; i++) {
if ((i & 15) == 0) {
printf(".");
color_fprintf(stdout, color, "  %04x: ", i);
@@ -68,7 +68,7 @@ void trace_event(union perf_event *event)
 
color_fprintf(stdout, color, " %02x", raw_event[i]);
 
-   if (((i & 15) == 15) || i == event->header.size-1) {
+   if (((i & 15) == 15) || i == size-1) {
color_fprintf(stdout, color, "  ");
for (j = 0; j < 15-(i & 15); j++)
color_fprintf(stdout, color, "   ");
diff --git a/tools/perf/util/debug.h b/tools/perf/util/debug.h
index efbd98805ad0..9cebb0cdb7bd 100644
--- a/tools/perf/util/debug.h
+++ b/tools/perf/util/debug.h
@@ -12,7 +12,7 @@ extern int verbose;
 extern bool quiet, dump_trace;
 
 int dump_printf(const char *fmt, ...) __attribute__((format(printf, 1, 2)));
-void trace_event(union perf_event *event);
+void dump_raw_event(void *event, int size);
 
 int ui__error(const char *format, ...) __attribute__((format(printf, 1, 2)));
 int ui__warning(const char *format, ...) __attribute__((format(printf, 1, 2)));
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index d1e449534b33..a714265ea0c3 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -832,7 +832,7 @@ static void dump_event(struct perf_session *session, union 
perf_event *event,
printf("\n%#" PRIx64 " [%#x]: event: %d\n",
   file_offset, event->header.size, event->header.type);
 
-   trace_event(event);
+   dump_raw_event(event, event->header.size);
 
if (sample)
perf_session__print_tstamp(session, event, sample);
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/18] perf ftrace: Add support for --pid option

2013-10-15 Thread Namhyung Kim
From: Namhyung Kim 

The -p (--pid) option enables to trace existing process by its pid.

Cc: Steven Rostedt 
Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-ftrace.c | 89 -
 1 file changed, 63 insertions(+), 26 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 6f3dd02c5b9a..bd415f2b1cde 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -11,6 +11,7 @@
 
 #include 
 #include 
+#include 
 
 #include "util/debug.h"
 #include "util/parse-options.h"
@@ -34,11 +35,12 @@ static void sig_handler(int sig __maybe_unused)
done = true;
 }
 
-static int write_tracing_file(const char *name, const char *val)
+static int __write_tracing_file(const char *name, const char *val, bool append)
 {
char *file;
int fd, ret = -1;
ssize_t size = strlen(val);
+   int flags = O_WRONLY;
 
file = get_tracing_file(name);
if (!file) {
@@ -46,7 +48,12 @@ static int write_tracing_file(const char *name, const char 
*val)
return -1;
}
 
-   fd = open(file, O_WRONLY);
+   if (append)
+   flags |= O_APPEND;
+   else
+   flags |= O_TRUNC;
+
+   fd = open(file, flags);
if (fd < 0) {
pr_debug("cannot open tracing file: %s\n", name);
goto out;
@@ -63,6 +70,16 @@ out:
return ret;
 }
 
+static int write_tracing_file(const char *name, const char *val)
+{
+   return __write_tracing_file(name, val, false);
+}
+
+static int append_tracing_file(const char *name, const char *val)
+{
+   return __write_tracing_file(name, val, true);
+}
+
 static int reset_tracing_files(struct perf_ftrace *ftrace __maybe_unused)
 {
if (write_tracing_file("tracing_on", "0") < 0)
@@ -77,11 +94,27 @@ static int reset_tracing_files(struct perf_ftrace *ftrace 
__maybe_unused)
return 0;
 }
 
+static int set_tracing_pid(struct perf_ftrace *ftrace)
+{
+   int i;
+   char buf[16];
+
+   if (perf_target__has_cpu(>target))
+   return 0;
+
+   for (i = 0; i < thread_map__nr(ftrace->evlist->threads); i++) {
+   scnprintf(buf, sizeof(buf), "%d",
+ ftrace->evlist->threads->map[i]);
+   if (append_tracing_file("set_ftrace_pid", buf) < 0)
+   return -1;
+   }
+   return 0;
+}
+
 static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char 
**argv)
 {
char *trace_file;
int trace_fd;
-   char *trace_pid;
char buf[4096];
struct pollfd pollfd = {
.events = POLLIN,
@@ -92,42 +125,37 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int 
argc, const char **argv)
return -1;
}
 
-   if (argc < 1)
-   return -1;
-
signal(SIGINT, sig_handler);
signal(SIGUSR1, sig_handler);
signal(SIGCHLD, sig_handler);
 
-   reset_tracing_files(ftrace);
+   if (reset_tracing_files(ftrace) < 0)
+   goto out;
 
/* reset ftrace buffer */
if (write_tracing_file("trace", "0") < 0)
goto out;
 
-   if (perf_evlist__prepare_workload(ftrace->evlist, >target,
- argv, false, true) < 0)
-   goto out;
-
-   if (write_tracing_file("current_tracer", ftrace->tracer) < 0) {
-   pr_err("failed to set current_tracer to %s\n", ftrace->tracer);
+   if (argc && perf_evlist__prepare_workload(ftrace->evlist,
+ >target,
+ argv, false, true) < 0) {
goto out;
}
 
-   if (asprintf(_pid, "%d", ftrace->evlist->threads->map[0]) < 0) {
-   pr_err("failed to allocate pid string\n");
-   goto out;
+   if (set_tracing_pid(ftrace) < 0) {
+   pr_err("failed to set ftrace pid\n");
+   goto out_reset;
}
 
-   if (write_tracing_file("set_ftrace_pid", trace_pid) < 0) {
-   pr_err("failed to set pid: %s\n", trace_pid);
-   goto out_free_pid;
+   if (write_tracing_file("current_tracer", ftrace->tracer) < 0) {
+   pr_err("failed to set current_tracer to %s\n", ftrace->tracer);
+   goto out_reset;
}
 
trace_file = get_tracing_file("trace_pipe");
if (!trace_file) {
pr_err("failed to open trace_pipe\n");
-   goto out_free_pid;
+   goto out_reset;
}
 
trace_fd = open(trace_file, O_RDONLY);
@@ -136,7 +164,7 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int 
argc, const char **argv)
 
if (trace_fd < 0) {
pr_err("failed to open trace_pipe\n");
-   goto out_free_pid;
+   goto out_reset;
}
 

[PATCH 01/18] perf util: Save pid-cmdline mapping into tracing header

2013-10-15 Thread Namhyung Kim
From: Namhyung Kim 

Current trace info data lacks the saved cmdline mapping which is
needed for pevent to find out the comm of a task.  Add this and bump
up the version number so that perf can determine its presence when
reading.

This is mostly corresponding to trace.dat file version 6, but still
lacks 4 byte of number of cpus, and 10 bytes of type string - and I
think we don't need those anyway.

Cc: Steven Rostedt 
Signed-off-by: Namhyung Kim 
---
 tools/perf/util/trace-event-info.c  | 33 -
 tools/perf/util/trace-event-parse.c | 17 +
 tools/perf/util/trace-event-read.c  | 36 ++--
 tools/perf/util/trace-event.h   |  1 +
 4 files changed, 84 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/trace-event-info.c 
b/tools/perf/util/trace-event-info.c
index f3c9e551bd35..48678ca4c4d6 100644
--- a/tools/perf/util/trace-event-info.c
+++ b/tools/perf/util/trace-event-info.c
@@ -41,7 +41,7 @@
 #include 
 #include "evsel.h"
 
-#define VERSION "0.5"
+#define VERSION "0.6"
 
 static int output_fd;
 
@@ -390,6 +390,34 @@ out:
return err;
 }
 
+static int record_saved_cmdline(void)
+{
+   unsigned int size;
+   char *path;
+   struct stat st;
+   int ret, err = 0;
+
+   path = get_tracing_file("saved_cmdlines");
+   if (!path) {
+   pr_debug("can't get tracing/saved_cmdline");
+   return -ENOMEM;
+   }
+
+   ret = stat(path, );
+   if (ret < 0) {
+   /* not found */
+   size = 0;
+   if (write(output_fd, , 8) != 8)
+   err = -EIO;
+   goto out;
+   }
+   err = record_file(path, 8);
+
+out:
+   put_tracing_file(path);
+   return err;
+}
+
 static void
 put_tracepoints_path(struct tracepoint_path *tps)
 {
@@ -550,6 +578,9 @@ struct tracing_data *tracing_data_get(struct list_head 
*pattrs,
if (err)
goto out;
err = record_ftrace_printk();
+   if (err)
+   goto out;
+   err = record_saved_cmdline();
 
 out:
/*
diff --git a/tools/perf/util/trace-event-parse.c 
b/tools/perf/util/trace-event-parse.c
index 6681f71f2f95..94912b7b5027 100644
--- a/tools/perf/util/trace-event-parse.c
+++ b/tools/perf/util/trace-event-parse.c
@@ -196,6 +196,23 @@ void parse_ftrace_printk(struct pevent *pevent,
}
 }
 
+void parse_saved_cmdline(struct pevent *pevent,
+char *file, unsigned int size __maybe_unused)
+{
+   char *comm;
+   char *line;
+   char *next = NULL;
+   int pid;
+
+   line = strtok_r(file, "\n", );
+   while (line) {
+   sscanf(line, "%d %ms", , );
+   pevent_register_comm(pevent, comm, pid);
+   free(comm);
+   line = strtok_r(NULL, "\n", );
+   }
+}
+
 int parse_ftrace_file(struct pevent *pevent, char *buf, unsigned long size)
 {
return pevent_parse_event(pevent, buf, size, "ftrace");
diff --git a/tools/perf/util/trace-event-read.c 
b/tools/perf/util/trace-event-read.c
index f2112270c663..e084e5e654b6 100644
--- a/tools/perf/util/trace-event-read.c
+++ b/tools/perf/util/trace-event-read.c
@@ -343,6 +343,31 @@ static int read_event_files(struct pevent *pevent)
return 0;
 }
 
+static int read_saved_cmdline(struct pevent *pevent)
+{
+   unsigned long long size;
+   char *buf;
+
+   /* it can have 0 size */
+   size = read8(pevent);
+   if (!size)
+   return 0;
+
+   buf = malloc(size + 1);
+   if (buf == NULL)
+   return -1;
+
+   if (do_read(buf, size) < 0) {
+   free(buf);
+   return -1;
+   }
+
+   parse_saved_cmdline(pevent, buf, size);
+
+   free(buf);
+   return 0;
+}
+
 ssize_t trace_report(int fd, struct pevent **ppevent, bool __repipe)
 {
char buf[BUFSIZ];
@@ -383,10 +408,11 @@ ssize_t trace_report(int fd, struct pevent **ppevent, 
bool __repipe)
return -1;
if (show_version)
printf("version = %s\n", version);
-   free(version);
 
-   if (do_read(buf, 1) < 0)
+   if (do_read(buf, 1) < 0) {
+   free(version);
return -1;
+   }
file_bigendian = buf[0];
host_bigendian = bigendian();
 
@@ -422,6 +448,11 @@ ssize_t trace_report(int fd, struct pevent **ppevent, bool 
__repipe)
err = read_ftrace_printk(pevent);
if (err)
goto out;
+   if (!strcmp(version, "0.6")) {
+   err = read_saved_cmdline(pevent);
+   if (err)
+   goto out;
+   }
 
size = trace_data_size;
repipe = false;
@@ -438,5 +469,6 @@ ssize_t trace_report(int fd, struct pevent **ppevent, bool 
__repipe)
 out:
if (pevent)
pevent_free(pevent);
+   free(version);
return size;
 }
diff --git 

[PATCHSET 00/18] perf tools: Introduce new 'ftrace' command (5)

2013-10-15 Thread Namhyung Kim
Hello,

This patchset implements a front-end tool for kernel's ftrace.  It
uses function_graph tracer by default and normal function tracer is
also supported.  (Of course you need to enable those tracers in your
kernel first.)

This version is almost merely a rebase onto current development, and I
throw it out only for not buried in the piles of patches. :) So
there're something that weren't addressed yet from previous feedback.
But I really want to have an agreement on multi-file support before
going further.

v5 changes:
  * rebase on current acme/perf/core
  * fix bug on record subcommand
  * add basic filter support 

I pushed it out to 'perf/ftrace-v5' branch on my tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git


Any comments are welcome, thanks,
Namhyung


Namhyung Kim (18):
  perf util: Save pid-cmdline mapping into tracing header
  perf util: Add more debug message on failure path
  perf tools: Introduce new 'ftrace' tool
  perf ftrace: Add support for --pid option
  perf ftrace: Add support for -a and -C option
  perf ftrace: Split "live" sub-command
  perf ftrace: Add 'record' sub-command
  perf ftrace: Add 'show' sub-command
  perf ftrace: Add 'report' sub-command
  perf ftrace: Add dump_printf() for low-level debugging
  perf ftrace: Use pager for displaying result
  perf ftrace: Cleanup using ftrace_setup/teardown()
  perf tools: Add document for perf-ftrace command
  perf ftrace: Add a signal handler for SIGSEGV
  perf ftrace: Add --clock option
  perf ftrace: Show leaf-functions as oneliner
  perf ftrace: Tidy up the function graph output of 'show' subcommand
  perf ftrace: Add --filter option

 tools/perf/Documentation/perf-ftrace.txt |  122 ++
 tools/perf/Makefile.perf |1 +
 tools/perf/builtin-ftrace.c  | 1861 ++
 tools/perf/builtin.h |1 +
 tools/perf/command-list.txt  |1 +
 tools/perf/perf.c|1 +
 tools/perf/util/cpumap.c |   45 +
 tools/perf/util/cpumap.h |1 +
 tools/perf/util/debug.c  |8 +-
 tools/perf/util/debug.h  |2 +-
 tools/perf/util/header.c |4 +-
 tools/perf/util/session.c|2 +-
 tools/perf/util/trace-event-info.c   |   33 +-
 tools/perf/util/trace-event-parse.c  |   17 +
 tools/perf/util/trace-event-read.c   |   77 +-
 tools/perf/util/trace-event.h|1 +
 16 files changed, 2156 insertions(+), 21 deletions(-)
 create mode 100644 tools/perf/Documentation/perf-ftrace.txt
 create mode 100644 tools/perf/builtin-ftrace.c


Following is the original description and example.
-
It consists of 4 subcommands: live, record, show and report.

'perf ftrace live' just triggers ftrace and relay kernel buffer
contents to stdout.  It does no processing in the tool side.

'perf ftrace record' starts ftrace and saves its result to per-cpu
files and a perf.header file in the perf.data.dir directory.
Recording was done by multiple threads (a thread per cpu) in order not
to miss events overrun.  The perf.header file is compatible to current
perf.data file and contains useful information and sample data.

The sample data were synthesized for each recorded cpu to provide more
information - I'm not sure it's really needed though.

Once you had run 'perf ftrace record', you could play with other
subcommands.

'perf ftrace show' displays function traces like 'live' subcommand or
trace-cmd does.  It's not useful than them at this time but it could
be improved soon.

'perf ftrace report' displays usual 'perf report' style output from
the function trace data.  You can see which function is called most
frequently for example.  Currently it uses 1 as a period value for
each entry but we might use funcgraph_exit->calltime to get proper
overhead later.

Example below:

  # perf ftrace record sleep 0.1
  # ls -l perf.data.dir
  total 5568
  -rw-r--r--. 1 root root 3514375 Apr 23 16:43 perf.header
  -rw-r--r--. 1 root root   90112 Apr 23 16:43 trace-cpu0.buf
  -rw-r--r--. 1 root root   0 Apr 23 16:43 trace-cpu1.buf
  -rw-r--r--. 1 root root 2093056 Apr 23 16:43 trace-cpu2.buf
  -rw-r--r--. 1 root root   0 Apr 23 16:43 trace-cpu3.buf

  # perf ftrace show
  overriding event (11) ftrace:funcgraph_entry with new print handler
  overriding event (10) ftrace:funcgraph_exit with new print handler
0)   0.065 us |  __fsnotify_parent();
0)|  fsnotify() {
0)   0.060 us |__srcu_read_lock();
0)   0.040 us |__srcu_read_unlock();
0)   0.652 us |  }
0)   0.040 us |  fput();
0)|  __audit_syscall_exit() {
0)|path_put() {
0)   0.037 us |  dput();
0)   0.032 us |  mntput();
0)   0.563 us |}
0)   0.035 us |unroll_tree_refs();
0)   0.035 us |kfree();
0)   1.284 us |  }
0)|  

[PATCH 09/18] perf ftrace: Add 'report' sub-command

2013-10-15 Thread Namhyung Kim
From: Namhyung Kim 

The ftrace report command is for analyzing ftrace result as usual perf
report style.  Internal processing of the ftrace buffer is similar to
the 'show' sub-command but it synthesizes necessary information like
thread, dso, map and symbol from saved trace info.

It currently count number of samples as a period and it can be
extended to use calltime of funcgraph_exit in the future.

Cc: Steven Rostedt 
Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-ftrace.c | 283 +++-
 1 file changed, 281 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 3a12fb9d4b94..ba04a402974c 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -21,6 +21,7 @@
 #include "util/target.h"
 #include "util/thread_map.h"
 #include "util/cpumap.h"
+#include "util/sort.h"
 #include "util/trace-event.h"
 #include "../lib/traceevent/kbuffer.h"
 #include "../lib/traceevent/event-parse.h"
@@ -35,6 +36,7 @@ struct perf_ftrace {
const char *tracer;
const char *dirname;
struct pevent *pevent;
+   bool show_full_info;
 };
 
 static bool done;
@@ -1195,6 +1197,215 @@ out:
return ret;
 }
 
+struct cmdline_list {
+   struct cmdline_list *next;
+   char*comm;
+   int pid;
+};
+
+struct func_list {
+   struct func_list*next;
+   unsigned long long  addr;
+   char*func;
+   char*mod;
+};
+
+static int do_ftrace_report(struct perf_ftrace *ftrace)
+{
+   int ret = -1;
+   char buf[PATH_MAX];
+   unsigned long nr_samples;
+   struct perf_session *session;
+   struct perf_evsel *evsel;
+   struct pevent_record *record;
+   struct perf_ftrace_report report = {
+   .ftrace = ftrace,
+   .tool = {
+   .sample = process_sample_event,
+   },
+   };
+   struct cmdline_list *cmdline;
+   struct func_list *func;
+   struct machine *machine;
+   struct dso *dso;
+
+   canonicalize_directory_name(ftrace->dirname);
+
+   scnprintf(buf, sizeof(buf), "%s.dir/perf.header", ftrace->dirname);
+
+   session = perf_session__new(buf, O_RDONLY, false, false, );
+   if (session == NULL) {
+   pr_err("failed to create a session\n");
+   return -1;
+   }
+
+   ftrace->pevent = session->pevent;
+
+   if (perf_session__process_events(session, ) < 0) {
+   pr_err("failed to process events\n");
+   goto out;
+   }
+
+   machine = machines__findnew(>machines, HOST_KERNEL_ID);
+
+   /* Synthesize thread info from saved cmdlines */
+   cmdline = ftrace->pevent->cmdlist;
+   while (cmdline) {
+   struct thread *thread;
+
+   thread = machine__findnew_thread(machine, cmdline->pid,
+cmdline->pid);
+   if (thread && !thread->comm_set)
+   thread__set_comm(thread, cmdline->comm);
+
+   cmdline = cmdline->next;
+   }
+
+   /* Synthesize kernel dso and symbol info from saved kallsyms */
+   func = ftrace->pevent->funclist;
+   while (func) {
+   struct symbol *sym;
+
+   scnprintf(buf, sizeof(buf), "[%s]",
+ func->mod ? func->mod : "kernel.kallsyms");
+
+   dso = dso__kernel_findnew(machine, buf, NULL, DSO_TYPE_KERNEL);
+   if (dso == NULL) {
+   pr_debug("can't find or allocate dso %s\n", buf);
+   continue;
+   }
+
+   sym = symbol__new(func->addr, 0, STB_GLOBAL, func->func);
+   if (sym == NULL) {
+   pr_debug("failed to allocate new symbol\n");
+   continue;
+   }
+   symbols__insert(>symbols[MAP__FUNCTION], sym);
+
+   func = func->next;
+   }
+
+   /* Generate kernel maps */
+   list_for_each_entry(dso, >kernel_dsos, node) {
+   struct map *map = map__new2(0, dso, MAP__FUNCTION);
+   if (map == NULL) {
+   pr_debug("failed to allocate new map\n");
+   goto out;
+   }
+
+   symbols__fixup_end(>symbols[MAP__FUNCTION]);
+   map__fixup_start(map);
+   map__fixup_end(map);
+
+   dso__set_loaded(dso, MAP__FUNCTION);
+
+   map_groups__insert(>kmaps, map);
+   if (strcmp(dso->name, "[kernel.kallsyms]") == 0)
+   machine->vmlinux_maps[MAP__FUNCTION] = map;
+   }
+
+   /* FIXME: no need to get ordered */
+   record = get_ordered_record(ftrace);
+   while (record) {
+   int type;
+   struct addr_location al;
+ 

[PATCH 11/18] perf ftrace: Use pager for displaying result

2013-10-15 Thread Namhyung Kim
From: Namhyung Kim 

It's convenient to use pager when seeing many lines of result.

Note that setup_pager() should be called after perf_evlist__
prepare_workload() since they can interfere each other regarding
shared stdio streams.

Cc: Steven Rostedt 
Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-ftrace.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 676b2aa7590f..83ed6d797087 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -227,6 +227,7 @@ static int do_ftrace_live(struct perf_ftrace *ftrace)
signal(SIGINT, sig_handler);
signal(SIGUSR1, sig_handler);
signal(SIGCHLD, sig_handler);
+   signal(SIGPIPE, sig_handler);
 
if (setup_tracing_files(ftrace) < 0)
goto out_reset;
@@ -1465,6 +1466,8 @@ __cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, 
const char **argv)
  argv, false, true) < 0)
goto out_maps;
 
+   setup_pager();
+
ret = do_ftrace_live(ftrace);
 
 out_maps:
@@ -1586,6 +1589,8 @@ __cmd_ftrace_show(struct perf_ftrace *ftrace, int argc, 
const char **argv)
if (ftrace->dirname == NULL)
ftrace->dirname = DEFAULT_DIRNAME;
 
+   setup_pager();
+
ret = do_ftrace_show(ftrace);
 
perf_evlist__delete_maps(ftrace->evlist);
@@ -1646,6 +1651,7 @@ __cmd_ftrace_report(struct perf_ftrace *ftrace, int argc, 
const char **argv)
 
perf_hpp__init();
 
+   setup_pager();
setup_sorting();
 
symbol_conf.exclude_other = false;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/18] perf ftrace: Split "live" sub-command

2013-10-15 Thread Namhyung Kim
From: Namhyung Kim 

Separate out the default behavior to "live" subcommand.  It's a
preparation to support more subcommands like "record" and "report".

Cc: Steven Rostedt 
Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-ftrace.c | 133 ++--
 1 file changed, 80 insertions(+), 53 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 642aa49c66d7..1bb6d1ff0eb1 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -171,19 +171,13 @@ static int reset_tracing_cpu(void)
return 0;
 }
 
-static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char 
**argv)
+static int do_ftrace_live(struct perf_ftrace *ftrace)
 {
char *trace_file;
int trace_fd;
char buf[4096];
-   struct pollfd pollfd = {
-   .events = POLLIN,
-   };
-
-   if (geteuid() != 0) {
-   pr_err("ftrace only works for root!\n");
-   return -1;
-   }
+   /* sleep 1ms if no data read */
+   struct timespec req = { .tv_nsec = 100 };
 
signal(SIGINT, sig_handler);
signal(SIGUSR1, sig_handler);
@@ -196,12 +190,6 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int 
argc, const char **argv)
if (write_tracing_file("trace", "0") < 0)
goto out;
 
-   if (argc && perf_evlist__prepare_workload(ftrace->evlist,
- >target,
- argv, false, true) < 0) {
-   goto out;
-   }
-
if (set_tracing_pid(ftrace) < 0) {
pr_err("failed to set ftrace pid\n");
goto out_reset;
@@ -233,7 +221,6 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int 
argc, const char **argv)
}
 
fcntl(trace_fd, F_SETFL, O_NONBLOCK);
-   pollfd.fd = trace_fd;
 
if (write_tracing_file("tracing_on", "1") < 0) {
pr_err("can't enable tracing\n");
@@ -243,16 +230,18 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int 
argc, const char **argv)
perf_evlist__start_workload(ftrace->evlist);
 
while (!done) {
-   if (poll(, 1, -1) < 0)
-   break;
+   int n = read(trace_fd, buf, sizeof(buf));
 
-   if (pollfd.revents & POLLIN) {
-   int n = read(trace_fd, buf, sizeof(buf));
-   if (n < 0)
-   break;
-   if (fwrite(buf, n, 1, stdout) != 1)
+   if (n < 0) {
+   if (errno == EINTR || errno == EAGAIN)
+   goto sleep;
+   else
break;
-   }
+   } else if (n == 0) {
+sleep:
+   clock_nanosleep(CLOCK_MONOTONIC, 0, , NULL);
+   } else if (fwrite(buf, n, 1, stdout) != 1)
+   break;
}
 
write_tracing_file("tracing_on", "0");
@@ -274,61 +263,99 @@ out:
return done ? 0 : -1;
 }
 
-int cmd_ftrace(int argc, const char **argv, const char *prefix __maybe_unused)
+static int
+__cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, const char **argv)
 {
-   int ret;
-   struct perf_ftrace ftrace = {
-   .target = { .uid = UINT_MAX, },
-   };
-   const char * const ftrace_usage[] = {
-   "perf ftrace [] []",
-   "perf ftrace [] --  []",
+   int ret = -1;
+   const char * const live_usage[] = {
+   "perf ftrace live [] []",
+   "perf ftrace live [] --  []",
NULL
};
-   const struct option ftrace_options[] = {
-   OPT_STRING('t', "tracer", , "tracer",
+   const struct option live_options[] = {
+   OPT_STRING('t', "tracer", >tracer, "tracer",
   "tracer to use: function_graph or function"),
-   OPT_STRING('p', "pid", , "pid",
+   OPT_STRING('p', "pid", >target.pid, "pid",
   "trace on existing process id"),
OPT_INCR('v', "verbose", ,
 "be more verbose"),
-   OPT_BOOLEAN('a', "all-cpus", _wide,
+   OPT_BOOLEAN('a', "all-cpus", >target.system_wide,
"system-wide collection from all CPUs"),
-   OPT_STRING('C', "cpu", _list, "cpu",
+   OPT_STRING('C', "cpu", >target.cpu_list, "cpu",
"list of cpus to monitor"),
OPT_END()
};
 
-   argc = parse_options(argc, argv, ftrace_options, ftrace_usage,
-   PARSE_OPT_STOP_AT_NON_OPTION);
-   if (!argc && perf_target__none())
-   usage_with_options(ftrace_usage, ftrace_options);
+   argc = parse_options(argc, argv, live_options, live_usage,
+PARSE_OPT_STOP_AT_NON_OPTION);
+   if (!argc && perf_target__none(>target))
+ 

[PATCH 13/18] perf tools: Add document for perf-ftrace command

2013-10-15 Thread Namhyung Kim
From: Namhyung Kim 

Cc: Steven Rostedt 
Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/Documentation/perf-ftrace.txt | 117 +++
 1 file changed, 117 insertions(+)
 create mode 100644 tools/perf/Documentation/perf-ftrace.txt

diff --git a/tools/perf/Documentation/perf-ftrace.txt 
b/tools/perf/Documentation/perf-ftrace.txt
new file mode 100644
index ..841699f3924d
--- /dev/null
+++ b/tools/perf/Documentation/perf-ftrace.txt
@@ -0,0 +1,117 @@
+perf-ftrace(1)
+==
+
+NAME
+
+perf-ftrace - A front end for kernel's ftrace
+
+SYNOPSIS
+
+[verse]
+'perf ftrace' {live|record} [] []
+'perf ftrace' {show|report} []
+
+DESCRIPTION
+---
+This command reads the ftrace buffer and displays the trace recorded.
+
+There are several variants of perf ftrace:
+
+  'perf ftrace live ' to see a live trace of kernel functions
+  via trace_pipe during executing the .  If  is not
+  specified, one of the target options (-p, -a or -C) should be given.
+  It just prints out the result to stdout and doesn't save any files.
+
+  'perf ftrace record ' to record trace entries of kernel
+  functions during the execution of the .  Like 'perf ftrace
+  live', at least one of  and/or target options should be
+  given in order to start recording.  The recorded results are saved
+  under a directory ('perf.data.dir' by default) and will be used by
+  other perf-ftrace tools like 'show' and 'report'.
+
+  'perf ftrace show' to see the trace of recorded functions.  It shows
+  functions sorted by time so the end result might be interspersed if
+  there's a concurrent execution.
+
+  'perf ftrace report' to display the result in an usual perf-report
+  style - entries are sorted by given sort keys and output is resorted
+  by its overhead.
+
+OPTIONS
+---
+...::
+   Any command you can specify in a shell.
+
+-t::
+--tracer=::
+   The ftrace tracer to be used (default: function_graph).
+   Currently, only 'function' and 'function_graph' are supported
+   by 'record' command. 'live' command accepts any available
+   tracer in system and outputs trace result directly from the
+   ftrace's trace_pipe.
+
+-p::
+--pid=::
+   Record events on existing process ID (comma separated list).
+   Used by 'live' and 'record' subcommands only.
+
+-a::
+--all-cpus::
+   Force system-wide collection.  Scripts run without a 
+   normally use -a by default, while scripts run with a 
+   normally don't - this option allows the latter to be run in
+   system-wide mode.  Used by 'live' and 'record' subcommands only.
+
+-C::
+--cpu:: Only process samples for the list of CPUs provided.
+   Multiple CPUs can be provided as a comma-separated list with
+   no space: 0,1. Ranges of CPUs are specified with -: 0-2.
+   Default is to report samples on all online CPUs.
+
+-s::
+--sort=::
+   Sort histogram entries by given key(s) - multiple keys can be
+   specified in CSV format.  Following sort keys are available:
+   pid, comm, dso, symbol, cpu.
+
+   Each key has following meaning:
+
+   - comm: command (name) of the task
+   - pid: command and tid of the task
+   - dso: name of library or module executed at the time of sample
+   - symbol: name of function executed at the time of sample
+   - cpu: cpu number the task ran at the time of sample
+
+   By default, comm, dso and symbol keys are used.
+   (i.e. --sort comm,dso,symbol)
+   Used by 'report' subcommands only.
+
+-i::
+--input=::
+   Input directory name excluding '.dir' at the end.
+   (default: perf.data)
+
+-o::
+--output=::
+   Output directory name excluding '.dir' at the end.
+   (default: perf.data)
+
+-v::
+--verbose::
+   Be more verbose (show counter open errors, etc).
+
+-D::
+--dump-raw-trace::
+   Dump raw trace in ASCII.  Used by 'report' subcommands only.
+
+-I::
+--show-info::
+   Display extended information about the record.  This adds
+   information which may be very large and thus may clutter the
+   display.  It currently includes: cpu and numa topology of the
+   host system.  It can only be used with the 'report' subcommand.
+
+SEE ALSO
+
+linkperf:perf-record[1], linkperf:perf-report[1],
+linkperf:perf-script[1]
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 18/18] perf ftrace: Add --filter option

2013-10-15 Thread Namhyung Kim
From: Namhyung Kim 

The --filter (-l) option is for filtering specific functions.  If this
option is given, only these (and their children) functions will be
shown in the output.

Cc: Steven Rostedt 
Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/Documentation/perf-ftrace.txt |  5 +++
 tools/perf/builtin-ftrace.c  | 55 
 2 files changed, 60 insertions(+)

diff --git a/tools/perf/Documentation/perf-ftrace.txt 
b/tools/perf/Documentation/perf-ftrace.txt
index 841699f3924d..b0ae286d3e28 100644
--- a/tools/perf/Documentation/perf-ftrace.txt
+++ b/tools/perf/Documentation/perf-ftrace.txt
@@ -50,6 +50,11 @@ OPTIONS
tracer in system and outputs trace result directly from the
ftrace's trace_pipe.
 
+-l::
+--filter=::
+   Filter given functions to suppress others in the output.
+   Used by 'live' and 'record' subcommands only.
+
 -p::
 --pid=::
Record events on existing process ID (comma separated list).
diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 6720d560d6f8..17f37c273b1b 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -22,6 +22,7 @@
 #include "util/thread_map.h"
 #include "util/cpumap.h"
 #include "util/sort.h"
+#include "util/strlist.h"
 #include "util/trace-event.h"
 #include "../lib/traceevent/kbuffer.h"
 #include "../lib/traceevent/event-parse.h"
@@ -36,6 +37,7 @@ struct perf_ftrace {
const char *tracer;
const char *dirname;
const char *clock;
+   struct strlist *filter;
struct pevent *pevent;
bool show_full_info;
 };
@@ -118,6 +120,12 @@ static int reset_tracing_files(struct perf_ftrace *ftrace 
__maybe_unused)
if (write_tracing_file("trace_clock", "local") < 0)
return -1;
 
+   if (write_tracing_file("set_ftrace_filter", " ") < 0)
+   return -1;
+
+   if (write_tracing_file("set_graph_function", " ") < 0)
+   return -1;
+
return 0;
 }
 
@@ -210,6 +218,28 @@ static int set_tracing_clock(struct perf_ftrace *ftrace)
return write_tracing_file("trace_clock", "local");
 }
 
+static int set_tracing_filter(struct perf_ftrace *ftrace)
+{
+   const char *filter_file;
+   struct str_node *func;
+
+   if (ftrace->filter == NULL)
+   return 0;
+
+   if (!strcmp(ftrace->tracer, "function_graph"))
+   filter_file = "set_graph_function";
+   else if (!strcmp(ftrace->tracer, "function"))
+   filter_file = "set_ftrace_filter";
+   else
+   return 0;
+
+   strlist__for_each(func, ftrace->filter) {
+   if (append_tracing_file(filter_file, func->s) < 0)
+   return -1;
+   }
+   return 0;
+}
+
 static int setup_tracing_files(struct perf_ftrace *ftrace)
 {
int ret = -1;
@@ -240,6 +270,11 @@ static int setup_tracing_files(struct perf_ftrace *ftrace)
goto out;
}
 
+   if (set_tracing_filter(ftrace) < 0) {
+   pr_err("failed to set ftrace filter\n");
+   goto out;
+   }
+
if (write_tracing_file("current_tracer", ftrace->tracer) < 0) {
pr_err("failed to set current_tracer to %s\n", ftrace->tracer);
goto out;
@@ -1596,6 +1631,22 @@ static void ftrace_teardown(struct perf_ftrace *ftrace)
 {
perf_evlist__delete_maps(ftrace->evlist);
perf_evlist__delete(ftrace->evlist);
+   strlist__delete(ftrace->filter);
+}
+
+static int function_filters(const struct option *opt, const char *str,
+   int unset __maybe_unused)
+{
+   struct perf_ftrace *ftrace = opt->value;
+
+   if (ftrace->filter == NULL) {
+   ftrace->filter = strlist__new(true, NULL);
+   if (ftrace->filter == NULL)
+   return -ENOMEM;
+   }
+
+   strlist__parse_list(ftrace->filter, str);
+   return 0;
 }
 
 static int
@@ -1610,6 +1661,8 @@ __cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, 
const char **argv)
const struct option live_options[] = {
OPT_STRING('t', "tracer", >tracer, "tracer",
   "tracer to use: function_graph or function"),
+   OPT_CALLBACK('l', "filter", ftrace, "function[,function,...]",
+"show only these functions in the trace", 
function_filters),
OPT_STRING('p', "pid", >target.pid, "pid",
   "trace on existing process id"),
OPT_INCR('v', "verbose", ,
@@ -1652,6 +1705,8 @@ __cmd_ftrace_record(struct perf_ftrace *ftrace, int argc, 
const char **argv)
const struct option record_options[] = {
OPT_STRING('t', "tracer", >tracer, "tracer",
   "tracer to use: function_graph or function"),
+   OPT_CALLBACK('l', "filter", ftrace, "function[,function,...]",
+"show only these functions in the trace", 
function_filters),

[PATCH 12/18] perf ftrace: Cleanup using ftrace_setup/teardown()

2013-10-15 Thread Namhyung Kim
From: Namhyung Kim 

The ftrace sub-commands share some common code so that factor it out
to ftrace_setup() and ftrace_teardown() helpers.

Cc: Steven Rostedt 
Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-ftrace.c | 172 +---
 1 file changed, 65 insertions(+), 107 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 83ed6d797087..fa7a9c59e228 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -1413,10 +1413,59 @@ out:
return ret;
 }
 
+static int ftrace_setup(struct perf_ftrace *ftrace, int argc, const char 
**argv)
+{
+   int ret;
+   char errbuf[512];
+
+   ret = perf_target__validate(>target);
+   if (ret) {
+   perf_target__strerror(>target, ret, errbuf, 512);
+   pr_err("%s\n", errbuf);
+   return -EINVAL;
+   }
+
+   ftrace->evlist = perf_evlist__new();
+   if (ftrace->evlist == NULL)
+   return -ENOMEM;
+
+   ret = perf_evlist__create_maps(ftrace->evlist, >target);
+   if (ret < 0)
+   goto out;
+
+   if (ftrace->tracer == NULL)
+   ftrace->tracer = DEFAULT_TRACER;
+
+   if (ftrace->dirname == NULL)
+   ftrace->dirname = DEFAULT_DIRNAME;
+
+   if (argc) {
+   ret = perf_evlist__prepare_workload(ftrace->evlist,
+   >target,
+   argv, false, true);
+   if (ret < 0)
+   goto out_maps;
+   }
+   return 0;
+
+out_maps:
+   perf_evlist__delete_maps(ftrace->evlist);
+out:
+   perf_evlist__delete(ftrace->evlist);
+
+   return ret;
+}
+
+static void ftrace_teardown(struct perf_ftrace *ftrace)
+{
+   perf_evlist__delete_maps(ftrace->evlist);
+   perf_evlist__delete(ftrace->evlist);
+}
+
 static int
 __cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, const char **argv)
 {
-   int ret = -1;
+   int ret;
const char * const live_usage[] = {
"perf ftrace live [] []",
"perf ftrace live [] --  []",
@@ -1441,47 +1490,22 @@ __cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, 
const char **argv)
if (!argc && perf_target__none(>target))
usage_with_options(live_usage, live_options);
 
-   ret = perf_target__validate(>target);
-   if (ret) {
-   char errbuf[512];
-
-   perf_target__strerror(>target, ret, errbuf, 512);
-   pr_err("%s\n", errbuf);
-   return -EINVAL;
-   }
-
-   ftrace->evlist = perf_evlist__new();
-   if (ftrace->evlist == NULL)
-   return -ENOMEM;
-
-   ret = perf_evlist__create_maps(ftrace->evlist, >target);
+   ret = ftrace_setup(ftrace, argc, argv);
if (ret < 0)
-   goto out;
-
-   if (ftrace->tracer == NULL)
-   ftrace->tracer = DEFAULT_TRACER;
-
-   if (argc && perf_evlist__prepare_workload(ftrace->evlist,
- >target,
- argv, false, true) < 0)
-   goto out_maps;
+   return ret;
 
setup_pager();
 
ret = do_ftrace_live(ftrace);
 
-out_maps:
-   perf_evlist__delete_maps(ftrace->evlist);
-out:
-   perf_evlist__delete(ftrace->evlist);
-
+   ftrace_teardown(ftrace);
return ret;
 }
 
 static int
 __cmd_ftrace_record(struct perf_ftrace *ftrace, int argc, const char **argv)
 {
-   int ret = -1;
+   int ret;
const char * const record_usage[] = {
"perf ftrace record [] []",
"perf ftrace record [] --  []",
@@ -1508,48 +1532,20 @@ __cmd_ftrace_record(struct perf_ftrace *ftrace, int 
argc, const char **argv)
if (!argc && perf_target__none(>target))
usage_with_options(record_usage, record_options);
 
-   ret = perf_target__validate(>target);
-   if (ret) {
-   char errbuf[512];
-
-   perf_target__strerror(>target, ret, errbuf, 512);
-   pr_err("%s\n", errbuf);
-   return -EINVAL;
-   }
-
-   ftrace->evlist = perf_evlist__new();
-   if (ftrace->evlist == NULL)
-   return -ENOMEM;
-
-   ret = perf_evlist__create_maps(ftrace->evlist, >target);
+   ret = ftrace_setup(ftrace, argc, argv);
if (ret < 0)
-   goto out;
-
-   if (ftrace->tracer == NULL)
-   ftrace->tracer = DEFAULT_TRACER;
-
-   if (ftrace->dirname == NULL)
-   ftrace->dirname = DEFAULT_DIRNAME;
-
-   if (argc && perf_evlist__prepare_workload(ftrace->evlist,
- >target,
- argv, false, true) < 0)
-   goto out_maps;
+   return ret;
 

[PATCH 15/18] perf ftrace: Add --clock option

2013-10-15 Thread Namhyung Kim
From: Namhyung Kim 

The --clock (-c) option is for controlling trace_clock.  Default to
'perf' if exists, or 'local'.

Cc: Steven Rostedt 
Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-ftrace.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 8fec8d6df37d..2a7acdbd6985 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -35,6 +35,7 @@ struct perf_ftrace {
struct perf_target target;
const char *tracer;
const char *dirname;
+   const char *clock;
struct pevent *pevent;
bool show_full_info;
 };
@@ -114,6 +115,9 @@ static int reset_tracing_files(struct perf_ftrace *ftrace 
__maybe_unused)
if (reset_tracing_cpu() < 0)
return -1;
 
+   if (write_tracing_file("trace_clock", "local") < 0)
+   return -1;
+
return 0;
 }
 
@@ -188,6 +192,24 @@ static int reset_tracing_cpu(void)
return 0;
 }
 
+static int set_tracing_clock(struct perf_ftrace *ftrace)
+{
+   const char *tclock = ftrace->clock;
+
+   if (tclock == NULL)
+   tclock = "perf";
+
+   if (!write_tracing_file("trace_clock", tclock))
+   return 0;
+
+   /* exit if user specified an invalid clock */
+   if (ftrace->clock)
+   return -1;
+
+   pr_debug("'perf' clock is not supported.. falling back to 'local' 
clock\n");
+   return write_tracing_file("trace_clock", "local");
+}
+
 static int setup_tracing_files(struct perf_ftrace *ftrace)
 {
int ret = -1;
@@ -213,6 +235,11 @@ static int setup_tracing_files(struct perf_ftrace *ftrace)
goto out;
}
 
+   if (set_tracing_clock(ftrace) < 0) {
+   pr_err("failed to set trace clock\n");
+   goto out;
+   }
+
if (write_tracing_file("current_tracer", ftrace->tracer) < 0) {
pr_err("failed to set current_tracer to %s\n", ftrace->tracer);
goto out;
@@ -1495,6 +1522,8 @@ __cmd_ftrace_live(struct perf_ftrace *ftrace, int argc, 
const char **argv)
"system-wide collection from all CPUs"),
OPT_STRING('C', "cpu", >target.cpu_list, "cpu",
"list of cpus to monitor"),
+   OPT_STRING('c', "clock", >clock, "clock",
+   "clock to be used for tracer"),
OPT_END()
};
 
@@ -1537,6 +1566,8 @@ __cmd_ftrace_record(struct perf_ftrace *ftrace, int argc, 
const char **argv)
"list of cpus to monitor"),
OPT_STRING('o', "output", >dirname, "dirname",
   "input directory name to use (default: perf.data)"),
+   OPT_STRING('c', "clock", >clock, "clock",
+   "clock to be used for tracer"),
OPT_END()
};
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 17/18] perf ftrace: Tidy up the function graph output of 'show' subcommand

2013-10-15 Thread Namhyung Kim
From: Namhyung Kim 

Now it doesn't call pevent_print_event() but prints context info in
itself using print_graph_duration().  Make it compact by only print
cpu number and duration:

  # perf ftrace show
  ...
   10)   0.065 us |  __fsnotify_parent();
   10)|  fsnotify() {
   10)   0.060 us |__srcu_read_lock();
   10)   0.040 us |__srcu_read_unlock();
   10)   0.652 us |  }
   10)   0.040 us |  fput();
   10)|  __audit_syscall_exit() {
   10)|path_put() {
   10)   0.037 us |  dput();
   10)   0.032 us |  mntput();
   10)   0.563 us |}
   10)   0.035 us |unroll_tree_refs();
   10)   0.035 us |kfree();
   10)   1.284 us |  }
   10)|  __audit_syscall_entry() {
   10)   0.029 us |current_kernel_time();
   10)   0.239 us |  }

Cc: Steven Rostedt 
Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-ftrace.c | 35 +--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 94f911946ef8..6720d560d6f8 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -898,6 +898,27 @@ static struct pevent_record *get_ordered_record(struct 
perf_ftrace *ftrace);
 
 static struct event_format *fgraph_exit_event;
 
+static void
+print_graph_duration(struct trace_seq *s, struct event_format *event,
+struct pevent_record *record)
+{
+   unsigned long long duration;
+   unsigned long long rettime, calltime;
+   unsigned long usec, nsec;
+
+   if (pevent_get_field_val(s, event, "rettime", record, , 1))
+   return;
+
+   if (pevent_get_field_val(s, event, "calltime", record, , 1))
+   return;
+
+   duration = rettime - calltime;
+   usec = duration / 1000;
+   nsec = duration % 1000;
+
+   trace_seq_printf(s, "%3d) %3lu.%03lu us |  ", record->cpu, usec, nsec);
+}
+
 static int
 fgraph_ent_handler(struct trace_seq *s, struct pevent_record *record,
   struct event_format *event, void *context)
@@ -930,9 +951,14 @@ fgraph_ent_handler(struct trace_seq *s, struct 
pevent_record *record,
if (next && next->cpu == record->cpu &&
pevent_data_type(event->pevent, next) == fgraph_exit_event->id) {
is_leaf = true;
+
+   print_graph_duration(s, fgraph_exit_event, next);
+
/* consume record */
get_ordered_record(ftrace);
free(next);
+   } else {
+   trace_seq_printf(s, "%3d) %*s |  ", record->cpu, 10, "");
}
 
 nested:
@@ -973,6 +999,8 @@ fgraph_ret_handler(struct trace_seq *s, struct 
pevent_record *record,
unsigned long long depth;
int i;
 
+   print_graph_duration(s, event, record);
+
if (pevent_get_field_val(s, event, "depth", record, , 1))
return trace_seq_putc(s, '!');
 
@@ -1284,9 +1312,12 @@ static int do_ftrace_show(struct perf_ftrace *ftrace)
continue;
}
 
-   pevent_print_event(ftrace->pevent, , record);
-   trace_seq_do_printf();
+   if (!strcmp(ftrace->tracer, "function_graph"))
+   pevent_event_info(, event, record);
+   else
+   pevent_print_event(ftrace->pevent, , record);
 
+   trace_seq_do_printf();
trace_seq_reset();
 
free(record);
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/18] perf ftrace: Show leaf-functions as oneliner

2013-10-15 Thread Namhyung Kim
From: Namhyung Kim 

Detect leaf functions and print them in a same line.

Note that it only converts leaf-functions that doesn't have any other
records between entry and exit even in other cpus.  I left other leaf
functions as is.

Cc: Steven Rostedt 
Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-ftrace.c | 87 +++--
 1 file changed, 76 insertions(+), 11 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 2a7acdbd6985..94f911946ef8 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -893,34 +893,77 @@ function_handler(struct trace_seq *s, struct 
pevent_record *record,
 
 #define TRACE_GRAPH_INDENT  2
 
+static struct pevent_record *peek_ordered_record(struct perf_ftrace *ftrace);
+static struct pevent_record *get_ordered_record(struct perf_ftrace *ftrace);
+
+static struct event_format *fgraph_exit_event;
+
 static int
 fgraph_ent_handler(struct trace_seq *s, struct pevent_record *record,
-  struct event_format *event, void *context __maybe_unused)
+  struct event_format *event, void *context)
 {
unsigned long long depth;
unsigned long long val;
const char *func;
+   struct perf_ftrace *ftrace = context;
+   struct pevent_record *next;
+   bool is_leaf = false;
+   bool needs_free = false;
+   void *data;
+   int ret = -1;
int i;
 
+   /*
+* record->data can be invalidated after calling peek_ordered_record()
+* because it can unmap the current kbuffer page.  Make a copy.
+*/
+   data = malloc(record->size);
+   if (data == NULL)
+   goto nested;
+
+   memcpy(data, record->data, record->size);
+   record->data = data;
+   needs_free = true;
+
+   /* detect leaf function and make it one-liner */
+   next = peek_ordered_record(ftrace);
+   if (next && next->cpu == record->cpu &&
+   pevent_data_type(event->pevent, next) == fgraph_exit_event->id) {
+   is_leaf = true;
+   /* consume record */
+   get_ordered_record(ftrace);
+   free(next);
+   }
+
+nested:
if (pevent_get_field_val(s, event, "depth", record, , 1))
-   return trace_seq_putc(s, '!');
+   goto out;
 
/* Function */
for (i = 0; i < (int)(depth * TRACE_GRAPH_INDENT); i++)
trace_seq_putc(s, ' ');
 
if (pevent_get_field_val(s, event, "func", record, , 1))
-   return trace_seq_putc(s, '!');
+   goto out;
 
func = pevent_find_function(event->pevent, val);
 
if (func)
-   trace_seq_printf(s, "%s() {", func);
+   trace_seq_printf(s, "%s()", func);
else
-   trace_seq_printf(s, "%llx() {", val);
+   trace_seq_printf(s, "%llx()", val);
 
-   trace_seq_putc(s, '\n');
-   return 0;
+   if (is_leaf)
+   trace_seq_puts(s, ";\n");
+   else
+   trace_seq_puts(s, " {\n");
+
+   ret = 0;
+out:
+   if (needs_free)
+   free(record->data);
+
+   return ret;
 }
 
 static int
@@ -1122,7 +1165,8 @@ get_ftrace_event_record(struct perf_ftrace *ftrace,
return fra->record;
 }
 
-static struct pevent_record *get_ordered_record(struct perf_ftrace *ftrace)
+static struct ftrace_report_arg *
+__get_ordered_record(struct perf_ftrace *ftrace)
 {
struct ftrace_report_arg *fra = NULL;
struct ftrace_report_arg *tmp;
@@ -1136,9 +1180,26 @@ static struct pevent_record *get_ordered_record(struct 
perf_ftrace *ftrace)
fra = tmp;
}
}
+   return fra;
+}
+
+static struct pevent_record *peek_ordered_record(struct perf_ftrace *ftrace)
+{
+   struct ftrace_report_arg *fra = __get_ordered_record(ftrace);
+
+   if (fra)
+   return fra->record;
+
+   return NULL;
+}
+
+static struct pevent_record *get_ordered_record(struct perf_ftrace *ftrace)
+{
+   struct ftrace_report_arg *fra = __get_ordered_record(ftrace);
 
if (fra) {
-   record = fra->record;
+   struct pevent_record *record = fra->record;
+
fra->record = NULL;
return record;
}
@@ -1194,10 +1255,10 @@ static int do_ftrace_show(struct perf_ftrace *ftrace)
  function_handler, NULL);
pevent_register_event_handler(ftrace->pevent, -1,
  "ftrace", "funcgraph_entry",
- fgraph_ent_handler, NULL);
+ fgraph_ent_handler, ftrace);
pevent_register_event_handler(ftrace->pevent, -1,
  "ftrace", "funcgraph_exit",
- fgraph_ret_handler, NULL);
+   

[PATCH 14/18] perf ftrace: Add a signal handler for SIGSEGV

2013-10-15 Thread Namhyung Kim
From: Namhyung Kim 

It's for debugging purpose.

Cc: Steven Rostedt 
Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-ftrace.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index fa7a9c59e228..8fec8d6df37d 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -41,6 +41,13 @@ struct perf_ftrace {
 
 static bool done;
 
+static void sig_exit(int sig)
+{
+   psignal(sig, "perf");
+   signal(sig, SIG_DFL);
+   raise(sig);
+}
+
 static void sig_handler(int sig __maybe_unused)
 {
done = true;
@@ -228,6 +235,7 @@ static int do_ftrace_live(struct perf_ftrace *ftrace)
signal(SIGUSR1, sig_handler);
signal(SIGCHLD, sig_handler);
signal(SIGPIPE, sig_handler);
+   signal(SIGSEGV, sig_exit);
 
if (setup_tracing_files(ftrace) < 0)
goto out_reset;
@@ -630,6 +638,7 @@ static int do_ftrace_record(struct perf_ftrace *ftrace)
signal(SIGINT, sig_handler);
signal(SIGUSR1, sig_handler);
signal(SIGCHLD, sig_handler);
+   signal(SIGSEGV, sig_exit);
 
if (setup_tracing_files(ftrace) < 0)
goto out_reset;
@@ -1139,6 +1148,8 @@ static int do_ftrace_show(struct perf_ftrace *ftrace)
},
};
 
+   signal(SIGSEGV, sig_exit);
+
canonicalize_directory_name(ftrace->dirname);
 
scnprintf(buf, sizeof(buf), "%s.dir/perf.header", ftrace->dirname);
@@ -1230,6 +1241,8 @@ static int do_ftrace_report(struct perf_ftrace *ftrace)
struct machine *machine;
struct dso *dso;
 
+   signal(SIGSEGV, sig_exit);
+
canonicalize_directory_name(ftrace->dirname);
 
scnprintf(buf, sizeof(buf), "%s.dir/perf.header", ftrace->dirname);
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/18] perf ftrace: Add support for -a and -C option

2013-10-15 Thread Namhyung Kim
From: Namhyung Kim 

The -a/--all-cpus and -C/--cpu option is for controlling tracing cpus.
To do that, add and use cpu_map__sprintf() function.

Cc: Steven Rostedt 
Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-ftrace.c | 69 +
 tools/perf/util/cpumap.c| 45 +
 tools/perf/util/cpumap.h|  1 +
 3 files changed, 115 insertions(+)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index bd415f2b1cde..642aa49c66d7 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -18,6 +18,7 @@
 #include "util/evlist.h"
 #include "util/target.h"
 #include "util/thread_map.h"
+#include "util/cpumap.h"
 
 
 #define DEFAULT_TRACER  "function_graph"
@@ -80,6 +81,8 @@ static int append_tracing_file(const char *name, const char 
*val)
return __write_tracing_file(name, val, true);
 }
 
+static int reset_tracing_cpu(void);
+
 static int reset_tracing_files(struct perf_ftrace *ftrace __maybe_unused)
 {
if (write_tracing_file("tracing_on", "0") < 0)
@@ -91,6 +94,9 @@ static int reset_tracing_files(struct perf_ftrace *ftrace 
__maybe_unused)
if (write_tracing_file("set_ftrace_pid", " ") < 0)
return -1;
 
+   if (reset_tracing_cpu() < 0)
+   return -1;
+
return 0;
 }
 
@@ -111,6 +117,60 @@ static int set_tracing_pid(struct perf_ftrace *ftrace)
return 0;
 }
 
+static int set_tracing_cpu(struct perf_ftrace *ftrace)
+{
+   char *cpumask;
+   size_t mask_size;
+   int ret;
+   int last_cpu;
+   struct cpu_map *cpumap = ftrace->evlist->cpus;
+
+   if (!perf_target__has_cpu(>target))
+   return 0;
+
+   last_cpu = cpumap->map[cpumap->nr - 1];
+   mask_size = (last_cpu + 3) / 4 + 1;
+   mask_size += last_cpu / 32; /* ',' is needed for every 32th cpus */
+
+   cpumask = malloc(mask_size);
+   if (cpumask == NULL) {
+   pr_debug("failed to allocate cpu mask\n");
+   return -1;
+   }
+
+   cpu_map__sprintf(cpumap, cpumask);
+
+   ret = write_tracing_file("tracing_cpumask", cpumask);
+
+   free(cpumask);
+   return ret;
+}
+
+static int reset_tracing_cpu(void)
+{
+   char *cpumask;
+   size_t mask_size;
+   int last_cpu;
+   struct cpu_map *cpumap = cpu_map__new(NULL);
+
+   last_cpu = cpumap->map[cpumap->nr - 1];
+   mask_size = (last_cpu + 3) / 4 + 1;
+   mask_size += last_cpu / 32; /* ',' is needed for every 32th cpus */
+
+   cpumask = malloc(mask_size);
+   if (cpumask == NULL) {
+   pr_debug("failed to allocate cpu mask\n");
+   return -1;
+   }
+
+   cpu_map__sprintf(cpumap, cpumask);
+
+   write_tracing_file("tracing_cpumask", cpumask);
+
+   free(cpumask);
+   return 0;
+}
+
 static int __cmd_ftrace(struct perf_ftrace *ftrace, int argc, const char 
**argv)
 {
char *trace_file;
@@ -147,6 +207,11 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int 
argc, const char **argv)
goto out_reset;
}
 
+   if (set_tracing_cpu(ftrace) < 0) {
+   pr_err("failed to set tracing cpumask\n");
+   goto out_reset;
+   }
+
if (write_tracing_file("current_tracer", ftrace->tracer) < 0) {
pr_err("failed to set current_tracer to %s\n", ftrace->tracer);
goto out_reset;
@@ -227,6 +292,10 @@ int cmd_ftrace(int argc, const char **argv, const char 
*prefix __maybe_unused)
   "trace on existing process id"),
OPT_INCR('v', "verbose", ,
 "be more verbose"),
+   OPT_BOOLEAN('a', "all-cpus", _wide,
+   "system-wide collection from all CPUs"),
+   OPT_STRING('C', "cpu", _list, "cpu",
+   "list of cpus to monitor"),
OPT_END()
};
 
diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
index beb8cf9f9976..7933839915ea 100644
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
@@ -187,6 +187,51 @@ size_t cpu_map__fprintf(struct cpu_map *map, FILE *fp)
return printed + fprintf(fp, "\n");
 }
 
+static char hex_char(char val)
+{
+   if (0 <= val && val <= 9)
+   return val + '0';
+   if (10 <= val && val < 16)
+   return val - 10 + 'a';
+   return '?';
+}
+
+size_t cpu_map__sprintf(struct cpu_map *map, char *buf)
+{
+   int i, cpu;
+   char *ptr = buf;
+   unsigned char *bitmap;
+   int last_cpu = map->map[map->nr - 1];
+
+   bitmap = zalloc((last_cpu + 7) / 8);
+   if (bitmap == NULL) {
+   buf[0] = '\0';
+   return 0;
+   }
+
+   for (i = 0; i < map->nr; i++) {
+   cpu = map->map[i];
+   bitmap[cpu / 8] |= 1 << (cpu % 8);
+   }
+
+   for (cpu = last_cpu / 4 * 4; cpu >= 0; cpu -= 4) {
+   unsigned char bits = 

[PATCH 08/18] perf ftrace: Add 'show' sub-command

2013-10-15 Thread Namhyung Kim
From: Namhyung Kim 

The ftrace show subcommand is for viewing recorded ftrace files.  It
enters perf.data.dir directory and open perf.header file to find out
necessary information.  And then read out per-cpu trace records using
kbuffer helper and print them to stdout in time order.

It only shows its basic form so function graph doesn't show duration/
overhead and no leaf entry handling is provided yet.  Maybe it can be
handled by a proper plugin in the libtraceevent.

Cc: Frederic Weisbecker 
Cc: Steven Rostedt 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-ftrace.c | 435 +++-
 1 file changed, 433 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 52bb8137daf2..3a12fb9d4b94 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "util/debug.h"
 #include "util/parse-options.h"
@@ -21,6 +22,8 @@
 #include "util/thread_map.h"
 #include "util/cpumap.h"
 #include "util/trace-event.h"
+#include "../lib/traceevent/kbuffer.h"
+#include "../lib/traceevent/event-parse.h"
 
 
 #define DEFAULT_TRACER  "function_graph"
@@ -31,6 +34,7 @@ struct perf_ftrace {
struct perf_target target;
const char *tracer;
const char *dirname;
+   struct pevent *pevent;
 };
 
 static bool done;
@@ -819,6 +823,379 @@ out_reset:
 }
 
 static int
+function_handler(struct trace_seq *s, struct pevent_record *record,
+struct event_format *event, void *context __maybe_unused)
+{
+   struct pevent *pevent = event->pevent;
+   unsigned long long function;
+   const char *func;
+
+   if (pevent_get_field_val(s, event, "ip", record, , 1))
+   return trace_seq_putc(s, '!');
+
+   func = pevent_find_function(pevent, function);
+   if (func)
+   trace_seq_printf(s, "%s <-- ", func);
+   else
+   trace_seq_printf(s, "0x%llx", function);
+
+   if (pevent_get_field_val(s, event, "parent_ip", record, , 1))
+   return trace_seq_putc(s, '!');
+
+   func = pevent_find_function(pevent, function);
+   if (func)
+   trace_seq_printf(s, "%s", func);
+   else
+   trace_seq_printf(s, "0x%llx", function);
+
+   trace_seq_putc(s, '\n');
+   return 0;
+}
+
+#define TRACE_GRAPH_INDENT  2
+
+static int
+fgraph_ent_handler(struct trace_seq *s, struct pevent_record *record,
+  struct event_format *event, void *context __maybe_unused)
+{
+   unsigned long long depth;
+   unsigned long long val;
+   const char *func;
+   int i;
+
+   if (pevent_get_field_val(s, event, "depth", record, , 1))
+   return trace_seq_putc(s, '!');
+
+   /* Function */
+   for (i = 0; i < (int)(depth * TRACE_GRAPH_INDENT); i++)
+   trace_seq_putc(s, ' ');
+
+   if (pevent_get_field_val(s, event, "func", record, , 1))
+   return trace_seq_putc(s, '!');
+
+   func = pevent_find_function(event->pevent, val);
+
+   if (func)
+   trace_seq_printf(s, "%s() {", func);
+   else
+   trace_seq_printf(s, "%llx() {", val);
+
+   trace_seq_putc(s, '\n');
+   return 0;
+}
+
+static int
+fgraph_ret_handler(struct trace_seq *s, struct pevent_record *record,
+  struct event_format *event, void *context __maybe_unused)
+{
+   unsigned long long depth;
+   int i;
+
+   if (pevent_get_field_val(s, event, "depth", record, , 1))
+   return trace_seq_putc(s, '!');
+
+   /* Function */
+   for (i = 0; i < (int)(depth * TRACE_GRAPH_INDENT); i++)
+   trace_seq_putc(s, ' ');
+
+   trace_seq_puts(s, "}\n");
+   return 0;
+}
+
+struct perf_ftrace_report {
+   struct perf_ftrace *ftrace;
+   struct perf_tool tool;
+};
+
+struct ftrace_report_arg {
+   struct list_head node;
+   struct pevent_record *record;
+   struct kbuffer *kbuf;
+   void *map;
+   int cpu;
+   int fd;
+   int done;
+   off_t offset;
+   off_t size;
+};
+
+static LIST_HEAD(ftrace_cpu_buffers);
+
+static int process_sample_event(struct perf_tool *tool,
+   union perf_event * event __maybe_unused,
+   struct perf_sample *sample,
+   struct perf_evsel *evsel __maybe_unused,
+   struct machine *machine __maybe_unused)
+{
+   struct perf_ftrace *ftrace;
+   struct perf_ftrace_report *report;
+   struct ftrace_report_arg *fra;
+   struct stat statbuf;
+   enum kbuffer_long_size long_size;
+   enum kbuffer_endian endian;
+   char buf[PATH_MAX];
+
+   report = container_of(tool, struct perf_ftrace_report, tool);
+   ftrace = report->ftrace;
+
+   if (perf_target__has_cpu(>target)) {
+   int i;
+  

[PATCH 07/18] perf ftrace: Add 'record' sub-command

2013-10-15 Thread Namhyung Kim
From: Namhyung Kim 

The ftrace record command is for saving raw ftrace buffer contents
which can be get from per_cpu/cpuX/trace_pipe_raw.

Since ftrace events are generated very frequently so single thread for
recording mostly resulted in buffer overruns.  Thus it uses per-cpu
recorder thread to prevent such cases and they save the contents to
their own files.

These per-cpu data files are saved in a directory so that they can be
easily found when needed.  I chose the default directory name as
"perf.data.dir" and the first two (i.e. "perf.data") can be changed
with -o option.  The structure of the directory looks like:

  $ tree perf.data.dir
  perf.data.dir/
  |-- perf.header
  |-- trace-cpu0.buf
  |-- trace-cpu1.buf
  |-- trace-cpu2.buf
  `-- trace-cpu3.buf

In addition to trace-cpuX.buf files, it has perf.header file also.
The perf.header file is compatible with existing perf.data format and
contains usual event information, feature mask and sample data.  The
sample data is synthesized to indicate given cpu has a record file.

Cc: Steven Rostedt 
Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-ftrace.c | 659 ++--
 1 file changed, 642 insertions(+), 17 deletions(-)

diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 1bb6d1ff0eb1..52bb8137daf2 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "util/debug.h"
 #include "util/parse-options.h"
@@ -19,14 +20,17 @@
 #include "util/target.h"
 #include "util/thread_map.h"
 #include "util/cpumap.h"
+#include "util/trace-event.h"
 
 
 #define DEFAULT_TRACER  "function_graph"
+#define DEFAULT_DIRNAME  "perf.data"
 
 struct perf_ftrace {
struct perf_evlist *evlist;
struct perf_target target;
const char *tracer;
+   const char *dirname;
 };
 
 static bool done;
@@ -171,40 +175,56 @@ static int reset_tracing_cpu(void)
return 0;
 }
 
-static int do_ftrace_live(struct perf_ftrace *ftrace)
+static int setup_tracing_files(struct perf_ftrace *ftrace)
 {
-   char *trace_file;
-   int trace_fd;
-   char buf[4096];
-   /* sleep 1ms if no data read */
-   struct timespec req = { .tv_nsec = 100 };
-
-   signal(SIGINT, sig_handler);
-   signal(SIGUSR1, sig_handler);
-   signal(SIGCHLD, sig_handler);
+   int ret = -1;
 
-   if (reset_tracing_files(ftrace) < 0)
+   if (reset_tracing_files(ftrace) < 0) {
+   pr_err("failed to reset tracing files\n");
goto out;
+   }
 
/* reset ftrace buffer */
-   if (write_tracing_file("trace", "0") < 0)
+   if (write_tracing_file("trace", "0") < 0) {
+   pr_err("failed to reset ftrace buffer\n");
goto out;
+   }
 
if (set_tracing_pid(ftrace) < 0) {
pr_err("failed to set ftrace pid\n");
-   goto out_reset;
+   goto out;
}
 
if (set_tracing_cpu(ftrace) < 0) {
pr_err("failed to set tracing cpumask\n");
-   goto out_reset;
+   goto out;
}
 
if (write_tracing_file("current_tracer", ftrace->tracer) < 0) {
pr_err("failed to set current_tracer to %s\n", ftrace->tracer);
-   goto out_reset;
+   goto out;
}
 
+   ret = 0;
+out:
+   return ret;
+}
+
+static int do_ftrace_live(struct perf_ftrace *ftrace)
+{
+   char *trace_file;
+   int trace_fd;
+   char buf[4096];
+   /* sleep 1ms if no data read */
+   struct timespec req = { .tv_nsec = 100 };
+
+   signal(SIGINT, sig_handler);
+   signal(SIGUSR1, sig_handler);
+   signal(SIGCHLD, sig_handler);
+
+   if (setup_tracing_files(ftrace) < 0)
+   goto out_reset;
+
trace_file = get_tracing_file("trace_pipe");
if (!trace_file) {
pr_err("failed to open trace_pipe\n");
@@ -259,7 +279,542 @@ out_close_fd:
close(trace_fd);
 out_reset:
reset_tracing_files(ftrace);
+   return done ? 0 : -1;
+}
+
+static int alloc_ftrace_evsel(struct perf_ftrace *ftrace)
+{
+   struct perf_evsel *evsel;
+
+   if (!strcmp(ftrace->tracer, "function")) {
+   if (perf_evlist__add_newtp(ftrace->evlist, "ftrace",
+  "function", NULL) < 0) {
+   pr_err("failed to allocate ftrace event\n");
+   return -1;
+   }
+   } else if (!strcmp(ftrace->tracer, "function_graph")) {
+   if (perf_evlist__add_newtp(ftrace->evlist, "ftrace",
+  "funcgraph_entry", NULL) ||
+   perf_evlist__add_newtp(ftrace->evlist, "ftrace",
+  "funcgraph_exit", NULL)) {
+   pr_err("failed to allocate ftrace 

Re: About [PATCH 1/2] regulator: core: Provide a dummy regulator with full constraints

2013-10-15 Thread Wei Ni
On 10/12/2013 08:14 PM, Mark Brown wrote:
> * PGP Signed by an unknown key
> 
> On Tue, Oct 08, 2013 at 05:46:49PM +0800, Wei Ni wrote:
> 
>> In the regulator_dev_lookup(), it will try to read the "xx-supply" to
>> get the regnode, but I didn't set the vcc-supply in dts file for lm90,
>> so the of_get_regulator() will return NULL, then the
>> regulator_dev_lookup() will set the ret to -ENODEV, and return the rdev
>> as NULL.
> 
> OK, I think the device tree board code just needs to set full
> constraints during machine initialisation.  We can't have multiple

Hi, Mark, do you mean we can call regulator_has_full_constraints(), but
some platforms doesn't have board file now, they only have dts file, how
to call this functions?

Wei.

> initcalls in the regulator code and doing it there is really a bit of a
> workaround anyway.
> 
> * Unknown Key
> * 0x7EA229BD
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] usb: misc: usb3503: Fix compile error due to incorrect regmap depedency

2013-10-15 Thread Matthew Dawson
The USB3503 driver had an incorrect depedency on REGMAP, instead of
REGMAP_I2C.  This caused the build to fail since the necessary regmap
i2c pieces were not available.

Signed-off-by: Matthew Dawson 
---
 drivers/usb/misc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/misc/Kconfig b/drivers/usb/misc/Kconfig
index e2b21c1..ba5f70f 100644
--- a/drivers/usb/misc/Kconfig
+++ b/drivers/usb/misc/Kconfig
@@ -246,6 +246,6 @@ config USB_EZUSB_FX2
 config USB_HSIC_USB3503
tristate "USB3503 HSIC to USB20 Driver"
depends on I2C
-   select REGMAP
+   select REGMAP_I2C
help
  This option enables support for SMSC USB3503 HSIC to USB 2.0 Driver.
-- 
1.8.1.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 03/12] mrst: Fixed checkpatch warnings

2013-10-15 Thread David Cohen
From: Kuppuswamy Sathyanarayanan 

Fixed checkpatch warnings in mrst related files.

Signed-off-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: David Cohen 
---
 arch/x86/platform/mrst/mrst.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/platform/mrst/mrst.c b/arch/x86/platform/mrst/mrst.c
index 235a742..2a45eab 100644
--- a/arch/x86/platform/mrst/mrst.c
+++ b/arch/x86/platform/mrst/mrst.c
@@ -977,7 +977,7 @@ static int __init sfi_parse_devs(struct sfi_table_header 
*table)
case SFI_DEV_TYPE_UART:
case SFI_DEV_TYPE_HSI:
default:
-   ;
+   break;
}
}
return 0;
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 02/12] mrst: Fixed indentation issues

2013-10-15 Thread David Cohen
From: Kuppuswamy Sathyanarayanan 

Fixed indentation issues reported by checkpatch script in
mrst related files.

Signed-off-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: David Cohen 
---
 arch/x86/platform/mrst/early_printk_mrst.c |  3 ++-
 arch/x86/platform/mrst/mrst.c  | 24 +---
 2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/arch/x86/platform/mrst/early_printk_mrst.c 
b/arch/x86/platform/mrst/early_printk_mrst.c
index 95880f7..39ecc27 100644
--- a/arch/x86/platform/mrst/early_printk_mrst.c
+++ b/arch/x86/platform/mrst/early_printk_mrst.c
@@ -219,7 +219,8 @@ static void early_mrst_spi_putc(char c)
 }
 
 /* Early SPI only uses polling mode */
-static void early_mrst_spi_write(struct console *con, const char *str, 
unsigned n)
+static void early_mrst_spi_write(struct console *con, const char *str,
+   unsigned n)
 {
int i;
 
diff --git a/arch/x86/platform/mrst/mrst.c b/arch/x86/platform/mrst/mrst.c
index b9aeb54..235a742 100644
--- a/arch/x86/platform/mrst/mrst.c
+++ b/arch/x86/platform/mrst/mrst.c
@@ -131,7 +131,7 @@ struct sfi_timer_table_entry *sfi_get_mtmr(int hint)
int i;
if (hint < sfi_mtimer_num) {
if (!sfi_mtimer_usage[hint]) {
-   pr_debug("hint taken for timer %d irq %d\n",\
+   pr_debug("hint taken for timer %d irq %d\n",
hint, sfi_mtimer_array[hint].irq);
sfi_mtimer_usage[hint] = 1;
return _mtimer_array[hint];
@@ -679,14 +679,14 @@ static void *msic_thermal_platform_data(void *info)
 /* tc35876x DSI-LVDS bridge chip and panel platform data */
 static void *tc35876x_platform_data(void *data)
 {
-   static struct tc35876x_platform_data pdata;
+   static struct tc35876x_platform_data pdata;
 
-   /* gpio pins set to -1 will not be used by the driver */
-   pdata.gpio_bridge_reset = get_gpio_by_name("LCMB_RXEN");
-   pdata.gpio_panel_bl_en = get_gpio_by_name("6S6P_BL_EN");
-   pdata.gpio_panel_vadd = get_gpio_by_name("EN_VREG_LCD_V3P3");
+   /* gpio pins set to -1 will not be used by the driver */
+   pdata.gpio_bridge_reset = get_gpio_by_name("LCMB_RXEN");
+   pdata.gpio_panel_bl_en = get_gpio_by_name("6S6P_BL_EN");
+   pdata.gpio_panel_vadd = get_gpio_by_name("EN_VREG_LCD_V3P3");
 
-   return 
+   return 
 }
 
 static const struct devs_id __initconst device_ids[] = {
@@ -729,7 +729,7 @@ static int i2c_next_dev;
 
 static void __init intel_scu_device_register(struct platform_device *pdev)
 {
-   if(ipc_next_dev == MAX_IPCDEVS)
+   if (ipc_next_dev == MAX_IPCDEVS)
pr_err("too many SCU IPC devices");
else
ipc_devs[ipc_next_dev++] = pdev;
@@ -872,7 +872,8 @@ static void __init sfi_handle_spi_dev(struct spi_board_info 
*spi_info)
 
while (dev->name[0]) {
if (dev->type == SFI_DEV_TYPE_SPI &&
-   !strncmp(dev->name, spi_info->modalias, 
SFI_NAME_LEN)) {
+   !strncmp(dev->name, spi_info->modalias,
+   SFI_NAME_LEN)) {
pdata = dev->get_platform_data(spi_info);
break;
}
@@ -904,7 +905,7 @@ static void __init sfi_handle_i2c_dev(int bus, struct 
i2c_board_info *i2c_info)
intel_scu_i2c_device_register(bus, i2c_info);
else
i2c_register_board_info(bus, i2c_info, 1);
- }
+}
 
 
 static int __init sfi_parse_devs(struct sfi_table_header *table)
@@ -1034,7 +1035,8 @@ static int __init pb_keys_init(void)
num = sizeof(gpio_button) / sizeof(struct gpio_keys_button);
for (i = 0; i < num; i++) {
gb[i].gpio = get_gpio_by_name(gb[i].desc);
-   pr_debug("info[%2d]: name = %s, gpio = %d\n", i, gb[i].desc, 
gb[i].gpio);
+   pr_debug("info[%2d]: name = %s, gpio = %d\n", i, gb[i].desc,
+   gb[i].gpio);
if (gb[i].gpio == -1)
continue;
 
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 06/12] intel_mid: Refactored sfi_parse_devs() function

2013-10-15 Thread David Cohen
From: Kuppuswamy Sathyanarayanan 

SFI device_id[] table parsing code is duplicated in every SFI
device handler. This patch removes this code duplication, by
adding a seperate function get_device_id() to parse through the
device table. Also this patch moves the SPI, I2C, IPC info code from
sfi_parse_devs() to respective device handlers.

Signed-off-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: David Cohen 
---
 arch/x86/platform/intel-mid/intel-mid.c | 141 
 1 file changed, 71 insertions(+), 70 deletions(-)

diff --git a/arch/x86/platform/intel-mid/intel-mid.c 
b/arch/x86/platform/intel-mid/intel-mid.c
index 742b7bf..f9c4be8 100644
--- a/arch/x86/platform/intel-mid/intel-mid.c
+++ b/arch/x86/platform/intel-mid/intel-mid.c
@@ -831,20 +831,15 @@ static void __init install_irq_resource(struct 
platform_device *pdev, int irq)
platform_device_add_resources(pdev, , 1);
 }
 
-static void __init sfi_handle_ipc_dev(struct sfi_device_table_entry *entry)
+static void __init sfi_handle_ipc_dev(struct sfi_device_table_entry *pentry,
+   struct devs_id *dev)
 {
-   const struct devs_id *dev = device_ids;
struct platform_device *pdev;
void *pdata = NULL;
 
-   while (dev->name[0]) {
-   if (dev->type == SFI_DEV_TYPE_IPC &&
-   !strncmp(dev->name, entry->name, SFI_NAME_LEN)) {
-   pdata = dev->get_platform_data(entry);
-   break;
-   }
-   dev++;
-   }
+   pr_debug("IPC bus, name = %16.16s, irq = 0x%2x\n",
+   pentry->name, pentry->irq);
+   pdata = dev->get_platform_data(pentry);
 
/*
 * On Medfield the platform device creation is handled by the MSIC
@@ -853,68 +848,94 @@ static void __init sfi_handle_ipc_dev(struct 
sfi_device_table_entry *entry)
if (intel_mid_has_msic())
return;
 
-   pdev = platform_device_alloc(entry->name, 0);
+   pdev = platform_device_alloc(pentry->name, 0);
if (pdev == NULL) {
pr_err("out of memory for SFI platform device '%s'.\n",
-   entry->name);
+   pentry->name);
return;
}
-   install_irq_resource(pdev, entry->irq);
+   install_irq_resource(pdev, pentry->irq);
 
pdev->dev.platform_data = pdata;
intel_scu_device_register(pdev);
 }
 
-static void __init sfi_handle_spi_dev(struct spi_board_info *spi_info)
+static void __init sfi_handle_spi_dev(struct sfi_device_table_entry *pentry,
+   struct devs_id *dev)
 {
-   const struct devs_id *dev = device_ids;
+   struct spi_board_info spi_info;
void *pdata = NULL;
 
-   while (dev->name[0]) {
-   if (dev->type == SFI_DEV_TYPE_SPI &&
-   !strncmp(dev->name, spi_info->modalias,
-   SFI_NAME_LEN)) {
-   pdata = dev->get_platform_data(spi_info);
-   break;
-   }
-   dev++;
-   }
-   spi_info->platform_data = pdata;
+   memset(_info, 0, sizeof(spi_info));
+   strncpy(spi_info.modalias, pentry->name, SFI_NAME_LEN);
+   spi_info.irq = ((pentry->irq == (u8)0xff) ? 0 : pentry->irq);
+   spi_info.bus_num = pentry->host_num;
+   spi_info.chip_select = pentry->addr;
+   spi_info.max_speed_hz = pentry->max_freq;
+   pr_debug("SPI bus=%d, name=%16.16s, irq=0x%2x, max_freq=%d, cs=%d\n",
+   spi_info.bus_num,
+   spi_info.modalias,
+   spi_info.irq,
+   spi_info.max_speed_hz,
+   spi_info.chip_select);
+
+   pdata = dev->get_platform_data(_info);
+
+   spi_info.platform_data = pdata;
if (dev->delay)
-   intel_scu_spi_device_register(spi_info);
+   intel_scu_spi_device_register(_info);
else
-   spi_register_board_info(spi_info, 1);
+   spi_register_board_info(_info, 1);
 }
 
-static void __init sfi_handle_i2c_dev(int bus, struct i2c_board_info *i2c_info)
+static void __init sfi_handle_i2c_dev(struct sfi_device_table_entry *pentry,
+   struct devs_id *dev)
 {
-   const struct devs_id *dev = device_ids;
+   struct i2c_board_info i2c_info;
void *pdata = NULL;
 
+   memset(_info, 0, sizeof(i2c_info));
+   strncpy(i2c_info.type, pentry->name, SFI_NAME_LEN);
+   i2c_info.irq = ((pentry->irq == (u8)0xff) ? 0 : pentry->irq);
+   i2c_info.addr = pentry->addr;
+   pr_debug("I2C bus = %d, name = %16.16s, irq = 0x%2x, addr = 0x%x\n",
+   pentry->host_num,
+   i2c_info.type,
+   i2c_info.irq,
+   i2c_info.addr);
+   pdata = dev->get_platform_data(_info);
+   i2c_info.platform_data = pdata;
+
+   if (dev->delay)
+   

[PATCH v5 07/12] intel_mid: Added custom device_handler support

2013-10-15 Thread David Cohen
From: Kuppuswamy Sathyanarayanan 

This patch provides a means to add custom handler for
SFI devices. If you set device_handler as NULL in
device_id table standard SFI device handler will be used.
If its not NULL custom handler will be called.

Signed-off-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: David Cohen 
---
 arch/x86/platform/intel-mid/intel-mid.c | 74 ++---
 1 file changed, 40 insertions(+), 34 deletions(-)

diff --git a/arch/x86/platform/intel-mid/intel-mid.c 
b/arch/x86/platform/intel-mid/intel-mid.c
index f9c4be8..7bfd784 100644
--- a/arch/x86/platform/intel-mid/intel-mid.c
+++ b/arch/x86/platform/intel-mid/intel-mid.c
@@ -396,6 +396,9 @@ struct devs_id {
u8 type;
u8 delay;
void *(*get_platform_data)(void *info);
+   /* Custom handler for devices */
+   void (*device_handler)(struct sfi_device_table_entry *pentry,
+   struct devs_id *dev);
 };
 
 /* the offset for the mapping of global gpio pin to irq */
@@ -690,28 +693,27 @@ static void *tc35876x_platform_data(void *data)
 }
 
 static const struct devs_id __initconst device_ids[] = {
-   {"bma023", SFI_DEV_TYPE_I2C, 1, _platform_data},
-   {"pmic_gpio", SFI_DEV_TYPE_SPI, 1, _gpio_platform_data},
-   {"pmic_gpio", SFI_DEV_TYPE_IPC, 1, _gpio_platform_data},
-   {"spi_max3111", SFI_DEV_TYPE_SPI, 0, _platform_data},
-   {"i2c_max7315", SFI_DEV_TYPE_I2C, 1, _platform_data},
-   {"i2c_max7315_2", SFI_DEV_TYPE_I2C, 1, _platform_data},
-   {"tca6416", SFI_DEV_TYPE_I2C, 1, _platform_data},
-   {"emc1403", SFI_DEV_TYPE_I2C, 1, _platform_data},
-   {"i2c_accel", SFI_DEV_TYPE_I2C, 0, _platform_data},
-   {"pmic_audio", SFI_DEV_TYPE_IPC, 1, _platform_data},
-   {"mpu3050", SFI_DEV_TYPE_I2C, 1, _platform_data},
-   {"i2c_disp_brig", SFI_DEV_TYPE_I2C, 0, _platform_data},
+   {"bma023", SFI_DEV_TYPE_I2C, 1, _platform_data, NULL},
+   {"pmic_gpio", SFI_DEV_TYPE_SPI, 1, _gpio_platform_data, NULL},
+   {"pmic_gpio", SFI_DEV_TYPE_IPC, 1, _gpio_platform_data, NULL},
+   {"spi_max3111", SFI_DEV_TYPE_SPI, 0, _platform_data, NULL},
+   {"i2c_max7315", SFI_DEV_TYPE_I2C, 1, _platform_data, NULL},
+   {"i2c_max7315_2", SFI_DEV_TYPE_I2C, 1, _platform_data, NULL},
+   {"tca6416", SFI_DEV_TYPE_I2C, 1, _platform_data, NULL},
+   {"emc1403", SFI_DEV_TYPE_I2C, 1, _platform_data, NULL},
+   {"i2c_accel", SFI_DEV_TYPE_I2C, 0, _platform_data, NULL},
+   {"pmic_audio", SFI_DEV_TYPE_IPC, 1, _platform_data, NULL},
+   {"mpu3050", SFI_DEV_TYPE_I2C, 1, _platform_data, NULL},
+   {"i2c_disp_brig", SFI_DEV_TYPE_I2C, 0, _platform_data, NULL},
 
/* MSIC subdevices */
-   {"msic_battery", SFI_DEV_TYPE_IPC, 1, _battery_platform_data},
-   {"msic_gpio", SFI_DEV_TYPE_IPC, 1, _gpio_platform_data},
-   {"msic_audio", SFI_DEV_TYPE_IPC, 1, _audio_platform_data},
-   {"msic_power_btn", SFI_DEV_TYPE_IPC, 1, _power_btn_platform_data},
-   {"msic_ocd", SFI_DEV_TYPE_IPC, 1, _ocd_platform_data},
-   {"msic_thermal", SFI_DEV_TYPE_IPC, 1, _thermal_platform_data},
-
-   {},
+   {"msic_battery", SFI_DEV_TYPE_IPC, 1, _battery_platform_data, 
NULL},
+   {"msic_gpio", SFI_DEV_TYPE_IPC, 1, _gpio_platform_data, NULL},
+   {"msic_audio", SFI_DEV_TYPE_IPC, 1, _audio_platform_data, NULL},
+   {"msic_power_btn", SFI_DEV_TYPE_IPC, 1, _power_btn_platform_data, 
NULL},
+   {"msic_ocd", SFI_DEV_TYPE_IPC, 1, _ocd_platform_data, NULL},
+   {"msic_thermal", SFI_DEV_TYPE_IPC, 1, _thermal_platform_data, 
NULL},
+   { 0 }
 };
 
 #define MAX_IPCDEVS24
@@ -965,20 +967,24 @@ static int __init sfi_parse_devs(struct sfi_table_header 
*table)
if ((dev == NULL) || (dev->get_platform_data == NULL))
continue;
 
-   switch (pentry->type) {
-   case SFI_DEV_TYPE_IPC:
-   sfi_handle_ipc_dev(pentry, dev);
-   break;
-   case SFI_DEV_TYPE_SPI:
-   sfi_handle_spi_dev(pentry, dev);
-   break;
-   case SFI_DEV_TYPE_I2C:
-   sfi_handle_i2c_dev(pentry, dev);
-   break;
-   case SFI_DEV_TYPE_UART:
-   case SFI_DEV_TYPE_HSI:
-   default:
-   break;
+   if (dev->device_handler) {
+   dev->device_handler(pentry, dev);
+   } else {
+   switch (pentry->type) {
+   case SFI_DEV_TYPE_IPC:
+   sfi_handle_ipc_dev(pentry, dev);
+   break;
+   case SFI_DEV_TYPE_SPI:
+   sfi_handle_spi_dev(pentry, dev);
+   break;
+   case SFI_DEV_TYPE_I2C:
+   sfi_handle_i2c_dev(pentry, dev);
+

[PATCH v5 01/12] mrst: Fixed printk/pr_* related issues

2013-10-15 Thread David Cohen
From: Kuppuswamy Sathyanarayanan 

Fixed printk and pr_* related issues in mrst related files.

Signed-off-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: David Cohen 
---
 arch/x86/platform/mrst/early_printk_mrst.c | 2 +-
 arch/x86/platform/mrst/mrst.c  | 2 +-
 arch/x86/platform/mrst/vrtc.c  | 5 ++---
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/platform/mrst/early_printk_mrst.c 
b/arch/x86/platform/mrst/early_printk_mrst.c
index 028454f..95880f7 100644
--- a/arch/x86/platform/mrst/early_printk_mrst.c
+++ b/arch/x86/platform/mrst/early_printk_mrst.c
@@ -213,7 +213,7 @@ static void early_mrst_spi_putc(char c)
}
 
if (!timeout)
-   pr_warning("MRST earlycon: timed out\n");
+   pr_warn("MRST earlycon: timed out\n");
else
max3110_write_data(c);
 }
diff --git a/arch/x86/platform/mrst/mrst.c b/arch/x86/platform/mrst/mrst.c
index 3ca5957..b9aeb54 100644
--- a/arch/x86/platform/mrst/mrst.c
+++ b/arch/x86/platform/mrst/mrst.c
@@ -328,7 +328,7 @@ static inline int __init setup_x86_mrst_timer(char *arg)
else if (strcmp("lapic_and_apbt", arg) == 0)
mrst_timer_options = MRST_TIMER_LAPIC_APBT;
else {
-   pr_warning("X86 MRST timer option %s not recognised"
+   pr_warn("X86 MRST timer option %s not recognised"
   " use x86_mrst_timer=apbt_only or lapic_and_apbt\n",
   arg);
return -EINVAL;
diff --git a/arch/x86/platform/mrst/vrtc.c b/arch/x86/platform/mrst/vrtc.c
index 5e355b1..ca4f7d9 100644
--- a/arch/x86/platform/mrst/vrtc.c
+++ b/arch/x86/platform/mrst/vrtc.c
@@ -79,7 +79,7 @@ void vrtc_get_time(struct timespec *now)
/* vRTC YEAR reg contains the offset to 1972 */
year += 1972;
 
-   printk(KERN_INFO "vRTC: sec: %d min: %d hour: %d day: %d "
+   pr_info("vRTC: sec: %d min: %d hour: %d day: %d "
"mon: %d year: %d\n", sec, min, hour, mday, mon, year);
 
now->tv_sec = mktime(year, mon, mday, hour, min, sec);
@@ -109,8 +109,7 @@ int vrtc_set_mmss(const struct timespec *now)
vrtc_cmos_write(tm.tm_sec, RTC_SECONDS);
spin_unlock_irqrestore(_lock, flags);
} else {
-   printk(KERN_ERR
-  "%s: Invalid vRTC value: write of %lx to vRTC failed\n",
+   pr_err("%s: Invalid vRTC value: write of %lx to vRTC failed\n",
__FUNCTION__, now->tv_sec);
retval = -EINVAL;
}
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 05/12] intel_mid: Renamed *mrst* to *intel_mid*

2013-10-15 Thread David Cohen
From: Kuppuswamy Sathyanarayanan 

mrst is used as common name to represent all intel_mid type
soc's. But moorsetwon is just one of the intel_mid soc. So
renamed them to use intel_mid.

This patch mainly renames the variables and related
functions that uses *mrst* prefix with *intel_mid*.

To ensure that there are no functional changes, I have compared
the objdump of related files before and after rename and found
the only difference is symbol and name changes.

Signed-off-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: David Cohen 
---
 Documentation/kernel-parameters.txt|   6 +-
 arch/x86/include/asm/intel-mid.h   |  26 ++---
 arch/x86/include/asm/setup.h   |   4 +-
 arch/x86/include/uapi/asm/bootparam.h  |   2 +-
 arch/x86/kernel/apb_timer.c|   8 +-
 arch/x86/kernel/head32.c   |   4 +-
 arch/x86/kernel/rtc.c  |   2 +-
 arch/x86/pci/intel_mid_pci.c   |  12 +--
 .../platform/intel-mid/early_printk_intel_mid.c|   2 +-
 arch/x86/platform/intel-mid/intel-mid.c| 109 ++---
 arch/x86/platform/intel-mid/intel_mid_vrtc.c   |   8 +-
 drivers/platform/x86/intel_scu_ipc.c   |   2 +-
 drivers/watchdog/intel_scu_watchdog.c  |   2 +-
 13 files changed, 93 insertions(+), 94 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index fcbb736..dfaeb0c 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3471,11 +3471,11 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
default x2apic cluster mode on platforms
supporting x2apic.
 
-   x86_mrst_timer= [X86-32,APBT]
-   Choose timer option for x86 Moorestown MID platform.
+   x86_intel_mid_timer= [X86-32,APBT]
+   Choose timer option for x86 Intel MID platform.
Two valid options are apbt timer only and lapic timer
plus one apbt timer for broadcast timer.
-   x86_mrst_timer=apbt_only | lapic_and_apbt
+   x86_intel_mid_timer=apbt_only | lapic_and_apbt
 
xen_emul_unplug=[HW,X86,XEN]
Unplug Xen emulated devices
diff --git a/arch/x86/include/asm/intel-mid.h b/arch/x86/include/asm/intel-mid.h
index cc79a4f..beb7a5f 100644
--- a/arch/x86/include/asm/intel-mid.h
+++ b/arch/x86/include/asm/intel-mid.h
@@ -13,7 +13,7 @@
 
 #include 
 
-extern int pci_mrst_init(void);
+extern int intel_mid_pci_init(void);
 extern int __init sfi_parse_mrtc(struct sfi_table_header *table);
 extern int sfi_mrtc_num;
 extern struct sfi_rtc_table_entry sfi_mrtc_array[];
@@ -25,33 +25,33 @@ extern struct sfi_rtc_table_entry sfi_mrtc_array[];
  * we treat Medfield/Penwell as a variant of Moorestown. Penwell can be
  * identified via MSRs.
  */
-enum mrst_cpu_type {
+enum intel_mid_cpu_type {
/* 1 was Moorestown */
-   MRST_CPU_CHIP_PENWELL = 2,
+   INTEL_MID_CPU_CHIP_PENWELL = 2,
 };
 
-extern enum mrst_cpu_type __mrst_cpu_chip;
+extern enum intel_mid_cpu_type __intel_mid_cpu_chip;
 
 #ifdef CONFIG_X86_INTEL_MID
 
-static inline enum mrst_cpu_type mrst_identify_cpu(void)
+static inline enum intel_mid_cpu_type intel_mid_identify_cpu(void)
 {
-   return __mrst_cpu_chip;
+   return __intel_mid_cpu_chip;
 }
 
 #else /* !CONFIG_X86_INTEL_MID */
 
-#define mrst_identify_cpu()(0)
+#define intel_mid_identify_cpu()(0)
 
 #endif /* !CONFIG_X86_INTEL_MID */
 
-enum mrst_timer_options {
-   MRST_TIMER_DEFAULT,
-   MRST_TIMER_APBT_ONLY,
-   MRST_TIMER_LAPIC_APBT,
+enum intel_mid_timer_options {
+   INTEL_MID_TIMER_DEFAULT,
+   INTEL_MID_TIMER_APBT_ONLY,
+   INTEL_MID_TIMER_LAPIC_APBT,
 };
 
-extern enum mrst_timer_options mrst_timer_options;
+extern enum intel_mid_timer_options intel_mid_timer_options;
 
 /*
  * Penwell uses spread spectrum clock, so the freq number is not exactly
@@ -76,6 +76,6 @@ extern void intel_scu_devices_destroy(void);
 #define MRST_VRTC_MAP_SZ   (1024)
 /*#define MRST_VRTC_PGOFFSET   (0xc00) */
 
-extern void mrst_rtc_init(void);
+extern void intel_mid_rtc_init(void);
 
 #endif /* _ASM_X86_INTEL_MID_H */
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 3475554..59bcf4e 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -51,9 +51,9 @@ extern void i386_reserve_resources(void);
 extern void setup_default_timer_irq(void);
 
 #ifdef CONFIG_X86_INTEL_MID
-extern void x86_mrst_early_setup(void);
+extern void x86_intel_mid_early_setup(void);
 #else
-static inline void x86_mrst_early_setup(void) { }
+static inline void x86_intel_mid_early_setup(void) { }
 #endif
 
 #ifdef CONFIG_X86_INTEL_CE
diff --git 

[PATCH v5 04/12] intel_mid: Renamed *mrst* to *intel_mid*

2013-10-15 Thread David Cohen
From: Kuppuswamy Sathyanarayanan 

Following files contains code that is common to all intel mid
soc's. So renamed them as below.

mrst/mrst.c  -> intel-mid/intel-mid.c
mrst/vrtc.c  -> intel-mid/intel_mid_vrtc.c
mrst/early_printk_mrst.c -> intel-mid/intel_mid_vrtc.c
pci/mrst.c   -> pci/intel_mid_pci.c

Also, renamed the corresponding header files and made changes
to the driver files that included these header files.

To ensure that there are no functional changes, I have compared
the objdump of renamed files before and after rename and found
that the only difference is file name change.

Signed-off-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: David Cohen 
---
 arch/x86/include/asm/{mrst.h => intel-mid.h}  |  8 
 arch/x86/include/asm/{mrst-vrtc.h => intel_mid_vrtc.h}|  4 ++--
 arch/x86/kernel/apb_timer.c   |  2 +-
 arch/x86/kernel/early_printk.c|  2 +-
 arch/x86/kernel/rtc.c |  2 +-
 arch/x86/pci/Makefile |  2 +-
 arch/x86/pci/{mrst.c => intel_mid_pci.c}  |  2 +-
 arch/x86/platform/Makefile|  2 +-
 arch/x86/platform/intel-mid/Makefile  |  3 +++
 .../early_printk_intel_mid.c} |  4 ++--
 arch/x86/platform/{mrst/mrst.c => intel-mid/intel-mid.c}  | 11 ++-
 arch/x86/platform/{mrst/vrtc.c => intel-mid/intel_mid_vrtc.c} |  6 +++---
 arch/x86/platform/mrst/Makefile   |  3 ---
 drivers/gpu/drm/gma500/mdfld_dsi_output.h |  2 +-
 drivers/gpu/drm/gma500/oaktrail_device.c  |  2 +-
 drivers/gpu/drm/gma500/oaktrail_lvds.c|  2 +-
 drivers/platform/x86/intel_scu_ipc.c  |  2 +-
 drivers/rtc/rtc-mrst.c|  4 ++--
 drivers/watchdog/intel_scu_watchdog.c |  2 +-
 19 files changed, 33 insertions(+), 32 deletions(-)
 rename arch/x86/include/asm/{mrst.h => intel-mid.h} (93%)
 rename arch/x86/include/asm/{mrst-vrtc.h => intel_mid_vrtc.h} (81%)
 rename arch/x86/pci/{mrst.c => intel_mid_pci.c} (99%)
 create mode 100644 arch/x86/platform/intel-mid/Makefile
 rename arch/x86/platform/{mrst/early_printk_mrst.c => 
intel-mid/early_printk_intel_mid.c} (98%)
 rename arch/x86/platform/{mrst/mrst.c => intel-mid/intel-mid.c} (99%)
 rename arch/x86/platform/{mrst/vrtc.c => intel-mid/intel_mid_vrtc.c} (97%)
 delete mode 100644 arch/x86/platform/mrst/Makefile

diff --git a/arch/x86/include/asm/mrst.h b/arch/x86/include/asm/intel-mid.h
similarity index 93%
rename from arch/x86/include/asm/mrst.h
rename to arch/x86/include/asm/intel-mid.h
index fc18bf3..cc79a4f 100644
--- a/arch/x86/include/asm/mrst.h
+++ b/arch/x86/include/asm/intel-mid.h
@@ -1,5 +1,5 @@
 /*
- * mrst.h: Intel Moorestown platform specific setup code
+ * intel-mid.h: Intel MID specific setup code
  *
  * (C) Copyright 2009 Intel Corporation
  *
@@ -8,8 +8,8 @@
  * as published by the Free Software Foundation; version 2
  * of the License.
  */
-#ifndef _ASM_X86_MRST_H
-#define _ASM_X86_MRST_H
+#ifndef _ASM_X86_INTEL_MID_H
+#define _ASM_X86_INTEL_MID_H
 
 #include 
 
@@ -78,4 +78,4 @@ extern void intel_scu_devices_destroy(void);
 
 extern void mrst_rtc_init(void);
 
-#endif /* _ASM_X86_MRST_H */
+#endif /* _ASM_X86_INTEL_MID_H */
diff --git a/arch/x86/include/asm/mrst-vrtc.h 
b/arch/x86/include/asm/intel_mid_vrtc.h
similarity index 81%
rename from arch/x86/include/asm/mrst-vrtc.h
rename to arch/x86/include/asm/intel_mid_vrtc.h
index 1e69a75..86ff468 100644
--- a/arch/x86/include/asm/mrst-vrtc.h
+++ b/arch/x86/include/asm/intel_mid_vrtc.h
@@ -1,5 +1,5 @@
-#ifndef _MRST_VRTC_H
-#define _MRST_VRTC_H
+#ifndef _INTEL_MID_VRTC_H
+#define _INTEL_MID_VRTC_H
 
 extern unsigned char vrtc_cmos_read(unsigned char reg);
 extern void vrtc_cmos_write(unsigned char val, unsigned char reg);
diff --git a/arch/x86/kernel/apb_timer.c b/arch/x86/kernel/apb_timer.c
index c9876ef..9154836 100644
--- a/arch/x86/kernel/apb_timer.c
+++ b/arch/x86/kernel/apb_timer.c
@@ -40,7 +40,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 #define APBT_CLOCKEVENT_RATING 110
diff --git a/arch/x86/kernel/early_printk.c b/arch/x86/kernel/early_printk.c
index d15f575..38ca398 100644
--- a/arch/x86/kernel/early_printk.c
+++ b/arch/x86/kernel/early_printk.c
@@ -14,7 +14,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 
diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c
index 0aa2939..a1b52fe 100644
--- a/arch/x86/kernel/rtc.c
+++ b/arch/x86/kernel/rtc.c
@@ -12,7 +12,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/pci/Makefile b/arch/x86/pci/Makefile
index ee0af58..e063eed 100644
--- 

[PATCH v5 08/12] intel_mid: Added custom handler for ipc devices

2013-10-15 Thread David Cohen
From: Kuppuswamy Sathyanarayanan 

Added a custom handler for medfield based ipc devices and
moved devs_id structure defintion to header file.

Signed-off-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: David Cohen 
---
 arch/x86/include/asm/intel-mid.h| 15 ++
 arch/x86/platform/intel-mid/intel-mid.c | 82 -
 2 files changed, 66 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/intel-mid.h b/arch/x86/include/asm/intel-mid.h
index beb7a5f..ad236ae 100644
--- a/arch/x86/include/asm/intel-mid.h
+++ b/arch/x86/include/asm/intel-mid.h
@@ -19,6 +19,21 @@ extern int sfi_mrtc_num;
 extern struct sfi_rtc_table_entry sfi_mrtc_array[];
 
 /*
+ * Here defines the array of devices platform data that IAFW would export
+ * through SFI "DEVS" table, we use name and type to match the device and
+ * its platform data.
+ */
+struct devs_id {
+   char name[SFI_NAME_LEN + 1];
+   u8 type;
+   u8 delay;
+   void *(*get_platform_data)(void *info);
+   /* Custom handler for devices */
+   void (*device_handler)(struct sfi_device_table_entry *pentry,
+   struct devs_id *dev);
+};
+
+/*
  * Medfield is the follow-up of Moorestown, it combines two chip solution into
  * one. Other than that it also added always-on and constant tsc and lapic
  * timers. Medfield is the platform name, and the chip name is called Penwell
diff --git a/arch/x86/platform/intel-mid/intel-mid.c 
b/arch/x86/platform/intel-mid/intel-mid.c
index 7bfd784..40a3ff8 100644
--- a/arch/x86/platform/intel-mid/intel-mid.c
+++ b/arch/x86/platform/intel-mid/intel-mid.c
@@ -78,6 +78,8 @@ int sfi_mtimer_num;
 struct sfi_rtc_table_entry sfi_mrtc_array[SFI_MRTC_MAX];
 EXPORT_SYMBOL_GPL(sfi_mrtc_array);
 int sfi_mrtc_num;
+static void __init ipc_device_handler(struct sfi_device_table_entry *pentry,
+   struct devs_id *dev);
 
 static void intel_mid_power_off(void)
 {
@@ -386,21 +388,6 @@ static int get_gpio_by_name(const char *name)
return -1;
 }
 
-/*
- * Here defines the array of devices platform data that IAFW would export
- * through SFI "DEVS" table, we use name and type to match the device and
- * its platform data.
- */
-struct devs_id {
-   char name[SFI_NAME_LEN + 1];
-   u8 type;
-   u8 delay;
-   void *(*get_platform_data)(void *info);
-   /* Custom handler for devices */
-   void (*device_handler)(struct sfi_device_table_entry *pentry,
-   struct devs_id *dev);
-};
-
 /* the offset for the mapping of global gpio pin to irq */
 #define INTEL_MID_IRQ_OFFSET 0x100
 
@@ -695,24 +682,24 @@ static void *tc35876x_platform_data(void *data)
 static const struct devs_id __initconst device_ids[] = {
{"bma023", SFI_DEV_TYPE_I2C, 1, _platform_data, NULL},
{"pmic_gpio", SFI_DEV_TYPE_SPI, 1, _gpio_platform_data, NULL},
-   {"pmic_gpio", SFI_DEV_TYPE_IPC, 1, _gpio_platform_data, NULL},
+   {"pmic_gpio", SFI_DEV_TYPE_IPC, 1, _gpio_platform_data, 
_device_handler},
{"spi_max3111", SFI_DEV_TYPE_SPI, 0, _platform_data, NULL},
{"i2c_max7315", SFI_DEV_TYPE_I2C, 1, _platform_data, NULL},
{"i2c_max7315_2", SFI_DEV_TYPE_I2C, 1, _platform_data, NULL},
{"tca6416", SFI_DEV_TYPE_I2C, 1, _platform_data, NULL},
{"emc1403", SFI_DEV_TYPE_I2C, 1, _platform_data, NULL},
{"i2c_accel", SFI_DEV_TYPE_I2C, 0, _platform_data, NULL},
-   {"pmic_audio", SFI_DEV_TYPE_IPC, 1, _platform_data, NULL},
+   {"pmic_audio", SFI_DEV_TYPE_IPC, 1, _platform_data, 
_device_handler},
{"mpu3050", SFI_DEV_TYPE_I2C, 1, _platform_data, NULL},
{"i2c_disp_brig", SFI_DEV_TYPE_I2C, 0, _platform_data, NULL},
 
/* MSIC subdevices */
-   {"msic_battery", SFI_DEV_TYPE_IPC, 1, _battery_platform_data, 
NULL},
-   {"msic_gpio", SFI_DEV_TYPE_IPC, 1, _gpio_platform_data, NULL},
-   {"msic_audio", SFI_DEV_TYPE_IPC, 1, _audio_platform_data, NULL},
-   {"msic_power_btn", SFI_DEV_TYPE_IPC, 1, _power_btn_platform_data, 
NULL},
-   {"msic_ocd", SFI_DEV_TYPE_IPC, 1, _ocd_platform_data, NULL},
-   {"msic_thermal", SFI_DEV_TYPE_IPC, 1, _thermal_platform_data, 
NULL},
+   {"msic_battery", SFI_DEV_TYPE_IPC, 1, _battery_platform_data, 
_device_handler},
+   {"msic_gpio", SFI_DEV_TYPE_IPC, 1, _gpio_platform_data, 
_device_handler},
+   {"msic_audio", SFI_DEV_TYPE_IPC, 1, _audio_platform_data, 
_device_handler},
+   {"msic_power_btn", SFI_DEV_TYPE_IPC, 1, _power_btn_platform_data, 
_device_handler},
+   {"msic_ocd", SFI_DEV_TYPE_IPC, 1, _ocd_platform_data, 
_device_handler},
+   {"msic_thermal", SFI_DEV_TYPE_IPC, 1, _thermal_platform_data, 
_device_handler},
{ 0 }
 };
 
@@ -843,13 +830,6 @@ static void __init sfi_handle_ipc_dev(struct 
sfi_device_table_entry *pentry,
pentry->name, pentry->irq);
pdata = dev->get_platform_data(pentry);
 
-   /*
-* On Medfield the 

[PATCH v5 10/12] intel-mid: sfi: allow struct devs_id.get_platform_data to be NULL

2013-10-15 Thread David Cohen
Intel mid sfi code doesn't need struct devs_id.get_platform_data != NULL.
If the callback is not set, just assume there is no platform_data.

Signed-off-by: David Cohen 
Cc: Kuppuswamy Sathyanarayanan 
---
 arch/x86/platform/intel-mid/sfi.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/platform/intel-mid/sfi.c 
b/arch/x86/platform/intel-mid/sfi.c
index 2f8196d..3f1c171 100644
--- a/arch/x86/platform/intel-mid/sfi.c
+++ b/arch/x86/platform/intel-mid/sfi.c
@@ -70,6 +70,9 @@ struct blocking_notifier_head intel_scu_notifier =
BLOCKING_NOTIFIER_INIT(intel_scu_notifier);
 EXPORT_SYMBOL_GPL(intel_scu_notifier);
 
+#define intel_mid_sfi_get_pdata(dev, priv) \
+   ((dev)->get_platform_data ? (dev)->get_platform_data(priv) : NULL)
+
 /* parse all the mtimer info to a static mtimer array */
 int __init sfi_parse_mtmr(struct sfi_table_header *table)
 {
@@ -334,7 +337,7 @@ static void __init sfi_handle_ipc_dev(struct 
sfi_device_table_entry *pentry,
 
pr_debug("IPC bus, name = %16.16s, irq = 0x%2x\n",
pentry->name, pentry->irq);
-   pdata = dev->get_platform_data(pentry);
+   pdata = intel_mid_sfi_get_pdata(dev, pentry);
 
pdev = platform_device_alloc(pentry->name, 0);
if (pdev == NULL) {
@@ -367,7 +370,7 @@ static void __init sfi_handle_spi_dev(struct 
sfi_device_table_entry *pentry,
spi_info.max_speed_hz,
spi_info.chip_select);
 
-   pdata = dev->get_platform_data(_info);
+   pdata = intel_mid_sfi_get_pdata(dev, _info);
 
spi_info.platform_data = pdata;
if (dev->delay)
@@ -391,7 +394,7 @@ static void __init sfi_handle_i2c_dev(struct 
sfi_device_table_entry *pentry,
i2c_info.type,
i2c_info.irq,
i2c_info.addr);
-   pdata = dev->get_platform_data(_info);
+   pdata = intel_mid_sfi_get_pdata(dev, _info);
i2c_info.platform_data = pdata;
 
if (dev->delay)
@@ -450,7 +453,7 @@ static int __init sfi_parse_devs(struct sfi_table_header 
*table)
 
dev = get_device_id(pentry->type, pentry->name);
 
-   if ((dev == NULL) || (dev->get_platform_data == NULL))
+   if (!dev)
continue;
 
if (dev->device_handler) {
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 11/12] x86: intel-mid: add section for sfi device table

2013-10-15 Thread David Cohen
When Intel mid uses SFI table to enumerate devices, it requires an extra
device table with further information about how to probe such devices.

This patch creates a section where the device table will stay if
CONFIG_X86_INTEL_MID is selected.

Signed-off-by: David Cohen 
---
 arch/x86/kernel/vmlinux.lds.S | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 10c4f30..da6b35a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -199,6 +199,15 @@ SECTIONS
__x86_cpu_dev_end = .;
}
 
+#ifdef CONFIG_X86_INTEL_MID
+   .x86_intel_mid_dev.init : AT(ADDR(.x86_intel_mid_dev.init) - \
+   LOAD_OFFSET) {
+   __x86_intel_mid_dev_start = .;
+   *(.x86_intel_mid_dev.init)
+   __x86_intel_mid_dev_end = .;
+   }
+#endif
+
/*
 * start address and size of operations which during runtime
 * can be patched with virtualization friendly instructions or
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 09/12] intel_mid: Moved SFI related code to sfi.c

2013-10-15 Thread David Cohen
From: Kuppuswamy Sathyanarayanan 

Moved SFI specific parsing/handling code to sfi.c. This will enable us
to reuse our intel-mid code for platforms that supports firmware
interfaces other than SFI (like ACPI).

Signed-off-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: David Cohen 
---
 arch/x86/include/asm/intel-mid.h|   1 +
 arch/x86/platform/intel-mid/Makefile|   2 +
 arch/x86/platform/intel-mid/intel-mid.c | 451 +
 arch/x86/platform/intel-mid/sfi.c   | 485 
 4 files changed, 489 insertions(+), 450 deletions(-)
 create mode 100644 arch/x86/platform/intel-mid/sfi.c

diff --git a/arch/x86/include/asm/intel-mid.h b/arch/x86/include/asm/intel-mid.h
index ad236ae..3b0e7a7 100644
--- a/arch/x86/include/asm/intel-mid.h
+++ b/arch/x86/include/asm/intel-mid.h
@@ -15,6 +15,7 @@
 
 extern int intel_mid_pci_init(void);
 extern int __init sfi_parse_mrtc(struct sfi_table_header *table);
+extern int __init sfi_parse_mtmr(struct sfi_table_header *table);
 extern int sfi_mrtc_num;
 extern struct sfi_rtc_table_entry sfi_mrtc_array[];
 
diff --git a/arch/x86/platform/intel-mid/Makefile 
b/arch/x86/platform/intel-mid/Makefile
index de29635..b11e5b2 100644
--- a/arch/x86/platform/intel-mid/Makefile
+++ b/arch/x86/platform/intel-mid/Makefile
@@ -1,3 +1,5 @@
 obj-$(CONFIG_X86_INTEL_MID) += intel-mid.o
 obj-$(CONFIG_X86_INTEL_MID)+= intel_mid_vrtc.o
 obj-$(CONFIG_EARLY_PRINTK_INTEL_MID)   += early_printk_intel_mid.o
+# SFI specific code
+obj-$(CONFIG_SFI) += sfi.o
diff --git a/arch/x86/platform/intel-mid/intel-mid.c 
b/arch/x86/platform/intel-mid/intel-mid.c
index 40a3ff8..4091569 100644
--- a/arch/x86/platform/intel-mid/intel-mid.c
+++ b/arch/x86/platform/intel-mid/intel-mid.c
@@ -18,19 +18,9 @@
 #include 
 #include 
 #include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
 #include 
 #include 
 #include 
-#include 
-#include 
-#include 
 
 #include 
 #include 
@@ -68,19 +58,11 @@
 
 enum intel_mid_timer_options intel_mid_timer_options;
 
-static u32 sfi_mtimer_usage[SFI_MTMR_MAX_NUM];
-static struct sfi_timer_table_entry sfi_mtimer_array[SFI_MTMR_MAX_NUM];
 enum intel_mid_cpu_type __intel_mid_cpu_chip;
 EXPORT_SYMBOL_GPL(__intel_mid_cpu_chip);
 
-int sfi_mtimer_num;
-
-struct sfi_rtc_table_entry sfi_mrtc_array[SFI_MRTC_MAX];
-EXPORT_SYMBOL_GPL(sfi_mrtc_array);
-int sfi_mrtc_num;
 static void __init ipc_device_handler(struct sfi_device_table_entry *pentry,
struct devs_id *dev);
-
 static void intel_mid_power_off(void)
 {
 }
@@ -90,114 +72,6 @@ static void intel_mid_reboot(void)
intel_scu_ipc_simple_command(IPCMSG_COLD_BOOT, 0);
 }
 
-/* parse all the mtimer info to a static mtimer array */
-static int __init sfi_parse_mtmr(struct sfi_table_header *table)
-{
-   struct sfi_table_simple *sb;
-   struct sfi_timer_table_entry *pentry;
-   struct mpc_intsrc mp_irq;
-   int totallen;
-
-   sb = (struct sfi_table_simple *)table;
-   if (!sfi_mtimer_num) {
-   sfi_mtimer_num = SFI_GET_NUM_ENTRIES(sb,
-   struct sfi_timer_table_entry);
-   pentry = (struct sfi_timer_table_entry *) sb->pentry;
-   totallen = sfi_mtimer_num * sizeof(*pentry);
-   memcpy(sfi_mtimer_array, pentry, totallen);
-   }
-
-   pr_debug("SFI MTIMER info (num = %d):\n", sfi_mtimer_num);
-   pentry = sfi_mtimer_array;
-   for (totallen = 0; totallen < sfi_mtimer_num; totallen++, pentry++) {
-   pr_debug("timer[%d]: paddr = 0x%08x, freq = %dHz,"
-   " irq = %d\n", totallen, (u32)pentry->phys_addr,
-   pentry->freq_hz, pentry->irq);
-   if (!pentry->irq)
-   continue;
-   mp_irq.type = MP_INTSRC;
-   mp_irq.irqtype = mp_INT;
-/* triggering mode edge bit 2-3, active high polarity bit 0-1 */
-   mp_irq.irqflag = 5;
-   mp_irq.srcbus = MP_BUS_ISA;
-   mp_irq.srcbusirq = pentry->irq; /* IRQ */
-   mp_irq.dstapic = MP_APIC_ALL;
-   mp_irq.dstirq = pentry->irq;
-   mp_save_irq(_irq);
-   }
-
-   return 0;
-}
-
-struct sfi_timer_table_entry *sfi_get_mtmr(int hint)
-{
-   int i;
-   if (hint < sfi_mtimer_num) {
-   if (!sfi_mtimer_usage[hint]) {
-   pr_debug("hint taken for timer %d irq %d\n",
-   hint, sfi_mtimer_array[hint].irq);
-   sfi_mtimer_usage[hint] = 1;
-   return _mtimer_array[hint];
-   }
-   }
-   /* take the first timer available */
-   for (i = 0; i < sfi_mtimer_num;) {
-   if (!sfi_mtimer_usage[i]) {
-   sfi_mtimer_usage[i] = 1;
-   return 

[PATCH v5 12/12] intel_mid: Moved board related code to a new file

2013-10-15 Thread David Cohen
As Intel rolling out more SoC's after Moorestown, we need to
re-structure the code in a way that is backward compatible and easy to
expand. This patch implements a flexible way to support multiple boards
and devices.

This patch does not add any new functional support. It just refactors
the existing code to increase the modularity and decrease the code
duplication for supporting multiple soc's and boards.

Currently intel-mid.c has both board and soc related code in one file.
This patch moves the board related code to new files in the following
order.

1. Moved the device specific code to arch/x86/platform/intel-mid/device-libs/
   platform_.*. A new device file is added for every supported
   device. This code will get conditionally compiled by using
   corresponding device driver CONFIG option.

2. Moved the device_ids location to .x86_intel_mid_dev.init section by
   using new intel_mid_sfi_dev() macro.

This patch was based on previous code from Sathyanarayanan Kuppuswamy.

Signed-off-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: David Cohen 
---
 arch/x86/include/asm/intel-mid.h   |  16 +
 arch/x86/platform/intel-mid/Makefile   |   3 +
 arch/x86/platform/intel-mid/device_libs/Makefile   |  22 ++
 .../intel-mid/device_libs/platform_emc1403.c   |  41 ++
 .../intel-mid/device_libs/platform_gpio_keys.c |  83 
 .../platform/intel-mid/device_libs/platform_ipc.c  |  68 
 .../platform/intel-mid/device_libs/platform_ipc.h  |  17 +
 .../intel-mid/device_libs/platform_lis331.c|  39 ++
 .../intel-mid/device_libs/platform_max3111.c   |  35 ++
 .../intel-mid/device_libs/platform_max7315.c   |  79 
 .../intel-mid/device_libs/platform_mpu3050.c   |  36 ++
 .../platform/intel-mid/device_libs/platform_msic.c |  87 +
 .../platform/intel-mid/device_libs/platform_msic.h |  19 +
 .../intel-mid/device_libs/platform_msic_audio.c|  47 +++
 .../intel-mid/device_libs/platform_msic_battery.c  |  37 ++
 .../intel-mid/device_libs/platform_msic_gpio.c |  48 +++
 .../intel-mid/device_libs/platform_msic_ocd.c  |  49 +++
 .../device_libs/platform_msic_power_btn.c  |  36 ++
 .../intel-mid/device_libs/platform_msic_thermal.c  |  37 ++
 .../intel-mid/device_libs/platform_pmic_gpio.c |  54 +++
 .../intel-mid/device_libs/platform_tc35876x.c  |  36 ++
 .../intel-mid/device_libs/platform_tca6416.c   |  57 +++
 arch/x86/platform/intel-mid/intel-mid.c| 419 -
 arch/x86/platform/intel-mid/sfi.c  |  13 +-
 24 files changed, 952 insertions(+), 426 deletions(-)
 create mode 100644 arch/x86/platform/intel-mid/device_libs/Makefile
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_emc1403.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_gpio_keys.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_ipc.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_ipc.h
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_lis331.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_max3111.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_max7315.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_mpu3050.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_msic.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_msic.h
 create mode 100644 
arch/x86/platform/intel-mid/device_libs/platform_msic_audio.c
 create mode 100644 
arch/x86/platform/intel-mid/device_libs/platform_msic_battery.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_msic_gpio.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_msic_ocd.c
 create mode 100644 
arch/x86/platform/intel-mid/device_libs/platform_msic_power_btn.c
 create mode 100644 
arch/x86/platform/intel-mid/device_libs/platform_msic_thermal.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_pmic_gpio.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_tc35876x.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_tca6416.c

diff --git a/arch/x86/include/asm/intel-mid.h b/arch/x86/include/asm/intel-mid.h
index 3b0e7a7..553d2e1 100644
--- a/arch/x86/include/asm/intel-mid.h
+++ b/arch/x86/include/asm/intel-mid.h
@@ -12,8 +12,11 @@
 #define _ASM_X86_INTEL_MID_H
 
 #include 
+#include 
 
 extern int intel_mid_pci_init(void);
+extern int get_gpio_by_name(const char *name);
+extern void intel_scu_device_register(struct platform_device *pdev);
 extern int __init sfi_parse_mrtc(struct sfi_table_header *table);
 extern int __init sfi_parse_mtmr(struct sfi_table_header *table);
 extern int sfi_mrtc_num;
@@ -34,6 +37,10 @@ struct devs_id {
struct devs_id *dev);
 };
 
+#define sfi_device(i)   \
+   static const struct devs_id *__intel_mid_sfi_##i##_dev __used \
+  

[PATCH v5 00/12] rework arch/x86/platform/[mrst => intel-mid]

2013-10-15 Thread David Cohen
This patch set does initial rework from arch/x86/platform/mrst to
arch/x86/platform/intel-mid.
These changes are necessary to update the obsolete Intel Atom Moorestown code
to support the newer Atom processors of this family (called 'intel-mid'). 

David Cohen (3):
  intel-mid: sfi: allow struct devs_id.get_platform_data to be NULL
  x86: intel-mid: add section for sfi device table
  intel_mid: Moved board related code to a new file

Kuppuswamy Sathyanarayanan (9):
  mrst: Fixed printk/pr_* related issues
  mrst: Fixed indentation issues
  mrst: Fixed checkpatch warnings
  intel_mid: Renamed *mrst* to *intel_mid*
  intel_mid: Renamed *mrst* to *intel_mid*
  intel_mid: Refactored sfi_parse_devs() function
  intel_mid: Added custom device_handler support
  intel_mid: Added custom handler for ipc devices
  intel_mid: Moved SFI related code to sfi.c

 Documentation/kernel-parameters.txt|6 +-
 arch/x86/include/asm/intel-mid.h   |  113 +++
 .../include/asm/{mrst-vrtc.h => intel_mid_vrtc.h}  |4 +-
 arch/x86/include/asm/mrst.h|   81 --
 arch/x86/include/asm/setup.h   |4 +-
 arch/x86/include/uapi/asm/bootparam.h  |2 +-
 arch/x86/kernel/apb_timer.c|   10 +-
 arch/x86/kernel/early_printk.c |2 +-
 arch/x86/kernel/head32.c   |4 +-
 arch/x86/kernel/rtc.c  |4 +-
 arch/x86/kernel/vmlinux.lds.S  |9 +
 arch/x86/pci/Makefile  |2 +-
 arch/x86/pci/{mrst.c => intel_mid_pci.c}   |   14 +-
 arch/x86/platform/Makefile |2 +-
 arch/x86/platform/intel-mid/Makefile   |8 +
 arch/x86/platform/intel-mid/device_libs/Makefile   |   22 +
 .../intel-mid/device_libs/platform_emc1403.c   |   41 +
 .../intel-mid/device_libs/platform_gpio_keys.c |   83 ++
 .../platform/intel-mid/device_libs/platform_ipc.c  |   68 ++
 .../platform/intel-mid/device_libs/platform_ipc.h  |   17 +
 .../intel-mid/device_libs/platform_lis331.c|   39 +
 .../intel-mid/device_libs/platform_max3111.c   |   35 +
 .../intel-mid/device_libs/platform_max7315.c   |   79 ++
 .../intel-mid/device_libs/platform_mpu3050.c   |   36 +
 .../platform/intel-mid/device_libs/platform_msic.c |   87 ++
 .../platform/intel-mid/device_libs/platform_msic.h |   19 +
 .../intel-mid/device_libs/platform_msic_audio.c|   47 +
 .../intel-mid/device_libs/platform_msic_battery.c  |   37 +
 .../intel-mid/device_libs/platform_msic_gpio.c |   48 +
 .../intel-mid/device_libs/platform_msic_ocd.c  |   49 +
 .../device_libs/platform_msic_power_btn.c  |   36 +
 .../intel-mid/device_libs/platform_msic_thermal.c  |   37 +
 .../intel-mid/device_libs/platform_pmic_gpio.c |   54 +
 .../intel-mid/device_libs/platform_tc35876x.c  |   36 +
 .../intel-mid/device_libs/platform_tca6416.c   |   57 ++
 .../early_printk_intel_mid.c}  |   11 +-
 arch/x86/platform/intel-mid/intel-mid.c|  213 
 .../{mrst/vrtc.c => intel-mid/intel_mid_vrtc.c}|   19 +-
 arch/x86/platform/intel-mid/sfi.c  |  487 +
 arch/x86/platform/mrst/Makefile|3 -
 arch/x86/platform/mrst/mrst.c  | 1052 
 drivers/gpu/drm/gma500/mdfld_dsi_output.h  |2 +-
 drivers/gpu/drm/gma500/oaktrail_device.c   |2 +-
 drivers/gpu/drm/gma500/oaktrail_lvds.c |2 +-
 drivers/platform/x86/intel_scu_ipc.c   |4 +-
 drivers/rtc/rtc-mrst.c |4 +-
 drivers/watchdog/intel_scu_watchdog.c  |4 +-
 47 files changed, 1808 insertions(+), 1187 deletions(-)
 create mode 100644 arch/x86/include/asm/intel-mid.h
 rename arch/x86/include/asm/{mrst-vrtc.h => intel_mid_vrtc.h} (81%)
 delete mode 100644 arch/x86/include/asm/mrst.h
 rename arch/x86/pci/{mrst.c => intel_mid_pci.c} (96%)
 create mode 100644 arch/x86/platform/intel-mid/Makefile
 create mode 100644 arch/x86/platform/intel-mid/device_libs/Makefile
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_emc1403.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_gpio_keys.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_ipc.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_ipc.h
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_lis331.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_max3111.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_max7315.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_mpu3050.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_msic.c
 create mode 100644 arch/x86/platform/intel-mid/device_libs/platform_msic.h
 create mode 100644 

Re: [PATCH v4 12/12] intel_mid: Moved board related code to a new file

2013-10-15 Thread David Cohen

On 10/15/2013 05:45 PM, H. Peter Anvin wrote:

On 10/15/2013 04:53 PM, David Cohen wrote:

On 10/15/2013 04:44 PM, H. Peter Anvin wrote:

On 10/15/2013 04:42 PM, David Cohen wrote:


+#define intel_mid_sfi_dev(i)   \
+static const struct devs_id *__intel_mid_sfi_##i##_dev __used \
+__attribute__((__section__(".x86_intel_mid_dev.init"))) = 
+


Any reason to not just call this "sfi_device()" or something similar?
"Intel MID SFI" seems a bit redundant...


I had the same though. But struct devs_id is defined by asm/intel-mid.h.
This function is not meant to be used by any other user beside
intel-mid.
But I can change if you prefer.



Hm, I guess it doesn't really matter.  After all, no other devices will
probably ever see SFI (we hope).


I hope too :)
Let me send a new version with this change.

Br, David Cohen

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] md: Fix skipping recovery for read-only arrays.

2013-10-15 Thread NeilBrown
On Mon, 07 Oct 2013 16:25:51 +0200 Lukasz Dorau 
wrote:

> Since:
> commit 7ceb17e87bde79d285a8b988cfed9eaeebe60b86
> md: Allow devices to be re-added to a read-only array.
> 
> spares are activated on a read-only array. In case of raid1 and raid10
> personalities it causes that not-in-sync devices are marked in-sync
> without checking if recovery has been finished.
> 
> If a read-only array is degraded and one of its devices is not in-sync
> (because the array has been only partially recovered) recovery will be 
> skipped.
> 
> This patch adds checking if recovery has been finished before marking a device
> in-sync for raid1 and raid10 personalities. In case of raid5 personality
> such condition is already present (at raid5.c:6029).
> 
> Bug was introduced in 3.10 and causes data corruption.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Pawel Baldysiak 
> Signed-off-by: Lukasz Dorau 
> ---
>  drivers/md/raid1.c  |1 +
>  drivers/md/raid10.c |1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index d60412c..aacf6bf 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1479,6 +1479,7 @@ static int raid1_spare_active(struct mddev *mddev)
>   }
>   }
>   if (rdev
> + && rdev->recovery_offset == MaxSector
>   && !test_bit(Faulty, >flags)
>   && !test_and_set_bit(In_sync, >flags)) {
>   count++;
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index df7b0a0..73dc8a3 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1782,6 +1782,7 @@ static int raid10_spare_active(struct mddev *mddev)
>   }
>   sysfs_notify_dirent_safe(tmp->replacement->sysfs_state);
>   } else if (tmp->rdev
> +&& tmp->rdev->recovery_offset == MaxSector
>  && !test_bit(Faulty, >rdev->flags)
>  && !test_and_set_bit(In_sync, >rdev->flags)) {
>   count++;

Applied - thanks.

I'll forward it to Linus and -stable shortly.

NeilBrown


signature.asc
Description: PGP signature


Re: [PATCH] random.c: fix a typo in comments

2013-10-15 Thread Theodore Ts'o
On Mon, Oct 07, 2013 at 04:38:49PM +0200, Stefan Beller wrote:
> Also removes (trailing) tabs from an empty line.
> 
> Signed-off-by: Stefan Beller 

Thanks for sending the patch, but the comments in this section have
since been reworded in random.git, so this patch is no longer
applicable to the random.git tree which will be pushed upstream at the
next merge window.

Cheers,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 3/3] arm64: reuse FPSIMD hardware context if possible

2013-10-15 Thread Jiang Liu
From: Jiang Liu 

Reuse FPSIMD hardware context if it hasn't been touched by other thread
yet, so we can get rid of unnecessary FPSIMD context restores. This is
especially useful when switching between kernel thread and user thread
because kernel thread usaually doesn't touch FPSIMD registers.

Signed-off-by: Jiang Liu 
Cc: Jiang Liu 
---
 arch/arm64/include/asm/fpsimd.h |  2 ++
 arch/arm64/kernel/fpsimd.c  | 35 +--
 arch/arm64/kernel/smp.c |  1 +
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 142084f..4356d6e 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -35,6 +35,7 @@ struct fpsimd_state {
__uint128_t vregs[32];
u32 fpsr;
u32 fpcr;
+   int last_cpu;
};
};
 };
@@ -56,6 +57,7 @@ struct task_struct;
 
 extern void fpsimd_save_state(struct fpsimd_state *state);
 extern void fpsimd_load_state(struct fpsimd_state *state);
+extern void fpsimd_reset_lazy_restore(void);
 
 extern void fpsimd_dup_task_struct(struct task_struct *dst,
   struct task_struct *src);
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index f43dd58..5e37d86 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -33,6 +34,13 @@
 #define FPEXC_IXF  (1 << 4)
 #define FPEXC_IDF  (1 << 7)
 
+static DEFINE_PER_CPU(struct fpsimd_state *, fpsimd_owner);
+
+static inline void fpsimd_set_last_cpu(struct fpsimd_state *state, int cpu)
+{
+   state->last_cpu = cpu;
+}
+
 static inline void fpsimd_init_hw_state(void)
 {
int val = AARCH64_FPCR_DEFAULT_VAL;
@@ -84,19 +92,41 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
send_sig_info(SIGFPE, , current);
 }
 
+static void fpsimd_load_state_lazy(struct fpsimd_state *state)
+{
+   /* Could we reuse the hardware context? */
+   if (state->last_cpu == smp_processor_id() &&
+   __this_cpu_read(fpsimd_owner) == state)
+   return;
+   fpsimd_load_state(state);
+}
+
+static void fpsimd_save_state_lazy(struct fpsimd_state *state)
+{
+   fpsimd_save_state(state);
+   fpsimd_set_last_cpu(state, smp_processor_id());
+   __this_cpu_write(fpsimd_owner, state);
+}
+
+void fpsimd_reset_lazy_restore(void)
+{
+   this_cpu_write(fpsimd_owner, NULL);
+}
+
 void fpsimd_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 {
fpsimd_save_state(>thread.fpsimd_state);
*dst = *src;
+   fpsimd_set_last_cpu(>thread.fpsimd_state, -1);
 }
 
 void fpsimd_thread_switch(struct task_struct *next)
 {
/* check if not kernel threads */
if (current->mm)
-   fpsimd_save_state(>thread.fpsimd_state);
+   fpsimd_save_state_lazy(>thread.fpsimd_state);
if (next->mm)
-   fpsimd_load_state(>thread.fpsimd_state);
+   fpsimd_load_state_lazy(>thread.fpsimd_state);
 }
 
 void fpsimd_flush_thread(void)
@@ -107,6 +137,7 @@ void fpsimd_flush_thread(void)
memset(state, 0, sizeof(struct fpsimd_state));
if (AARCH64_FPCR_DEFAULT_VAL)
state->fpcr = AARCH64_FPCR_DEFAULT_VAL;
+   fpsimd_set_last_cpu(state, -1);
fpsimd_load_state(state);
preempt_enable();
 }
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 78db90d..aae15c4 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -183,6 +183,7 @@ asmlinkage void secondary_start_kernel(void)
 */
cpu_set_reserved_ttbr0();
flush_tlb_all();
+   fpsimd_reset_lazy_restore();
 
preempt_disable();
trace_hardirqs_off();
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 2/3] arm64: reduce duplicated code when saving/restoring FPSIMD for signal handling

2013-10-15 Thread Jiang Liu
From: Jiang Liu 

Reduce duplicated code when saving/restoring FPSIMD for signal
handling, it also helps to concentrate all FPSIMD hardware related
code into fpsimd.c.

Signed-off-by: Jiang Liu 
Cc: Jiang Liu 
---
 arch/arm64/include/asm/fpsimd.h |  4 
 arch/arm64/kernel/fpsimd.c  | 20 
 arch/arm64/kernel/process.c |  3 +--
 arch/arm64/kernel/signal.c  | 11 +++
 arch/arm64/kernel/signal32.c|  9 +++--
 5 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index b3c12fd..142084f 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -57,10 +57,14 @@ struct task_struct;
 extern void fpsimd_save_state(struct fpsimd_state *state);
 extern void fpsimd_load_state(struct fpsimd_state *state);
 
+extern void fpsimd_dup_task_struct(struct task_struct *dst,
+  struct task_struct *src);
 extern void fpsimd_thread_switch(struct task_struct *next);
 extern void fpsimd_flush_thread(void);
 
 extern void fpsimd_init_sigctx(struct fpsimd_state *state);
+extern void fpsimd_save_sigctx(struct fpsimd_state *state);
+extern void fpsimd_restore_sigctx(struct fpsimd_state *state);
 
 #endif
 
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 9daee2c..f43dd58 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -84,6 +84,12 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
send_sig_info(SIGFPE, , current);
 }
 
+void fpsimd_dup_task_struct(struct task_struct *dst, struct task_struct *src)
+{
+   fpsimd_save_state(>thread.fpsimd_state);
+   *dst = *src;
+}
+
 void fpsimd_thread_switch(struct task_struct *next)
 {
/* check if not kernel threads */
@@ -110,6 +116,20 @@ void fpsimd_init_sigctx(struct fpsimd_state *state)
fpsimd_clear_fpsr();
 }
 
+void fpsimd_save_sigctx(struct fpsimd_state *state)
+{
+   /* dump the hardware registers to the fpsimd_state structure */
+   fpsimd_save_state(state);
+}
+
+void fpsimd_restore_sigctx(struct fpsimd_state *state)
+{
+   /* load the hardware registers from the fpsimd_state structure */
+   preempt_disable();
+   fpsimd_load_state(state);
+   preempt_enable();
+}
+
 #ifdef CONFIG_KERNEL_MODE_NEON
 
 /*
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 7ae8a1f..6796080 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -195,8 +195,7 @@ void release_thread(struct task_struct *dead_task)
 
 int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 {
-   fpsimd_save_state(>thread.fpsimd_state);
-   *dst = *src;
+   fpsimd_dup_task_struct(dst, src);
return 0;
 }
 
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index f2c83e8..596c8cf 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -50,8 +50,7 @@ static int preserve_fpsimd_context(struct fpsimd_context 
__user *ctx)
struct fpsimd_state *fpsimd = >thread.fpsimd_state;
int err;
 
-   /* dump the hardware registers to the fpsimd_state structure */
-   fpsimd_save_state(fpsimd);
+   fpsimd_save_sigctx(fpsimd);
 
/* copy the FP and status/control registers */
err = __copy_to_user(ctx->vregs, fpsimd->vregs, sizeof(fpsimd->vregs));
@@ -85,12 +84,8 @@ static int restore_fpsimd_context(struct fpsimd_context 
__user *ctx)
__get_user_error(fpsimd.fpsr, >fpsr, err);
__get_user_error(fpsimd.fpcr, >fpcr, err);
 
-   /* load the hardware registers from the fpsimd_state structure */
-   if (!err) {
-   preempt_disable();
-   fpsimd_load_state();
-   preempt_enable();
-   }
+   if (!err)
+   fpsimd_restore_sigctx();
 
return err ? -EFAULT : 0;
 }
diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c
index e393174..4ce3768 100644
--- a/arch/arm64/kernel/signal32.c
+++ b/arch/arm64/kernel/signal32.c
@@ -247,7 +247,7 @@ static int compat_preserve_vfp_context(struct 
compat_vfp_sigframe __user *frame)
 * Note that this also saves V16-31, which aren't visible
 * in AArch32.
 */
-   fpsimd_save_state(fpsimd);
+   fpsimd_save_sigctx(fpsimd);
 
/* Place structure header on the stack */
__put_user_error(magic, >magic, err);
@@ -310,11 +310,8 @@ static int compat_restore_vfp_context(struct 
compat_vfp_sigframe __user *frame)
 * We don't need to touch the exception register, so
 * reload the hardware state.
 */
-   if (!err) {
-   preempt_disable();
-   fpsimd_load_state();
-   preempt_enable();
-   }
+   if (!err)
+   fpsimd_restore_sigctx();
 
return err ? -EFAULT : 0;
 }
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe 

[PATCH v3 1/3] arm64: restore FPSIMD to default state for kernel and signal contexts

2013-10-15 Thread Jiang Liu
From: Jiang Liu 

Restore FPSIMD control and status registers to default values
when creating new FPSIMD contexts for kernel context and reset
FPSIMD status register when creating FPSIMD context for signal
handling, otherwise the stale value in FPSIMD control and status
registers may affect the new kernal or signal handling contexts.

Signed-off-by: Jiang Liu 
Cc: Jiang Liu 
---
 arch/arm64/include/asm/fpsimd.h |  4 
 arch/arm64/kernel/fpsimd.c  | 30 --
 arch/arm64/kernel/signal.c  |  1 +
 3 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index c43b4ac..b3c12fd 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -50,6 +50,8 @@ struct fpsimd_state {
 #define VFP_STATE_SIZE ((32 * 8) + 4)
 #endif
 
+#defineAARCH64_FPCR_DEFAULT_VAL0
+
 struct task_struct;
 
 extern void fpsimd_save_state(struct fpsimd_state *state);
@@ -58,6 +60,8 @@ extern void fpsimd_load_state(struct fpsimd_state *state);
 extern void fpsimd_thread_switch(struct task_struct *next);
 extern void fpsimd_flush_thread(void);
 
+extern void fpsimd_init_sigctx(struct fpsimd_state *state);
+
 #endif
 
 #endif
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index bb785d2..9daee2c 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -33,6 +33,21 @@
 #define FPEXC_IXF  (1 << 4)
 #define FPEXC_IDF  (1 << 7)
 
+static inline void fpsimd_init_hw_state(void)
+{
+   int val = AARCH64_FPCR_DEFAULT_VAL;
+
+   asm volatile ("msr fpcr, %x0\n"
+ "msr fpsr, xzr\n"
+ : : "r"(val) : "memory");
+}
+
+static inline void fpsimd_clear_fpsr(void)
+{
+   asm volatile ("msr fpsr, xzr\n"
+ : : : "memory");
+}
+
 /*
  * Trapped FP/ASIMD access.
  */
@@ -80,12 +95,21 @@ void fpsimd_thread_switch(struct task_struct *next)
 
 void fpsimd_flush_thread(void)
 {
+   struct fpsimd_state *state = >thread.fpsimd_state;
+
preempt_disable();
-   memset(>thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
-   fpsimd_load_state(>thread.fpsimd_state);
+   memset(state, 0, sizeof(struct fpsimd_state));
+   if (AARCH64_FPCR_DEFAULT_VAL)
+   state->fpcr = AARCH64_FPCR_DEFAULT_VAL;
+   fpsimd_load_state(state);
preempt_enable();
 }
 
+void fpsimd_init_sigctx(struct fpsimd_state *state)
+{
+   fpsimd_clear_fpsr();
+}
+
 #ifdef CONFIG_KERNEL_MODE_NEON
 
 /*
@@ -99,6 +123,8 @@ void kernel_neon_begin(void)
 
if (current->mm)
fpsimd_save_state(>thread.fpsimd_state);
+
+   fpsimd_init_hw_state();
 }
 EXPORT_SYMBOL(kernel_neon_begin);
 
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 890a591..f2c83e8 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -320,6 +320,7 @@ static void handle_signal(unsigned long sig, struct 
k_sigaction *ka,
 * handler.
 */
user_fastforward_single_step(tsk);
+   fpsimd_init_sigctx(>thread.fpsimd_state);
 
signal_delivered(sig, info, ka, regs, 0);
 }
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 14/18] net: usb: use wrapper functions of net_ratelimit() to simplify code

2013-10-15 Thread Kefeng Wang
Thanks for you reply.
On 10/16 3:06, Sergei Shtylyov wrote:
> Hello.
> 
> On 10/15/2013 03:45 PM, Kefeng Wang wrote:
> 
>> net_ratelimited_function() is called to simplify code.
> 
>> Signed-off-by: Kefeng Wang 
>> ---
>>   drivers/net/usb/usbnet.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
>> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
>> index bf94e10..edf81de 100644
>> --- a/drivers/net/usb/usbnet.c
>> +++ b/drivers/net/usb/usbnet.c
>> @@ -450,8 +450,8 @@ void usbnet_defer_kevent (struct usbnet *dev, int work)
>>   {
>>   set_bit (work, >flags);
>>   if (!schedule_work (>kevent)) {
>> -if (net_ratelimit())
>> -netdev_err(dev->net, "kevent %d may have been dropped\n", work);
>> +net_ratelimited_function(netdev_err, dev->net,
>> +"kevent %d may have been dropped\n", work);
> 
>The continuation line should start under 'netdev_err'. Same about the 
> other patches where you didn't change the indentation of the continuation 
> lines though you should have.

Got it, indentation will be changed.

> WBR, Sergei
> 
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/18] net: use wrapper functions of net_ratelimit() to simplify code

2013-10-15 Thread Kefeng Wang
Thanks for your reply.

On 10/16 0:24, Joe Perches wrote:
> On Tue, 2013-10-15 at 19:44 +0800, Kefeng Wang wrote:
>> Wrapper functions net_ratelimited_function() and net_XXX_ratelimited()
>> are called to simplify code.
> []
>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> []
>> @@ -465,10 +465,8 @@ void br_fdb_update(struct net_bridge *br, struct 
>> net_bridge_port *source,
>>  if (likely(fdb)) {
>>  /* attempt to update an entry for a local interface */
>>  if (unlikely(fdb->is_local)) {
>> -if (net_ratelimit())
>> -br_warn(br, "received packet on %s with "
>> -"own address as source address\n",
>> -source->dev->name);
>> +net_ratelimited_function(br_warn, br, "received packet 
>> on %s "
>> +"with own address as source address\n", 
>> source->dev->name);
> 
> Hello Kefeng.
> 
> When these types of lines are changed, please coalesce the
> fragmented format pieces into a single string.
> 
> It makes grep a bit easier and 80 columns limits don't
> apply to formats.

Got it, I will coalesce them, but 80 columns limits will be
broken.

> I think using net_ratelimited_function is not particularly
> clarifying here.
> 
> Maybe net_ratelimited_function should be removed instead
> of its use sites expanded.
> 
> Perhaps adding macros like #define br_warn_ratelimited()
> would be better.

yes, I found dev_emerg_ratelimited already exists. I should
use them and will add some similar mcaros.

> This comment applies to the whole series.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 7/7] jump_label: use defined macros instead of hard-coding for better readability

2013-10-15 Thread Jiang Liu
From: Jiang Liu 

Use macro JUMP_LABEL_TRUE_BRANCH instead of hard-coding for better
readability.

Signed-off-by: Jiang Liu 
Cc: Jiang Liu 
---
 include/linux/jump_label.h | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index a507907..6e54029 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -74,18 +74,21 @@ struct module;
 #include 
 #ifdef HAVE_JUMP_LABEL
 
-#define JUMP_LABEL_TRUE_BRANCH 1UL
+#define JUMP_LABEL_TYPE_FALSE_BRANCH   0UL
+#define JUMP_LABEL_TYPE_TRUE_BRANCH1UL
+#define JUMP_LABEL_TYPE_MASK   1UL
 
 static
 inline struct jump_entry *jump_label_get_entries(struct static_key *key)
 {
return (struct jump_entry *)((unsigned long)key->entries
-   & ~JUMP_LABEL_TRUE_BRANCH);
+   & ~JUMP_LABEL_TYPE_MASK);
 }
 
 static inline bool jump_label_get_branch_default(struct static_key *key)
 {
-   if ((unsigned long)key->entries & JUMP_LABEL_TRUE_BRANCH)
+   if (((unsigned long)key->entries & JUMP_LABEL_TYPE_MASK) ==
+   JUMP_LABEL_TYPE_TRUE_BRANCH)
return true;
return false;
 }
@@ -116,9 +119,11 @@ extern void static_key_slow_dec(struct static_key *key);
 extern void jump_label_apply_nops(struct module *mod);
 
 #define STATIC_KEY_INIT_TRUE ((struct static_key) \
-   { .enabled = ATOMIC_INIT(1), .entries = (void *)1 })
+   { .enabled = ATOMIC_INIT(1), \
+ .entries = (void *)JUMP_LABEL_TYPE_TRUE_BRANCH })
 #define STATIC_KEY_INIT_FALSE ((struct static_key) \
-   { .enabled = ATOMIC_INIT(0), .entries = (void *)0 })
+   { .enabled = ATOMIC_INIT(0), \
+ .entries = (void *)JUMP_LABEL_TYPE_FALSE_BRANCH })
 
 #else  /* !HAVE_JUMP_LABEL */
 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 6/7] arm64, jump label: optimize jump label implementation

2013-10-15 Thread Jiang Liu
From: Jiang Liu 

Optimize jump label implementation for ARM64 by dynamically patching
kernel text.

Signed-off-by: Jiang Liu 
Cc: Jiang Liu 
---
 arch/arm64/Kconfig  |  1 +
 arch/arm64/include/asm/jump_label.h | 52 
 arch/arm64/kernel/Makefile  |  1 +
 arch/arm64/kernel/jump_label.c  | 60 +
 4 files changed, 114 insertions(+)
 create mode 100644 arch/arm64/include/asm/jump_label.h
 create mode 100644 arch/arm64/kernel/jump_label.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c044548..da388e4 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -17,6 +17,7 @@ config ARM64
select GENERIC_SMP_IDLE_THREAD
select GENERIC_TIME_VSYSCALL
select HARDIRQS_SW_RESEND
+   select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_TRACEHOOK
select HAVE_DEBUG_BUGVERBOSE
select HAVE_DEBUG_KMEMLEAK
diff --git a/arch/arm64/include/asm/jump_label.h 
b/arch/arm64/include/asm/jump_label.h
new file mode 100644
index 000..d268fab
--- /dev/null
+++ b/arch/arm64/include/asm/jump_label.h
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2013 Huawei Ltd.
+ * Author: Jiang Liu 
+ *
+ * Based on arch/arm/include/asm/jump_label.h
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+#ifndef _ASM_ARM64_JUMP_LABEL_H
+#define _ASM_ARM64_JUMP_LABEL_H
+#include 
+
+#ifdef __KERNEL__
+
+#define JUMP_LABEL_NOP_SIZE 4
+
+static __always_inline bool arch_static_branch(struct static_key *key)
+{
+   asm goto("1:\n\t"
+"nop\n\t"
+".pushsection __jump_table,  \"aw\"\n\t"
+".align 3\n\t"
+".quad 1b, %l[l_yes], %c0\n\t"
+".popsection\n\t"
+:  :  "i"(key) :  : l_yes);
+
+   return false;
+l_yes:
+   return true;
+}
+
+#endif /* __KERNEL__ */
+
+typedef u64 jump_label_t;
+
+struct jump_entry {
+   jump_label_t code;
+   jump_label_t target;
+   jump_label_t key;
+};
+
+#endif /* _ASM_ARM64_JUMP_LABEL_H */
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 9af6cb3..b7db65e 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -18,6 +18,7 @@ arm64-obj-$(CONFIG_SMP)   += smp.o 
smp_spin_table.o smp_psci.o
 arm64-obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o
 arm64-obj-$(CONFIG_HAVE_HW_BREAKPOINT)+= hw_breakpoint.o
 arm64-obj-$(CONFIG_EARLY_PRINTK)   += early_printk.o
+arm64-obj-$(CONFIG_JUMP_LABEL) += jump_label.o
 
 obj-y  += $(arm64-obj-y) vdso/
 obj-m  += $(arm64-obj-m)
diff --git a/arch/arm64/kernel/jump_label.c b/arch/arm64/kernel/jump_label.c
new file mode 100644
index 000..74cbc73
--- /dev/null
+++ b/arch/arm64/kernel/jump_label.c
@@ -0,0 +1,60 @@
+/*
+ * Copyright (C) 2013 Huawei Ltd.
+ * Author: Jiang Liu 
+ *
+ * Based on arch/arm/kernel/jump_label.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+#include 
+#include 
+#include 
+#include 
+
+#ifdef HAVE_JUMP_LABEL
+
+static void __arch_jump_label_transform(struct jump_entry *entry,
+   enum jump_label_type type,
+   bool is_static)
+{
+   void *addr = (void *)entry->code;
+   u32 insn;
+
+   if (type == JUMP_LABEL_ENABLE) {
+   /* no way out if instruction offset is out of range(+/-128M) */
+   insn = aarch64_insn_gen_branch_imm(entry->code,
+  entry->target, 0);
+   BUG_ON(!insn);
+   } else {
+   insn = aarch64_insn_gen_nop();
+   }
+
+   if (is_static)
+   aarch64_insn_patch_text_nosync(, , 1);
+   else
+   aarch64_insn_patch_text(, , 1);
+}
+
+void arch_jump_label_transform(struct 

[PATCH v3 5/7] arm64, jump label: detect %c support for ARM64

2013-10-15 Thread Jiang Liu
From: Jiang Liu 

As commit a9468f30b5eac6 "ARM: 7333/2: jump label: detect %c
support for ARM", this patch detects the same thing for ARM64
because some ARM64 GCC versions have the same issue.

Some versions of ARM64 GCC which do support asm goto, do not
support the %c specifier. Since we need the %c to support jump
labels on ARM64, detect that too in the asm goto detection script
to avoid build errors with these versions.

Signed-off-by: Jiang Liu 
Cc: Jiang Liu 
---
 scripts/gcc-goto.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/gcc-goto.sh b/scripts/gcc-goto.sh
index a2af2e8..c9469d3 100644
--- a/scripts/gcc-goto.sh
+++ b/scripts/gcc-goto.sh
@@ -5,7 +5,7 @@
 cat << "END" | $@ -x c - -c -o /dev/null >/dev/null 2>&1 && echo "y"
 int main(void)
 {
-#ifdef __arm__
+#if defined(__arm__) || defined(__aarch64__)
/*
 * Not related to asm goto, but used by jump label
 * and broken on some ARM GCC versions (see GCC Bug 48637).
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 4/7] arm64: introduce aarch64_insn_gen_{nop|branch_imm}() helper functions

2013-10-15 Thread Jiang Liu
From: Jiang Liu 

Introduce aarch64_insn_gen_{nop|branch_imm}() helper functions, which
will be used to implement jump label on ARM64.

Signed-off-by: Jiang Liu 
Cc: Jiang Liu 
---
 arch/arm64/include/asm/insn.h |  7 +++
 arch/arm64/kernel/insn.c  | 27 +++
 2 files changed, 34 insertions(+)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 8dc0a91..87c44b2 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -61,6 +61,13 @@ __AARCH64_INSN_FUNCS(nop,0x, 0xD503201F)
 enum aarch64_insn_class aarch64_get_insn_class(u32 insn);
 u32 aarch64_insn_encode_immediate(enum aarch64_insn_imm_type type,
  u32 insn, u64 imm);
+u32 aarch64_insn_gen_branch_imm(unsigned long pc, unsigned long addr,
+   bool link);
+static __always_inline u32 aarch64_insn_gen_nop(void)
+{
+   return aarch64_insn_get_nop_value();
+}
+
 u32 aarch64_insn_read(void *addr);
 void aarch64_insn_write(void *addr, u32 insn);
 bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 90cc312..c63fae6 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -14,6 +14,7 @@
  * You should have received a copy of the GNU General Public License
  * along with this program.  If not, see .
  */
+#include 
 #include 
 #include 
 #include 
@@ -256,3 +257,29 @@ u32 aarch64_insn_encode_immediate(enum 
aarch64_insn_imm_type type,
 
return insn;
 }
+
+u32 aarch64_insn_gen_branch_imm(unsigned long pc, unsigned long addr, bool 
link)
+{
+   u32 insn;
+   long offset;
+
+   /*
+* PC: A 64-bit Program Counter holding the address of the current
+* instruction. A64 instructions may be word-aligned.
+*/
+   BUG_ON((pc & 0x3) || (addr & 0x3));
+
+   /* B/BR support [-128M, 128M) offset */
+   offset = ((long)addr - (long)pc) >> 2;
+   if (abs(offset) > BIT(25) || offset == BIT(25)) {
+   WARN_ON_ONCE(1);
+   return 0;
+   }
+
+   if (link)
+   insn = aarch64_insn_get_bl_value();
+   else
+   insn = aarch64_insn_get_b_value();
+
+   return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_26, insn, offset);
+}
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 2/7] arm64: introduce interfaces to hotpatch kernel and module code

2013-10-15 Thread Jiang Liu
From: Jiang Liu 

Introduce three interfaces to patch kernel and module code:
aarch64_insn_patch_text_nosync():
patch code without synchronization, it's caller's responsibility
to synchronize all CPUs if needed.
aarch64_insn_patch_text_sync():
patch code and always synchronize with stop_machine()
aarch64_insn_patch_text():
patch code and synchronize with stop_machine() if needed

Signed-off-by: Jiang Liu 
Cc: Jiang Liu 
---
 arch/arm64/include/asm/insn.h |  7 +++-
 arch/arm64/kernel/insn.c  | 95 +++
 2 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index e7d1bc8..2dfcdb4 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -47,7 +47,12 @@ __AARCH64_INSN_FUNCS(nop,0x, 0xD503201F)
 #undef __AARCH64_INSN_FUNCS
 
 enum aarch64_insn_class aarch64_get_insn_class(u32 insn);
-
+u32 aarch64_insn_read(void *addr);
+void aarch64_insn_write(void *addr, u32 insn);
 bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
 
+int aarch64_insn_patch_text_nosync(void *addrs[], u32 insns[], int cnt);
+int aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt);
+int aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt);
+
 #endif /* _ASM_ARM64_INSN_H */
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 1be4d11..ad4185f 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -16,6 +16,8 @@
  */
 #include 
 #include 
+#include 
+#include 
 #include 
 
 /*
@@ -84,3 +86,96 @@ bool __kprobes aarch64_insn_hotpatch_safe(u32 old_insn, u32 
new_insn)
return __aarch64_insn_hotpatch_safe(old_insn) &&
   __aarch64_insn_hotpatch_safe(new_insn);
 }
+
+/*
+ * In ARMv8-A, A64 instructions have a fixed length of 32 bits and are always
+ * little-endian. On the other hand, SCTLR_EL1.EE (bit 25, Exception 
Endianness)
+ * flag controls endianness for EL1 explicit data accesses and stage 1
+ * translation table walks as below:
+ * 0: little-endian
+ * 1: big-endian
+ * So need to handle endianness when patching kernel code.
+ */
+u32 __kprobes aarch64_insn_read(void *addr)
+{
+   u32 insn;
+
+#ifdef __AARCH64EB__
+   insn = swab32(*(u32 *)addr);
+#else
+   insn = *(u32 *)addr;
+#endif
+
+   return insn;
+}
+
+void __kprobes aarch64_insn_write(void *addr, u32 insn)
+{
+#ifdef __AARCH64EB__
+   *(u32 *)addr = swab32(insn);
+#else
+   *(u32 *)addr = insn;
+#endif
+}
+
+int __kprobes aarch64_insn_patch_text_nosync(void *addrs[], u32 insns[],
+int cnt)
+{
+   int i;
+   u32 *tp;
+
+   if (cnt <= 0)
+   return -EINVAL;
+
+   for (i = 0; i < cnt; i++) {
+   tp = addrs[i];
+   /* A64 instructions must be word aligned */
+   if ((uintptr_t)tp & 0x3)
+   return -EINVAL;
+   aarch64_insn_write(tp, insns[i]);
+   flush_icache_range((uintptr_t)tp, (uintptr_t)tp + sizeof(u32));
+   }
+
+   return 0;
+}
+
+struct aarch64_insn_patch {
+   void**text_addrs;
+   u32 *new_insns;
+   int insn_cnt;
+};
+
+static int __kprobes aarch64_insn_patch_text_cb(void *arg)
+{
+   struct aarch64_insn_patch *pp = arg;
+
+   return aarch64_insn_patch_text_nosync(pp->text_addrs, pp->new_insns,
+ pp->insn_cnt);
+}
+
+int __kprobes aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt)
+{
+   struct aarch64_insn_patch patch = {
+   .text_addrs = addrs,
+   .new_insns = insns,
+   .insn_cnt = cnt,
+   };
+
+   if (cnt <= 0)
+   return -EINVAL;
+
+   /*
+* Execute __aarch64_insn_patch_text() on every online CPU,
+* which ensure serialization among all online CPUs.
+*/
+   return stop_machine(aarch64_insn_patch_text_cb, , NULL);
+}
+
+int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
+{
+   if (cnt == 1 && aarch64_insn_hotpatch_safe(aarch64_insn_read(addrs[0]),
+  insns[0]))
+   return aarch64_insn_patch_text_nosync(addrs, insns, cnt);
+   else
+   return aarch64_insn_patch_text_sync(addrs, insns, cnt);
+}
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 3/7] arm64: move encode_insn_immediate() from module.c to insn.c

2013-10-15 Thread Jiang Liu
From: Jiang Liu 

Function encode_insn_immediate() will be used by other instruction
manipulate related functions, so move it into insn.c and rename it
as aarch64_insn_encode_immediate().

Signed-off-by: Jiang Liu 
Cc: Jiang Liu 
---
 arch/arm64/include/asm/insn.h |  14 
 arch/arm64/kernel/insn.c  |  77 +
 arch/arm64/kernel/module.c| 151 +-
 3 files changed, 123 insertions(+), 119 deletions(-)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 2dfcdb4..8dc0a91 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -28,6 +28,18 @@ enum aarch64_insn_class {
 * system instructions */
 };
 
+enum aarch64_insn_imm_type {
+   AARCH64_INSN_IMM_MOVNZ,
+   AARCH64_INSN_IMM_MOVK,
+   AARCH64_INSN_IMM_ADR,
+   AARCH64_INSN_IMM_26,
+   AARCH64_INSN_IMM_19,
+   AARCH64_INSN_IMM_16,
+   AARCH64_INSN_IMM_14,
+   AARCH64_INSN_IMM_12,
+   AARCH64_INSN_IMM_9,
+};
+
 #define__AARCH64_INSN_FUNCS(abbr, mask, val)   \
 static __always_inline bool aarch64_insn_is_##abbr(u32 code) \
 { return (code & (mask)) == (val); }   \
@@ -47,6 +59,8 @@ __AARCH64_INSN_FUNCS(nop, 0x, 0xD503201F)
 #undef __AARCH64_INSN_FUNCS
 
 enum aarch64_insn_class aarch64_get_insn_class(u32 insn);
+u32 aarch64_insn_encode_immediate(enum aarch64_insn_imm_type type,
+ u32 insn, u64 imm);
 u32 aarch64_insn_read(void *addr);
 void aarch64_insn_write(void *addr, u32 insn);
 bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index ad4185f..90cc312 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -179,3 +179,80 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], u32 
insns[], int cnt)
else
return aarch64_insn_patch_text_sync(addrs, insns, cnt);
 }
+
+u32 aarch64_insn_encode_immediate(enum aarch64_insn_imm_type type,
+ u32 insn, u64 imm)
+{
+   u32 immlo, immhi, lomask, himask, mask;
+   int shift;
+
+   switch (type) {
+   case AARCH64_INSN_IMM_MOVNZ:
+   /*
+* For signed MOVW relocations, we have to manipulate the
+* instruction encoding depending on whether or not the
+* immediate is less than zero.
+*/
+   insn &= ~(3 << 29);
+   if ((s64)imm >= 0) {
+   /* >=0: Set the instruction to MOVZ (opcode 10b). */
+   insn |= 2 << 29;
+   } else {
+   /*
+* <0: Set the instruction to MOVN (opcode 00b).
+* Since we've masked the opcode already, we
+* don't need to do anything other than
+* inverting the new immediate field.
+*/
+   imm = ~imm;
+   }
+   case AARCH64_INSN_IMM_MOVK:
+   mask = BIT(16) - 1;
+   shift = 5;
+   break;
+   case AARCH64_INSN_IMM_ADR:
+   lomask = 0x3;
+   himask = 0x7;
+   immlo = imm & lomask;
+   imm >>= 2;
+   immhi = imm & himask;
+   imm = (immlo << 24) | (immhi);
+   mask = (lomask << 24) | (himask);
+   shift = 5;
+   break;
+   case AARCH64_INSN_IMM_26:
+   mask = BIT(26) - 1;
+   shift = 0;
+   break;
+   case AARCH64_INSN_IMM_19:
+   mask = BIT(19) - 1;
+   shift = 5;
+   break;
+   case AARCH64_INSN_IMM_16:
+   mask = BIT(16) - 1;
+   shift = 5;
+   break;
+   case AARCH64_INSN_IMM_14:
+   mask = BIT(14) - 1;
+   shift = 5;
+   break;
+   case AARCH64_INSN_IMM_12:
+   mask = BIT(12) - 1;
+   shift = 10;
+   break;
+   case AARCH64_INSN_IMM_9:
+   mask = BIT(9) - 1;
+   shift = 12;
+   break;
+   default:
+   pr_err("aarch64_insn_encode_immediate: unknown immediate 
encoding %d\n",
+   type);
+   return 0;
+   }
+
+   /* Update the immediate field. */
+   insn &= ~(mask << shift);
+   insn |= (imm & mask) << shift;
+
+   return insn;
+}
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index ca0e3d5..69e3c31 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void *module_alloc(unsigned long size)
 {
@@ -94,96 +95,8 @@ static int reloc_data(enum aarch64_reloc_op op, void *place, 
u64 val, int len)
return 0;
 }
 

[PATCH v3 1/7] arm64: introduce basic aarch64 instruction decoding helpers

2013-10-15 Thread Jiang Liu
From: Jiang Liu 

Introduce basic aarch64 instruction decoding helper
aarch64_get_insn_class() and aarch64_insn_hotpatch_safe().

Signed-off-by: Jiang Liu 
Cc: Jiang Liu 
---
 arch/arm64/include/asm/insn.h | 53 ++
 arch/arm64/kernel/Makefile|  2 +-
 arch/arm64/kernel/insn.c  | 86 +++
 3 files changed, 140 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/include/asm/insn.h
 create mode 100644 arch/arm64/kernel/insn.c

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
new file mode 100644
index 000..e7d1bc8
--- /dev/null
+++ b/arch/arm64/include/asm/insn.h
@@ -0,0 +1,53 @@
+/*
+ * Copyright (C) 2013 Huawei Ltd.
+ * Author: Jiang Liu 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+#ifndef_ASM_ARM64_INSN_H
+#define_ASM_ARM64_INSN_H
+#include 
+
+enum aarch64_insn_class {
+   AARCH64_INSN_CLS_UNKNOWN,   /* UNALLOCATED */
+   AARCH64_INSN_CLS_DP_IMM,/* Data processing - immediate */
+   AARCH64_INSN_CLS_DP_REG,/* Data processing - register */
+   AARCH64_INSN_CLS_DP_FPSIMD, /* Data processing - SIMD and FP */
+   AARCH64_INSN_CLS_LDST,  /* Loads and stores */
+   AARCH64_INSN_CLS_BR_SYS,/* Branch, exception generation and
+* system instructions */
+};
+
+#define__AARCH64_INSN_FUNCS(abbr, mask, val)   \
+static __always_inline bool aarch64_insn_is_##abbr(u32 code) \
+{ return (code & (mask)) == (val); }   \
+static __always_inline u32 aarch64_insn_get_##abbr##_mask(void) \
+{ return (mask); } \
+static __always_inline u32 aarch64_insn_get_##abbr##_value(void) \
+{ return (val); }
+
+__AARCH64_INSN_FUNCS(b,0xFC00, 0x1400)
+__AARCH64_INSN_FUNCS(bl,   0xFC00, 0x9400)
+__AARCH64_INSN_FUNCS(svc,  0xFFE0001F, 0xD401)
+__AARCH64_INSN_FUNCS(hvc,  0xFFE0001F, 0xD402)
+__AARCH64_INSN_FUNCS(smc,  0xFFE0001F, 0xD403)
+__AARCH64_INSN_FUNCS(brk,  0xFFE0001F, 0xD420)
+__AARCH64_INSN_FUNCS(nop,  0x, 0xD503201F)
+
+#undef __AARCH64_INSN_FUNCS
+
+enum aarch64_insn_class aarch64_get_insn_class(u32 insn);
+
+bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
+
+#endif /* _ASM_ARM64_INSN_H */
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 7b4b564..9af6cb3 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -9,7 +9,7 @@ AFLAGS_head.o   := -DTEXT_OFFSET=$(TEXT_OFFSET)
 arm64-obj-y:= cputable.o debug-monitors.o entry.o irq.o fpsimd.o   
\
   entry-fpsimd.o process.o ptrace.o setup.o signal.o   
\
   sys.o stacktrace.o time.o traps.o io.o vdso.o
\
-  hyp-stub.o psci.o
+  hyp-stub.o psci.o insn.o
 
 arm64-obj-$(CONFIG_COMPAT) += sys32.o kuser32.o signal32.o 
\
   sys_compat.o
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
new file mode 100644
index 000..1be4d11
--- /dev/null
+++ b/arch/arm64/kernel/insn.c
@@ -0,0 +1,86 @@
+/*
+ * Copyright (C) 2013 Huawei Ltd.
+ * Author: Jiang Liu 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+#include 
+#include 
+#include 
+
+/*
+ * ARM Architecture Reference Manual ARMv8, Section C3.1
+ * AARCH64 main encoding table
+ *  Bit position
+ *   28 27 26 25   Encoding Group
+ *   0  0  -  -Unallocated
+ *   1  0  0  -Data processing, immediate
+ *   1  0  1  -Branch, exception generation and system 
instructions
+ *   -  1  -  0Loads and stores
+ *   -  1  0  1Data processing - register
+ *   0  1  1  1Data processing - SIMD and floating 

[PATCH v3 0/7] Optimize jump label implementation for ARM64

2013-10-15 Thread Jiang Liu
From: Jiang Liu 

This patchset tries to optimize arch specfic jump label implementation
for ARM64 by dynamic kernel text patching.

To enable this feature, your toolchain must support "asm goto" extension
and "%c" constraint extesion. Current GCC for AARCH64 doesn't support
"%c", so you need a GCC patch similiar to this:
http://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/arm/arm.c?view=patch=175293=175565=175565
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48637

It has been tested on ARM Fast mode and a real hardware platform.

Any comments are welcomed!

V2->V3:
1) fix a bug in comparing signed and unsigned values
2) detect big endian by checking __AARCH64EB__

V1->V2: address review comments of V1
1) refine comments
2) add a new interface to always synchronize with stop_machine()
   when patching code
3) handle endian issue when patching code

Jiang Liu (7):
  arm64: introduce basic aarch64 instruction decoding helpers
  arm64: introduce interfaces to hotpatch kernel and module code
  arm64: move encode_insn_immediate() from module.c to insn.c
  arm64: introduce aarch64_insn_gen_{nop|branch_imm}() helper functions
  arm64, jump label: detect %c support for ARM64
  arm64, jump label: optimize jump label implementation
  jump_label: use defined macros instead of hard-coding for better
readability

 arch/arm64/Kconfig  |   1 +
 arch/arm64/include/asm/insn.h   |  79 ++
 arch/arm64/include/asm/jump_label.h |  52 +++
 arch/arm64/kernel/Makefile  |   3 +-
 arch/arm64/kernel/insn.c| 285 
 arch/arm64/kernel/jump_label.c  |  60 
 arch/arm64/kernel/module.c  | 151 ---
 include/linux/jump_label.h  |  15 +-
 scripts/gcc-goto.sh |   2 +-
 9 files changed, 522 insertions(+), 126 deletions(-)
 create mode 100644 arch/arm64/include/asm/insn.h
 create mode 100644 arch/arm64/include/asm/jump_label.h
 create mode 100644 arch/arm64/kernel/insn.c
 create mode 100644 arch/arm64/kernel/jump_label.c

-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] ACPI, CPER: Update cper info

2013-10-15 Thread Joe Perches
On Tue, 2013-10-15 at 22:46 -0400, Chen Gong wrote:
> On Tue, Oct 15, 2013 at 06:57:18PM -0700, Joe Perches wrote:
> > Date: Tue, 15 Oct 2013 18:57:18 -0700
> > From: Joe Perches 
> > To: Borislav Petkov 
> > Cc: "Chen, Gong" , tony.l...@intel.com,
> >  linux-kernel@vger.kernel.org, linux-a...@vger.kernel.org
> > Subject: Re: [PATCH 2/8] ACPI, CPER: Update cper info
> > X-Mailer: Evolution 3.6.4-0ubuntu1 
> > 
> > On Fri, 2013-10-11 at 17:47 +0200, Borislav Petkov wrote:
> > > On Fri, Oct 11, 2013 at 11:06:30AM +0200, Borislav Petkov wrote:
> > > > > - printk("%s""APEI generic hardware error status\n", pfx);
> > > > > + printk("%s""Generic Hardware Error Status\n", pfx);
> > > > 
> > > > Btw, what's the story with printk not using KERN_x levels in this file?
> > > > Why are we falling back to default printk levels for all printks here
> > > > and shouldn't we rather prioritize them by urgency into, say, KERN_ERR,
> > > > KERN_INFO, etc?
> > > 
> > > Ignore that - checkpatch complained about it but I kinda missed that
> > > we're handing down the prefix.
> > 
> > I think it'd be better to rename pfx to level
> > as that's what printk.h calls them.
> > 
> No. pfx includes log level and prefix string both.

Perhaps it'd be better to separate them.

I haven't looked too hard, but is apei_status_print
the only place it's used with more than KERN_?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] ACPI, CPER: Update cper info

2013-10-15 Thread Chen Gong
On Tue, Oct 15, 2013 at 06:57:18PM -0700, Joe Perches wrote:
> Date: Tue, 15 Oct 2013 18:57:18 -0700
> From: Joe Perches 
> To: Borislav Petkov 
> Cc: "Chen, Gong" , tony.l...@intel.com,
>  linux-kernel@vger.kernel.org, linux-a...@vger.kernel.org
> Subject: Re: [PATCH 2/8] ACPI, CPER: Update cper info
> X-Mailer: Evolution 3.6.4-0ubuntu1 
> 
> On Fri, 2013-10-11 at 17:47 +0200, Borislav Petkov wrote:
> > On Fri, Oct 11, 2013 at 11:06:30AM +0200, Borislav Petkov wrote:
> > > > -   printk("%s""APEI generic hardware error status\n", pfx);
> > > > +   printk("%s""Generic Hardware Error Status\n", pfx);
> > > 
> > > Btw, what's the story with printk not using KERN_x levels in this file?
> > > Why are we falling back to default printk levels for all printks here
> > > and shouldn't we rather prioritize them by urgency into, say, KERN_ERR,
> > > KERN_INFO, etc?
> > 
> > Ignore that - checkpatch complained about it but I kinda missed that
> > we're handing down the prefix.
> 
> I think it'd be better to rename pfx to level
> as that's what printk.h calls them.
> 
No. pfx includes log level and prefix string both.
> 
> 


signature.asc
Description: Digital signature


Re: Kconfig help entry for CONFIG_PARAVIRT_SPINLOCK

2013-10-15 Thread Raghavendra K T

On 10/16/2013 03:15 AM, Sander Eikelenboom wrote:

Hi Raghavendra,

Since the ticketlock series have landed in this mergewindow (thanks :-) ) the
help accompanying the Kconfig entry doesn't seem to reflect the current state 
well.

- Wasn't the whole purpose of the ticketlock series to mitigate this 5% 
performance hit to something
   far less, so distro kernels could enable this for their normal kernels ?
   I don't have the exact performance figures though.

- Perhaps the suggestion to enable this for supported hypervisors (Xen and KVM 
?) could be added ?


You are right. Thanks. I 'll send the patch to change the config text
soon.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] perf tools: Add data object to handle perf data file

2013-10-15 Thread Namhyung Kim
Hi Jiri,

On Tue, 15 Oct 2013 16:27:32 +0200, Jiri Olsa wrote:
> This patch is adding 'struct perf_data_file' object as
> a placeholder for all attributes regarding perf.data
> file handling. Changing perf_session__new to take it
> as an argument.
>
> The rest of the functionality will be added later to keep
> this change simple enough, because all the places using
> perf_session are changed now.

All three look good.

Btw, are you planning to support multiple per-cpu file record?  As you
know I suggested perf.data.dir approach in my perf-ftrace patchset (I'll
resend a new version soonish) something like below.  What do you think?

  perf.data.dir/
|-- perf-cpu0.data
|-- perf-cpu1.data
|-- perf-cpu2.data
`-- perf-cpu3.data

Maybe we could split sample data and other data (e.g. COMM, MMAP or some
user data?) to another file(s).

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/8] mm: thrash detection-based file cache sizing v5

2013-10-15 Thread Dave Chinner
On Tue, Oct 15, 2013 at 10:05:26PM -0400, Rik van Riel wrote:
> On 10/15/2013 07:41 PM, Dave Chinner wrote:
> > On Tue, Oct 15, 2013 at 01:41:28PM -0400, Johannes Weiner wrote:
> 
> >> I'm not forgetting about them, I just track them very coarsely by
> >> linking up address spaces and then lazily enforce their upper limit
> >> when memory is tight by using the shrinker callback.  The assumption
> >> was that actually scanning them is such a rare event that we trade the
> >> rare computational costs for smaller memory consumption most of the
> >> time.
> > 
> > Sure, I understand the tradeoff that you made. But there's nothing
> > worse than a system that slows down unpredictably because of some
> > magic threshold in some subsystem has been crossed and
> > computationally expensive operations kick in.
> 
> The shadow shrinker should remove the radix nodes with
> the oldest shadow entries first, so true LRU should actually
> work for the radix tree nodes.
> 
> Actually, since we only care about the age of the youngest
> shadow entry in each radix tree node, FIFO will be the same
> as LRU for that list.
> 
> That means the shrinker can always just take the radix tree
> nodes off the end.

Right, but it can't necessarily free the node as it may still have
pointers to pages in it. In that case, it would have to simply
rotate the page to the end of the LRU again.

Unless, of course, we kept track of the number of exceptional
entries in a node and didn't add it to the reclaim list until there
were no non-expceptional entries in the node

> >> But it
> >> looks like tracking radix tree nodes with a list and backpointers to
> >> the mapping object for the lock etc. will be a major pain in the ass.
> > 
> > Perhaps so - it may not work out when we get down to the fine
> > details...
> 
> I suspect that a combination of lifetime rules (inode cannot
> disappear until all the radix tree nodes) and using RCU free
> for the radix tree nodes, and the inodes might do the trick.
> 
> That would mean that, while holding the rcu read lock, the
> back pointer from a radix tree node to the inode will always
> point to valid memory.

Yes, that is what I was thinking...

> That allows the shrinker to lock the inode, and verify that
> the inode is still valid, before it attempts to rcu free the
> radix tree node with shadow entries.

Lock the mapping, not the inode. The radix tree is protected by the
mapping_lock, not an inode lock. i.e. I'd hope that this can all b
contained within the struct address_space and not require any
knowledge of inodes or inode lifecycles at all.

> It also means that locking only needs to be in the inode,
> and on the LRU list for shadow radix tree nodes.
> 
> Does that sound sane?
> 
> Am I overlooking something?

It's pretty much along the same lines of what I was thinking, but
lets see what Johannes thinks.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] For for each TSN t being newly acked (Not only cumulatively, but also SELECTIVELY) cacc_saw_newack should be set to 1.

2013-10-15 Thread Vlad Yasevich

On 10/15/2013 02:13 PM, Chang Xiangzhong wrote:

Signed-off-by: Xiangzhong Chang 


Your proposed solution is very nice, but it does 2 things in one
patch.
 1) It fixes the bug
 2) It refactors the code to improve the flow.

While (2) is very nice, it needs a much more careful review.
Can you please split this into 2 patches?  First patch can
make the code look like this:

if (sctp_acked(sack, tsn)) {
...
if (!tchunk->tsn_gap_acked) {
tchunk->tsn_gap_acked = 1;
*highest_new_tsn_in_sack = tsn;
bytes_acked += sctp_data_size(tchunk);
if (!tchunk->transport)
migrate_bytes += sctp_data_size(tchunk);
forward_progress = true;

/*
 * SFR-CACC algorithm:
 * 2) If the SACK contains gap acks
 * and the flag CHANGEOVER_ACTIVE is
 * set the receiver of the SACK MUST
 * take the following action:
...
}
}

Then you can file a second patch to improve the flow/refactor the 
function.  You have to be very careful here though and be sure to

run through all the regression tests since you would be modifying
a very critcal part of the code.

Thanks
-vlad


---
  net/sctp/outqueue.c |   76 ---
  1 file changed, 35 insertions(+), 41 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 94df758..84ef3b8 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1357,13 +1357,13 @@ static void sctp_check_transmitted(struct sctp_outq *q,

tsn = ntohl(tchunk->subh.data_hdr->tsn);
if (sctp_acked(sack, tsn)) {
-   /* If this queue is the retransmit queue, the
-* retransmit timer has already reclaimed
-* the outstanding bytes for this chunk, so only
-* count bytes associated with a transport.
-*/
-   if (transport) {
-   /* If this chunk is being used for RTT
+   if (!tchunk->tsn_gap_acked) {
+   /* If this queue is the retransmit queue, the
+* retransmit timer has already reclaimed
+* the outstanding bytes for this chunk, so only
+* count bytes associated with a transport.
+*
+* If this chunk is being used for RTT
 * measurement, calculate the RTT and update
 * the RTO using this value.
 *
@@ -1374,28 +1374,44 @@ static void sctp_check_transmitted(struct sctp_outq *q,
 * first instance of the packet or a later
 * instance).
 */
-   if (!tchunk->tsn_gap_acked &&
-   tchunk->rtt_in_progress) {
+   if (transport && tchunk->rtt_in_progress) {
tchunk->rtt_in_progress = 0;
rtt = jiffies - tchunk->sent_at;
sctp_transport_update_rto(transport,
- rtt);
+   rtt);
}
-   }

-   /* If the chunk hasn't been marked as ACKED,
-* mark it and account bytes_acked if the
-* chunk had a valid transport (it will not
-* have a transport if ASCONF had deleted it
-* while DATA was outstanding).
-*/
-   if (!tchunk->tsn_gap_acked) {
+   /* If the chunk hasn't been marked as ACKED,
+* mark it and account bytes_acked if the
+* chunk had a valid transport (it will not
+* have a transport if ASCONF had deleted it
+* while DATA was outstanding).
+*/
tchunk->tsn_gap_acked = 1;
*highest_new_tsn_in_sack = tsn;
bytes_acked += sctp_data_size(tchunk);
if (!tchunk->transport)
migrate_bytes += sctp_data_size(tchunk);
forward_progress = 

[PATCH 01/10][v6] powerpc: Rename branch_opcode() to instr_opcode()

2013-10-15 Thread Sukadev Bhattiprolu
The logic used in branch_opcode() to extract the opcode for an instruction
applies to non branch instructions also. So rename to instr_opcode().

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/lib/code-patching.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 17e5b23..2bc9db3 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -72,19 +72,19 @@ unsigned int create_cond_branch(const unsigned int *addr,
return instruction;
 }
 
-static unsigned int branch_opcode(unsigned int instr)
+static unsigned int instr_opcode(unsigned int instr)
 {
return (instr >> 26) & 0x3F;
 }
 
 static int instr_is_branch_iform(unsigned int instr)
 {
-   return branch_opcode(instr) == 18;
+   return instr_opcode(instr) == 18;
 }
 
 static int instr_is_branch_bform(unsigned int instr)
 {
-   return branch_opcode(instr) == 16;
+   return instr_opcode(instr) == 16;
 }
 
 int instr_is_relative_branch(unsigned int instr)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/10][v6] powerpc/perf: Export memory hierarchy level in Power7/8.

2013-10-15 Thread Sukadev Bhattiprolu
Power7 and Power8 processors save the memory hierarchy level (eg: L2, L3)
from which a load or store instruction was satisfied. Export this hierarchy
information to the user via the perf_mem_data_src object.

Thanks to input from Stephane Eranian, Michael Ellerman, Michael Neuling
and Anshuman Khandual.

Sukadev Bhattiprolu (10):
  powerpc: Rename branch_opcode() to instr_opcode()
  powerpc/Power7: detect load/store instructions
  tools/perf: silence compiler warnings
  tools/perf: Remove local byteorder.h.
  powerpc/perf: Remove PME_ prefix for power7 events
  powerpc/perf: Export Power8 generic events in sysfs
  powerpc/perf: Add Power8 event PM_MRK_GRP_CMPL to sysfs.
  powerpc/perf: Define big-endian version of perf_mem_data_src
  powerpc/perf: Export Power8 memory hierarchy info to user space.
  powerpc/perf: Export Power7 memory hierarchy info to user space.

 arch/powerpc/include/asm/code-patching.h |1 +
 arch/powerpc/include/asm/perf_event_server.h |4 +-
 arch/powerpc/lib/code-patching.c |   51 +++-
 arch/powerpc/perf/core-book3s.c  |   11 +++
 arch/powerpc/perf/power7-pmu.c   |  112 +++---
 arch/powerpc/perf/power8-events-list.h   |   21 +
 arch/powerpc/perf/power8-pmu.c   |   97 --
 include/uapi/linux/perf_event.h  |   16 
 tools/perf/Makefile  |1 -
 tools/perf/util/include/asm/byteorder.h  |2 -
 tools/perf/util/include/linux/types.h|   20 +
 tools/perf/util/srcline.c|4 +-
 12 files changed, 316 insertions(+), 24 deletions(-)
 create mode 100644 arch/powerpc/perf/power8-events-list.h
 delete mode 100644 tools/perf/util/include/asm/byteorder.h

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Panic and page fault in loop during handling NMI backtrace handler

2013-10-15 Thread Liu, Chuansheng
Hello Steven,

> -Original Message-
> From: Steven Rostedt [mailto:rost...@goodmis.org]
> Sent: Wednesday, October 16, 2013 10:08 AM
> To: Liu, Chuansheng
> Cc: Ingo Molnar (mi...@kernel.org); h...@zytor.com; fweis...@gmail.com;
> a...@linux-foundation.org; paul...@linux.vnet.ibm.com; Peter Zijlstra
> (pet...@infradead.org); x...@kernel.org; 'linux-kernel@vger.kernel.org'
> (linux-kernel@vger.kernel.org); Wang, Xiaoming; Li, Zhuangzhi
> Subject: Re: Panic and page fault in loop during handling NMI backtrace 
> handler
> 
> On Wed, 16 Oct 2013 01:54:51 +
> "Liu, Chuansheng"  wrote:
> 
> 
> > > Since the NMI iretq nesting has been fixed, there's no reason that
> > I think you patch fix the infinite loop, we will have a test soon.
> > BTW, we are using 3.10, could you help to point out which NMI iretq nesting
> patch?
> 
> There were many. You can read about what was done here:
> 
>  https://lwn.net/Articles/484932/
> 
> The original is here:
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ccd49c
> 2391773ffbf52bb80d75c4a92b16972517
> 
> But more were added.
> 
> But that is back in 3.3, so 3.10 has all the required updates.
Thanks your info, is trying your patch now.

> 
> -- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/10][v6] tools/perf: silence compiler warnings

2013-10-15 Thread Sukadev Bhattiprolu
The uninitialized variables cause warnings which are treated as errors
during build (without WERROR=0).

Signed-off-by: Sukadev Bhattiprolu 
---
 tools/perf/util/srcline.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index 10983a9..0477055 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -223,8 +223,8 @@ out:
 
 char *get_srcline(struct dso *dso, unsigned long addr)
 {
-   char *file;
-   unsigned line;
+   char *file = NULL;
+   unsigned line = 0;
char *srcline;
char *dso_name = dso->long_name;
size_t size;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/10][v6] tools/perf: Remove local byteorder.h.

2013-10-15 Thread Sukadev Bhattiprolu
Remove the local tools/perf/util/include/asm/byteorder.h and add
a few missing typedefs to tools/perf/util/include/linux/types.h.

The local byteorder.h complicates defining big/little endian versions
of data structures in include/uapi/linux/perf_event.h.

Fix proposed by Michael Ellerman.

Signed-off-by: Sukadev Bhattiprolu 
---
 tools/perf/Makefile |1 -
 tools/perf/util/include/asm/byteorder.h |2 --
 tools/perf/util/include/linux/types.h   |   20 
 3 files changed, 20 insertions(+), 3 deletions(-)
 delete mode 100644 tools/perf/util/include/asm/byteorder.h

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index b62e12d..3c4a7d9 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -225,7 +225,6 @@ LIB_H += util/include/linux/types.h
 LIB_H += util/include/linux/linkage.h
 LIB_H += util/include/asm/asm-offsets.h
 LIB_H += util/include/asm/bug.h
-LIB_H += util/include/asm/byteorder.h
 LIB_H += util/include/asm/hweight.h
 LIB_H += util/include/asm/swab.h
 LIB_H += util/include/asm/system.h
diff --git a/tools/perf/util/include/asm/byteorder.h 
b/tools/perf/util/include/asm/byteorder.h
deleted file mode 100644
index 2a9bdc0..000
--- a/tools/perf/util/include/asm/byteorder.h
+++ /dev/null
@@ -1,2 +0,0 @@
-#include 
-#include "../../../../include/uapi/linux/swab.h"
diff --git a/tools/perf/util/include/linux/types.h 
b/tools/perf/util/include/linux/types.h
index eb46478..775f68e 100644
--- a/tools/perf/util/include/linux/types.h
+++ b/tools/perf/util/include/linux/types.h
@@ -7,10 +7,30 @@
 #define __bitwise
 #endif
 
+#ifndef __le16
+typedef __u16 __bitwise __le16;
+#endif
+
 #ifndef __le32
 typedef __u32 __bitwise __le32;
 #endif
 
+#ifndef __be16
+typedef __u16 __bitwise __be16;
+#endif
+
+#ifndef __le64
+typedef __u64 __bitwise __le64;
+#endif
+
+#ifndef __be32
+typedef __u32 __bitwise __be32;
+#endif
+
+#ifndef __be64
+typedef __u64 __bitwise __be64;
+#endif
+
 #define DECLARE_BITMAP(name,bits) \
unsigned long name[BITS_TO_LONGS(bits)]
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/10][v6] powerpc/perf: Remove PME_ prefix for power7 events

2013-10-15 Thread Sukadev Bhattiprolu
We used the PME_ prefix earlier to avoid some macro/variable name
collisions.  We have since changed the way we define/use the event
macros so we no longer need the prefix.

By dropping the prefix, we keep the the event macros consistent with
their official names.

Reported-by: Michael Ellerman 
Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/include/asm/perf_event_server.h |2 +-
 arch/powerpc/perf/power7-pmu.c   |   18 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 3fd2f1b..d7b3419 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -138,7 +138,7 @@ extern ssize_t power_events_sysfs_show(struct device *dev,
 #defineEVENT_PTR(_id, _suffix) _VAR(_id, 
_suffix).attr.attr
 
 #defineEVENT_ATTR(_name, _id, _suffix) 
\
-   PMU_EVENT_ATTR(_name, EVENT_VAR(_id, _suffix), PME_##_id,   \
+   PMU_EVENT_ATTR(_name, EVENT_VAR(_id, _suffix), _id, \
power_events_sysfs_show)
 
 #defineGENERIC_EVENT_ATTR(_name, _id)  EVENT_ATTR(_name, _id, _g)
diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 56c67bc..ae24dfc 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -54,7 +54,7 @@
  * Power7 event codes.
  */
 #define EVENT(_name, _code) \
-   PME_##_name = _code,
+   _name = _code,
 
 enum {
 #include "power7-events-list.h"
@@ -318,14 +318,14 @@ static void power7_disable_pmc(unsigned int pmc, unsigned 
long mmcr[])
 }
 
 static int power7_generic_events[] = {
-   [PERF_COUNT_HW_CPU_CYCLES] =PME_PM_CYC,
-   [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =   PME_PM_GCT_NOSLOT_CYC,
-   [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =PME_PM_CMPLU_STALL,
-   [PERF_COUNT_HW_INSTRUCTIONS] =  PME_PM_INST_CMPL,
-   [PERF_COUNT_HW_CACHE_REFERENCES] =  PME_PM_LD_REF_L1,
-   [PERF_COUNT_HW_CACHE_MISSES] =  PME_PM_LD_MISS_L1,
-   [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] =   PME_PM_BRU_FIN,
-   [PERF_COUNT_HW_BRANCH_MISSES] = PME_PM_BR_MPRED,
+   [PERF_COUNT_HW_CPU_CYCLES] =PM_CYC,
+   [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =   PM_GCT_NOSLOT_CYC,
+   [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =PM_CMPLU_STALL,
+   [PERF_COUNT_HW_INSTRUCTIONS] =  PM_INST_CMPL,
+   [PERF_COUNT_HW_CACHE_REFERENCES] =  PM_LD_REF_L1,
+   [PERF_COUNT_HW_CACHE_MISSES] =  PM_LD_MISS_L1,
+   [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] =   PM_BRU_FIN,
+   [PERF_COUNT_HW_BRANCH_MISSES] = PM_BR_MPRED,
 };
 
 #define C(x)   PERF_COUNT_HW_CACHE_##x
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kernel/rcutorture.c: use scnprintf() instead of sprintf()

2013-10-15 Thread Chen Gang
On 10/15/2013 10:47 PM, Paul E. McKenney wrote:
> On Tue, Oct 15, 2013 at 08:32:41PM +0800, Chen Gang wrote:
>> Yeah, that is a way for it. It seems you (related maintainer) like
>> additional fix for it.
>>
>> Hmm... I will try within this week (although I don't think it is quite
>> necessary to me).
>>
>> :-)
> 
> If you always ensure that the buffer is big enough, do you really need
> the checking?
> 

Since they are all normal static functions: Of cause not need length
checking, either don't need return value, either don't need local
variable 'cnt'.

>>
>> Excuse me, my English is not quite well, I am not quite understand your
>> meaning.
>>
>> I guess your meaning is: "after find a simple/acceptable solution, we
>> can think of more, it may be more efficient".
>>
>> If what I guess is correct, It is OK to me -- since at least, it is not
>> an 'urgent' thing (for 'important' thing, your idea is more efficient,
>> although for 'urgent' thing, it is not).
> 
> That is important as well -- the first solution you think of might not
> be the right one.
> 

In my opinion, my first solution is correct, simple, and acceptable
enough for a test module, although it may be not the simplest, or not
most acceptable one.

> My point is related.  If you believe you found a bug by inspection,
> it is often worth testing to be sure.  Especially if the code in
> question is at all complex.
> 

Yeah, "it is often worth testing to be sure": it is related with test
case which based on the demands (so demands/requirement is the first),
in fact, most of maintainers will not give much focus on a test module.

The reason why I still will spend more time resource on test module is:
"since the related maintainer wants to focus on it, if it isn't urgent,
I will spend more time resource on it".

For 'important' but not 'urgent' thing (I assume your demands belong to
'important' thing), often need a trigger, if no triggers, better not
touch it now (it is not quite efficient). Now you are the 'trigger'. ;-)


>   Thanx, Paul
> 
> 
> 


Thanks.
-- 
Chen Gang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/10][v6] powerpc/perf: Export Power8 generic events in sysfs

2013-10-15 Thread Sukadev Bhattiprolu
Export generic perf events for Power8 in sysfs.

Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v6]:
[Michael Ellerman] Drop PME_ prefix in macros

 arch/powerpc/perf/power8-events-list.h |   20 +++
 arch/powerpc/perf/power8-pmu.c |   44 +++-
 2 files changed, 58 insertions(+), 6 deletions(-)
 create mode 100644 arch/powerpc/perf/power8-events-list.h

diff --git a/arch/powerpc/perf/power8-events-list.h 
b/arch/powerpc/perf/power8-events-list.h
new file mode 100644
index 000..1368547
--- /dev/null
+++ b/arch/powerpc/perf/power8-events-list.h
@@ -0,0 +1,20 @@
+/*
+ * Performance counter support for POWER8 processors.
+ *
+ * Copyright 2013 Sukadev Bhattiprolu, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+/*
+ * Some power8 event codes.
+ */
+EVENT(PM_CYC,  0x0001e)
+EVENT(PM_GCT_NOSLOT_CYC,   0x100f8)
+EVENT(PM_CMPLU_STALL,  0x4000a)
+EVENT(PM_INST_CMPL,0x2)
+EVENT(PM_BRU_FIN,  0x10068)
+EVENT(PM_BR_MPRED_CMPL,0x400f6)
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 2ee4a70..5141d97 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -18,13 +18,13 @@
 /*
  * Some power8 event codes.
  */
-#define PM_CYC 0x0001e
-#define PM_GCT_NOSLOT_CYC  0x100f8
-#define PM_CMPLU_STALL 0x4000a
-#define PM_INST_CMPL   0x2
-#define PM_BRU_FIN 0x10068
-#define PM_BR_MPRED_CMPL   0x400f6
+#define EVENT(_name, _code)_name = _code,
 
+enum {
+#include "power8-events-list.h"
+};
+
+#undef EVENT
 
 /*
  * Raw event encoding for POWER8:
@@ -510,6 +510,37 @@ static void power8_disable_pmc(unsigned int pmc, unsigned 
long mmcr[])
mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SHIFT(pmc + 1));
 }
 
+GENERIC_EVENT_ATTR(cpu-cyles,  PM_CYC);
+GENERIC_EVENT_ATTR(stalled-cycles-frontend,PM_GCT_NOSLOT_CYC);
+GENERIC_EVENT_ATTR(stalled-cycles-backend, PM_CMPLU_STALL);
+GENERIC_EVENT_ATTR(instructions,   PM_INST_CMPL);
+GENERIC_EVENT_ATTR(branch-instructions,PM_BRU_FIN);
+GENERIC_EVENT_ATTR(branch-misses,  PM_BR_MPRED_CMPL);
+
+#define EVENT(_name, _code)POWER_EVENT_ATTR(_name, _name);
+#include "power8-events-list.h"
+#undef EVENT
+
+#define EVENT(_name, _code)POWER_EVENT_PTR(_name),
+
+static struct attribute *power8_events_attr[] = {
+   GENERIC_EVENT_PTR(PM_CYC),
+   GENERIC_EVENT_PTR(PM_GCT_NOSLOT_CYC),
+   GENERIC_EVENT_PTR(PM_CMPLU_STALL),
+   GENERIC_EVENT_PTR(PM_INST_CMPL),
+   GENERIC_EVENT_PTR(PM_BRU_FIN),
+   GENERIC_EVENT_PTR(PM_BR_MPRED_CMPL),
+
+   #include "power8-events-list.h"
+   #undef EVENT
+   NULL
+};
+
+static struct attribute_group power8_pmu_events_group = {
+   .name = "events",
+   .attrs = power8_events_attr,
+};
+
 PMU_FORMAT_ATTR(event, "config:0-49");
 PMU_FORMAT_ATTR(pmcxsel,   "config:0-7");
 PMU_FORMAT_ATTR(mark,  "config:8");
@@ -546,6 +577,7 @@ struct attribute_group power8_pmu_format_group = {
 
 static const struct attribute_group *power8_pmu_attr_groups[] = {
_pmu_format_group,
+   _pmu_events_group,
NULL,
 };
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/10][v6] powerpc/perf: Define big-endian version of perf_mem_data_src

2013-10-15 Thread Sukadev Bhattiprolu
perf_mem_data_src is an union that is initialized via the ->val field
and accessed via the bitmap fields. For this to work on big endian
platforms, we also need a big-endian represenation of perf_mem_data_src.

Cc: Stephane Eranian 
Cc: Michael Ellerman 
Signed-off-by: Sukadev Bhattiprolu 
---
Changelog [v6]
- [Michael Ellerman] Use __BIG_ENDIAN_BITFIELD to simplify the
  endian check.

Changelog [v5]:
- perf_event.h includes  which pulls in the local
  byteorder.h when building the perf tool. This local byteorder.h
  leaves __LITTLE_ENDIAN and __BIG_ENDIAN undefined.
  Include  explicitly in the local byteorder.h.

Changelog [v2]:
- [Vince Weaver, Michael Ellerman] No __KERNEL__ in uapi headers.

 include/uapi/linux/perf_event.h |   16 
 1 file changed, 16 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index ca1d90b..383052b7 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -695,6 +695,7 @@ enum perf_callchain_context {
 #define PERF_FLAG_FD_OUTPUT(1U << 1)
 #define PERF_FLAG_PID_CGROUP   (1U << 2) /* pid=cgroup id, per-cpu 
mode only */
 
+#if defined(__LITTLE_ENDIAN_BITFIELD)
 union perf_mem_data_src {
__u64 val;
struct {
@@ -706,6 +707,21 @@ union perf_mem_data_src {
mem_rsvd:31;
};
 };
+#elif defined(__BIG_ENDIAN_BITFIELD)
+union perf_mem_data_src {
+   __u64 val;
+   struct {
+   __u64   mem_rsvd:31,
+   mem_dtlb:7, /* tlb access */
+   mem_lock:2, /* lock instr */
+   mem_snoop:5,/* snoop mode */
+   mem_lvl:14, /* memory hierarchy level */
+   mem_op:5;   /* type of opcode */
+   };
+};
+#else
+#error "Unknown endianness"
+#endif
 
 /* type of opcode (load/store/prefetch,code) */
 #define PERF_MEM_OP_NA 0x01 /* not available */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/10][v6] powerpc/perf: Export Power7 memory hierarchy info to user space.

2013-10-15 Thread Sukadev Bhattiprolu
On Power7, the DCACHE_SRC field in MMCRA register identifies the memory
hierarchy level (eg: L2, L3 etc) from which a data-cache miss for a
marked instruction was satisfied.

Use the 'perf_mem_data_src' object to export this hierarchy level to user
space. Some memory hierarchy levels in Power7 don't map into the arch-neutral
levels. However, since newer generation of the processor (i.e. Power8) uses
fewer levels than in Power7, we don't really need to define new hierarchy
levels just for Power7.

We instead, map as many levels as possible and approximate the rest. See
comments near dcache-src_map[] in the patch.

Usage:

perf record -d -e 'cpu/PM_MRK_GRP_CMPL/' 
perf report -n --mem-mode --sort=mem,sym,dso,symbol_daddr,dso_daddr"

For samples involving load/store instructions, the memory
hierarchy level is shown as "L1 hit", "Remote RAM hit" etc.
# or

perf record --data 
perf report -D

Sample records contain a 'data_src' field which encodes the
memory hierarchy level: Eg: data_src 0x442 indicates
MEM_OP_LOAD, MEM_LVL_HIT, MEM_LVL_L2 (i.e load hit L2).

Note that the PMU event PM_MRK_GRP_CMPL tracks all marked group completions
events. While some of these are loads and stores, others like 'add'
instructions may also be sampled.

As such, the precise semantics of 'perf mem -t load' or 'perf mem -t store'
(which require sampling only loads or only stores cannot be implemented on
Power. (Sampling on PM_MRK_GRP_CMPL and throwing away non-loads and non-store
samples could yield an inconsistent profile of the application).

Thanks to input from Stephane Eranian, Michael Ellerman and Michael Neuling.

Cc: Stephane Eranian 
Cc: Michael Ellerman 
Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v4]:
Drop support for 'perf mem' for Power (use perf-record and perf-report
directly)

Changelog[v3]:
[Michael Ellerman] If newer levels that we defined in [v2] are not
needed for Power8, ignore the new levels for Power7 also, and
approximate them.
Separate the TLB level mapping to a separate patchset.

Changelog[v2]:
[Stephane Eranian] Define new levels rather than ORing the L2 and L3
with REM_CCE1 and REM_CCE2.
[Stephane Eranian] allocate a bit PERF_MEM_XLVL_NA for architectures
that don't use the ->mem_xlvl field.
Insert the TLB patch ahead so the new TLB bits are contigous with
existing TLB bits.

 arch/powerpc/perf/power7-pmu.c |   94 
 1 file changed, 94 insertions(+)

diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index ae24dfc..3e86bb8 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -11,8 +11,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#include 
 
 /*
  * Bits in event code for POWER7
@@ -317,6 +319,97 @@ static void power7_disable_pmc(unsigned int pmc, unsigned 
long mmcr[])
mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SH(pmc));
 }
 
+#define POWER7_MMCRA_DCACHE_MISS   (0x1LL << 55)
+#define POWER7_MMCRA_DCACHE_SRC_SHIFT  51
+#define POWER7_MMCRA_DCACHE_SRC_MASK   (0xFLL << POWER7_MMCRA_DCACHE_SRC_SHIFT)
+
+#define P(a, b)PERF_MEM_S(a, b)
+#define PLH(a, b)  (P(OP, LOAD) | P(LVL, HIT) | P(a, b))
+/*
+ * Map the Power7 DCACHE_SRC field (bits 9..12) in MMCRA register to the
+ * architecture-neutral memory hierarchy levels. For the levels in Power7
+ * that don't map to the arch-neutral levels, approximate to nearest
+ * level.
+ *
+ * 1-hop:  indicates another core on the same chip (2.1 and 3.1 levels).
+ * 2-hops: indicates a different chip on same or different node (remote
+ * and distant levels).
+ *
+ * For consistency with this interpretation of the hops, we dont use
+ * the REM_RAM1 level below.
+ *
+ * The *SHR and *MOD states of the cache are ignored/not exported to user.
+ *
+ * ### Levels marked with ### in comments below are approximated
+ */
+static u64 dcache_src_map[] = {
+   PLH(LVL, L2),   /* 00: FROM_L2 */
+   PLH(LVL, L3),   /* 01: FROM_L3 */
+
+   P(LVL, NA), /* 02: Reserved */
+   P(LVL, NA), /* 03: Reserved */
+
+   PLH(LVL, REM_CCE1), /* 04: FROM_L2.1_SHR ### */
+   PLH(LVL, REM_CCE1), /* 05: FROM_L2.1_MOD ### */
+
+   PLH(LVL, REM_CCE1), /* 06: FROM_L3.1_SHR ### */
+   PLH(LVL, REM_CCE1), /* 07: FROM_L3.1_MOD ### */
+
+   PLH(LVL, REM_CCE2), /* 08: FROM_RL2L3_SHR ### */
+   PLH(LVL, REM_CCE2), /* 09: FROM_RL2L3_MOD ### */
+
+   PLH(LVL, REM_CCE2), /* 10: FROM_DL2L3_SHR ### */
+   PLH(LVL, REM_CCE2), /* 11: FROM_DL2L3_MOD ### */
+
+   PLH(LVL, LOC_RAM),  /* 12: FROM_LMEM 

[PATCH 07/10][v6] powerpc/perf: Add Power8 event PM_MRK_GRP_CMPL to sysfs.

2013-10-15 Thread Sukadev Bhattiprolu
The perf event PM_MRK_GRP_CMPL is useful in analyzing memory hierarchy
of applications.

Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v6]:
- [Michael Ellerman]: Drop redundant PME_ prefix from event name.

 arch/powerpc/perf/power8-events-list.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/perf/power8-events-list.h 
b/arch/powerpc/perf/power8-events-list.h
index 1368547..b39e117 100644
--- a/arch/powerpc/perf/power8-events-list.h
+++ b/arch/powerpc/perf/power8-events-list.h
@@ -18,3 +18,4 @@ EVENT(PM_CMPLU_STALL, 0x4000a)
 EVENT(PM_INST_CMPL,0x2)
 EVENT(PM_BRU_FIN,  0x10068)
 EVENT(PM_BR_MPRED_CMPL,0x400f6)
+EVENT(PM_MRK_GRP_CMPL, 0x40130)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Panic and page fault in loop during handling NMI backtrace handler

2013-10-15 Thread Steven Rostedt
On Wed, 16 Oct 2013 01:54:51 +
"Liu, Chuansheng"  wrote:


> > Since the NMI iretq nesting has been fixed, there's no reason that
> I think you patch fix the infinite loop, we will have a test soon.
> BTW, we are using 3.10, could you help to point out which NMI iretq nesting 
> patch?

There were many. You can read about what was done here:

 https://lwn.net/Articles/484932/

The original is here:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ccd49c2391773ffbf52bb80d75c4a92b16972517

But more were added.

But that is back in 3.3, so 3.10 has all the required updates.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/10][v6] powerpc/perf: Export Power8 memory hierarchy info to user space.

2013-10-15 Thread Sukadev Bhattiprolu
On Power8, the LDST field in SIER identifies the memory hierarchy level
(eg: L1, L2 etc), from which a data-cache miss for a marked instruction
was satisfied.

Use the 'perf_mem_data_src' object to export this hierarchy level to user
space. Fortunately, the memory hierarchy levels in Power8 map fairly easily
into the arch-neutral levels as described by the ldst_src_map[] table.

Usage:

perf record -d -e 'cpu/PM_MRK_GRP_CMPL/' 
perf report -n --mem-mode --sort=mem,sym,dso,symbol_daddr,dso_daddr"

For samples involving load/store instructions, the memory
hierarchy level is shown as "L1 hit", "Remote RAM hit" etc.
# or

perf record --data 
perf report -D

Sample records contain a 'data_src' field which encodes the
memory hierarchy level: Eg: data_src 0x442 indicates
MEM_OP_LOAD, MEM_LVL_HIT, MEM_LVL_L2 (i.e load hit L2).

Note that the PMU event PM_MRK_GRP_CMPL tracks all marked group completions
events. While some of these are loads and stores, others like 'add'
instructions may also be sampled. One alternative of sampling on
PM_MRK_GRP_CMPL and throwing away non-loads and non-store samples could
yield an inconsistent profile of the application.

As the precise semantics of 'perf mem -t load' or 'perf mem -t store' (which
require sampling only loads or only stores) cannot be implemented on Power,
we don't implement 'perf mem' on Power for now.

Thanks to input from Stephane Eranian, Michael Ellerman and Michael Neuling.

Cc: Stephane Eranian 
Cc: Michael Ellerman 
Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v2]:
Drop support for 'perf mem' for Power (use perf-record and perf-report
directly)

 arch/powerpc/include/asm/perf_event_server.h |2 +
 arch/powerpc/perf/core-book3s.c  |   11 ++
 arch/powerpc/perf/power8-pmu.c   |   53 ++
 3 files changed, 66 insertions(+)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index d7b3419..5f2c449 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -38,6 +38,8 @@ struct power_pmu {
void(*config_bhrb)(u64 pmu_bhrb_filter);
void(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
int (*limited_pmc_event)(u64 event_id);
+   void(*get_mem_data_src)(union perf_mem_data_src *dsrc,
+   struct pt_regs *regs);
u32 flags;
const struct attribute_group**attr_groups;
int n_generic;
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index eeae308..5221ba1 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1696,6 +1696,13 @@ ssize_t power_events_sysfs_show(struct device *dev,
return sprintf(page, "event=0x%02llx\n", pmu_attr->id);
 }
 
+static inline void power_get_mem_data_src(union perf_mem_data_src *dsrc,
+   struct pt_regs *regs)
+{
+   if  (ppmu->get_mem_data_src)
+   ppmu->get_mem_data_src(dsrc, regs);
+}
+
 struct pmu power_pmu = {
.pmu_enable = power_pmu_enable,
.pmu_disable= power_pmu_disable,
@@ -1777,6 +1784,10 @@ static void record_and_restart(struct perf_event *event, 
unsigned long val,
data.br_stack = >bhrb_stack;
}
 
+   if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
+   ppmu->get_mem_data_src)
+   ppmu->get_mem_data_src(_src, regs);
+
if (perf_event_overflow(event, , regs))
power_pmu_stop(event, 0);
}
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 5141d97..c25b5c3 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -541,6 +541,58 @@ static struct attribute_group power8_pmu_events_group = {
.attrs = power8_events_attr,
 };
 
+#define POWER8_SIER_TYPE_SHIFT 15
+#define POWER8_SIER_TYPE_MASK  (0x7LL << POWER8_SIER_TYPE_SHIFT)
+
+#define POWER8_SIER_LDST_SHIFT 1
+#define POWER8_SIER_LDST_MASK  (0x7LL << POWER8_SIER_LDST_SHIFT)
+
+#define P(a, b)PERF_MEM_S(a, b)
+#define PLH(a, b)  (P(OP, LOAD) | P(LVL, HIT) | P(a, b))
+#define PSM(a, b)  (P(OP, STORE) | P(LVL, MISS) | P(a, b))
+
+/*
+ * Power8 interpretations:
+ * REM_CCE1: 1-hop indicates L2/L3 cache of a different core on same chip
+ * REM_CCE2: 2-hop indicates different chip or different node.
+ */
+static u64 ldst_src_map[] = {
+   /* 000 */   P(LVL, NA),
+
+   /* 001 */   PLH(LVL, L1),
+   /* 010 */   PLH(LVL, L2),
+   /* 011 */   PLH(LVL, L3),
+   /* 100 */   PLH(LVL, LOC_RAM),
+  

[PATCH 02/10][v6] powerpc/Power7: detect load/store instructions

2013-10-15 Thread Sukadev Bhattiprolu
Implement instr_is_load_store_2_06() to detect whether a given instruction
is one of the fixed-point or floating-point load/store instructions in the
POWER Instruction Set Architecture v2.06.

This function will be used in a follow-on patch to save memory hierarchy
information of the load/store on a Power7 system. (Power8 systems set some
bits in the SIER to identify load/store operations and hence don't need a
similar functionality).

Based on optimized code from Michael Ellerman and comments from Tom Musta.

Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v6]
- [Michael Ellerman, Tom Musta]: Optmize the implementation to
  avoid for loop.

 arch/powerpc/include/asm/code-patching.h |1 +
 arch/powerpc/lib/code-patching.c |   45 ++
 2 files changed, 46 insertions(+)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index a6f8c7a..9cc3ef1 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -34,6 +34,7 @@ int instr_is_branch_to_addr(const unsigned int *instr, 
unsigned long addr);
 unsigned long branch_target(const unsigned int *instr);
 unsigned int translate_branch(const unsigned int *dest,
  const unsigned int *src);
+int instr_is_load_store_2_06(const unsigned int *instr);
 
 static inline unsigned long ppc_function_entry(void *func)
 {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 2bc9db3..49fb9d7 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -159,6 +159,51 @@ unsigned int translate_branch(const unsigned int *dest, 
const unsigned int *src)
return 0;
 }
 
+/*
+ * Determine if the op code in the instruction corresponds to a load or
+ * store instruction. Ignore the vector load instructions like evlddepx,
+ * evstddepx for now.
+ *
+ * This function is valid for POWER ISA 2.06.
+ *
+ * Reference:  PowerISA_V2.06B_Public.pdf, Sections 3.3.2 through 3.3.6
+ * and 4.6.2 through 4.6.4, Appendix F (Opcode Maps).
+ */
+int instr_is_load_store_2_06(const unsigned int *instr)
+{
+   unsigned int op, upper, lower;
+
+   op = instr_opcode(*instr);
+
+   if ((op >= 32 && op <= 58) || (op == 61 || op == 62))
+   return true;
+
+   if (op != 31)
+   return false;
+
+   upper = op >> 5;
+   lower = op & 0x1f;
+
+   /* Short circuit as many misses as we can */
+   if (lower < 3 || lower > 23)
+   return false;
+
+   if (lower == 3) {
+   if (upper >= 16)
+   return true;
+
+   return false;
+   }
+
+   if (lower == 7 || lower == 12)
+   return true;
+
+   if (lower >= 20) /* && lower <= 23 (implicit) */
+   return true;
+
+   return false;
+}
+
 
 #ifdef CONFIG_CODE_PATCHING_SELFTEST
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/8] mm: thrash detection-based file cache sizing v5

2013-10-15 Thread Rik van Riel
On 10/15/2013 07:41 PM, Dave Chinner wrote:
> On Tue, Oct 15, 2013 at 01:41:28PM -0400, Johannes Weiner wrote:

>> I'm not forgetting about them, I just track them very coarsely by
>> linking up address spaces and then lazily enforce their upper limit
>> when memory is tight by using the shrinker callback.  The assumption
>> was that actually scanning them is such a rare event that we trade the
>> rare computational costs for smaller memory consumption most of the
>> time.
> 
> Sure, I understand the tradeoff that you made. But there's nothing
> worse than a system that slows down unpredictably because of some
> magic threshold in some subsystem has been crossed and
> computationally expensive operations kick in.

The shadow shrinker should remove the radix nodes with
the oldest shadow entries first, so true LRU should actually
work for the radix tree nodes.

Actually, since we only care about the age of the youngest
shadow entry in each radix tree node, FIFO will be the same
as LRU for that list.

That means the shrinker can always just take the radix tree
nodes off the end.

>> But it
>> looks like tracking radix tree nodes with a list and backpointers to
>> the mapping object for the lock etc. will be a major pain in the ass.
> 
> Perhaps so - it may not work out when we get down to the fine
> details...

I suspect that a combination of lifetime rules (inode cannot
disappear until all the radix tree nodes) and using RCU free
for the radix tree nodes, and the inodes might do the trick.

That would mean that, while holding the rcu read lock, the
back pointer from a radix tree node to the inode will always
point to valid memory.

That allows the shrinker to lock the inode, and verify that
the inode is still valid, before it attempts to rcu free the
radix tree node with shadow entries.

It also means that locking only needs to be in the inode,
and on the LRU list for shadow radix tree nodes.

Does that sound sane?

Am I overlooking something?

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


still running into WARNING: CPU: at fs/ext4/inode.c:230 ext4_evict_inode+0x4a6/0x4e0

2013-10-15 Thread Davidlohr Bueso
Hello Jan,

Just wanted to let you know I hit this[1] again on Linus' latest. The
setup/workload is *identical* to the reported one a few months ago.

[1] https://lkml.org/lkml/2013/8/1/532

Here's the complete output, I hope it helps...

[ cut here ]
WARNING: CPU: 42 PID: 74607 at fs/ext4/inode.c:230 ext4_evict_inode+0x4a6/0x4e0 
[ext4]()
Modules linked in: fuse autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand 
pcc_cpufreq ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter 
ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack 
ip6table_filter ip6_tables ipv6 uinput iTCO_wdt iTCO_vendor_support ipmi_si 
ipmi_msghandler microcode pcspkr serio_raw lpc_ich mfd_core sg hpilo hpwdt 
i7core_edac edac_core netxen_nic freq_table ext4 jbd2 mbcache sr_mod cdrom 
sd_mod crc_t10dif crct10dif_common qla2xxx scsi_transport_fc scsi_tgt pata_acpi 
ata_generic ata_piix hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core 
dm_mirror dm_region_hash dm_log dm_mod
CPU: 42 PID: 74607 Comm: reaim Tainted: GW3.12.0-rc5 #40
Hardware name: HP ProLiant DL980 G7, BIOS P66 08/31/
 00e6 8908c6c59d28 81550dd8 00e6
  8908c6c59d68 8104e29c 8908c6c59d78
 880afae83500 880afae83608 880afae83450 880afae83500
Call Trace:
 [] dump_stack+0x49/0x61
 [] warn_slowpath_common+0x8c/0xc0
 [] warn_slowpath_null+0x1a/0x20
 [] ext4_evict_inode+0x4a6/0x4e0 [ext4]
 [] evict+0xa7/0x1c0
 [] iput_final+0xef/0x190
 [] iput+0x3e/0x50
 [] d_kill+0xd8/0x110
 [] dput+0x18b/0x1c0
 [] __fput+0x188/0x270
 [] fput+0xe/0x10
 [] task_work_run+0x8f/0xf0
 [] do_notify_resume+0x84/0x90
 [] int_signal+0x12/0x17
---[ end trace ea7184a5fb48e185 ]---


Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v11 2/3] DMA: Freescale: Add new 8-channel DMA engine device tree nodes

2013-10-15 Thread Hongbo Zhang

On 10/15/2013 09:38 PM, Mark Rutland wrote:

On Tue, Oct 08, 2013 at 04:22:07AM +0100, Hongbo Zhang wrote:

Hi Mark, Stephen and other DT maintainers?

The 1/3 had already been acked by Mark, and please have a further look
at this patch 2/3.
The DMA maintainer Vinod  needs ack for the DT related patches so that
he can take all this patch set.

Sorry for the delay.

This looks ok to me.

Acked-by: Mark Rutland 


Thanks, Mark.


On 09/26/2013 05:33 PM, hongbo.zh...@freescale.com wrote:

From: Hongbo Zhang 

Freescale QorIQ T4 and B4 introduce new 8-channel DMA engines, this patch adds
the device tree nodes for them.

Signed-off-by: Hongbo Zhang 
---
   .../devicetree/bindings/powerpc/fsl/dma.txt|   70 +
   arch/powerpc/boot/dts/fsl/b4si-post.dtsi   |4 +-
   arch/powerpc/boot/dts/fsl/elo3-dma-0.dtsi  |   82 

   arch/powerpc/boot/dts/fsl/elo3-dma-1.dtsi  |   82 

   arch/powerpc/boot/dts/fsl/t4240si-post.dtsi|4 +-
   5 files changed, 238 insertions(+), 4 deletions(-)
   create mode 100644 arch/powerpc/boot/dts/fsl/elo3-dma-0.dtsi
   create mode 100644 arch/powerpc/boot/dts/fsl/elo3-dma-1.dtsi

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/dma.txt 
b/Documentation/devicetree/bindings/powerpc/fsl/dma.txt
index 0584168..7fc1b01 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/dma.txt
+++ b/Documentation/devicetree/bindings/powerpc/fsl/dma.txt
@@ -128,6 +128,76 @@ Example:
   };
   };

+** Freescale Elo3 DMA Controller
+   DMA controller which has same function as EloPlus except that Elo3 has 8
+   channels while EloPlus has only 4, it is used in Freescale Txxx and Bxxx
+   series chips, such as t1040, t4240, b4860.
+
+Required properties:
+
+- compatible: must include "fsl,elo3-dma"
+- reg   : contains two entries for DMA General Status Registers,
+  i.e. DGSR0 which includes status for channel 1~4, and
+  DGSR1 for channel 5~8
+- ranges: describes the mapping between the address space of the
+  DMA channels and the address space of the DMA controller
+
+- DMA channel nodes:
+- compatible: must include "fsl,eloplus-dma-channel"
+- reg   : DMA channel specific registers
+- interrupts: interrupt specifier for DMA channel IRQ
+- interrupt-parent  : optional, if needed for interrupt mapping
+
+Example:
+dma@100300 {
+ #address-cells = <1>;
+ #size-cells = <1>;
+ compatible = "fsl,elo3-dma";
+ reg = <0x100300 0x4>,
+   <0x100600 0x4>;
+ ranges = <0x0 0x100100 0x500>;
+ dma-channel@0 {
+ compatible = "fsl,eloplus-dma-channel";
+ reg = <0x0 0x80>;
+ interrupts = <28 2 0 0>;
+ };
+ dma-channel@80 {
+ compatible = "fsl,eloplus-dma-channel";
+ reg = <0x80 0x80>;
+ interrupts = <29 2 0 0>;
+ };
+ dma-channel@100 {
+ compatible = "fsl,eloplus-dma-channel";
+ reg = <0x100 0x80>;
+ interrupts = <30 2 0 0>;
+ };
+ dma-channel@180 {
+ compatible = "fsl,eloplus-dma-channel";
+ reg = <0x180 0x80>;
+ interrupts = <31 2 0 0>;
+ };
+ dma-channel@300 {
+ compatible = "fsl,eloplus-dma-channel";
+ reg = <0x300 0x80>;
+ interrupts = <76 2 0 0>;
+ };
+ dma-channel@380 {
+ compatible = "fsl,eloplus-dma-channel";
+ reg = <0x380 0x80>;
+ interrupts = <77 2 0 0>;
+ };
+ dma-channel@400 {
+ compatible = "fsl,eloplus-dma-channel";
+ reg = <0x400 0x80>;
+ interrupts = <78 2 0 0>;
+ };
+ dma-channel@480 {
+ compatible = "fsl,eloplus-dma-channel";
+ reg = <0x480 0x80>;
+ interrupts = <79 2 0 0>;
+ };
+};
+
   Note on DMA channel compatible properties: The compatible property must say
   "fsl,elo-dma-channel" or "fsl,eloplus-dma-channel" to be used by the Elo DMA
   driver (fsldma).  Any DMA channel used by fsldma cannot be used by another
diff --git a/arch/powerpc/boot/dts/fsl/b4si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/b4si-post.dtsi
index 7399154..ea53ea1 100644
--- a/arch/powerpc/boot/dts/fsl/b4si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/b4si-post.dtsi
@@ -223,13 +223,13 @@
   reg = <0xe2000 0x1000>;
   };

-/include/ "qoriq-dma-0.dtsi"
+/include/ "elo3-dma-0.dtsi"
   dma@100300 {
   fsl,iommu-parent = <>;
   fsl,liodn-reg = < 0x580>; /* DMA1LIODNR */
   };

-/include/ "qoriq-dma-1.dtsi"
+/include/ "elo3-dma-1.dtsi"
   dma@101300 {
   fsl,iommu-parent = <>;
   fsl,liodn-reg = < 0x584>; /* DMA2LIODNR */
diff --git a/arch/powerpc/boot/dts/fsl/elo3-dma-0.dtsi 
b/arch/powerpc/boot/dts/fsl/elo3-dma-0.dtsi

Re: [PATCH] perf record: mmap output file - v2

2013-10-15 Thread David Ahern

On 10/15/13 7:52 PM, Namhyung Kim wrote:

Aha, okay.  So it mostly matters to syscall tracing, right?  For a
normal record session, it seems that the effect is not that large:


Yes, that's in the description "When recording raw_syscalls for the 
entire system"


There is a small benefit to all record sessions -- mmap+memcpy has less 
overhead than write(). I still need to look at Ingo's suggestion to use 
non-temporal stores which might reduce the overhead of the memcpy.


Try a workload that generates a HUGE data file -- say a full kernel 
build (e.g., perf record -g -- make O=/tmp/kbuild -j 16). You should see 
a much larger benefit from the mmap route. Be sure to use your callchain 
enhancements to look at that 1+G file. ;-)


David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] ACPI, CPER: Update cper info

2013-10-15 Thread Joe Perches
On Fri, 2013-10-11 at 17:47 +0200, Borislav Petkov wrote:
> On Fri, Oct 11, 2013 at 11:06:30AM +0200, Borislav Petkov wrote:
> > > - printk("%s""APEI generic hardware error status\n", pfx);
> > > + printk("%s""Generic Hardware Error Status\n", pfx);
> > 
> > Btw, what's the story with printk not using KERN_x levels in this file?
> > Why are we falling back to default printk levels for all printks here
> > and shouldn't we rather prioritize them by urgency into, say, KERN_ERR,
> > KERN_INFO, etc?
> 
> Ignore that - checkpatch complained about it but I kinda missed that
> we're handing down the prefix.

I think it'd be better to rename pfx to level
as that's what printk.h calls them.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Panic and page fault in loop during handling NMI backtrace handler

2013-10-15 Thread Liu, Chuansheng
Hello Steven,

> -Original Message-
> From: Steven Rostedt [mailto:rost...@goodmis.org]
> Sent: Wednesday, October 16, 2013 12:40 AM
> To: Liu, Chuansheng
> Cc: Ingo Molnar (mi...@kernel.org); h...@zytor.com; fweis...@gmail.com;
> a...@linux-foundation.org; paul...@linux.vnet.ibm.com; Peter Zijlstra
> (pet...@infradead.org); x...@kernel.org; 'linux-kernel@vger.kernel.org'
> (linux-kernel@vger.kernel.org); Wang, Xiaoming; Li, Zhuangzhi
> Subject: Re: Panic and page fault in loop during handling NMI backtrace 
> handler
> 
> 
> BTW, please do not send out HTML email, as that gets blocked from going
> to LKML.
Thanks your reminder, I forgot to convert it into txt email.

> 
> On Tue, 15 Oct 2013 02:01:04 +
> "Liu, Chuansheng"  wrote:
> 
> > We meet one issue that during trigger all CPU backtrace, but during in the
> NMI handler arch_trigger_all_cpu_backtrace_handler,
> > It hit the PAGE fault, then PAGE fault is in loop, at last the thread stack
> overflow, and system panic.
> >
> > Anyone can give some help? Thanks.
> >
> >
> > Panic log as below:
> > ===
> > [   15.069144] BUG: unable to handle kernel [   15.073635] paging request
> at 1649736d
> > [   15.076379] IP: [] print_context_stack+0x4a/0xa0
> > [   15.082529] *pde = 
> > [   15.085758] Thread overran stack, or stack corrupted
> > [   15.091303] Oops:  [#1] SMP
> > [   15.094932] Modules linked in: atomisp_css2400b0_v2(+) lm3554 ov2722
> imx1x5 atmel_mxt_ts vxd392 videobuf_vmalloc videobuf_core bcm_bt_lpm
> bcm43241 kct_daemon(O)
> > [   15.111093] CPU: 2 PID: 2443 Comm: Compiler Tainted: GW  O
> 3.10.1+ #1
> 
> I'm curious, what "Out-of-tree" module was loaded?
We have some un-upstream modules indeed:)

> 
> Read the rest from the bottom up, as that's how I wrote it :-)
> 
> 
> > [   15.119075] task: f213f980 ti: f0c42000 task.ti: f0c42000
> > [   15.125116] EIP: 0060:[] EFLAGS: 00210087 CPU: 2
> > [   15.131255] EIP is at print_context_stack+0x4a/0xa0
> > [   15.136712] EAX: 16497ffc EBX: 1649736d ECX: 986736d8 EDX: 1649736d
> > [   15.143722] ESI:  EDI: e000 EBP: f0c4220c ESP: f0c421ec
> > [   15.150732]  DS: 007b ES: 007b FS: 00d8 GS: 003b SS: 0068
> > [   15.156771] CR0: 80050033 CR2: 1649736d CR3: 31245000 CR4:
> 001007d0
> > [   15.163781] DR0:  DR1:  DR2:  DR3:
> 
> > [   15.170789] DR6: 0ff0 DR7: 0400
> > [   15.175076] Stack:
> > [   15.177324]  16497ffc 16496000 986736d8 e000 986736d8 1649736d
> c282c148 16496000
> > [   15.186067]  f0c4223c c20033b0 c282c148 c29ceecf  f0c4222c
> 986736d8 f0c4222c
> > [   15.194810]   c29ceecf   f0c42260
> c20041a7 f0c4229c c282c148
> > [   15.203549] Call Trace:
> > [   15.206295]  [] dump_trace+0x70/0xf0
> > [   15.211274]  [] show_trace_log_lvl+0x47/0x60
> > [   15.217028]  [] show_stack_log_lvl+0x52/0xd0
> > [   15.222782]  [] show_stack+0x21/0x50
> > [   15.227762]  [] dump_stack+0x16/0x18
> > [   15.232742]  [] warn_slowpath_common+0x5f/0x80
> > [   15.238693]  [] ? vmalloc_fault+0x5a/0xcf
> > [   15.244156]  [] ? vmalloc_fault+0x5a/0xcf
> > [   15.249621]  [] ? __do_page_fault+0x4a0/0x4a0
> > [   15.255472]  [] warn_slowpath_null+0x1d/0x20
> > [   15.261228]  [] vmalloc_fault+0x5a/0xcf
> > [   15.266497]  [] __do_page_fault+0x2cf/0x4a0
> > [   15.272154]  [] ? logger_aio_write+0x230/0x230
> > [   15.278106]  [] ? console_unlock+0x314/0x440
> > ... //
> > [   16.885364]  [] ? __do_page_fault+0x4a0/0x4a0
> > [   16.891217]  [] do_page_fault+0x8/0x10
> > [   16.896387]  [] error_code+0x5a/0x60
> > [   16.901367]  [] ? __do_page_fault+0x4a0/0x4a0
> > [   16.907219]  [] ? print_modules+0x20/0x90
> > [   16.912685]  [] warn_slowpath_common+0x5a/0x80
> > [   16.918634]  [] ? vmalloc_fault+0x5a/0xcf
> > [   16.924097]  [] ? vmalloc_fault+0x5a/0xcf
> > [   16.929562]  [] ? __do_page_fault+0x4a0/0x4a0
> > [   16.935415]  [] warn_slowpath_null+0x1d/0x20
> > [   16.941169]  [] vmalloc_fault+0x5a/0xcf
> > [   16.946437]  [] __do_page_fault+0x2cf/0x4a0
> > [   16.952095]  [] ? logger_aio_write+0x230/0x230
> > [   16.958046]  [] ? console_unlock+0x314/0x440
> > [   16.963800]  [] ? sys_modify_ldt+0x2/0x160
> > [   16.969362]  [] ? __do_page_fault+0x4a0/0x4a0
> > [   16.975215]  [] do_page_fault+0x8/0x10
> > [   16.980386]  [] error_code+0x5a/0x60
> > [   16.985366]  [] ? __do_page_fault+0x4a0/0x4a0
> > [   16.991215]  [] ? print_modules+0x20/0x90
> > [   16.996673]  [] warn_slowpath_common+0x5a/0x80
> > [   17.002622]  [] ? vmalloc_fault+0x5a/0xcf
> > [   17.008086]  [] ? vmalloc_fault+0x5a/0xcf
> > [   17.013550]  [] ? __do_page_fault+0x4a0/0x4a0
> > [   17.019403]  [] warn_slowpath_null+0x1d/0x20
> > [   17.025159]  [] vmalloc_fault+0x5a/0xcf
> 
> Oh look, we are constantly warning about this same fault! There's your
> infinite loop.
Yes, it is the real WARN_ON infinite loop.

> 
> Note the WARN_ON_ONCE() does the WARN_ON() first and then updates
> __warned = true. Thus, if the 

Re: [PATCH 2/8] ACPI, CPER: Update cper info

2013-10-15 Thread Chen Gong
On Tue, Oct 15, 2013 at 11:47:23PM +0530, Naveen N. Rao wrote:
> Date: Tue, 15 Oct 2013 23:47:23 +0530
> From: "Naveen N. Rao" 
> To: "Chen, Gong" 
> Cc: tony.l...@intel.com, b...@alien8.de, linux-kernel@vger.kernel.org,
>  linux-a...@vger.kernel.org
> Subject: Re: [PATCH 2/8] ACPI, CPER: Update cper info
> User-Agent: Mutt/1.5.21 (2010-09-15)
> 
> On 2013/10/11 02:32AM, Chen Gong wrote:
> > To satisfy the necessary of following patches and make related definition
> > more clear, update some definitions about CPER. No functional changes.
> > 
> > Signed-off-by: Chen, Gong 
> > ---
> >  drivers/acpi/apei/apei-internal.h | 12 -
> >  drivers/acpi/apei/cper.c  | 46 -
> >  drivers/acpi/apei/ghes.c  | 54 
> > +++
> >  include/acpi/actbl1.h | 14 +-
> >  include/acpi/ghes.h   |  2 +-
> >  5 files changed, 64 insertions(+), 64 deletions(-)
> > 
> > diff --git a/drivers/acpi/apei/apei-internal.h 
> > b/drivers/acpi/apei/apei-internal.h
> > index f220d64..21ba34a 100644
> > --- a/drivers/acpi/apei/apei-internal.h
> > +++ b/drivers/acpi/apei/apei-internal.h
> > @@ -122,11 +122,11 @@ struct dentry;
> >  struct dentry *apei_get_debugfs_dir(void);
> >  
> >  #define apei_estatus_for_each_section(estatus, section)
> > \
> > -   for (section = (struct acpi_hest_generic_data *)(estatus + 1);  \
> > +   for (section = (struct acpi_generic_data *)(estatus + 1);   \
> 
> This is a good one to rename, though I wonder if acpi_generic_error_data
> is more appropriate?
> 
> >  (void *)section - (void *)estatus < estatus->data_length;  \
> >  section = (void *)(section+1) + section->error_data_length)
> >  
> > -static inline u32 apei_estatus_len(struct acpi_hest_generic_status 
> > *estatus)
> > +static inline u32 cper_estatus_len(struct acpi_generic_status *estatus)
> 
> Not sure I understand the rationale for these changes - we are still
> dealing with ACPI/APEI generic error status/data structures. So, why
> the cper_ prefix?
> 

Because CPER is not APEI specific, beside APEI, some others like eMCA
needs this.

> >  {
> > if (estatus->raw_data_length)
> > return estatus->raw_data_offset + \
> > @@ -135,10 +135,10 @@ static inline u32 apei_estatus_len(struct 
> > acpi_hest_generic_status *estatus)
> > return sizeof(*estatus) + estatus->data_length;
> >  }
> >  
> > -void apei_estatus_print(const char *pfx,
> > -   const struct acpi_hest_generic_status *estatus);
> > -int apei_estatus_check_header(const struct acpi_hest_generic_status 
> > *estatus);
> > -int apei_estatus_check(const struct acpi_hest_generic_status *estatus);
> > +void cper_estatus_print(const char *pfx,
> > +   const struct acpi_generic_status *estatus);
> > +int cper_estatus_check_header(const struct acpi_generic_status *estatus);
> > +int cper_estatus_check(const struct acpi_generic_status *estatus);
> 
> Same here. All the above functions work on ACPI structures...
> 
> >  /* Values for block_status flags above */
> >  
> > -#define ACPI_HEST_UNCORRECTABLE (1)
> > -#define ACPI_HEST_CORRECTABLE   (1<<1)
> > -#define ACPI_HEST_MULTIPLE_UNCORRECTABLE(1<<2)
> > -#define ACPI_HEST_MULTIPLE_CORRECTABLE  (1<<3)
> > -#define ACPI_HEST_ERROR_ENTRY_COUNT (0xFF<<4)  /* 8 bits, 
> > error count */
> > +#define ACPI_GEN_ERR_UC(1)
> > +#define ACPI_GEN_ERR_CE(1<<1)
> > +#define ACPI_GEN_ERR_MULTI_UC  (1<<2)
> > +#define ACPI_GEN_ERR_MULTI_CE  (1<<3)
> > +#define ACPI_GEN_ERR_COUNT_SHIFT   (0xFF<<4) /* 8 bits, error count */
> 
> I'd prefer ACPI_GENERIC_ERR_ since ACPI_GEN_ERR sounds far too much like
> ACPI "Generated" :)
> 
> 
> Thanks,
> Naveen
> 


signature.asc
Description: Digital signature


Re: [PATCH] perf record: mmap output file - v2

2013-10-15 Thread Namhyung Kim
On Tue, 15 Oct 2013 07:35:53 -0600, David Ahern wrote:
> On 10/15/13 1:31 AM, Namhyung Kim wrote:
>> Hi David,
>>
>> On Mon, 14 Oct 2013 20:55:31 -0600, David Ahern wrote:
>>> When recording raw_syscalls for the entire system, e.g.,
>>>  perf record -e raw_syscalls:*,sched:sched_switch -a -- sleep 1
>>>
>>> you end up with a negative feedback loop as perf itself calls
>>> write() fairly often. This patch handles the problem by mmap'ing the
>>> file in chunks of 64M at a time and copies events from the event buffers
>>> to the file avoiding write system calls.
>>>
>>> Before (with write syscall):
>>>
>>> perf record -o /tmp/perf.data -e raw_syscalls:*,sched:sched_switch -a -- 
>>> sleep 1
>>> [ perf record: Woken up 0 times to write data ]
>>> [ perf record: Captured and wrote 81.843 MB /tmp/perf.data (~3575786 
>>> samples) ]
>>>
>>> After (using mmap):
>>>
>>> perf record -o /tmp/perf.data -e raw_syscalls:*,sched:sched_switch -a -- 
>>> sleep 1
>>> [ perf record: Woken up 31 times to write data ]
>>> [ perf record: Captured and wrote 8.203 MB /tmp/perf.data (~358388 samples) 
>>> ]
>>
>> Why do they have that different size?
>
> perf calls write() for each mmap, each time through the loop. Each
> write generates 2 events (syscall entry + exit) -- ie., generates more
> events. That's the negative feedback loop.

Aha, okay.  So it mostly matters to syscall tracing, right?  For a
normal record session, it seems that the effect is not that large:

Before:

  $ perf stat -r3 --null --sync -- perf record -a -- sleep 5
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.467 MB perf.data (~20420 samples) ]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.544 MB perf.data (~23750 samples) ]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.482 MB perf.data (~21073 samples) ]
  
   Performance counter stats for 'perf record -a -- sleep 5' (3 runs):
  
 5.174476094 seconds time elapsed   
   ( +-  0.07% )

  $ perf record -- perf bench sched pipe
  # Running sched/pipe benchmark...
  # Executed 100 pipe operations between two processes
  
   Total time: 21.271 [sec]
  
21.271357 usecs/op
47011 ops/sec
  [ perf record: Woken up 21 times to write data ]
  [ perf record: Captured and wrote 5.643 MB perf.data (~246524 samples) ]


After:

  $ perf stat -r3 --null --sync -- perf record -a -- sleep 5
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.459 MB perf.data (~20055 samples) ]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.463 MB perf.data (~20230 samples) ]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.467 MB perf.data (~20401 samples) ]
  
   Performance counter stats for 'perf record -a -- sleep 5' (3 runs):
  
 5.085910919 seconds time elapsed   
   ( +-  0.06% )

  $ perf record -- perf bench sched pipe
  # Running sched/pipe benchmark...
  # Executed 100 pipe operations between two processes
  
   Total time: 21.175 [sec]
  
21.175406 usecs/op
47224 ops/sec
  [ perf record: Woken up 21 times to write data ]
  [ perf record: Captured and wrote 5.612 MB perf.data (~245206 samples) ]


Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/8] ACPI, APEI, CPER: Add UEFI 2.4 support for memory error

2013-10-15 Thread Chen Gong
On Tue, Oct 15, 2013 at 10:56:25PM +0530, Naveen N. Rao wrote:
> Date: Tue, 15 Oct 2013 22:56:25 +0530
> From: "Naveen N. Rao" 
> To: "Chen, Gong" 
> Cc: tony.l...@intel.com, b...@alien8.de, linux-kernel@vger.kernel.org,
>  linux-a...@vger.kernel.org
> Subject: Re: [PATCH 5/8] ACPI, APEI, CPER: Add UEFI 2.4 support for memory
>  error
> User-Agent: Mutt/1.5.21 (2010-09-15)
> 
> On 2013/10/11 02:32AM, Chen Gong wrote:
> > In latest UEFI spec(by now it is 2.4) memory error definition
> > for CPER (UEFI 2.4 Appendix N Common Platform Error Record)
> > adds some new fields. These fields help people to locate
> > memory error on actual DIMM location.
> > 
> > Original-author: Tony Luck 
> > Signed-off-by: Chen, Gong 
> > ---
> >  drivers/acpi/apei/cper.c | 3 ++-
> >  include/linux/cper.h | 7 +++
> >  2 files changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/acpi/apei/cper.c b/drivers/acpi/apei/cper.c
> > index b2e4134..680230c 100644
> > --- a/drivers/acpi/apei/cper.c
> > +++ b/drivers/acpi/apei/cper.c
> > @@ -8,7 +8,7 @@
> >   * various tables, such as ERST, BERT and HEST etc.
> >   *
> >   * For more information about CPER, please refer to Appendix N of UEFI
> > - * Specification version 2.3.
> > + * Specification version 2.4.
> >   *
> >   * This program is free software; you can redistribute it and/or
> >   * modify it under the terms of the GNU General Public License version
> > @@ -191,6 +191,7 @@ static const char *cper_mem_err_type_strs[] = {
> > "memory sparing",
> > "scrub corrected error",
> > "scrub uncorrected error",
> > +   "Physical Memory Map-out event",
> 
> All small letters to match the rest of the items:
> "physical memory map-out event"
> 

sure, of course.

> >  };
> >  
> >  static void cper_print_mem(const char *pfx, const struct cper_sec_mem_err 
> > *mem)
> > diff --git a/include/linux/cper.h b/include/linux/cper.h
> > index c230494..bd01c9a 100644
> > --- a/include/linux/cper.h
> > +++ b/include/linux/cper.h
> > @@ -232,6 +232,9 @@ enum {
> >  #define CPER_MEM_VALID_RESPONDER_ID0x1000
> >  #define CPER_MEM_VALID_TARGET_ID   0x2000
> >  #define CPER_MEM_VALID_ERROR_TYPE  0x4000
> > +#define CPER_MEM_VALID_RANK_NUMBER 0x8000
> > +#define CPER_MEM_VALID_CARD_HANDLE 0x1
> > +#define CPER_MEM_VALID_MODULE_HANDLE   0x2
> >  
> >  #define CPER_PCIE_VALID_PORT_TYPE  0x0001
> >  #define CPER_PCIE_VALID_VERSION0x0002
> > @@ -347,6 +350,10 @@ struct cper_sec_mem_err {
> > __u64   responder_id;
> > __u64   target_id;
> > __u8error_type;
> > +   __u8reserved;
> > +   __u16   rank;
> > +   __u16   mem_array_handle;
> > +   __u16   mem_dev_handle;
> 
> Nit: could you name those fields similar to what the spec has:
> card_handle and module_handle, with perhaps a comment to indicate
> relationship to SMBIOS type 16/17 tables?
> 
> 
On the contrary, what I'm thinking is reserve these names but
adding comments for what it is in the spec. I consider a
reasonable name is more meaningful than just following the
spec strictly.

> Regards,
> Naveen
> 
> >  };
> >  
> >  struct cper_sec_pcie {
> > -- 
> > 1.8.4.rc3
> > 
> 


signature.asc
Description: Digital signature


[PATCH v2] pwm: add ep93xx PWM support

2013-10-15 Thread H Hartley Sweeten
Remove the non-standard EP93xx PWM driver in drivers/misc and add
a new driver for the PWM controllers on the EP93xx platform based
on the PWM framework.

These PWM controllers each support 1 PWM channel with programmable
duty cycle, frequency, and polarity inversion.

Signed-off-by: H Hartley Sweeten 
Cc: Ryan Mallon 
Cc: Thierry Reding 
Cc: Arnd Bergmann 
Cc: Greg Kroah-Hartman 
---
v2: address issues pointed out by Thierry Reding.

 drivers/misc/Kconfig  |  13 ---
 drivers/misc/Makefile |   1 -
 drivers/misc/ep93xx_pwm.c | 286 --
 drivers/pwm/Kconfig   |   9 ++
 drivers/pwm/Makefile  |   1 +
 drivers/pwm/pwm-ep93xx.c  | 230 +
 6 files changed, 240 insertions(+), 300 deletions(-)
 delete mode 100644 drivers/misc/ep93xx_pwm.c
 create mode 100644 drivers/pwm/pwm-ep93xx.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 8dacd4c..c43c66a 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -381,19 +381,6 @@ config HMC6352
  This driver provides support for the Honeywell HMC6352 compass,
  providing configuration and heading data via sysfs.
 
-config EP93XX_PWM
-   tristate "EP93xx PWM support"
-   depends on ARCH_EP93XX
-   help
- This option enables device driver support for the PWM channels
- on the Cirrus EP93xx processors.  The EP9307 chip only has one
- PWM channel all the others have two, the second channel is an
- alternate function of the EGPIO14 pin.  A sysfs interface is
- provided to control the PWM channels.
-
- To compile this driver as a module, choose M here: the module will
- be called ep93xx_pwm.
-
 config DS1682
tristate "Dallas DS1682 Total Elapsed Time Recorder with Alarm"
depends on I2C
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index c235d5b..ecccd00 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -33,7 +33,6 @@ obj-$(CONFIG_APDS9802ALS) += apds9802als.o
 obj-$(CONFIG_ISL29003) += isl29003.o
 obj-$(CONFIG_ISL29020) += isl29020.o
 obj-$(CONFIG_SENSORS_TSL2550)  += tsl2550.o
-obj-$(CONFIG_EP93XX_PWM)   += ep93xx_pwm.o
 obj-$(CONFIG_DS1682)   += ds1682.o
 obj-$(CONFIG_TI_DAC7512)   += ti_dac7512.o
 obj-$(CONFIG_C2PORT)   += c2port/
diff --git a/drivers/misc/ep93xx_pwm.c b/drivers/misc/ep93xx_pwm.c
deleted file mode 100644
index cdb67a9..000
--- a/drivers/misc/ep93xx_pwm.c
+++ /dev/null
@@ -1,286 +0,0 @@
-/*
- *  Simple PWM driver for EP93XX
- *
- * (c) Copyright 2009  Matthieu Crapet 
- * (c) Copyright 2009  H Hartley Sweeten 
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- *
- *  EP9307 has only one channel:
- *- PWMOUT
- *
- *  EP9301/02/12/15 have two channels:
- *- PWMOUT
- *- PWMOUT1 (alternate function for EGPIO14)
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-
-#define EP93XX_PWMx_TERM_COUNT 0x00
-#define EP93XX_PWMx_DUTY_CYCLE 0x04
-#define EP93XX_PWMx_ENABLE 0x08
-#define EP93XX_PWMx_INVERT 0x0C
-
-#define EP93XX_PWM_MAX_COUNT   0x
-
-struct ep93xx_pwm {
-   void __iomem*mmio_base;
-   struct clk  *clk;
-   u32 duty_percent;
-};
-
-/*
- * /sys/devices/platform/ep93xx-pwm.N
- *   /min_freq  read-only   minimum pwm output frequency
- *   /max_req   read-only   maximum pwm output frequency
- *   /freq  read-write  pwm output frequency (0 = disable output)
- *   /duty_percent  read-write  pwm duty cycle percent (1..99)
- *   /invertread-write  invert pwm output
- */
-
-static ssize_t ep93xx_pwm_get_min_freq(struct device *dev,
-   struct device_attribute *attr, char *buf)
-{
-   struct platform_device *pdev = to_platform_device(dev);
-   struct ep93xx_pwm *pwm = platform_get_drvdata(pdev);
-   unsigned long rate = clk_get_rate(pwm->clk);
-
-   return sprintf(buf, "%ld\n", rate / (EP93XX_PWM_MAX_COUNT + 1));
-}
-
-static ssize_t ep93xx_pwm_get_max_freq(struct device *dev,
-   struct device_attribute *attr, char *buf)
-{
-   struct platform_device *pdev = to_platform_device(dev);
-   struct ep93xx_pwm *pwm = platform_get_drvdata(pdev);
-   unsigned long rate = clk_get_rate(pwm->clk);
-
-   return sprintf(buf, "%ld\n", rate / 2);
-}
-
-static ssize_t ep93xx_pwm_get_freq(struct device *dev,
-   struct device_attribute *attr, char *buf)
-{
-   struct platform_device *pdev = to_platform_device(dev);
-   struct ep93xx_pwm *pwm = platform_get_drvdata(pdev);
-
-   if (readl(pwm->mmio_base + EP93XX_PWMx_ENABLE) & 0x1) {
-   unsigned long rate = 

[Announce] sg3_utils-1.37 available

2013-10-15 Thread Douglas Gilbert

sg3_utils is a package of command line utilities for sending
SCSI and some ATA commands to devices. This package targets
the Linux 3, 2.6 and 2.4 kernel series. It also has ports to
FreeBSD, Tru64, Solaris, and Windows (cygwin and MinGW).

This version contains many fixes, some code cleanup and some
small extensions. This version tracks various changes made
by www.t10.org since May 2013.

For an overview of sg3_utils and downloads see this page:
http://sg.danny.cz/sg/sg3_utils.html
The sg_ses utility (for enclosure devices) is discussed at:
http://sg.danny.cz/sg/sg_ses.html
The SG_IO ioctl is discussed at:
http://sg.danny.cz/sg/sg_io.html
A full changelog can be found at:
http://sg.danny.cz/sg/p/sg3_utils.ChangeLog

A release announcement will be sent to freecode.com .

Changelog for sg3_utils-1.37 [20131014] [svn: r522]
  - sg_compare_and_write: fix wrprotect setting
- add --quiet option to suppress miscompare report
- merge features from another implementation
  - sg_inq: fix referrals VPD page
- dev_id VPD: T10 vendor id designator clean up
  - sg_logs: improve for tape drives, general cleanup
  - sg_persist: fix core dump on -Q option
  - sg_unmap: fix core dump on -g option
  - sg_vpd: dev_id VPD: T10 vendor id designator clean up
- cleanup up dev_id NAA-3: locally assigned
  - sg_ses: add --nickname and --nickid options
- eiioe added to additional element status page (ses3r6)
- multiple --filter options to prune output
  - sg_verify: improve miscompare handling
- rename --btychk=ndo option to --ndo=ndo (hide former)
- add --quiet option
  - sg_xcopy: allow sg and bsg devices
- fix for bpt going negative
- limit each XCOPY(LID1) command to 65535 blocks
- fix for seek in multi-segment copies
  - sg_sanitize: skip 15 second safety delay with --fail
  - sg_libs: extended copy opcode renamed (spc4r34)
- sg_ll_receive_copy_results(): expand for all sa_s
- add sg_get_sense_key()
- add sg_ll_3party_copy_out()
- add dStrHexErr(): ascii hex to stderr
- add dStrHexStr(): ascii hex to string
- add SG_LIB_CAT_MISCOMPARE to categories
- clean header files
  - sg_pt_freebsd: sanity check on sense_resid; fix leaks
  - scripts/rescan-scsi-bus.sh KG's v1.57 + HR patch
- improve wlun handling, detect updated and resized
  devices, better multipath support
  - Makefile.am cleanup
  - examples: add sg_tst_excl and sg_tst_excl2

Changelog for sg3_utils-1.36 [20130531] [svn: r497]
...

Doug Gilbert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT] Apparmor bugfixes for 3.12

2013-10-15 Thread James Morris
A couple more regressions fixed, please pull.

The following changes since commit 34ec4de42be5006abdd8d0c08b306ffaa64d0d5d:

  Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux 
(2013-10-15 17:14:13 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git 
for-linus

John Johansen (2):
  apparmor: fix memleak of the profile hash
  apparmor: fix bad lock balance when introspecting policy

 security/apparmor/apparmorfs.c |4 +---
 security/apparmor/policy.c |1 +
 2 files changed, 2 insertions(+), 3 deletions(-)

---

commit ed2c7da3a40c58410508fe24e12d03e508d7ec01
Author: John Johansen 
Date:   Mon Oct 14 11:46:27 2013 -0700

apparmor: fix bad lock balance when introspecting policy

BugLink: http://bugs.launchpad.net/bugs/1235977

The profile introspection seq file has a locking bug when policy is viewed
from a virtual root (task in a policy namespace), introspection from the
real root is not affected.

The test for root
while (parent) {
is correct for the real root, but incorrect for tasks in a policy namespace.
This allows the task to walk backup the policy tree past its virtual root
causing it to be unlocked before the virtual root should be in the p_stop
fn.

This results in the following lockdep back trace:
[   78.479744] [ BUG: bad unlock balance detected! ]
[   78.479792] 3.11.0-11-generic #17 Not tainted
[   78.479838] -
[   78.479885] grep/2223 is trying to release lock (>lock) at:
[   78.479952] [] mutex_unlock+0xe/0x10
[   78.480002] but there are no more locks to release!
[   78.480037]
[   78.480037] other info that might help us debug this:
[   78.480037] 1 lock held by grep/2223:
[   78.480037]  #0:  (>lock){+.+.+.}, at: [] 
seq_read+0x3d/0x3d0
[   78.480037]
[   78.480037] stack backtrace:
[   78.480037] CPU: 0 PID: 2223 Comm: grep Not tainted 3.11.0-11-generic #17
[   78.480037] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   78.480037]  817bf3be 880007763d60 817b97ef 
8800189d2190
[   78.480037]  880007763d88 810e1c6e 88001f044730 
8800189d2190
[   78.480037]  817bf3be 880007763e00 810e5bd6 
000724fe56b7
[   78.480037] Call Trace:
[   78.480037]  [] ? mutex_unlock+0xe/0x10
[   78.480037]  [] dump_stack+0x54/0x74
[   78.480037]  [] print_unlock_imbalance_bug+0xee/0x100
[   78.480037]  [] ? mutex_unlock+0xe/0x10
[   78.480037]  [] lock_release_non_nested+0x226/0x300
[   78.480037]  [] ? __mutex_unlock_slowpath+0xce/0x180
[   78.480037]  [] ? mutex_unlock+0xe/0x10
[   78.480037]  [] lock_release+0xac/0x310
[   78.480037]  [] __mutex_unlock_slowpath+0x83/0x180
[   78.480037]  [] mutex_unlock+0xe/0x10
[   78.480037]  [] p_stop+0x51/0x90
[   78.480037]  [] seq_read+0x288/0x3d0
[   78.480037]  [] vfs_read+0x9e/0x170
[   78.480037]  [] SyS_read+0x4c/0xa0
[   78.480037]  [] system_call_fastpath+0x1a/0x1f

Signed-off-by: John Johansen 
Signed-off-by: James Morris 

diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 95c2b26..7db9954 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -580,15 +580,13 @@ static struct aa_namespace *__next_namespace(struct 
aa_namespace *root,
 
/* check if the next ns is a sibling, parent, gp, .. */
parent = ns->parent;
-   while (parent) {
+   while (ns != root) {
mutex_unlock(>lock);
next = list_entry_next(ns, base.list);
if (!list_entry_is_head(next, >sub_ns, base.list)) {
mutex_lock(>lock);
return next;
}
-   if (parent == root)
-   return NULL;
ns = parent;
parent = parent->parent;
}

commit 5cb3e91ebd0405519795f243adbfc4ed2a6fe53f
Author: John Johansen 
Date:   Mon Oct 14 11:44:34 2013 -0700

apparmor: fix memleak of the profile hash

BugLink: http://bugs.launchpad.net/bugs/1235523

This fixes the following kmemleak trace:
unreferenced object 0x8801e8c35680 (size 32):
  comm "apparmor_parser", pid 691, jiffies 4294895667 (age 13230.876s)
  hex dump (first 32 bytes):
e0 d3 4e b5 ac 6d f4 ed 3f cb ee 48 1c fd 40 cf  ..N..m..?..H..@.
5b cc e9 93 00 00 00 00 00 00 00 00 00 00 00 00  [...
  backtrace:
[] kmemleak_alloc+0x4e/0xb0
[] __kmalloc+0x103/0x290
[] aa_calc_profile_hash+0x6c/0x150
[] aa_unpack+0x39d/0xd50
[] aa_replace_profiles+0x3d/0xd80
[] profile_replace+0x37/0x50
[] vfs_write+0xbd/0x1e0
[] SyS_write+0x4c/0xa0
[] 

Re: [PATCH v2 2/2] x86, apic: Disable BSP if boot cpu is AP

2013-10-15 Thread HATAYAMA Daisuke

(2013/10/16 4:30), Vivek Goyal wrote:

On Tue, Oct 15, 2013 at 02:43:27PM +0900, HATAYAMA Daisuke wrote:

Currently, on x86 architecture, if crash happens on AP in the kdump
1st kernel, the 2nd kernel fails to wake up multiple CPUs. The typical
behaviour we actually see is immediate system reset or hang.

This comes from the hardware specification that the processor with BSP
flag is jumped at BIOS init code when receiving INIT; the behaviour we
then see depends on the init code.

This never happens if we use only one cpu in the 2nd kernel. So, we
have avoided the issue by the workaround that specifying maxcpus=1 or
nr_cpus=1 in kernel parameter of the 2nd kernel.

In order to address the issue, this patch disables BSP if boot cpu is
an AP, and thus we don't try to wake up the BSP by sending INIT.

Before this idea we discussed the following two ideas but we cannot
adopt them in each reasons:

1. Switch CPU from AP to BSP via IPI NMI at crash in the 1st kernel

This is done in the kdump crash path where logic is in inconsistent
state. Any part of memory can be corrupted, including
hardware-related table being accessed for example when paging is
performed or interruption is performed.

2. Unset BSP flag of the boot cpu in the 1st kernel

Unsetting BSP flag can affect some real world firmware badly. For
example, Ma verified that some HP systems fail to reboot under this
configuration. See:

http://lkml.indiana.edu/hypermail/linux/kernel/1308.1/03574.html

Due to the idea 1, we have to address the issue in the 2nd kernel on
AP. Then, it's impossible to know which CPU is BSP by rdmsr
instruction because the CPU is the one we are now trying to wake
up. From the same reason, it's also impossible to unset BSP flag of
the BSP by wrmsr instruction.

Next, due to the idea 2, BSP is halting in the 1st kernel while
keeping BSP flag set (or possibly could be running somewhere in
catastrophic state.) In generall, CPUs except for the boot cpu in the
2nd kernel -- the cpu under which crash happened --- can be thought of
as remaining in any inconsistent state in the 1st kernel. For APs,
it's possible to recover sane state by initiating INIT to them; see
3.7.3 Processor-specific INIT in MultiProcessor
specification. However, there's no way for BSP. Therefore, there's no
other way to disable BSP.

My motivation is to generate crash dump quickly on the system with
huge memory. We can assume such system also has a lot of N-cpus and
(N-1)-cpus are still available.

To identify which CPU is BSP, we lookup ACPI table or MP table. One
concern is that ACPI guidlines BIOS *should* list the BSP in the first
MADT LAPIC entry; not *must*. In this sense, this logic relis on BIOS
following ACPI's guideline. On the other hand, we don't need to worry
about this in MP table case because it has explit BSP flag.

To avoid any undesirable bahaviour caused by any broken BIOS that
doesn't conform to the guideline, it's enough to limit the number of
cpus to 1 by specifying maxcpu=1 or nr_cpus=1, as is currently done in
default kdump configuration. (But of course, it's problematic in
maxcpu=1 case if trying to wake up other cpus later in user space.)

SFI and devicetree doesn't provide BSP information, so there's no
functionality change in their codes, only assigning false for all the
entries, keeping interface uniform.


Hi Hatayama,

So we rely on ACPI reporting BSP properly. And SFI and device tree does
not provide BSP info. So for those cases situations where BSP is not
reported, situaiton is little dicy. We might try to bring up those cpus
and bring down the system.



Yes, I intend that. If there's no BIOS facility reporting BSP information
in the system, max_cpus=1 or nrcpus=1 should be specified just as so far.


I am wondering if there is any attribute of cpu which we can pass to
second kernel on command line. And tell second kernel not to bring up
that specific cpu. (Say exclude_cpu=)? If this works, then
if ACPI or other mechanism don't report BSP, we could possibly assume
that cpu 0 is BSP and ask second kernel to not try to boot it.



I've come up with similar idea. If there's such kernel option, rest of
the processing can be implemented in user-land, i.e., get apicid of
BSP from /proc/cpuid and set it in kernel command line of 2nd kernel.
What kexec-tools should do on fedora/RHEL? Also, this idea covers SFI
and device tree.

The reason why I didn't choose such idea was first passing the value
via command-line seems rather ad-hoc. The second reason is that in any
case it's compromised design. Rigorously, we cannot get correct mapping
of apicid to {BSP, APIC} at the 1st kernel. That is, there's a class of
the bugs that affect BSP flag of each processor. For example, on
catastrophic state, all the cpus can have BSP flag on the 2nd kernel due
to wrmsr instructions generated by the bug causing crash. In this sense,
current implementation is less reliable than max_cpus=1 case.

If addressing this 

Re: [PATCH] perf record: mmap output file - v2

2013-10-15 Thread Namhyung Kim
Hi David,

On Tue, 15 Oct 2013 07:25:04 -0600, David Ahern wrote:
> On 10/15/13 1:09 AM, Namhyung Kim wrote:
>>> The stat() seems superfluous, here in __cmd_record() we've just checked
>>> the output_name and made sure it exists. Can that stat() call ever fail?
>>
>> AFAICS it's needed to check current file size.  But I think it's better
>> to use fstat().
>
> Sure fstat could be used over stat -- if it ends up staying.
>
>
>>>
>>> 3)
>>>
>>> The rec->bytes_at_mmap_start field feels a bit weird. If I read the code
>>> correctly, in every 'perf record' invocation, rec->bytes_written starts at
>>> 0 - i.e. we don't have repeat invocations of cmd_record().
>>
>> rec->bytes_written is updated when it writes to the output file for
>> synthesizing COMM/MMAP events (this mmap output is not used at that time).
>
> Ingo: I went through a number of itereations before using the
> bytes_at_mmap_start. One of those was to use the bytes_written
> counter. All failed. Header + synthesized events are written to the
> file before we start farming the ring buffers.
>
> Perhaps a good code cleanup will help figure out why. I needed the
> functionality ASAP for use with perf-trace -a so I stuck with the new
> variable. Since this change is working out well, I will look at a code
> clean up on the next round.

session->header.data_offset ?

>
> I am traveling to LinuxCon / KVM Forum / Tracing Forum on
> Friday. Perhaps the clean up and followup patch can be done on the
> long plane ride; more likely when I return which means 3.14 material.

See you there :)

>
>> Actually I worried about the mmap offset not being aligned to page
>> size.  But it seems that's not a problem.
>
> This code snippet makes sure the mmap offset is a multiple of 64M
> (rec->mmap_size). offset is the argument to mmap; mmap_offset is the
> where we are within the mmap for the next copy:
>
> + offset = rec->bytes_at_mmap_start + rec->bytes_written;
> + if (offset < (ssize_t) rec->mmap_size) {
> + rec->mmap_offset = offset;
> + offset = 0;
> + } else
> + rec->mmap_offset = 0;
>

Oh, I overlooked this code, it actually aligned offset.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] pwm: add ep93xx PWM support

2013-10-15 Thread Hartley Sweeten
On Tuesday, October 15, 2013 3:40 AM, Thierry Reding wrote:
> On Mon, Oct 14, 2013 at 02:57:48PM -0700, H Hartley Sweeten wrote:
>> Remove the non-standard EP93xx pwm driver in drivers/misc and add
>
> pwm -> PWM

OK

>> a new driver for the PWM chips on the EP93xx platforms based on the
>> PWM framework.
>> 
>> These PWM chips each support 1 PWM channel with programmable duty
>
> Perhaps "chips" -> "controllers"?

OK

>> cycle, frequency, and polarity inversion.
>> 
>> Signed-off-by: H Hartley Sweeten 
>> Cc: Ryan Mallon 
>> Cc: Thierry Reding 
>> Cc: Arnd Bergmann 
>> Cc: Greg Kroah-Hartman 
>> 
>> diff --git a/drivers/misc/ep93xx_pwm.c b/drivers/misc/ep93xx_pwm.c
> [...]
>> - *  (c) Copyright 2009  Matthieu Crapet 
>> - *  (c) Copyright 2009  H Hartley Sweeten 
> [...]
>> -MODULE_AUTHOR("Matthieu Crapet , "
>> -  "H Hartley Sweeten ");
> [...]
>
>> diff --git a/drivers/pwm/pwm-ep93xx.c b/drivers/pwm/pwm-ep93xx.c
> [...]
>> + * Copyright (C) 2013 H Hartley Sweeten 
> [...]
>> +MODULE_AUTHOR("H Hartley Sweeten ");
>
> Why are you removing Matthieu from the list of authors and copyright
> here? From a brief look it seems like this new driver is still based on
> code from the old driver and not a complete rewrite.

My bad. It is based on the misc driver but I forgot to put Matthieu in as
one of the original authors when I wrote it.

I'll fix that.

>> +#include   /* for ep93xx_pwm_{acquire,release}_gpio() */
>
> I'm not sure how well that will play together with multiplatform support
> but perhaps that's not an issue for ep93xx?

For multiplatform it would probably be a problem. But I don't think anyone
would be including ep93xx in a multiplatform kernel. If the problem comes up
I'll figure out some way to deal with it, probably with a pinctrl driver.

>> +static int ep93xx_pwm_request(struct pwm_chip *chip, struct pwm_device *pwm)
>> +{
>> +struct platform_device *pdev = to_platform_device(chip->dev);
>> +
>> +return ep93xx_pwm_acquire_gpio(pdev);
>> +}
>> +
>> +static void ep93xx_pwm_free(struct pwm_chip *chip, struct pwm_device *pwm)
>> +{
>> +struct platform_device *pdev = to_platform_device(chip->dev);
>> +
>> +ep93xx_pwm_release_gpio(pdev);
>> +}
>
> This looks like it would belong in the domain of pinctrl, but I suspect
> that ep93xx doesn't support that.

It should be but I have not worked out how to support EP93xx GPIOs with a
pinctrl driver yet. The GPIOs are pretty limited on this platform compared to
the other pinctrl users.

>> +static int ep93xx_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
>> + int duty_ns, int period_ns)
>> +{
>> +struct ep93xx_pwm *ep93xx_pwm = to_ep93xx_pwm(chip);
>> +void __iomem *base = ep93xx_pwm->base;
>> +unsigned long long c;
>> +unsigned long period_cycles;
>> +unsigned long duty_cycles;
>> +unsigned long term;
>> +int ret = 0;
>> +
>> +/*
>> + * The clock needs to be enabled to access the PWM registers.
>> + * Configuration can be changed at any time.
>> + */
>> +if (!test_bit(PWMF_ENABLED, >flags))
>> +clk_enable(ep93xx_pwm->clk);
>
> clk_enable() can fail, so you should check the return value and
> propagate errors.

I overlooked that. This will be fixed in the next version.

>> +static int ep93xx_pwm_polarity(struct pwm_chip *chip, struct pwm_device 
>> *pwm,
>> +   enum pwm_polarity polarity)
>> +{
>> +struct ep93xx_pwm *ep93xx_pwm = to_ep93xx_pwm(chip);
>> +
>> +/*
>> + * The clock needs to be enabled to access the PWM registers.
>> + * Polarity can only be changed when the PWM is disabled.
>> +  */
>
> Nit: the closing */ is wrongly aligned.

OK

>> +clk_enable(ep93xx_pwm->clk);
>
> Needs a check of the return value.

OK

>> +writew(polarity, ep93xx_pwm->base + EP93XX_PWMx_INVERT);
>
> I'd prefer if this did some explicit conversion from the PWM framework
> value to the driver-specific value, even if they happen to be the same
> in this case.

OK

>> +static int ep93xx_pwm_enable(struct pwm_chip *chip, struct pwm_device *pwm)
>> +{
>> +struct ep93xx_pwm *ep93xx_pwm = to_ep93xx_pwm(chip);
>> +
>> +clk_enable(ep93xx_pwm->clk);
>
> Also needs to check the return value.

OK

>> +static struct pwm_ops ep93xx_pwm_ops = {
>
> static const, please.

OK

>> +static int ep93xx_pwm_remove(struct platform_device *pdev)
>> +{
>> +struct ep93xx_pwm *ep93xx_pwm;
>> +
>> +ep93xx_pwm = platform_get_drvdata(pdev);
>> +if (!ep93xx_pwm)
>> +return -ENODEV;
>
> No need for this check. It will never happen.

OK

>> +
>> +return pwmchip_remove(_pwm->chip);
>> +}
>> +
>> +static struct platform_driver ep93xx_pwm_driver = {
>> +.driver = {
>> +.name   = "ep93xx-pwm",
>> +.owner  = THIS_MODULE,
>
> This is no longer required because the core sets it to the proper value.

OK

>> +},
>> +.probe  = ep93xx_pwm_probe,
>> +.remove 

[PATCH] ext4: fix performance regression in ext4_writepages

2013-10-15 Thread Ming Lei
Commit 4e7ea81db5(ext4: restructure writeback path) introduces
another performance regression on random write:

- one more page may be added to ext4 extent in mpage_prepare_extent_to_map,
  and will be submitted for I/O so nr_to_write will become -1 before 'done'
  is set

- the worse thing is that dirty pages may still be retrieved from page
  cache after nr_to_write becomes negative, so lots of small chunks can be
  submitted to block device when page writeback is catching up with write
  path, and performance is hurted.

On one arm A15 board with sata 3.0 SSD(CPU: 1.5GHz dura core, RAM: 2GB,
SATA controller: 3.0Gbps), this patch can improve below test's result
from 157MB/sec to 174MB/sec(>10%):

dd if=/dev/zero of=./z.img bs=8K count=512K

The above test is actually prototype of block write in bonnie++ utility.

This patch makes sure no more pages than nr_to_write can be added to extent
for mapping, so that nr_to_write won't become negative.

Cc: Ted Tso 
Cc: linux-e...@vger.kernel.org
Cc: "linux-fsde...@vger.kernel.org" 
Acked-by: Jan Kara 
Signed-off-by: Ming Lei 
---
 fs/ext4/inode.c |   26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 32c04ab..32beaa4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2295,6 +2295,7 @@ static int mpage_prepare_extent_to_map(struct 
mpage_da_data *mpd)
struct address_space *mapping = mpd->inode->i_mapping;
struct pagevec pvec;
unsigned int nr_pages;
+   long left = mpd->wbc->nr_to_write;
pgoff_t index = mpd->first_page;
pgoff_t end = mpd->last_page;
int tag;
@@ -2330,6 +2331,17 @@ static int mpage_prepare_extent_to_map(struct 
mpage_da_data *mpd)
if (page->index > end)
goto out;
 
+   /*
+* Accumulated enough dirty pages? This doesn't apply
+* to WB_SYNC_ALL mode. For integrity sync we have to
+* keep going because someone may be concurrently
+* dirtying pages, and we might have synced a lot of
+* newly appeared dirty pages, but have not synced all
+* of the old dirty pages.
+*/
+   if (mpd->wbc->sync_mode == WB_SYNC_NONE && left <= 0)
+   goto out;
+
/* If we can't merge this page, we are done. */
if (mpd->map.m_len > 0 && mpd->next_page != page->index)
goto out;
@@ -2364,19 +2376,7 @@ static int mpage_prepare_extent_to_map(struct 
mpage_da_data *mpd)
if (err <= 0)
goto out;
err = 0;
-
-   /*
-* Accumulated enough dirty pages? This doesn't apply
-* to WB_SYNC_ALL mode. For integrity sync we have to
-* keep going because someone may be concurrently
-* dirtying pages, and we might have synced a lot of
-* newly appeared dirty pages, but have not synced all
-* of the old dirty pages.
-*/
-   if (mpd->wbc->sync_mode == WB_SYNC_NONE &&
-   mpd->next_page - mpd->first_page >=
-   mpd->wbc->nr_to_write)
-   goto out;
+   left--;
}
pagevec_release();
cond_resched();
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] x86, apic: Add boot_cpu_is_bsp() to check if boot cpu is BSP

2013-10-15 Thread HATAYAMA Daisuke

(2013/10/16 4:12), Vivek Goyal wrote:

On Tue, Oct 15, 2013 at 02:43:22PM +0900, HATAYAMA Daisuke wrote:

Kexec can enter the kdump 2nd kernel on AP if crash happens on AP. To
check if boot cpu is BSP, introduce a helper function
boot_cpu_is_bsp().

Signed-off-by: HATAYAMA Daisuke 
---
  arch/x86/include/asm/mpspec.h |7 +++
  arch/x86/kernel/apic/apic.c   |   16 
  arch/x86/kernel/setup.c   |2 ++
  3 files changed, 25 insertions(+)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index 626cf70..54d5f98 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -47,11 +47,18 @@ extern int mp_bus_id_to_type[MAX_MP_BUSSES];
  extern DECLARE_BITMAP(mp_bus_not_pci, MAX_MP_BUSSES);

  extern unsigned int boot_cpu_physical_apicid;
+extern bool boot_cpu_is_bsp;
  extern unsigned int max_physical_apicid;
  extern int mpc_default_type;
  extern unsigned long mp_lapic_addr;

  #ifdef CONFIG_X86_LOCAL_APIC
+extern void boot_cpu_is_bsp_init(void);
+#else
+static inline void boot_cpu_is_bsp_init(void) { };
+#endif
+
+#ifdef CONFIG_X86_LOCAL_APIC
  extern int smp_found_config;
  #else
  # define smp_found_config 0
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index a7eb82d..62ee365 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -64,6 +64,12 @@ unsigned disabled_cpus;
  unsigned int boot_cpu_physical_apicid = -1U;

  /*


[..]

+ * Indicates whether the processor that is doing the boot up, is BSP
+ * processor or not.
+ */
+bool boot_cpu_is_bsp;


Should we set it to true by default? I think in most of the cases boot cpu
is going to be bsp too?



Agreed. Most likely value should be default.

The reason why I wrote so would be that -- if there's reason -- I wanted to
write it uniform to other variables around it and wanted to avoid to let it
have static storage in binary file.


+
+/*
   * The highest APIC ID seen during enumeration.
   */
  unsigned int max_physical_apicid;
@@ -2589,3 +2595,13 @@ static int __init lapic_insert_resource(void)
   * that is using request_resource
   */
  late_initcall(lapic_insert_resource);
+
+void __init boot_cpu_is_bsp_init(void)
+{
+   if (cpu_has_apic) {
+   u32 l, h;
+
+   rdmsr_safe(MSR_IA32_APICBASE, , );
+   boot_cpu_is_bsp = (l & MSR_IA32_APICBASE_BSP) ? true : false;


I came across following thread.

https://lkml.org/lkml/2012/4/18/370

Can we hit above read msr on old P5 class machines? Or is it safe to
call unconditionally.



No, it's dangerous to cause #UD, and current implementation doesn't check
exception value returned by rdmsr_safe. It's meaningless to call rdmsr_safe.

At least, checking boot_cpu_data.x86 >= 6 satisfies support of IA32_APIC_BASE
MSR and this at the same time satisfies support of rdmsr instruction since the
instruction was introduced at Pentium processor. So,

if (boot_cpu_data.x86 >= 6 && cpu_has_apic()) {
u32 l, h;

rdmsr(MSR_IA32_APICBASE, , );
boot_cpu_is_bsp = (l & MSR_IA32_APICBASE_BSP) ? true : false;  
}


--
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: A review of dm-writeboost

2013-10-15 Thread Akira Hayakawa
Mikulas,

> I/Os shouldn't be returned with -ENOMEM. If they are, you can treat it as 
> a hard error.
It seems to be blkdev_issue_discard returns -ENOMEM
when bio_alloc fails, for example.
Waiting for a second and we can alloc the memory is my idea
for handling -ENOMEM returned.

> Blocking I/O until the admin turns a specific variable isn't too 
> reliable.
> 
> Think of this case - your driver detects I/O error and blocks all I/Os. 
> The admin tries to log in. The login process needs memory. To fulfill this 
> memory need, the login process writes out some dirty pages. Those writes 
> are blocked by your driver - in the result, the admin is not able to log 
> in and flip the switch to unblock I/Os.
> 
> Blocking I/O indefinitely isn't good because any system activity 
> (including typing commands into shell) may wait on this I/O.
I understand the problem. But, what should I do then?
Since writeboost is a cache software,
it loses consistency if we ignore the cache at all
in its returning I/O error.
Go panic in that case is also inappropriate (But, inaccessibility to
the storage will eventually halt the whole system. If so, go panic might
be an acceptable solution).

I am afraid my idea is based on your past comment
> If you can't handle a specific I/O request failure gracefully, you should 
> mark the driver as dead, don't do any more I/Os to the disk or cache 
> device and return -EIO on all incoming requests.
> 
> Always think that I/O failures can happen because of connection problems, 
> not data corruption problems - for example, a disk cable can go loose, a 
> network may lose connectivity, etc. In these cases, it is best to stop 
> doing any I/O at all and let the user resolve the situation.
1) In failure, mark the driver dead - set `blockup` to 1 in my case -
   and returning -EIO on all incoming requests. Yes.
2) And wait for the user resolve the situation - returning -EIO until
   admin turns `blockup` to 0 after checkup in my case - . Yes.

Did you mean we should not provide any way to recover the system
because admin may not be able to reach the switch?
writeboost module autonomously checking the device in problem
recovered should be implemented?
Retry submitting I/O to the device and find the device is recovered
on I/O success is a solution and I have implemented it.
I/O retry doesn't destroy any consistency in writeboost;
sooner or later it can not be able to accept writes any more because of
lack of RAM buffer which can be reused after I/O success to cache device.

Akira

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 12/12] intel_mid: Moved board related code to a new file

2013-10-15 Thread H. Peter Anvin
On 10/15/2013 04:53 PM, David Cohen wrote:
> On 10/15/2013 04:44 PM, H. Peter Anvin wrote:
>> On 10/15/2013 04:42 PM, David Cohen wrote:
>>>
>>> +#define intel_mid_sfi_dev(i)   \
>>> +static const struct devs_id *__intel_mid_sfi_##i##_dev __used \
>>> +__attribute__((__section__(".x86_intel_mid_dev.init"))) = 
>>> +
>>
>> Any reason to not just call this "sfi_device()" or something similar?
>> "Intel MID SFI" seems a bit redundant...
> 
> I had the same though. But struct devs_id is defined by asm/intel-mid.h.
> This function is not meant to be used by any other user beside
> intel-mid.
> But I can change if you prefer.
> 

Hm, I guess it doesn't really matter.  After all, no other devices will
probably ever see SFI (we hope).

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] usb: hub: Clear Port Reset Change during init/resume

2013-10-15 Thread Julius Werner
This patch adds the Port Reset Change flag to the set of bits that are
preemptively cleared on init/resume of a hub. In theory this bit should
never be set unexpectedly... in practice it can still happen if BIOS,
SMM or ACPI code plays around with USB devices without cleaning up
correctly. This is especially dangerous for XHCI root hubs, which don't
generate any more Port Status Change Events until all change bits are
cleared, so this is a good precaution to have (similar to how it's
already done for the Warm Port Reset Change flag).

Signed-off-by: Julius Werner 
---
 drivers/usb/core/hub.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index e6b682c..c3dd64c 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -1130,6 +1130,11 @@ static void hub_activate(struct usb_hub *hub, enum 
hub_activation_type type)
usb_clear_port_feature(hub->hdev, port1,
USB_PORT_FEAT_C_ENABLE);
}
+   if (portchange & USB_PORT_STAT_C_RESET) {
+   need_debounce_delay = true;
+   usb_clear_port_feature(hub->hdev, port1,
+   USB_PORT_FEAT_C_RESET);
+   }
if ((portchange & USB_PORT_STAT_C_BH_RESET) &&
hub_is_superspeed(hub->hdev)) {
need_debounce_delay = true;
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/8] ACPI / trace: Add trace interface for eMCA driver

2013-10-15 Thread Mauro Carvalho Chehab
I see a few problems on this patchset:

Em Tue, 15 Oct 2013 23:00:53 +0530
"Naveen N. Rao"  escreveu:

> On 10/15/2013 10:30 PM, Borislav Petkov wrote:
> > On Tue, Oct 15, 2013 at 10:24:35PM +0530, Naveen N. Rao wrote:
> >> On 2013/10/11 02:32AM, Chen Gong wrote:
> >>> Use trace interface to elaborate all H/W error related
> >>> information.
> >>>
> >>> Signed-off-by: Chen, Gong 
> >>> ---
> >> 
> >>> +TRACE_EVENT(extlog_mem_event,
> >>> + TP_PROTO(u32 etype,
> >>> + char *dimm_loc,
> >>> + const uuid_le *fru_id,

Using a custom typedef here seems problematic, as that can make userspace
interface more complicated.

> >>> + char *fru_text,
> >>> + u64 error_count,
> >>> + u32 severity,
> >>> + u64 phy_addr,
> >>> + char *mem_loc),

By looking on the rest of the changes, the mem_loc can now contain the 
right location of the memory error, including on what DIMM the error
happened. It can also (optionally) contain the DIMM label. 

Mangling those information on just one string field seems to be a very
bad idea to me, as it prevents to write a generic logic, on userspace,
that would apply a per-DIMM threshold policy.

Also, userspace needs to know what's the granularity for the error
that an eMCA driver will give, in order to adjust its policies.

> >>
> >> [Adding Mauro...]
> >>
> >> This looks very similar to the trace event I wrote a while back,
> >> which was similar to the one provided by ghes_edac:
> >> http://thread.gmane.org/gmane.linux.kernel.pci/24616
> >>
> >> Seems to me this has the same issues we previously discussed w.r.t
> >> EDAC conflicts...

Agreed.

> >
> > Right, I'm inclined to leave this trace_mc_event in ras_event.h to edac
> > use alone because of all those layers which don't mean whit for GHES and
> > eMCA error sources.

If you don't create the EDAC nodes, it means that userspace doesn't have any
glue about what error information will be provided.

The right thing to do is, IMHO, add some additional EDAC sysfs nodes that
shows what kind of error information will be provided by the device, e. g.:

- a complete hardware-based type of information directly obtained from
  the hardware;

- a very poor BIOS-based type of error information, where the provided
  data is not sufficient to pinpoint to the DIMM where the error 
actually
  occurred (what's currently there at ghes_edac);

- an eMCA-based type of error information, where the BIOS and ACPI will
  provide the complete error path, allowing userspace to properly parse
  the errors as if they come from the hardware-based approach.

In any case, this is provided by the EDAC core functions that describe the
memory in details. So, IMHO, get rid of EDAC is a big mistake.

> > And maybe define a trace_mem_event which is shared by GHES and eMCA and
> > not use the edac tracepoint there not load ghes_edac on such systems
> > which have sufficient decoding capability in firmware.
> >
> > Thoughts?
> 
> I thought the primary problem was the conflict with edac core itself. 
> So, if I'm not mistaken, we would have to prevent all edac drivers from 
> loading.

Yes, this is another aspect of this approach: whatever provided mechanism,
the Kernel should assure that one error path won't conflict with the other
ones. We know by experience that enabling both BIOS-based and hardware-based
mechanisms cause race conditions, with affects both ways.
It is also nice to allow the user to choose his preferred mechanism, when
more than one is properly supported on a given system.

Regards,
Mauro

(c/c Aristeu, as he might also being working with similar stuff)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usb: hub: Clear Port Reset Change during init/resume

2013-10-15 Thread Julius Werner
>> +   if ((portchange & USB_PORT_STAT_C_RESET)) {
>
>
>Hm, why these double parens?

Oh... good question. I copied the entry below it, remove the && and
must have overlooked those. Sorry, v2 incoming...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 12/15] KVM: MMU: allow locklessly access shadow page table out of vcpu thread

2013-10-15 Thread Xiao Guangrong

On Oct 16, 2013, at 6:21 AM, Marcelo Tosatti  wrote:

> On Tue, Oct 15, 2013 at 06:57:05AM +0300, Gleb Natapov wrote:
>>> 
>>> Why is it safe to allow access, by the lockless page write protect
>>> side, to spt pointer for shadow page A that can change to a shadow page 
>>> pointer of shadow page B?
>>> 
>>> Write protect spte of any page at will? Or verify that in fact thats the
>>> shadow you want to write protect?
>>> 
>>> Note that spte value might be the same for different shadow pages, 
>>> so cmpxchg succeeding does not guarantees its the same shadow page that
>>> has been protected.
>>> 
>> Two things can happen: spte that we accidentally write protect is some
>> other last level spte - this is benign, it will be unprotected on next
>> fault.  
> 
> Nothing forbids two identical writable sptes to point to a same pfn. How
> do you know you are write protecting the correct one? (the proper gfn).
> 
> Lockless walk sounds interesting. By the time you get to the lower
> level, that might be a different spte.

That's safe. Since get-dirty-log is serialized by slot-lock the dirty-bit
can not be lost - even if we write-protect on the different memslot
 (the dirty bit is still set). The worst case is we write-protect on a
unnecessary spte and cause a extra #PF but that is really race.

And the lockless rmap-walker can detect the new spte so that
write-protection on the memslot is not missed.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usb-storage: scsiglue: Changing the command result

2013-10-15 Thread Ming Lei
On Wed, Oct 16, 2013 at 4:22 AM, Alan Stern  wrote:
> On Tue, 15 Oct 2013, Vishal Annapurve wrote:
>
>> Hi Alan,
>>
>> USB storage maybe just has to say that the abort occurred. By setting the
>> US_FLIDX_TIMED_OUT bit USB storage is getting signaled that the reason was
>> time out and the command is being aborted.
>
> No.  By setting the US_FLIDX_TIMED_OUT bit, usb-storage indicates that
> the command was aborted.  This doesn't indicate anything about the
> reason for the abort.  (Maybe this bit's name wasn't chosen very well.)
>
>> Now, it's arguable whether to change the implication of US_FLIDX_TIMED_OUT
>> bit for scsi - USB storage bridge or for entire usb storage
>
> I don't understand this.  What's the difference between "scsi - USB
> storage bridge" and "entire usb storage"?  Aren't they the same thing?
>
>>  Or maybe scsi has
>> decided to abort so it should override the result.
>
> Of course the SCSI midlayer has decided to abort.  That's the only way
> this bit can get set.  But usb-storage doesn't know why SCSI decided to
> abort.

usb-storage may know if it is caused by timeout via .eh_timed_out callback
if it wants to know.


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's

2013-10-15 Thread Eric Dumazet
On Tue, 2013-10-15 at 09:21 -0700, Joe Perches wrote:

> Ingo, Eric _showed_ that the prefetch is good here.
> How about looking at a little optimization to the minimal
> prefetch that gives that level of performance.

Wait a minute, my point was to remind that main cost is the
memory fetching.

Its nice to optimize cpu cycles if we are short of them,
but in the csum_partial() case, the bottleneck is the memory.

Also I was wondering on the implications of changing reads order,
as it might fool cpu predictions.

I do not particularly care about finding the right prefetch stride,
I think Intel guys know better than me.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] drivers: scsi: lpfc: Fix typo on NULL assignment

2013-10-15 Thread Felipe Pena
In the lpfc_ct_free_iocb function after freeing associated memory to the
ctiocb->context3, the ctiocb->context1 is set to NULL instead of context3.

Signed-off-by: Felipe Pena 
---
 drivers/scsi/lpfc/lpfc_ct.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_ct.c b/drivers/scsi/lpfc/lpfc_ct.c
index 02e8cd9..da61d8d 100644
--- a/drivers/scsi/lpfc/lpfc_ct.c
+++ b/drivers/scsi/lpfc/lpfc_ct.c
@@ -280,7 +280,7 @@ lpfc_ct_free_iocb(struct lpfc_hba *phba, struct lpfc_iocbq 
*ctiocb)
buf_ptr = (struct lpfc_dmabuf *) ctiocb->context3;
lpfc_mbuf_free(phba, buf_ptr->virt, buf_ptr->phys);
kfree(buf_ptr);
-   ctiocb->context1 = NULL;
+   ctiocb->context3 = NULL;
}
lpfc_sli_release_iocbq(phba, ctiocb);
return 0;
--
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >