[PATCH] pinctrl: fix a typo in Kconfig

2015-11-29 Thread Masahiro Yamada
Signed-off-by: Masahiro Yamada 
---

 drivers/pinctrl/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pinctrl/Kconfig b/drivers/pinctrl/Kconfig
index 312c78b..e586794 100644
--- a/drivers/pinctrl/Kconfig
+++ b/drivers/pinctrl/Kconfig
@@ -244,7 +244,7 @@ config PINCTRL_ZYNQ
select PINMUX
select GENERIC_PINCONF
help
- This selectes the pinctrl driver for Xilinx Zynq.
+ This selects the pinctrl driver for Xilinx Zynq.
 
 source "drivers/pinctrl/bcm/Kconfig"
 source "drivers/pinctrl/berlin/Kconfig"
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] base/platform: fix panic when probe function is NULL

2015-11-29 Thread Wilck, Martin
Hello Uwe,

thanks for your review.

> This may cause a panic later. For example, inserting the tpm_tis
> > driver with parameter "force=1" (i.e. registering tpm_tis as a platform
> > driver) will panic in tpmm_chip_alloc() because dev->driver is NULL:
> > 
> >  chip->cdev.owner = chip->pdev->driver->owner;
> 
> This sounds like a separate issue though. Looking at init_tis there is:
> 
> rc = platform_driver_register(_drv);
> if (rc < 0)
> return rc;
> pdev = platform_device_register_simple("tpm_tis", -1, NULL, 0);
> if (IS_ERR(pdev)) {
> rc = PTR_ERR(pdev);
> goto err_dev;
> }
> rc = tpm_tis_init(>dev, _default_info, NULL);
> 
> tpm_tis_init calls tpmm_chip_alloc which barfs when pdev (i.e. the return 
> value
> of platform_device_register_simple above) isn't bound. It is not allowed
> to assume that the device is bound after the above function calls.

I agree that the TPM platform device code deserves improvement. Jason
wrote that he has already some patches available for that.

I lack the knowledge to judge whether or not tpm_is_init's assumption
was correct. But, maybe just by luck, this assumption used to be *true*
until patch b8b2c7d845d5. Driver and device were matched by name
("tpm_tis") by the platform driver probing code, and device and driver
were actually bound to each other after this sequence of calls. 

> So I'd say drop the paragraph about tpm_tis and the change is fine.

I didn't mean to blame your patch. But a note about the panic might be
helpful just in case someone else runs into the same problem. The
connection between your patch and tpm_tis loading is far from obvious.
I mentioned the panic in order to clarify that this wasn't just a
theoretical issue.

Anyway, I'll resubmit with your style hints applied and will try to find
a wording for the commit message that we can agree upon.

Best Regards,
Martin

> 
> > This patch fixes this by returning success in platform_drv_probe() if
> > "just" dev_pm_domain_attach() had failed. This restores the semantics
> > of platform_device_register_XXX() if the associated platform driver has
> > no "probe" function.
> > 
> > Fixes: b8b2c7d845d5 ("base/platform: assert that dev_pm_domain
> > callbacks are called unconditionally")
> > 
> 
> I think line breaks in the Fixes: line are frowned on. Also usually
> there is no empty line between Fixes: and S-o-b:.
> 
> > Signed-off-by: Martin Wilck 
> > ---
> >  drivers/base/platform.c | 12 
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> > index 1dd6d3b..c994e76 100644
> > --- a/drivers/base/platform.c
> > +++ b/drivers/base/platform.c
> > @@ -513,10 +513,14 @@ static int platform_drv_probe(struct device *_dev)
> > return ret;
> >  
> > ret = dev_pm_domain_attach(_dev, true);
> > -   if (ret != -EPROBE_DEFER && drv->probe) {
> > -   ret = drv->probe(dev);
> > -   if (ret)
> > -   dev_pm_domain_detach(_dev, true);
> > +   if (ret != -EPROBE_DEFER) {
> > +   if (drv->probe) {
> > +   ret = drv->probe(dev);
> > +   if (ret)
> > +   dev_pm_domain_detach(_dev, true);
> > +   } else
> > +   /* don't fail if just dev_pm_domain_attach failed */
> > +   ret = 0;
> 
> An else that has a } should also have a {, according to 
> checkpatch and Documentation/CodingStyle. You can write it
> alternatively as:
> 
>   if (ret != -EPROBE_DEFER) {
>   if (drv->probe)
>   ret = drv->probe(dev);
>   else
>   ret = 0;
> 
>   if (ret)
>   dev_pm_domain_detach(_dev, true);
>   }
> 
> .
> 
> Best regards
> Uwe
> 
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

[PATCH] clk: fix a typo in comment block of clk_notifier_register()

2015-11-29 Thread Masahiro Yamada
The word "cases" is doubled.  Keep decent forms for the following
lines.

Signed-off-by: Masahiro Yamada 
---

 drivers/clk/clk.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index efb0dfd..6d9cd05 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2806,10 +2806,9 @@ void __clk_put(struct clk *clk)
  * re-enter into the clk framework by calling any top-level clk APIs;
  * this will cause a nested prepare_lock mutex.
  *
- * In all notification cases cases (pre, post and abort rate change) the
- * original clock rate is passed to the callback via struct
- * clk_notifier_data.old_rate and the new frequency is passed via struct
- * clk_notifier_data.new_rate.
+ * In all notification cases (pre, post and abort rate change) the original
+ * clock rate is passed to the callback via struct clk_notifier_data.old_rate
+ * and the new frequency is passed via struct clk_notifier_data.new_rate.
  *
  * clk_notifier_register() must be called from non-atomic context.
  * Returns -EINVAL if called with null arguments, -ENOMEM upon
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 04/13] perf record: Apply config to BPF objects before recording

2015-11-29 Thread Wang Nan
bpf__apply_obj_config() is introduced as the core API to apply object
config options to all BPF objects. This patch also does the real work
for setting values for BPF_MAP_TYPE_PERF_ARRAY maps by inserting value
stored in map's private field into the BPF map.

This patch is required because we are not always able to set all
BPF config during parsing. Further patch will set events created
by perf to BPF_MAP_TYPE_PERF_EVENT_ARRAY maps, which is not exist
until perf_evsel__open().

bpf_map_foreach_key() is introduced to iterate over each key
needs to be configured. This function would be extended to support
more map types and different key settings.

In perf record, before start recording, call bpf__apply_config() to
turn on all BPF config options.

Test result:

 # cat ./test_bpf_map_1.c
 / BEGIN **/
 #define SEC(NAME) __attribute__((section(NAME), used))
 enum bpf_map_type {
 BPF_MAP_TYPE_ARRAY = 2,
 };
 struct bpf_map_def {
 unsigned int type;
 unsigned int key_size;
 unsigned int value_size;
 unsigned int max_entries;
 };
 static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
 (void *)1;
 static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) =
 (void *)6;
 struct bpf_map_def SEC("maps") channel = {
 .type = BPF_MAP_TYPE_ARRAY,
 .key_size = sizeof(int),
 .value_size = sizeof(int),
 .max_entries = 1,
 };
 SEC("func=sys_nanosleep")
 int func(void *ctx)
 {
 int key = 0;
 char fmt[] = "%d\n";
 int *pval = map_lookup_elem(, );
 if (!pval)
 return 0;
 bpf_trace_printk(fmt, sizeof(fmt), *pval);
 return 0;
 }
 char _license[] SEC("license") = "GPL";
 int _version SEC("version") = LINUX_VERSION_CODE;
 /* END ***/


 # echo "" > /sys/kernel/debug/tracing/trace
 # ./perf record -e './test_bpf_map_1.c/maps:channel.value=11/' usleep 10
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.012 MB perf.data ]
 # cat /sys/kernel/debug/tracing/trace
 # tracer: nop
 #
 # entries-in-buffer/entries-written: 1/1   #P:8
 [SNIP]
 #   TASK-PID   CPU#  TIMESTAMP  FUNCTION
 #  | |   |      | |
usleep-18593 [007] d... 2394714.395539: : 11
 # ./perf record -e './test_bpf_map.c/maps:channel.value=101/' usleep 10
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.012 MB perf.data ]
 # cat /sys/kernel/debug/tracing/trace
 # tracer: nop
 #
 # entries-in-buffer/entries-written: 1/1   #P:8
 [SNIP]
 #   TASK-PID   CPU#  TIMESTAMP  FUNCTION
 #  | |   |      | |
usleep-18593 [007] d... 2394714.395539: : 11
usleep-19000 [006] d... 2394831.057840: : 101

Signed-off-by: Wang Nan 
Signed-off-by: He Kuang 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: pi3or...@163.com
---
 tools/perf/builtin-record.c  |  11 +++
 tools/perf/util/bpf-loader.c | 180 +++
 tools/perf/util/bpf-loader.h |  15 
 3 files changed, 206 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 199fc31..8479821 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -32,6 +32,7 @@
 #include "util/parse-branch-options.h"
 #include "util/parse-regs-options.h"
 #include "util/llvm-utils.h"
+#include "util/bpf-loader.h"
 
 #include 
 #include 
@@ -524,6 +525,16 @@ static int __cmd_record(struct record *rec, int argc, 
const char **argv)
goto out_child;
}
 
+   err = bpf__apply_obj_config();
+   if (err) {
+   char errbuf[BUFSIZ];
+
+   bpf__strerror_apply_obj_config(err, errbuf, sizeof(errbuf));
+   pr_err("ERROR: Apply config to BPF failed: %s\n",
+errbuf);
+   goto out_child;
+   }
+
/*
 * Normally perf_session__new would do this, but it doesn't have the
 * evlist.
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 7d361aa..96fd18b 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -7,6 +7,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "perf.h"
@@ -984,6 +985,178 @@ out:
 
 }
 
+typedef int (*map_config_func_t)(const char *name, int map_fd,
+struct bpf_map_def *pdef,
+struct bpf_map_op *op,
+void *pkey, void *arg);
+
+static int
+foreach_key_array_all(map_config_func_t func,
+ void *arg, const char *name,
+ int map_fd, struct bpf_map_def *pdef,
+ struct bpf_map_op *op)
+{
+   unsigned int i;
+   int err;
+
+   for (i = 0; i < pdef->max_entries; i++) {
+   

[PATCH v3 00/13] perf tools: BPF related update

2015-11-29 Thread Wang Nan
This patch set is based on perf/core.

Compare with v2:
 - Checks return value of strdup()
 - Change BPF map setting syntax to 
'maps:[mapname].[event:value]=value'
   For example:
 'maps:mapname.value[1,3...5]=10'
   Test cases in each commit message are also changed correspondingly.
   (Thanks to Namhyung Kim)

This patch set improves perf's BPF support:

 - Support filling BPF array with values
 
   Users are allowed to pass something to BPF program through command
   line without changing the program itself.

 - Support filling BPF event array with events

   BPF program can read PMU counters through BPF's perf_event_read()
   helper.

 - Support bpf_output_event() helper

   BPF program can issue perf event to perf.data.

In most of the patches I list commands for testing them, both normal
case and error case.

He Kuang (2):
  perf tools: Support perf event alias name
  perf record: Support custom vmlinux path

Wang Nan (11):
  tools lib bpf: Check return value of strdup when reading map names
  perf tools: Add API to config maps in bpf object
  perf tools: Enable BPF object configure syntax
  perf record: Apply config to BPF objects before recording
  perf tools: Enable passing event to BPF object
  perf tools: Support setting different slots in a BPF map separately
  perf tools: Enable indices setting syntax for BPF maps
  perf tools: Introduce bpf-output event
  perf data: Add u32_hex data type
  perf data: Support converting data from bpf_perf_event_output()
  perf tools: Always give options even it not compiled

 tools/lib/bpf/libbpf.c   |  13 +-
 tools/perf/Documentation/perf-record.txt |  10 +-
 tools/perf/builtin-probe.c   |  15 +-
 tools/perf/builtin-record.c  |  36 +-
 tools/perf/util/bpf-loader.c | 700 +++
 tools/perf/util/bpf-loader.h |  59 +++
 tools/perf/util/data-convert-bt.c| 117 +-
 tools/perf/util/evlist.c |  16 +
 tools/perf/util/evlist.h |   4 +
 tools/perf/util/evsel.c  |   7 +
 tools/perf/util/evsel.h  |   1 +
 tools/perf/util/parse-events.c   | 125 +-
 tools/perf/util/parse-events.h   |  20 +-
 tools/perf/util/parse-events.l   |  16 +-
 tools/perf/util/parse-events.y   | 123 +-
 tools/perf/util/parse-options.c  | 113 -
 tools/perf/util/parse-options.h  |   5 +
 17 files changed, 1345 insertions(+), 35 deletions(-)

-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 08/13] perf tools: Enable indices setting syntax for BPF maps

2015-11-29 Thread Wang Nan
This patch introduce a new syntax to perf event parser:

 # perf record -e bpf_file.c/maps.mymap.value[0,3...5,7]=1234/ ...

By utilizing the basic facilities in bpf-loader.c which allow setting
different slots in a BPF map separately, the newly introduced syntax
allows perf to control specific elements in a BPF map.

Test result:

 # cat ./test_bpf_map_3.c
 / BEGIN **/
 #define SEC(NAME) __attribute__((section(NAME), used))
 enum bpf_map_type {
 BPF_MAP_TYPE_ARRAY = 2,
 };
 struct bpf_map_def {
 unsigned int type;
 unsigned int key_size;
 unsigned int value_size;
 unsigned int max_entries;
 };
 static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
 (void *)1;
 static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) =
 (void *)6;
 struct bpf_map_def SEC("maps") channel = {
 .type = BPF_MAP_TYPE_ARRAY,
 .key_size = sizeof(int),
 .value_size = sizeof(unsigned char),
 .max_entries = 100,
 };
 SEC("func=hrtimer_nanosleep rqtp->tv_nsec")
 int func(void *ctx, int err, long nsec)
 {
 char fmt[] = "%ld\n";
 long usec = nsec * 0x10624dd3 >> 38; // nsec / 1000
 int key = (int)usec;
 unsigned char *pval = map_lookup_elem(, );

 if (!pval)
 return 0;
 bpf_trace_printk(fmt, sizeof(fmt), (unsigned char)*pval);
 return 0;
 }
 char _license[] SEC("license") = "GPL";
 int _version SEC("version") = LINUX_VERSION_CODE;
 /* END ***/

Normal case:
 # echo "" > /sys/kernel/debug/tracing/trace
 # ./perf record -e './test_bpf_map_3.c/maps:channel.value[0,1,2,3...5]=101/' 
usleep 2
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.012 MB perf.data ]
 # cat /sys/kernel/debug/tracing/trace | grep usleep
   usleep-405   [004] d... 2745423.547822: : 101
 # ./perf record -e 
'./test_bpf_map_3.c/maps:channel.value[0...9,20...29]=102,maps:channel.value[10...19]=103/'
 usleep 3
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.012 MB perf.data ]
 # ./perf record -e 
'./test_bpf_map_3.c/maps:channel.value[0...9,20...29]=102,maps:channel.value[10...19]=103/'
 usleep 15
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.012 MB perf.data ]
 # cat /sys/kernel/debug/tracing/trace | grep usleep
   usleep-405   [004] d... 2745423.547822: : 101
   usleep-655   [006] d... 2745434.122814: : 102
   usleep-904   [006] d... 2745439.916264: : 103
 # ./perf record -e './test_bpf_map_3.c/maps:channel.value[all]=104/' usleep 99
 # cat /sys/kernel/debug/tracing/trace | grep usleep
   usleep-405   [004] d... 2745423.547822: : 101
   usleep-655   [006] d... 2745434.122814: : 102
   usleep-904   [006] d... 2745439.916264: : 103
   usleep-1537  [003] d... 2745538.053737: : 104

Error case:
 # ./perf record -e './test_bpf_map_3.c/maps:channel.value[10...1000]=104/' 
usleep 99
 event syntax error: '..annel.value[10...1000]=104/'
   \___ Index too large
 Hint:  Valid config terms:
maps:[].value=[value]
maps:[].event=[event]

where  is something like [0,3...5] or [all]
(add -v to see detail)
 Run 'perf list' for a list of valid events

  Usage: perf record [] []
 or: perf record [] --  []

 -e, --eventevent selector. use 'perf list' to list available 
events

Signed-off-by: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: pi3or...@163.com
---
 tools/perf/util/parse-events.c |  5 ++-
 tools/perf/util/parse-events.l | 13 ++-
 tools/perf/util/parse-events.y | 85 ++
 3 files changed, 100 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index af3d657..c485b32 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -660,9 +660,10 @@ parse_events_config_bpf(struct parse_events_evlist *data,
 sizeof(errbuf));
data->error->help = strdup(
 "Hint:\tValid config terms:\n"
-" \tmaps:[].value=[value]\n"
-" \tmaps:[].event=[event]\n"
+" \tmaps:[].value=[value]\n"
+" \tmaps:[].event=[event]\n"
 "\n"
+" \twhere  is something like [0,3...5] or [all]\n"
 " \t(add -v to see detail)");
data->error->str = strdup(errbuf);
if (err == -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUE)
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 4387728..8bb3437 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -9,8 +9,8 @@
 %{
 #include 
 #include "../perf.h"
-#include "parse-events-bison.h"
 #include "parse-events.h"
+#include "parse-events-bison.h"
 
 char 

[PATCH 02/16] clk: change the argument of __clk_init() into pointer to clk_core

2015-11-29 Thread Masahiro Yamada
The argument clk_user is used only for the clk_user->core.  The rest
of this function only takes care of clk_core.

Signed-off-by: Masahiro Yamada 
---

 drivers/clk/clk.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 65530e9..8c8ba91 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2285,25 +2285,22 @@ static inline void clk_debug_unregister(struct clk_core 
*core)
 #endif
 
 /**
- * __clk_init - initialize the data structures in a struct clk
- * @clk:   clk being initialized
+ * __clk_init - initialize the data structures in a struct clk_core
+ * @core:  clk_core being initialized
  *
  * Initializes the lists in struct clk_core, queries the hardware for the
  * parent and rate and sets them both.
  */
-static int __clk_init(struct clk *clk_user)
+static int __clk_init(struct clk_core *core)
 {
int i, ret = 0;
struct clk_core *orphan;
struct hlist_node *tmp2;
-   struct clk_core *core;
unsigned long rate;
 
-   if (!clk_user)
+   if (!core)
return -EINVAL;
 
-   core = clk_user->core;
-
clk_prepare_lock();
 
/* check to see if a clock with this name is already registered */
@@ -2574,7 +2571,7 @@ struct clk *clk_register(struct device *dev, struct 
clk_hw *hw)
goto fail_parent_names_copy;
}
 
-   ret = __clk_init(hw->clk);
+   ret = __clk_init(core);
if (!ret)
return hw->clk;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/16] clk: slightly optimize clk_core_set_parent()

2015-11-29 Thread Masahiro Yamada
If clk_fetch_parent_index() fails, p_rate is unused.  Move the
assignment after the error checking.

Signed-off-by: Masahiro Yamada 
---

 drivers/clk/clk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 8a6a33b..479a754 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -1730,13 +1730,13 @@ static int clk_core_set_parent(struct clk_core *core, 
struct clk_core *parent)
/* try finding the new parent index */
if (parent) {
p_index = clk_fetch_parent_index(core, parent);
-   p_rate = parent->rate;
if (p_index < 0) {
pr_debug("%s: clk %s can not be parent of clk %s\n",
__func__, parent->name, core->name);
ret = p_index;
goto out;
}
+   p_rate = parent->rate;
}
 
/* propagate PRE_RATE_CHANGE notifications */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/16] clk: rename __clk_init() into __clk_core_init()

2015-11-29 Thread Masahiro Yamada
Now this function takes clk_core as its argument.  __clk_core_init()
would be more suitable for the name of this function.

Signed-off-by: Masahiro Yamada 
---

 drivers/clk/clk.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 8c8ba91..36373d3 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2285,13 +2285,13 @@ static inline void clk_debug_unregister(struct clk_core 
*core)
 #endif
 
 /**
- * __clk_init - initialize the data structures in a struct clk_core
+ * __clk_core_init - initialize the data structures in a struct clk_core
  * @core:  clk_core being initialized
  *
  * Initializes the lists in struct clk_core, queries the hardware for the
  * parent and rate and sets them both.
  */
-static int __clk_init(struct clk_core *core)
+static int __clk_core_init(struct clk_core *core)
 {
int i, ret = 0;
struct clk_core *orphan;
@@ -2571,7 +2571,7 @@ struct clk *clk_register(struct device *dev, struct 
clk_hw *hw)
goto fail_parent_names_copy;
}
 
-   ret = __clk_init(core);
+   ret = __clk_core_init(core);
if (!ret)
return hw->clk;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/16] clk: drop the initial core->parents look-ups from __clk_core_init()

2015-11-29 Thread Masahiro Yamada
The core->parents is a cache to save expensive clock parent look-ups.
It will be filled as needed later.  We do not have to do it here.

Signed-off-by: Masahiro Yamada 
---

 drivers/clk/clk.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index f2758c4..43fb329 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2329,17 +2329,6 @@ static int __clk_core_init(struct clk_core *core)
"%s: invalid NULL in %s's .parent_names\n",
__func__, core->name);
 
-   /*
-* clk_core_lookup returns NULL for parents that have not been
-* clk_init'd; thus any access to clk->parents[] must check
-* for a NULL pointer.  We can always perform lazy lookups for
-* missing parents later on.
-*/
-   if (core->parents)
-   for (i = 0; i < core->num_parents; i++)
-   core->parents[i] =
-   clk_core_lookup(core->parent_names[i]);
-
core->parent = __clk_init_parent(core);
 
/*
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/16] clk: a collection of various clean-ups and improvements

2015-11-29 Thread Masahiro Yamada
Many refactoring, with detection of circular parent looping.


Masahiro Yamada (16):
  clk: remove unused first argument of __clk_init()
  clk: change the argument of __clk_init() into pointer to clk_core
  clk: rename __clk_init() into __clk_core_init()
  clk: remove unnecessary !core->parents conditional
  clk: change sizeof(struct clk *) to sizeof(*core->parents)
  clk: move core->parents allocation to clk_register()
  clk: simplify clk_core_get_parent_by_index()
  clk: drop the initial core->parents look-ups from __clk_core_init()
  clk: replace pr_warn() with pr_err() for fatal cases
  clk: move checking .git_parent to __clk_core_init()
  clk: simplify __clk_init_parent()
  clk: avoid circular clock topology
  clk: walk the orphan clock list more simply
  clk: make sure parent is not NULL in clk_fetch_parent_index()
  clk: simplify clk_fetch_parent_index() function
  clk: slightly optimize clk_core_set_parent()

 drivers/clk/clk.c | 214 ++
 1 file changed, 85 insertions(+), 129 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/16] clk: replace pr_warn() with pr_err() for fatal cases

2015-11-29 Thread Masahiro Yamada
These three cases let clk_register() fail.  They should be considered
as error messages.

Signed-off-by: Masahiro Yamada 
---

 drivers/clk/clk.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 43fb329..cd96442 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2302,22 +2302,22 @@ static int __clk_core_init(struct clk_core *core)
if (core->ops->set_rate &&
!((core->ops->round_rate || core->ops->determine_rate) &&
  core->ops->recalc_rate)) {
-   pr_warning("%s: %s must implement .round_rate or 
.determine_rate in addition to .recalc_rate\n",
-   __func__, core->name);
+   pr_err("%s: %s must implement .round_rate or .determine_rate in 
addition to .recalc_rate\n",
+  __func__, core->name);
ret = -EINVAL;
goto out;
}
 
if (core->ops->set_parent && !core->ops->get_parent) {
-   pr_warning("%s: %s must implement .get_parent & .set_parent\n",
-   __func__, core->name);
+   pr_err("%s: %s must implement .get_parent & .set_parent\n",
+  __func__, core->name);
ret = -EINVAL;
goto out;
}
 
if (core->ops->set_rate_and_parent &&
!(core->ops->set_parent && core->ops->set_rate)) {
-   pr_warn("%s: %s must implement .set_parent & .set_rate\n",
+   pr_err("%s: %s must implement .set_parent & .set_rate\n",
__func__, core->name);
ret = -EINVAL;
goto out;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/16] clk: change sizeof(struct clk *) to sizeof(*core->parents)

2015-11-29 Thread Masahiro Yamada
Now, the clock parent is not "struct clk *", but "struct clk_core *".
Of course, the size of a pointer is always same, but strictly speaking,
sizeof(struct clk *) should be sizeof(struct clk_core *) here.

This mismatch happened when we split the structure into struct clk
and struct clk_core.  For the potential possibility of future renaming,
sizeof(*core->parents) would be better.

Signed-off-by: Masahiro Yamada 
---

 drivers/clk/clk.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 66c6f34..0f80c69 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -1069,7 +1069,7 @@ static int clk_fetch_parent_index(struct clk_core *core,
 
if (!core->parents) {
core->parents = kcalloc(core->num_parents,
-   sizeof(struct clk *), GFP_KERNEL);
+   sizeof(*core->parents), GFP_KERNEL);
if (!core->parents)
return -ENOMEM;
}
@@ -1702,7 +1702,7 @@ static struct clk_core *__clk_init_parent(struct clk_core 
*core)
 
if (!core->parents)
core->parents =
-   kcalloc(core->num_parents, sizeof(struct clk *),
+   kcalloc(core->num_parents, sizeof(*core->parents),
GFP_KERNEL);
 
ret = clk_core_get_parent_by_index(core, index);
@@ -2350,8 +2350,8 @@ static int __clk_core_init(struct clk_core *core)
 * necessary.
 */
if (core->num_parents > 1) {
-   core->parents = kcalloc(core->num_parents, sizeof(struct clk *),
-   GFP_KERNEL);
+   core->parents = kcalloc(core->num_parents,
+   sizeof(*core->parents), GFP_KERNEL);
/*
 * clk_core_lookup returns NULL for parents that have not been
 * clk_init'd; thus any access to clk->parents[] must check
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/16] clk: move checking .git_parent to __clk_core_init()

2015-11-29 Thread Masahiro Yamada
The .git_parent is mandatory for multi-parent clocks.  Move the check
to __clk_core_init(), like other callback checkings.

Signed-off-by: Masahiro Yamada 
---

 drivers/clk/clk.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index cd96442..486f6d4 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -1677,13 +1677,6 @@ static struct clk_core *__clk_init_parent(struct 
clk_core *core)
goto out;
}
 
-   if (!core->ops->get_parent) {
-   WARN(!core->ops->get_parent,
-   "%s: multi-parent clocks must implement .get_parent\n",
-   __func__);
-   goto out;
-   }
-
/*
 * Do our best to cache parent clocks in core->parents.  This prevents
 * unnecessary and expensive lookups.  We don't set core->parent here;
@@ -2315,6 +2308,11 @@ static int __clk_core_init(struct clk_core *core)
goto out;
}
 
+   if (core->num_parents > 1 && !core->ops->get_parent) {
+   pr_err("%s: %s must implement .get_parent as it has multi 
parents\n",
+  __func__, core->name);
+   }
+
if (core->ops->set_rate_and_parent &&
!(core->ops->set_parent && core->ops->set_rate)) {
pr_err("%s: %s must implement .set_parent & .set_rate\n",
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/16] clk: avoid circular clock topology

2015-11-29 Thread Masahiro Yamada
Currently, clk_register() never checks a circular parent looping,
but clock providers could register such an insane clock topology.
For example, "clk_a" could have "clk_b" as a parent, and vice versa.
In this case, clk_core_reparent() creates a circular parent list
and __clk_recalc_accuracies() calls itself recursively forever.

The core infrastructure should be kind enough to bail out, showing
an appropriate error message in such a case.  This helps to easily
find a bug in clock providers.  (uh, I made such a silly mistake
when I was implementing my clock providers first.  I was upset
because the kernel did not respond, without any error message.)

This commit adds a new helper function, __clk_is_ancestor().  It
returns true if the second argument is a possible ancestor of the
first one.  If a clock core is a possible ancestor of itself, it
would make a loop when it were registered.  That should be detected
as an error.

Signed-off-by: Masahiro Yamada 
---

 drivers/clk/clk.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index ef6fedb..a1d046c 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2235,6 +2235,38 @@ static inline void clk_debug_unregister(struct clk_core 
*core)
 #endif
 
 /**
+ * __clk_is_ancestor - check if a clk_core is a possible ancestor of another
+ * @core: clock core
+ * @ancestor: ancestor clock core
+ *
+ * Returns true if there is a possibility that @ancestor can be an ancestor
+ * of @core, false otherwise.
+ *
+ * This function can be used against @core or @ancestor that has not been
+ * registered yet.
+ */
+static bool __clk_is_ancestor(struct clk_core *core, struct clk_core *ancestor)
+{
+   struct clk_core *parent;
+   int i;
+
+   for (i = 0; i < core->num_parents; i++) {
+   parent = clk_core_get_parent_by_index(core, i);
+   /*
+* If ancestor has not been added to clk_{root,orphan}_list
+* yet, clk_core_lookup() cannot find it.  If parent is NULL,
+* compare the name strings, too.
+*/
+   if ((parent && (parent == ancestor ||
+   __clk_is_ancestor(parent, ancestor))) ||
+   (!parent && !strcmp(core->parent_names[i], ancestor->name)))
+   return true;
+   }
+
+   return false;
+}
+
+/**
  * __clk_core_init - initialize the data structures in a struct clk_core
  * @core:  clk_core being initialized
  *
@@ -2297,6 +2329,14 @@ static int __clk_core_init(struct clk_core *core)
"%s: invalid NULL in %s's .parent_names\n",
__func__, core->name);
 
+   /* If core is an ancestor of itself, it would make a loop. */
+   if (__clk_is_ancestor(core, core)) {
+   pr_err("%s: %s would create circular parent\n", __func__,
+  core->name);
+   ret = -EINVAL;
+   goto out;
+   }
+
core->parent = __clk_init_parent(core);
 
/*
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/16] clk: move core->parents allocation to clk_register()

2015-11-29 Thread Masahiro Yamada
Currently, __clk_core_init() allows failure of the kcalloc() for the
core->parents.  So, clk_fetch_parent_index() and __clk_init_parent()
also try to allocate core->parents in case it has not been allocated
yet.  Scattering memory allocation here and there makes things
complicated.

Like other clk_core members, allocate core->parents in clk_register()
and let it fail in case of memory shortage.  If we cannot allocate
such a small piece of memory, the system is already insane.  There is
no point to postpone the memory allocation.

Also, allocate core->parents regardless of core->num_parents.  We want
it even if core->num_parents == 1 because clk_fetch_parent_index()
might be called against the clk_core with a single parent.

If core->num_parents == 0, core->parents is set to ZERO_SIZE_PTR. It
is harmless because no access happens to core->parents in such a case.

Signed-off-by: Masahiro Yamada 
---

 drivers/clk/clk.c | 51 +++
 1 file changed, 19 insertions(+), 32 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 0f80c69..e05084e 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -1067,13 +1067,6 @@ static int clk_fetch_parent_index(struct clk_core *core,
 {
int i;
 
-   if (!core->parents) {
-   core->parents = kcalloc(core->num_parents,
-   sizeof(*core->parents), GFP_KERNEL);
-   if (!core->parents)
-   return -ENOMEM;
-   }
-
/*
 * find index of new parent clock using cached parent ptrs,
 * or if not yet cached, use string name comparison and cache
@@ -1700,11 +1693,6 @@ static struct clk_core *__clk_init_parent(struct 
clk_core *core)
 
index = core->ops->get_parent(core->hw);
 
-   if (!core->parents)
-   core->parents =
-   kcalloc(core->num_parents, sizeof(*core->parents),
-   GFP_KERNEL);
-
ret = clk_core_get_parent_by_index(core, index);
 
 out:
@@ -2343,26 +2331,15 @@ static int __clk_core_init(struct clk_core *core)
__func__, core->name);
 
/*
-* Allocate an array of struct clk *'s to avoid unnecessary string
-* look-ups of clk's possible parents.  This can fail for clocks passed
-* in to clk_init during early boot; thus any access to core->parents[]
-* must always check for a NULL pointer and try to populate it if
-* necessary.
+* clk_core_lookup returns NULL for parents that have not been
+* clk_init'd; thus any access to clk->parents[] must check
+* for a NULL pointer.  We can always perform lazy lookups for
+* missing parents later on.
 */
-   if (core->num_parents > 1) {
-   core->parents = kcalloc(core->num_parents,
-   sizeof(*core->parents), GFP_KERNEL);
-   /*
-* clk_core_lookup returns NULL for parents that have not been
-* clk_init'd; thus any access to clk->parents[] must check
-* for a NULL pointer.  We can always perform lazy lookups for
-* missing parents later on.
-*/
-   if (core->parents)
-   for (i = 0; i < core->num_parents; i++)
-   core->parents[i] =
-   clk_core_lookup(core->parent_names[i]);
-   }
+   if (core->parents)
+   for (i = 0; i < core->num_parents; i++)
+   core->parents[i] =
+   clk_core_lookup(core->parent_names[i]);
 
core->parent = __clk_init_parent(core);
 
@@ -2560,12 +2537,20 @@ struct clk *clk_register(struct device *dev, struct 
clk_hw *hw)
}
}
 
+   /* avoid unnecessary string look-ups of clk_core's possible parents. */
+   core->parents = kcalloc(core->num_parents, sizeof(*core->parents),
+   GFP_KERNEL);
+   if (!core->parents) {
+   ret = -ENOMEM;
+   goto fail_parents;
+   };
+
INIT_HLIST_HEAD(>clks);
 
hw->clk = __clk_create_clk(hw, NULL, NULL);
if (IS_ERR(hw->clk)) {
ret = PTR_ERR(hw->clk);
-   goto fail_parent_names_copy;
+   goto fail_parents;
}
 
ret = __clk_core_init(core);
@@ -2575,6 +2560,8 @@ struct clk *clk_register(struct device *dev, struct 
clk_hw *hw)
__clk_free_clk(hw->clk);
hw->clk = NULL;
 
+fail_parents:
+   kfree(core->parents);
 fail_parent_names_copy:
while (--i >= 0)
kfree_const(core->parent_names[i]);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/16] clk: remove unused first argument of __clk_init()

2015-11-29 Thread Masahiro Yamada
The "struct device *dev" is not used at all in this function.

Signed-off-by: Masahiro Yamada 
---

 drivers/clk/clk.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 6d9cd05..65530e9 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2286,13 +2286,12 @@ static inline void clk_debug_unregister(struct clk_core 
*core)
 
 /**
  * __clk_init - initialize the data structures in a struct clk
- * @dev:   device initializing this clk, placeholder for now
  * @clk:   clk being initialized
  *
  * Initializes the lists in struct clk_core, queries the hardware for the
  * parent and rate and sets them both.
  */
-static int __clk_init(struct device *dev, struct clk *clk_user)
+static int __clk_init(struct clk *clk_user)
 {
int i, ret = 0;
struct clk_core *orphan;
@@ -2575,7 +2574,7 @@ struct clk *clk_register(struct device *dev, struct 
clk_hw *hw)
goto fail_parent_names_copy;
}
 
-   ret = __clk_init(dev, hw->clk);
+   ret = __clk_init(hw->clk);
if (!ret)
return hw->clk;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/16] clk: simplify clk_fetch_parent_index() function

2015-11-29 Thread Masahiro Yamada
The clk_core_get_parent_by_index can be used as a helper function
to simplify the implementation of clk_fetch_parent_index().

Signed-off-by: Masahiro Yamada 
---

 drivers/clk/clk.c | 18 ++
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 50b9b3d..8a6a33b 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -1069,23 +1069,9 @@ static int clk_fetch_parent_index(struct clk_core *core,
if (parent)
return -EINVAL;
 
-   /*
-* find index of new parent clock using cached parent ptrs,
-* or if not yet cached, use string name comparison and cache
-* them now to avoid future calls to clk_core_lookup.
-*/
-   for (i = 0; i < core->num_parents; i++) {
-   if (core->parents[i] == parent)
-   return i;
-
-   if (core->parents[i])
-   continue;
-
-   if (!strcmp(core->parent_names[i], parent->name)) {
-   core->parents[i] = clk_core_lookup(parent->name);
+   for (i = 0; i < core->num_parents; i++)
+   if (clk_core_get_parent_by_index(core, i) == parent)
return i;
-   }
-   }
 
return -EINVAL;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/16] clk: walk the orphan clock list more simply

2015-11-29 Thread Masahiro Yamada
This loop can be much simpler.  If a new parent is available for
orphan clocks, __clk_init_parent(orphan) can detect it.

Signed-off-by: Masahiro Yamada 
---

 drivers/clk/clk.c | 21 ++---
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index a1d046c..e3c6559 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2402,24 +2402,15 @@ static int __clk_core_init(struct clk_core *core)
core->rate = core->req_rate = rate;
 
/*
-* walk the list of orphan clocks and reparent any that are children of
-* this clock
+* walk the list of orphan clocks and reparent any that newly finds a
+* parent.
 */
hlist_for_each_entry_safe(orphan, tmp2, _orphan_list, child_node) {
-   if (orphan->num_parents && orphan->ops->get_parent) {
-   i = orphan->ops->get_parent(orphan->hw);
-   if (i >= 0 && i < orphan->num_parents &&
-   !strcmp(core->name, orphan->parent_names[i]))
-   clk_core_reparent(orphan, core);
-   continue;
-   }
+   struct clk_core *parent = __clk_init_parent(orphan);
 
-   for (i = 0; i < orphan->num_parents; i++)
-   if (!strcmp(core->name, orphan->parent_names[i])) {
-   clk_core_reparent(orphan, core);
-   break;
-   }
-}
+   if (parent)
+   clk_core_reparent(orphan, parent);
+   }
 
/*
 * optional platform-specific magic
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/16] clk: make sure parent is not NULL in clk_fetch_parent_index()

2015-11-29 Thread Masahiro Yamada
If parent is given with NULL, clk_fetch_parent_index() could return
a positive index value.

Currently, parent is checked by the callers of this function, but
it would be safer to do it in this function.

Signed-off-by: Masahiro Yamada 
---

 drivers/clk/clk.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index e3c6559..50b9b3d 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -1066,6 +1066,9 @@ static int clk_fetch_parent_index(struct clk_core *core,
 {
int i;
 
+   if (parent)
+   return -EINVAL;
+
/*
 * find index of new parent clock using cached parent ptrs,
 * or if not yet cached, use string name comparison and cache
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/16] clk: simplify __clk_init_parent()

2015-11-29 Thread Masahiro Yamada
The translation from the index into clk_core is done by
clk_core_get_parent_by_index().  The if-block for num_parents == 1
case is duplicating the code in the clk_core_get_parent_by_index().

Drop the "if (num_parents == 1)" from the special case.  Instead,
set the index to zero if .get_parent() is missing.

Signed-off-by: Masahiro Yamada 
---

 drivers/clk/clk.c | 38 --
 1 file changed, 4 insertions(+), 34 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 486f6d4..ef6fedb 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -1651,44 +1651,14 @@ struct clk *clk_get_parent(struct clk *clk)
 }
 EXPORT_SYMBOL_GPL(clk_get_parent);
 
-/*
- * .get_parent is mandatory for clocks with multiple possible parents.  It is
- * optional for single-parent clocks.  Always call .get_parent if it is
- * available and WARN if it is missing for multi-parent clocks.
- *
- * For single-parent clocks without .get_parent, first check to see if the
- * .parents array exists, and if so use it to avoid an expensive tree
- * traversal.  If .parents does not exist then walk the tree.
- */
 static struct clk_core *__clk_init_parent(struct clk_core *core)
 {
-   struct clk_core *ret = NULL;
-   u8 index;
-
-   /* handle the trivial cases */
+   u8 index = 0;
 
-   if (!core->num_parents)
-   goto out;
-
-   if (core->num_parents == 1) {
-   if (IS_ERR_OR_NULL(core->parent))
-   core->parent = clk_core_lookup(core->parent_names[0]);
-   ret = core->parent;
-   goto out;
-   }
+   if (core->ops->get_parent)
+   index = core->ops->get_parent(core->hw);
 
-   /*
-* Do our best to cache parent clocks in core->parents.  This prevents
-* unnecessary and expensive lookups.  We don't set core->parent here;
-* that is done by the calling function.
-*/
-
-   index = core->ops->get_parent(core->hw);
-
-   ret = clk_core_get_parent_by_index(core, index);
-
-out:
-   return ret;
+   return clk_core_get_parent_by_index(core, index);
 }
 
 static void clk_core_reparent(struct clk_core *core,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/16] clk: remove unnecessary !core->parents conditional

2015-11-29 Thread Masahiro Yamada
This if-block has been here since the introduction of the common
clock framework.  Now no clock drivers are statically initialized.
core->parent is always NULL at this point.  Drop the redundant
check and the confusing comment.

Signed-off-by: Masahiro Yamada 
---

 drivers/clk/clk.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 36373d3..66c6f34 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2348,11 +2348,8 @@ static int __clk_core_init(struct clk_core *core)
 * in to clk_init during early boot; thus any access to core->parents[]
 * must always check for a NULL pointer and try to populate it if
 * necessary.
-*
-* If core->parents is not NULL we skip this entire block.  This allows
-* for clock drivers to statically initialize core->parents.
 */
-   if (core->num_parents > 1 && !core->parents) {
+   if (core->num_parents > 1) {
core->parents = kcalloc(core->num_parents, sizeof(struct clk *),
GFP_KERNEL);
/*
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 03/13] perf tools: Enable BPF object configure syntax

2015-11-29 Thread Wang Nan
This patch adds the final step for BPF map configuration. A new syntax
is appended into parser so user can config BPF objects through '/' '/'
enclosed config terms.

After this patch, following syntax is available:

 # perf record -e bpf_file.c/maps:mymap:value=123/ ...

It would takes effect after appling following commits.

Test result:

 # cat ./test_bpf_map_1.c
 / BEGIN **/
 #define SEC(NAME) __attribute__((section(NAME), used))
 enum bpf_map_type {
 BPF_MAP_TYPE_ARRAY = 2,
 };
 struct bpf_map_def {
 unsigned int type;
 unsigned int key_size;
 unsigned int value_size;
 unsigned int max_entries;
 };
 static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
 (void *)1;
 static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) =
 (void *)6;
 struct bpf_map_def SEC("maps") channel = {
 .type = BPF_MAP_TYPE_ARRAY,
 .key_size = sizeof(int),
 .value_size = sizeof(int),
 .max_entries = 1,
 };
 SEC("func=sys_nanosleep")
 int func(void *ctx)
 {
 int key = 0;
 char fmt[] = "%d\n";
 int *pval = map_lookup_elem(, );
 if (!pval)
 return 0;
 bpf_trace_printk(fmt, sizeof(fmt), *pval);
 return 0;
 }
 char _license[] SEC("license") = "GPL";
 int _version SEC("version") = LINUX_VERSION_CODE;
 /* END ***/

 - Normal case:
 # ./perf record -e './test_bpf_map_1.c/maps:channel.value=10/' usleep 10
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.012 MB perf.data ]

 - Error case:

 # ./perf record -e './test_bpf_map_1.c/maps:channel.value/' usleep 10
 event syntax error: '..ps:channel:value/'
   \___ Config value not set (lost '=')
 Hint:  Valid config term:
maps:[]:value=[value]
(add -v to see detail)
Run 'perf list' for a list of valid events

 Usage: perf record [] []
or: perf record [] --  []

-e, --eventevent selector. use 'perf list' to list available 
events

 # ./perf record -e './test_bpf_map_1.c/xmaps:channel.value=10/' usleep 10
 event syntax error: '..pf_map_1.c/xmaps:channel.value=10/'
   \___ Invalid object config option
 [SNIP]

 # ./perf record -e './test_bpf_map_1.c/maps:xchannel.value=10/' usleep 10
 event syntax error: '..p_1.c/maps:xchannel.value=10/'
   \___ Target map not exist
 [SNIP]

 # ./perf record -e './test_bpf_map_1.c/maps:channel.xvalue=10/' usleep 10
 event syntax error: '..ps:channel.xvalue=10/'
   \___ Invalid object maps config option
 [SNIP]

 # ./perf record -e './test_bpf_map_1.c/maps:channel.value=x10/' usleep 10
 event syntax error: '..nnel.value=x10/'
   \___ Incorrect value type for map
 [SNIP]

 Change BPF_MAP_TYPE_ARRAY = 2 tp BPF_MAP_TYPE_ARRAY = 1:

 # ./perf record -e './test_bpf_map_1.c/maps:channel.value=10/' usleep 10
 event syntax error: '..ps:channel.value=10/'
   \___ Can't use this config term to this type 
of map

 Hint:  Valid config term:
maps:[].value=[value]
(add -v to see detail)

Signed-off-by: Wang Nan 
Signed-off-by: He Kuang 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: pi3or...@163.com
---
 tools/perf/util/parse-events.c | 56 +++---
 tools/perf/util/parse-events.h |  3 ++-
 tools/perf/util/parse-events.l |  2 +-
 tools/perf/util/parse-events.y | 23 ++---
 4 files changed, 75 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 6fc8cd7..95775fe 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -628,17 +628,64 @@ errout:
return err;
 }
 
+static int
+parse_events_config_bpf(struct parse_events_evlist *data,
+  struct bpf_object *obj,
+  struct list_head *head_config)
+{
+   struct parse_events_term *term;
+   int error_pos;
+
+   if (!head_config || list_empty(head_config))
+   return 0;
+
+   list_for_each_entry(term, head_config, list) {
+   char errbuf[BUFSIZ];
+   int err;
+
+   if (term->type_term != PARSE_EVENTS__TERM_TYPE_USER) {
+   snprintf(errbuf, sizeof(errbuf),
+"Invalid config term for BPF object");
+   errbuf[BUFSIZ - 1] = '\0';
+
+   data->error->idx = term->err_term;
+   data->error->str = strdup(errbuf);
+   return -EINVAL;
+   }
+
+   err = bpf__config_obj(obj, term, NULL, _pos);
+   if (err) {
+   bpf__strerror_config_obj(obj, term, NULL,
+

[PATCH v3 07/13] perf tools: Support setting different slots in a BPF map separately

2015-11-29 Thread Wang Nan
This patch introduces basic facilities to support config different
slots in a BPF map one by one.

array.nr_ranges and array.ranges are introduced into 'struct
parse_events_term', where ranges is an array of indices range (start,
length) which will be configured by this config term. nr_ranges
is the size of the array. The array is passed to 'struct bpf_map_priv'.
To indicate the new type of configuration, BPF_MAP_KEY_RANGES is
added as a new key type. bpf_map_config_foreach_key() is extended to
iterate over those indices instead of all possible keys.

Code in this commit will be enabled by following commit which enables
the indices syntax for array configuration.

Signed-off-by: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: pi3or...@163.com
---
 tools/perf/util/bpf-loader.c   | 132 ++---
 tools/perf/util/bpf-loader.h   |   1 +
 tools/perf/util/parse-events.c |  33 ++-
 tools/perf/util/parse-events.h |  12 
 4 files changed, 170 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 97f5efc..8d232b8 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -17,6 +17,7 @@
 #include "llvm-utils.h"
 #include "probe-event.h"
 #include "probe-finder.h" // for MAX_PROBES
+#include "parse-events.h"
 #include "llvm-utils.h"
 
 #define DEFINE_PRINT_FN(name, level) \
@@ -747,6 +748,7 @@ enum bpf_map_op_type {
 
 enum bpf_map_key_type {
BPF_MAP_KEY_ALL,
+   BPF_MAP_KEY_RANGES,
 };
 
 struct bpf_map_op {
@@ -754,6 +756,9 @@ struct bpf_map_op {
enum bpf_map_op_type op_type;
enum bpf_map_key_type key_type;
union {
+   struct parse_events_array array;
+   } k;
+   union {
u64 value;
struct perf_evsel *evsel;
} v;
@@ -779,6 +784,8 @@ bpf_map_op__free(struct bpf_map_op *op)
 */
if ((list->next != LIST_POISON1) && (list->prev != LIST_POISON2))
list_del(list);
+   if (op->key_type == BPF_MAP_KEY_RANGES)
+   parse_events__clear_array(>k.array);
free(op);
 }
 
@@ -794,8 +801,30 @@ bpf_map_priv__clear(struct bpf_map *map __maybe_unused,
free(priv);
 }
 
+static int
+bpf_map_op_setkey(struct bpf_map_op *op, struct parse_events_term *term,
+ const char *map_name)
+{
+   op->key_type = BPF_MAP_KEY_ALL;
+
+   if (term->array.nr_ranges) {
+   size_t memsz = term->array.nr_ranges *
+   sizeof(op->k.array.ranges[0]);
+
+   op->k.array.ranges = memdup(term->array.ranges, memsz);
+   if (!op->k.array.ranges) {
+   pr_debug("No enough memory to alloc indices for %s\n",
+map_name);
+   return -ENOMEM;
+   }
+   op->key_type = BPF_MAP_KEY_RANGES;
+   op->k.array.nr_ranges = term->array.nr_ranges;
+   }
+   return 0;
+}
+
 static struct bpf_map_op *
-bpf_map_op__alloc(struct bpf_map *map)
+bpf_map_op__alloc(struct bpf_map *map, struct parse_events_term *term)
 {
struct bpf_map_op *op;
struct bpf_map_priv *priv;
@@ -829,7 +858,12 @@ bpf_map_op__alloc(struct bpf_map *map)
return ERR_PTR(-ENOMEM);
}
 
-   op->key_type = BPF_MAP_KEY_ALL;
+   err = bpf_map_op_setkey(op, term, map_name);
+   if (err) {
+   free(op);
+   return ERR_PTR(err);
+   }
+
list_add_tail(>list, >ops_list);
return op;
 }
@@ -872,7 +906,7 @@ bpf__obj_config_map_array_value(struct bpf_map *map,
return -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUESIZE;
}
 
-   op = bpf_map_op__alloc(map);
+   op = bpf_map_op__alloc(map, term);
if (IS_ERR(op))
return PTR_ERR(op);
op->op_type = BPF_MAP_OP_SET_VALUE;
@@ -933,7 +967,7 @@ bpf__obj_config_map_array_event(struct bpf_map *map,
return -BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE;
}
 
-   op = bpf_map_op__alloc(map);
+   op = bpf_map_op__alloc(map, term);
if (IS_ERR(op))
return PTR_ERR(op);
 
@@ -972,6 +1006,44 @@ struct bpf_obj_config_map_func bpf_obj_config_map_funcs[] 
= {
 };
 
 static int
+config_map_indices_range_check(struct parse_events_term *term,
+  struct bpf_map *map,
+  const char *map_name)
+{
+   struct parse_events_array *array = >array;
+   struct bpf_map_def def;
+   unsigned int i;
+   int err;
+
+   if (!array->nr_ranges)
+   return 0;
+   if (!array->ranges) {
+   pr_debug("ERROR: map %s: array->nr_ranges is %d but range array 
is NULL\n",
+map_name, (int)array->nr_ranges);
+   return -BPF_LOADER_ERRNO__INTERNAL;
+   }
+
+

[PATCH 07/16] clk: simplify clk_core_get_parent_by_index()

2015-11-29 Thread Masahiro Yamada
Drop the "if (!core->parents)" case and refactor the function a bit
because core->parents is always allocated.  (Strictly speaking, it is
ZERO_SIZE_PTR if core->num_parents == 0, but such a case is omitted
by the if-conditional above.)

Signed-off-by: Masahiro Yamada 
---

 drivers/clk/clk.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index e05084e..f2758c4 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -350,13 +350,12 @@ static struct clk_core 
*clk_core_get_parent_by_index(struct clk_core *core,
 {
if (!core || index >= core->num_parents)
return NULL;
-   else if (!core->parents)
-   return clk_core_lookup(core->parent_names[index]);
-   else if (!core->parents[index])
-   return core->parents[index] =
-   clk_core_lookup(core->parent_names[index]);
-   else
-   return core->parents[index];
+
+   if (!core->parents[index])
+   core->parents[index] =
+   clk_core_lookup(core->parent_names[index]);
+
+   return core->parents[index];
 }
 
 struct clk_hw *
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 13/13] perf record: Support custom vmlinux path

2015-11-29 Thread Wang Nan
From: He Kuang 

Make perf-record command support --vmlinux option if BPF_PROLOGUE is on.

'perf record' needs vmlinux as the source of DWARF info to generate
prologue for BPF programs, so path of vmlinux should be specified.

Short name 'k' has been taken by 'clockid'. This patch skips the short
option name and use '--vmlinux' for vmlinux path.

Documentation is also updated.

Test result:

In a production (or broken) environment:
 (built by
  # rm -rf ~/.debug/
  # mv /lib/modules/`uname -r`/build/vmlinux /tmp/
 )

 # ./perf record -e ./test_bpf_base.c ls
 Failed to find the path for kernel: No such file or directory
 event syntax error: './test_bpf_base.c'
  \___ You need to check probing points in BPF file
 ...

 # ./perf record --vmlinux /tmp/vmlinux -e ./test_bpf_base.c ls
 ...
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.011 MB perf.data ]

Help messages when build with NO_LIBBPF:

 # ./perf record -h
--transaction sample transaction flags (special events only)
--vmlinux   vmlinux pathname
  (not build because NO_LIBBPF=1)
 # ./perf record --vmlinux /tmp/vmlinux ls /
  Warning: option `vmlinux' is not built because NO_LIBBPF=1
 ...
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.011 MB perf.data (11 samples) ]

Help messages when build with NO_DWARF:

 # ./perf record -h
--transaction sample transaction flags (special events only)
--vmlinux   vmlinux pathname
  (not build because NO_DWARF=1)

Signed-off-by: He Kuang 
Signed-off-by: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: pi3or...@163.com
---
 tools/perf/Documentation/perf-record.txt | 10 --
 tools/perf/builtin-record.c  | 16 
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index e630a7d..8d032f4 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -314,11 +314,17 @@ This option sets the time out limit. The default value is 
500 ms.
 Record context switch events i.e. events of type PERF_RECORD_SWITCH or
 PERF_RECORD_SWITCH_CPU_WIDE.
 
---clang-path::
+--clang-path=PATH::
 Path to clang binary to use for compiling BPF scriptlets.
+(enabled when BPF support is on)
 
---clang-opt::
+--clang-opt=OPTIONS::
 Options passed to clang when compiling BPF scriptlets.
+(enabled when BPF support is on)
+
+--vmlinux=PATH::
+Specify vmlinux path which has debuginfo.
+(enabled when BPF prologue is on)
 
 SEE ALSO
 
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 11bf32d..2230b85 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1128,6 +1128,8 @@ struct option __record_options[] = {
   "clang binary to use for compiling BPF scriptlets"),
OPT_STRING(0, "clang-opt", _param.clang_opt, "clang options",
   "options passed to clang when compiling BPF scriptlets"),
+   OPT_STRING(0, "vmlinux", _conf.vmlinux_name,
+  "file", "vmlinux pathname"),
OPT_END()
 };
 
@@ -1146,6 +1148,20 @@ int cmd_record(int argc, const char **argv, const char 
*prefix __maybe_unused)
 # undef set_nobuild
 #endif
 
+#ifndef HAVE_BPF_PROLOGUE
+# if !defined (HAVE_DWARF_SUPPORT)
+#  define REASON  "NO_DWARF=1"
+# elif !defined (HAVE_LIBBPF_SUPPORT)
+#  define REASON  "NO_LIBBPF=1"
+# else
+#  define REASON  "this architecture doesn't support BPF prologue"
+# endif
+# define set_nobuild(s, l, c) set_option_nobuild(record_options, s, l, REASON, 
c)
+   set_nobuild('\0', "vmlinux", true);
+# undef set_nobuild
+# undef REASON
+#endif
+
rec->evlist = perf_evlist__new();
if (rec->evlist == NULL)
return -ENOMEM;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 05/13] perf tools: Support perf event alias name

2015-11-29 Thread Wang Nan
From: He Kuang 

This patch adds new bison rules for specifying an alias name to a perf
event, which allows cmdline refer to previous defined perf event through
its name. With this patch user can give alias name to a perf event using
following cmdline:

 # perf record -e mypmu=cycles ...

If alias is not provided (normal case):

 # perf record -e cycles ...

It will be set to event's name automatically ('cycles' in the above
example).

To allow parser refer to existing event selector, pass event list to
'struct parse_events_evlist'. perf_evlist__find_evsel_by_alias() is
introduced to get evsel through its alias.

Test result:

Before this patch:

 # ./perf record -e evt=cycles usleep 10
 event syntax error: 'evt=cycles'
  \___ parser error
 Run 'perf list' for a list of valid events
 [SNIP]

After this patch:

 # ./perf record -e evt=cycles usleep 10
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.012 MB perf.data (2 samples) ]

Signed-off-by: He Kuang 
Signed-off-by: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: pi3or...@163.com
---
 tools/perf/util/evlist.c   | 16 
 tools/perf/util/evlist.h   |  4 
 tools/perf/util/evsel.c|  1 +
 tools/perf/util/evsel.h|  1 +
 tools/perf/util/parse-events.c | 37 -
 tools/perf/util/parse-events.h |  5 +
 tools/perf/util/parse-events.y | 15 ++-
 7 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index d139219..8dd59aa 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1753,3 +1753,19 @@ void perf_evlist__set_tracking_event(struct perf_evlist 
*evlist,
 
tracking_evsel->tracking = true;
 }
+
+struct perf_evsel *
+perf_evlist__find_evsel_by_alias(struct perf_evlist *evlist,
+const char *alias)
+{
+   struct perf_evsel *evsel;
+
+   evlist__for_each(evlist, evsel) {
+   if (!evsel->alias)
+   continue;
+   if (strcmp(alias, evsel->alias) == 0)
+   return evsel;
+   }
+
+   return NULL;
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index a459fe7..4e25342 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -292,4 +292,8 @@ void perf_evlist__set_tracking_event(struct perf_evlist 
*evlist,
 struct perf_evsel *tracking_evsel);
 
 void perf_event_attr__set_max_precise_ip(struct perf_event_attr *attr);
+
+struct perf_evsel *
+perf_evlist__find_evsel_by_alias(struct perf_evlist *evlist, const char 
*alias);
+
 #endif /* __PERF_EVLIST_H */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 0a1f4d9..32131fc 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1060,6 +1060,7 @@ void perf_evsel__exit(struct perf_evsel *evsel)
thread_map__put(evsel->threads);
zfree(>group_name);
zfree(>name);
+   zfree(>alias);
perf_evsel__object.fini(evsel);
 }
 
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 0e49bd7..51bab0f 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -89,6 +89,7 @@ struct perf_evsel {
int idx;
u32 ids;
char*name;
+   char*alias;
double  scale;
const char  *unit;
struct event_format *tp_format;
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 95775fe..484c8e4 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -653,9 +653,9 @@ parse_events_config_bpf(struct parse_events_evlist *data,
return -EINVAL;
}
 
-   err = bpf__config_obj(obj, term, NULL, _pos);
+   err = bpf__config_obj(obj, term, data->evlist, _pos);
if (err) {
-   bpf__strerror_config_obj(obj, term, NULL,
+   bpf__strerror_config_obj(obj, term, data->evlist,
 _pos, err, errbuf,
 sizeof(errbuf));
data->error->help = strdup(
@@ -1089,6 +1089,30 @@ int parse_events__modifier_group(struct list_head *list,
return parse_events__modifier_event(list, event_mod, true);
 }
 
+int parse_events__set_event_alias(struct parse_events_evlist *data,
+ struct list_head *list,
+ const char *str,
+ void *loc_alias_)
+{
+   struct perf_evsel *evsel;
+   YYLTYPE *loc_alias = loc_alias_;
+
+   if (!str)
+   return 0;
+
+   if 

[PATCH v3 09/13] perf tools: Introduce bpf-output event

2015-11-29 Thread Wang Nan
Commit a43eec304259a6c637f4014a6d4767159b6a3aa3 (bpf: introduce
bpf_perf_event_output() helper) add a helper to enable BPF program
output data to perf ring buffer through a new type of perf event
PERF_COUNT_SW_BPF_OUTPUT. This patch enable perf to create perf
event of that type. Now perf user can use following cmdline to
receive output data from BPF programs:

 # ./perf record -a -e evt=bpf-output/no-inherit/ \
-e ./test_bpf_output.c/maps:channel.event=evt/ ls /
 # ./perf script
perf 12927 [004] 355971.129276:  0 evt=bpf-output/no-inherit/:  
811ed5f1 sys_write
perf 12927 [004] 355971.129279:  0 evt=bpf-output/no-inherit/:  
811ed5f1 sys_write
...

Test result:
 # cat ./test_bpf_output.c
 / BEGIN **/
 typedef int u32;
 typedef unsigned long long u64;

 enum bpf_map_type {
BPF_MAP_TYPE_PERF_EVENT_ARRAY = 4,
 };

 struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
 };

 #define SEC(NAME) __attribute__((section(NAME), used))
 static u64 (*bpf_ktime_get_ns)(void) =
(void *)5;
 static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)6;
 static int (*bpf_get_smp_processor_id)(void) =
(void *)8;
 static int (*bpf_perf_event_output)(void *, struct bpf_map_def *, int, void *, 
unsigned long) =
(void *)23;

 struct bpf_map_def SEC("maps") channel = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = __NR_CPUS__,
 };

 SEC("func_write=sys_write")
 int func_write(void *ctx)
 {
struct {
u64 ktime;
int cpuid;
} __attribute__((packed)) output_data;
char error_data[] = "Error: failed to output\n";

output_data.cpuid = bpf_get_smp_processor_id();
output_data.ktime = bpf_ktime_get_ns();
int err = bpf_perf_event_output(ctx, , 
bpf_get_smp_processor_id(),
_data, sizeof(output_data));
if (err)
bpf_trace_printk(error_data, sizeof(error_data));
return 0;
 }
 char _license[] SEC("license") = "GPL";
 int _version SEC("version") = LINUX_VERSION_CODE;
 / END ***/

 # ./perf record -a -e evt=bpf-output/no-inherit/ \
-e ./test_bpf_output.c/maps:channel.event=evt/ ls /
 # ./perf script | grep ls
  ls  4085 [000] 2746114.230215: evt=bpf-output/no-inherit/:  
811ed5f1 sys_write (/lib/modules/4.3.0-rc4+/build/vmlinux)
  ls  4085 [000] 2746114.230244: evt=bpf-output/no-inherit/:  
811ed5f1 sys_write (/lib/modules/4.3.0-rc4+/build/vmlinux)

Signed-off-by: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Brendan Gregg 
Cc: David S. Miller 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: pi3or...@163.com
---
 tools/perf/util/evsel.c| 6 ++
 tools/perf/util/parse-events.l | 1 +
 2 files changed, 7 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 32131fc..4dee8e3 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -224,6 +224,12 @@ struct perf_evsel *perf_evsel__new_idx(struct 
perf_event_attr *attr, int idx)
if (evsel != NULL)
perf_evsel__init(evsel, attr, idx);
 
+   if ((evsel->attr.type == PERF_TYPE_SOFTWARE) &&
+   (evsel->attr.config == PERF_COUNT_SW_BPF_OUTPUT)) {
+   evsel->attr.sample_type |= PERF_SAMPLE_RAW;
+   evsel->attr.sample_period = 1;
+   }
+
return evsel;
 }
 
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 8bb3437..27d567f 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -249,6 +249,7 @@ cpu-migrations|migrations   { return 
sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COU
 alignment-faults   { return sym(yyscanner, 
PERF_TYPE_SOFTWARE, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
 emulation-faults   { return sym(yyscanner, 
PERF_TYPE_SOFTWARE, PERF_COUNT_SW_EMULATION_FAULTS); }
 dummy  { return sym(yyscanner, 
PERF_TYPE_SOFTWARE, PERF_COUNT_SW_DUMMY); }
+bpf-output { return sym(yyscanner, 
PERF_TYPE_SOFTWARE, PERF_COUNT_SW_BPF_OUTPUT); }
 
/*
 * We have to handle the kernel PMU event 
cycles-ct/cycles-t/mem-loads/mem-stores separately.
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 02/13] perf tools: Add API to config maps in bpf object

2015-11-29 Thread Wang Nan
bpf__config_obj() is introduced as a core API to config BPF object
after loading. One configuration option of maps is introduced. After
this patch BPF object can accept configuration like:

 maps:my_map.value=1234

(maps.my_map.value looks pretty. However, there's a small but hard
to fixed problem related to flex's greedy matching. Please see [1].
Choose ':' to avoid it in a simpler way.)

This patch is more complex than the work it really does because the
consideration of extension. In designing of BPF map configuration,
following things should be considered:

 1. Array indics selection: perf should allow user setting different
value to different slots in an array, with syntax like:
maps:my_map.value[0,3...6]=1234;

 2. A map can be config by different config terms, each for a part
of it. For example, set each slot to pid of a thread;

 3. Type of value: integer is not the only valid value type. Perf
event can also be put into a map after commit 35578d7984003097af2b1e3
(bpf: Implement function bpf_perf_event_read() that get the selected
hardware PMU conuter);

 4. For hash table, it is possible to use string or other as key;

 5. It is possible that map configuration is unable to be setup
during parsing. Perf event is an example.

Therefore, this patch does following:

 1. Instead of updating map element during parsing, this patch stores
map config options in 'struct bpf_map_priv'. Following patches
would apply those configs at proper time;

 2. Link map operations to a list so a map can have multiple config
terms attached, so different parts can be configured separately;

 3. Make 'struct bpf_map_priv' extensible so following patches can
add new types of keys and operations;

 4. Use bpf_config_map_funcs array to support more maps config options.

Since the patch changing event parser to parse BPF object config is
relative large, I put in another commit. Code in this patch
could be tested after applying next patch.

[1] http://lkml.kernel.org/g/564ed621.4050...@huawei.com

Signed-off-by: Wang Nan 
Signed-off-by: He Kuang 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: pi3or...@163.com
---
 tools/perf/util/bpf-loader.c | 266 +++
 tools/perf/util/bpf-loader.h |  38 +++
 2 files changed, 304 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 540a7ef..7d361aa 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -739,6 +739,251 @@ int bpf__foreach_tev(struct bpf_object *obj,
return 0;
 }
 
+enum bpf_map_op_type {
+   BPF_MAP_OP_SET_VALUE,
+};
+
+enum bpf_map_key_type {
+   BPF_MAP_KEY_ALL,
+};
+
+struct bpf_map_op {
+   struct list_head list;
+   enum bpf_map_op_type op_type;
+   enum bpf_map_key_type key_type;
+   union {
+   u64 value;
+   } v;
+};
+
+struct bpf_map_priv {
+   struct list_head ops_list;
+};
+
+static void
+bpf_map_op__free(struct bpf_map_op *op)
+{
+   struct list_head *list = >list;
+   /*
+* bpf_map_op__free() needs to consider following cases:
+*   1. When the op is created but not linked to any list:
+*  impossible. This only happen in bpf_map_op__alloc()
+*  and it would be freed directly;
+*   2. Normal case, when the op is linked to a list;
+*   3. After the op has already be removed.
+* Thanks to list.h, if it has removed by list_del() then
+* list->{next,prev} should have been set to LIST_POISON{1,2}.
+*/
+   if ((list->next != LIST_POISON1) && (list->prev != LIST_POISON2))
+   list_del(list);
+   free(op);
+}
+
+static void
+bpf_map_priv__clear(struct bpf_map *map __maybe_unused,
+   void *_priv)
+{
+   struct bpf_map_priv *priv = _priv;
+   struct bpf_map_op *pos, *n;
+
+   list_for_each_entry_safe(pos, n, >ops_list, list)
+   bpf_map_op__free(pos);
+   free(priv);
+}
+
+static struct bpf_map_op *
+bpf_map_op__alloc(struct bpf_map *map)
+{
+   struct bpf_map_op *op;
+   struct bpf_map_priv *priv;
+   const char *map_name;
+   int err;
+
+   map_name = bpf_map__get_name(map);
+   err = bpf_map__get_private(map, (void **));
+   if (err) {
+   pr_debug("Failed to get private from map %s\n", map_name);
+   return ERR_PTR(err);
+   }
+
+   if (!priv) {
+   priv = zalloc(sizeof(*priv));
+   if (!priv) {
+   pr_debug("No enough memory to alloc map private\n");
+   return ERR_PTR(-ENOMEM);
+   }
+   INIT_LIST_HEAD(>ops_list);
+
+   if (bpf_map__set_private(map, priv, bpf_map_priv__clear)) {
+   free(priv);
+   return ERR_PTR(-BPF_LOADER_ERRNO__INTERNAL);
+   }

[PATCH v3 10/13] perf data: Add u32_hex data type

2015-11-29 Thread Wang Nan
Add hexdamical u32 to base data type, which is useful for raw output
because raw data are u32 aligned.

Signed-off-by: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Brendan Gregg 
Cc: David S. Miller 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: pi3or...@163.com
---
 tools/perf/util/data-convert-bt.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/util/data-convert-bt.c 
b/tools/perf/util/data-convert-bt.c
index 5bfc119..34cd1e4 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -63,6 +63,7 @@ struct ctf_writer {
struct bt_ctf_field_type*s32;
struct bt_ctf_field_type*u32;
struct bt_ctf_field_type*string;
+   struct bt_ctf_field_type*u32_hex;
struct bt_ctf_field_type*u64_hex;
};
struct bt_ctf_field_type *array[6];
@@ -982,6 +983,7 @@ do {
\
CREATE_INT_TYPE(cw->data.u64, 64, false, false);
CREATE_INT_TYPE(cw->data.s32, 32, true,  false);
CREATE_INT_TYPE(cw->data.u32, 32, false, false);
+   CREATE_INT_TYPE(cw->data.u32_hex, 32, false, true);
CREATE_INT_TYPE(cw->data.u64_hex, 64, false, true);
 
cw->data.string  = bt_ctf_field_type_string_create();
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 11/13] perf data: Support converting data from bpf_perf_event_output()

2015-11-29 Thread Wang Nan
bpf_perf_event_output() outputs data through sample->raw_data. This
patch adds support to convert those data into CTF. A python script
then can be used to process output data from BPF programs.

Test result:

 # cat ./test_bpf_output.c
 / BEGIN **/
 typedef int u32;
 typedef unsigned long long u64;

 enum bpf_map_type {
 BPF_MAP_TYPE_PERF_EVENT_ARRAY = 4,
 };

 struct bpf_map_def {
 unsigned int type;
 unsigned int key_size;
 unsigned int value_size;
 unsigned int max_entries;
 };
 #define SEC(NAME) __attribute__((section(NAME), used))
 static u64 (*bpf_ktime_get_ns)(void) =
 (void *)5;
 static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) =
 (void *)6;
 static int (*bpf_get_smp_processor_id)(void) =
 (void *)8;
 static int (*bpf_perf_event_output)(void *, struct bpf_map_def *, int, void *, 
unsigned long) =
 (void *)23;
 struct bpf_map_def SEC("maps") channel = {
 .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
 .key_size = sizeof(int),
 .value_size = sizeof(u32),
 .max_entries = __NR_CPUS__,
 };
 static inline int __attribute__((always_inline))
 func(void *ctx, int type)
 {
 struct {
 u64 ktime;
 int type;
 } __attribute__((packed)) output_data;
 char error_data[] = "Error: failed to output\n";
 int err;
 output_data.type = type;
 output_data.ktime = bpf_ktime_get_ns();
 err = bpf_perf_event_output(ctx, , bpf_get_smp_processor_id(),
 _data, sizeof(output_data));
 if (err)
 bpf_trace_printk(error_data, sizeof(error_data));
 return 0;
 }
 SEC("func_begin=sys_nanosleep")
 int func_begin(void *ctx) {return func(ctx, 1);}
 SEC("func_end=sys_nanosleep%return")
 int func_end(void *ctx) { return func(ctx, 2);}
 char _license[] SEC("license") = "GPL";
 int _version SEC("version") = LINUX_VERSION_CODE;
 / END ***/

 # ./perf record -e evt=bpf-output/no-inherit/ \
 -e ./test_bpf_output_2.c/maps:channel.event=evt/ \
 usleep 10
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.012 MB perf.data (2 samples) ]

 # ./perf script
  usleep 14942 92503.198504: evt=bpf-output/no-inherit/:  
810e0ba1 sys_nanosleep (/lib/modules/4.3.0
  usleep 14942 92503.298562: evt=bpf-output/no-inherit/:  
810585e9 kretprobe_trampoline_holder (/lib

 # ./perf data convert --to-ctf ./out.ctf
 [ perf data convert: Converted 'perf.data' into CTF data './out.ctf' ]
 [ perf data convert: Converted and wrote 0.000 MB (2 samples) ]

 # babeltrace ./out.ctf
 [01:41:43.198504134] (+?.?) evt=bpf-output/no-inherit/: { cpu_id = 0 
}, { perf_ip = 0x810E0BA1, perf_tid = 14942, perf_pid = 14942, perf_id 
= 1044, raw_len = 3, raw_data = [ [0] = 0x32C0C07B, [1] = 0x5421, [2] = 0x1 ] }
 [01:41:43.298562257] (+0.100058123) evt=bpf-output/no-inherit/: { cpu_id = 0 
}, { perf_ip = 0x810585E9, perf_tid = 14942, perf_pid = 14942, perf_id 
= 1044, raw_len = 3, raw_data = [ [0] = 0x38B77FAA, [1] = 0x5421, [2] = 0x2 ] }

 # cat ./test_bpf_output_2.py
 from babeltrace import TraceCollection
 tc = TraceCollection(
 tc.add_trace('./out.ctf', 'ctf')
 d = {1:[], 2:[]}
 for event in tc.events:
 if not event.name.startswith('evt=bpf-output/no-inherit/'):
 continue
 raw_data = event['raw_data']
 (time, type) = ((raw_data[0] + (raw_data[1] << 32)), raw_data[2])
 d[type].append(time)
 print(list(map(lambda i: d[2][i] - d[1][i], range(len(d[1]));

 # python3 ./test_bpf_output_2.py
 [100056879]

Signed-off-by: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Brendan Gregg 
Cc: David S. Miller 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: pi3or...@163.com
---
 tools/perf/util/data-convert-bt.c | 115 +-
 1 file changed, 114 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/data-convert-bt.c 
b/tools/perf/util/data-convert-bt.c
index 34cd1e4..1fb472b 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -352,6 +352,84 @@ static int add_tracepoint_values(struct ctf_writer *cw,
return ret;
 }
 
+static int
+add_bpf_output_values(struct bt_ctf_event_class *event_class,
+ struct bt_ctf_event *event,
+ struct perf_sample *sample)
+{
+   struct bt_ctf_field_type *len_type, *seq_type;
+   struct bt_ctf_field *len_field, *seq_field;
+   unsigned int raw_size = sample->raw_size;
+   unsigned int nr_elements = raw_size / sizeof(u32);
+   unsigned int i;
+   int ret;
+
+   if (nr_elements * sizeof(u32) != raw_size)
+   pr_warning("Incorrect raw_size (%u) in bpf output event, skip 
%lu bytes\n",
+  raw_size, nr_elements * sizeof(u32) - raw_size);
+
+   len_type = 

[PATCH v3 06/13] perf tools: Enable passing event to BPF object

2015-11-29 Thread Wang Nan
A new syntax is appended into parser so user can pass predefined perf
events into BPF objects.

After this patch, BPF programs for perf are finally able to utilize
bpf_perf_event_read() introduced in commit 35578d7984003097af2b1e3
(bpf: Implement function bpf_perf_event_read() that get the selected
hardware PMU conuter).

Test result:

 # cat ./test_bpf_map_2.c
 / BEGIN **/
 #define SEC(NAME) __attribute__((section(NAME), used))
 enum bpf_map_type {
 BPF_MAP_TYPE_PERF_EVENT_ARRAY = 4,
 };
 struct bpf_map_def {
 unsigned int type;
 unsigned int key_size;
 unsigned int value_size;
 unsigned int max_entries;
 };
 static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
 (void *)1;
 static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) =
 (void *)6;
 static int (*bpf_get_smp_processor_id)(void) =
 (void *)8;
 static int (*bpf_perf_event_read)(struct bpf_map_def *, int) =
 (void *)22;

 struct bpf_map_def SEC("maps") pmu_map = {
 .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
 .key_size = sizeof(int),
 .value_size = sizeof(int),
 .max_entries = __NR_CPUS__,
 };
 SEC("func_write=sys_write")
 int func_write(void *ctx)
 {
 unsigned long long val;
 char fmt[] = "sys_write:pmu=%llu\n";
 val = bpf_perf_event_read(_map, bpf_get_smp_processor_id());
 bpf_trace_printk(fmt, sizeof(fmt), val);
 return 0;
 }

 SEC("func_write_return=sys_write%return")
 int func_write_return(void *ctx)
 {
 unsigned long long val = 0;
 char fmt[] = "sys_write_return: pmu=%llu\n";
 val = bpf_perf_event_read(_map, bpf_get_smp_processor_id());
 bpf_trace_printk(fmt, sizeof(fmt), val);
 return 0;
 }
 char _license[] SEC("license") = "GPL";
 int _version SEC("version") = LINUX_VERSION_CODE;
 /* END ***/

Normal case 1:
 # echo "" > /sys/kernel/debug/tracing/trace
 # ./perf record -e evt=cycles/no-inherit/ -e 
'./test_bpf_map_2.c/maps:pmu_map.event=evt/' ls /
 [SNIP]
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.013 MB perf.data (7 samples) ]
 # cat /sys/kernel/debug/tracing/trace | grep ls
   ls-13865 [006] d... 2722740.933204: : sys_write:
pmu=1121685
   ls-13865 [006] dN.. 2722740.933242: : sys_write_return: 
pmu=1178149
   ls-13865 [006] d... 2722740.933248: : sys_write:
pmu=1194986
   ls-13865 [006] dN.. 2722740.933270: : sys_write_return: 
pmu=1220862

Normal case 2:
 # echo "" > /sys/kernel/debug/tracing/trace
 # ./perf record -e evt=cycles/period=0x7fff,no-inherit/ \
 -e './test_bpf_map_2.c/maps:pmu_map.event=evt/' ls /
 [SNIP]
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.013 MB perf.data ]
 # ./perf report --stdio
 Error:
 The perf.data file has no samples!

 (This is expected because we set period of cycles to a very large
 value to period of cycles event because we want to use this event
 as a counter only, don't need sampling)

 # cat /sys/kernel/debug/tracing/trace | grep ls
   ls-14446 [006] d... 2722976.486458: : sys_write:
pmu=1116233
   ls-14446 [006] dN.. 2722976.486486: : sys_write_return: 
pmu=1162108
   ls-14446 [006] d... 2722976.486491: : sys_write:
pmu=1177122
   ls-14446 [006] dN.. 2722976.486511: : sys_write_return: 
pmu=1202417

Normal case 3:
 # echo "" > /sys/kernel/debug/tracing/trace
 # ./perf record -i -e cycles -e 
'./test_bpf_map_2.c/maps:pmu_map.event=cycles/' ls /

 (When doesn't explicitly set alias, event name can be used to search events)

 [SNIP]
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.013 MB perf.data (7 samples) ]
 # cat /sys/kernel/debug/tracing/trace | grep ls
   ls-16480 [005] d... 2724143.955040: : sys_write:
pmu=1150794
   ls-16480 [005] dN.. 2724143.955077: : sys_write_return: 
pmu=1207161
   ls-16480 [005] d... 2724143.955083: : sys_write:
pmu=1219145
   ls-16480 [005] dN.. 2724143.955104: : sys_write_return: 
pmu=1245433

Normal case 4 (one thread case):
 # ls /proc/11808/task/
 11808
 # echo "" > /sys/kernel/debug/tracing/trace
 # ./perf record -e evt=cycles/no-inherit/ -e 
'./test_bpf_map_2.c/maps:pmu_map.event=evt/' -p 11808
 ^C[ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.019 MB perf.data (2 samples) ]

 # cat /sys/kernel/debug/tracing/trace | grep 11808
 sshd-11808 [000] d... 2740454.781150: : sys_write:
pmu=18446744073709551594
 sshd-11808 [000] d... 2740454.781168: : sys_write_return: 
pmu=18446744073709551594
 sshd-11808 [003] d... 2740467.411799: : sys_write:
pmu=131031
 sshd-11808 [003] dN.. 2740467.411806: : sys_write_return: 
pmu=161549

[PATCH v3 12/13] perf tools: Always give options even it not compiled

2015-11-29 Thread Wang Nan
This patch keeps options of perf builtins same in all conditions. If
one option is disabled because of compiling options, users should be
notified.

Masami suggested another implementation in [1] that, by adding a
OPTION_NEXT_DEPENDS option before those options in the 'struct option'
array, options parser knows an option is disabled. However, in some
cases this array is reordered (options__order()). In addition, in
parse-option.c that array is const, so we can't simply merge
information in decorator option into the affacted option.

This patch chooses a simpler implementation that, introducing a
set_option_nobuild() function and two option parsing flags. Builtins
with such options should call set_option_nobuild() before option
parsing. The complexity of this patch is because we want some of options
can be skipped safely. In this case their arguments should also be
consumed.

Options in 'perf record' and 'perf probe' are fixed in this patch.

[1] 
http://lkml.kernel.org/g/50399556c9727b4d88a595c8584aab3752627...@gsjptkydcembx32.service.hitachi.net

Test result:

Normal case:

# ./perf probe --vmlinux /tmp/vmlinux sys_write
Added new event:
  probe:sys_write  (on sys_write)

You can now use it in all perf tools, such as:

perf record -e probe:sys_write -aR sleep 1


Build with NO_DWARF=1:

# ./perf probe -L sys_write
  Error: switch `L' is not built because NO_DWARF=1

 Usage: perf probe [] 'PROBEDEF' ['PROBEDEF' ...]
or: perf probe [] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
or: perf probe [] --del '[GROUP:]EVENT' ...
or: perf probe --list [GROUP:]EVENT ...
or: perf probe [] --funcs

-L, --line 
  Show source code lines.
  (not build because NO_DWARF=1)

# ./perf probe -k /tmp/vmlinux sys_write
  Warning: switch `k' is not built because NO_DWARF=1
Added new event:
  probe:sys_write  (on sys_write)

You can now use it in all perf tools, such as:

perf record -e probe:sys_write -aR sleep 1

# ./perf probe --vmlinux /tmp/vmlinux sys_write
  Warning: option `vmlinux' is not built because NO_DWARF=1
Added new event:
[SNIP]

# ./perf probe -l
 Usage: perf probe [] 'PROBEDEF' ['PROBEDEF' ...]
or: perf probe [] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
...
-k, --vmlinux   vmlinux pathname
  (not build because NO_DWARF=1)
-L, --line 
  Show source code lines.
  (not build because NO_DWARF=1)
...
-V, --vars 
  Show accessible variables on PROBEDEF
  (not build because NO_DWARF=1)
--externs Show external variables too (with --vars only)
  (not build because NO_DWARF=1)
--no-inlines  Don't search inlined functions
  (not build because NO_DWARF=1)
--range   Show variables location range in scope (with --vars 
only)
  (not build because NO_DWARF=1)

Signed-off-by: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: pi3or...@163.com
---
 tools/perf/builtin-probe.c  |  15 +-
 tools/perf/builtin-record.c |   9 +++-
 tools/perf/util/parse-options.c | 113 
 tools/perf/util/parse-options.h |   5 ++
 4 files changed, 129 insertions(+), 13 deletions(-)

diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index 132afc9..dbe2ea5 100644
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -249,6 +249,9 @@ static int opt_show_vars(const struct option *opt,
 
return ret;
 }
+#else
+# define opt_show_lines NULL
+# define opt_show_vars NULL
 #endif
 static int opt_add_probe_event(const struct option *opt,
  const char *str, int unset __maybe_unused)
@@ -473,7 +476,6 @@ __cmd_probe(int argc, const char **argv, const char *prefix 
__maybe_unused)
opt_add_probe_event),
OPT_BOOLEAN('f', "force", _conf.force_add, "forcibly add events"
" with existing name"),
-#ifdef HAVE_DWARF_SUPPORT
OPT_CALLBACK('L', "line", NULL,
 "FUNC[:RLN[+NUM|-RLN2]]|SRC:ALN[+NUM|-ALN2]",
 "Show source code lines.", opt_show_lines),
@@ -490,7 +492,6 @@ __cmd_probe(int argc, const char **argv, const char *prefix 
__maybe_unused)
   "directory", "path to kernel source"),
OPT_BOOLEAN('\0', "no-inlines", _conf.no_inlines,
"Don't search inlined functions"),
-#endif
OPT__DRY_RUN(_event_dry_run),
OPT_INTEGER('\0', "max-probes", _conf.max_probes,
 "Set how many probe points can be found for a probe."),
@@ -521,6 +522,16 @@ __cmd_probe(int argc, const char **argv, const char 
*prefix __maybe_unused)
 #ifdef HAVE_DWARF_SUPPORT
set_option_flag(options, 'L', "line", PARSE_OPT_EXCLUSIVE);

[PATCH v3 01/13] tools lib bpf: Check return value of strdup when reading map names

2015-11-29 Thread Wang Nan
Commit 561bbccac72d08babafaa33fd7fa9100ec4c9fb6 ("tools lib bpf:
Extract and collect map names from BPF object file") forgets checking
return value of strdup(). This patch fixes it. It also checks names
pointer before strcmp() for safty.

Reported-by: Namhyung Kim 
Signed-off-by: Wang Nan 
Cc: Namhyung Kim 
Cc: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index a298614..16485ab 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -527,14 +527,14 @@ bpf_object__init_maps(struct bpf_object *obj, void *data,
return 0;
 }
 
-static void
+static int
 bpf_object__init_maps_name(struct bpf_object *obj, int maps_shndx)
 {
int i;
Elf_Data *symbols = obj->efile.symbols;
 
if (!symbols || maps_shndx < 0)
-   return;
+   return -EINVAL;
 
for (i = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
GElf_Sym sym;
@@ -556,9 +556,14 @@ bpf_object__init_maps_name(struct bpf_object *obj, int 
maps_shndx)
continue;
}
obj->maps[map_idx].name = strdup(map_name);
+   if (!obj->maps[map_idx].name) {
+   pr_warning("failed to alloc map name\n");
+   return -ENOMEM;
+   }
pr_debug("map %zu is \"%s\"\n", map_idx,
 obj->maps[map_idx].name);
}
+   return 0;
 }
 
 static int bpf_object__elf_collect(struct bpf_object *obj)
@@ -663,7 +668,7 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
}
 
if (maps_shndx >= 0)
-   bpf_object__init_maps_name(obj, maps_shndx);
+   err = bpf_object__init_maps_name(obj, maps_shndx);
 out:
return err;
 }
@@ -1372,7 +1377,7 @@ bpf_object__get_map_by_name(struct bpf_object *obj, const 
char *name)
struct bpf_map *pos;
 
bpf_map__for_each(pos, obj) {
-   if (strcmp(pos->name, name) == 0)
+   if (pos->name && !strcmp(pos->name, name))
return pos;
}
return NULL;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] net: fec: fix enet_out clock handling

2015-11-29 Thread Duan Andy
From: Lothar Waßmann  Sent: Monday, November 30, 2015 
2:56 PM
> To: Duan Fugang-B38611
> Cc: Andrew Lunn; David S. Miller; Estevam Fabio-R49496; Kevin Hao; Lucas
> Stach; Philippe Reynes; Russell King; Uwe Kleine-K?nig; linux-
> ker...@vger.kernel.org; net...@vger.kernel.org; Stefan Agner
> Subject: Re: [PATCH] net: fec: fix enet_out clock handling
> 
> Hi,
> 
> > From: Lothar Waßmann  Sent: Friday, November
> > 27, 2015 9:39 PM
> > > To: Andrew Lunn; David S. Miller; Estevam Fabio-R49496; Kevin Hao;
> > > Lothar Waßmann; Lucas Stach; Duan Fugang-B38611; Philippe Reynes;
> > > Russell King; Uwe Kleine-König; linux-kernel@vger.kernel.org;
> > > net...@vger.kernel.org; Stefan Agner
> > > Subject: [PATCH] net: fec: fix enet_out clock handling
> > >
> > > When ENET_OUT is being used as reference clock for an external PHY,
> > > the clock must not be disabled while the PHY is active. Otherwise
> > > the PHY may lose its internal state and require a reset to become
> functional again.
> > >
> > > A symptom for this bug is a network interface that constantly
> > > toggles between UP and DOWN state:
> > > fec 800f.ethernet eth0: Link is Up - 100Mbps/Full - flow control
> > > rx/tx fec 800f.ethernet eth0: Link is Down fec 800f.ethernet
> eth0:
> > > Link is Up - 100Mbps/Full - flow control rx/tx fec 800f.ethernet
> eth0:
> > > Link is Down [...]
> > >
> > > Signed-off-by: Lothar Waßmann 
> > > ---
> > >  drivers/net/ethernet/freescale/fec_main.c | 34
> > > +
> > > --
> > >  1 file changed, 14 insertions(+), 20 deletions(-)
> > >
> >
> > When MAC is not ready with clocks disabled,  it is not necessary to
> supply clock for PHY. In fact, PHY also is not ready, why does it need
> clock ?
> > For your problem, you must add PHY reset in your dts file to resolve
> your problem.
> >
> The phy-reset-gpio property is set in the DTB. But fec_reset_phy() which
> asserts the RESET is only called from within the probe() function.
> It should probably be called from fec_restart() instead?
> 
After enet_out clock enable, you can call fec_reset_phy() do phy reset.  Don't 
put it in .fec_restart() function because
Cable hotplug test cause phy registers reset to HW default status.

Regards,
Andy
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: Re: [RFC PATCH 02/15] PM / devfreq: exynos: Add documentation for generic exynos bus frequency driver

2015-11-29 Thread MyungJoo Ham
> Hi Rob,
> 
> On Sat, Nov 28, 2015 at 5:30 AM, Rob Herring  wrote:
> > On Thu, Nov 26, 2015 at 10:47:26PM +0900, Chanwoo Choi wrote:
> >> This patch adds the documentation for generic exynos bus frequency
> >> driver.
> >>
> >> Signed-off-by: Chanwoo Choi 
> >> ---
> >>  .../devicetree/bindings/devfreq/exynos-bus.txt | 92 
> >> ++
> >>  1 file changed, 92 insertions(+)
> >>  create mode 100644 
> >> Documentation/devicetree/bindings/devfreq/exynos-bus.txt
> >> +Example2 :
> >> + The bus of DMC block in exynos3250.dtsi are listed below:
> >
> > What is DMC?
> 
> DMC (DRAM Memory Controller)

It's Dynamic Memory Controller. (DRAM =~ Dynamic Memory)

You may need to write the full name with the first reference of DMC there
in the documentation as "DMC" may confuse a lot of people.



Cheers,
MyungJoo



[PATCH] mac80211_hwsim: missing NULL check

2015-11-29 Thread Rahul Jain
From: Amit Khatri 

txrate variable might be NULL and passing inside function
without NULL check.

Signed-off-by: Amit Khatri 
Signed-off-by: Rahul Jain 
---
 drivers/net/wireless/mac80211_hwsim.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/wireless/mac80211_hwsim.c 
b/drivers/net/wireless/mac80211_hwsim.c
index 194264c..72e4931 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -814,6 +814,9 @@ static void mac80211_hwsim_monitor_rx(struct ieee80211_hw 
*hw,
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(tx_skb);
struct ieee80211_rate *txrate = ieee80211_get_tx_rate(hw, info);
 
+   if (!txrate)
+   return;
+
if (!netif_running(hwsim_mon))
return;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD] Functional dependencies between devices

2015-11-29 Thread Andrzej Hajda
Hi,

Sorry for late response.

On 11/24/2015 05:28 PM, Rafael J. Wysocki wrote:
> On Tuesday, November 24, 2015 03:57:09 PM Andrzej Hajda wrote:
>> On 11/19/2015 11:04 PM, Rafael J. Wysocki wrote:
>>> On Thursday, November 19, 2015 10:08:43 AM Andrzej Hajda wrote:
 On 11/18/2015 03:17 AM, Rafael J. Wysocki wrote:
> On Tuesday, November 17, 2015 01:44:59 PM Andrzej Hajda wrote:
>> Hi Rafael,
>>
>>> [cut]
>>>
> So the operations that need to be taken care of are:
> - Probe (suppliers need to be probed before consumers if the dependencies 
> are
>   known beforehand).
> - System suspend/resume (suppliers need to be suspended after consumers 
> and
>   resumed before them) which may be asynchronous (so simple re-ordering 
> doesn't
>   help).
> - Runtime PM (suppliers should not be suspended if the consumers are not
>   suspended).
 I though provider's frameworks are taking care of it already. For example
 clock provider cannot suspend until there are prepared/enabled clocks.
 Similar enabled regulators, phys should block provider from runtime pm
 suspending.

 Are there situations/frameworks which requires additional care?
>>> Yes, there are, AFAICS.
>>>
>>> A somewhat extreme example of this is when an AML routine needed for power
>>> management of one device uses something like a GPIO line or an I2C link
>>> provided by another one.  We don't even have a way to track that kind of
>>> thing at the provider framework level and the only information we can get
>>> from the platform firmware is "this device depends on that one".
>>>
>>> Plus, even if the frameworks track those things, when a device suspend is
>>> requested, the question really is "Are there any devices that have to be
>>> suspended before this one?" rather than "Are other devices using resources
>>> provided by this one?".  Of course, you may argue that answering the second
>>> one will allow you to answer the first one too (that is based on the 
>>> assumption
>>> that you can always track all cases of resource utilization which may not be
>>> entirely realistic), but getting that answer in a non-racy way may be rather
>>> expensive.
>> In such extreme case the device itself can play a role of resource.
>> But in my proposal I do not try to answer which devices/resource depends
>> on which ones, we do not need such info.
>> It is just matter of notifying direct consumers about change of availability
>> of given resource, and this notification is necessary anyway if we want
>> to support hot resource/drivers (un-)plugging.
> Well, we've been supporting hotplug for quite a while without that ...
>
> You seem to be referring to situations in which individual resources may go
> away and drivers are supposed to reconfigure themselves on the fly.
>
> This is not what the $subject proposal is about.

Currently if you undbind some driver from the device and the driver is
a provider with active consumers usually it results in crashes/oopses.
So I wouldn't say that hot resources/drivers unplugging is supported.

>
> - System shutdown (shutdown callbacks should be executed for consumers 
> first).
> - Driver unbind (a supplier driver cannot be unbound before any of its 
> consumer
>   drivers).
>
> In principle you can use resource tracking to figure out all of the 
> involved
> dependencies, but that would require walking complicated data structures 
> unless
> you add an intermediate "device dependency" layer which is going to be 
> analogous
> to the one discussed here.
 It should be enough if provider notifies consumers that the resource
 will be unavailable.
>>> To me, this isn't going in the right direction.  You should be asking "Am I
>>> allowed to suspend now?" instead of saying "I'm suspending and now you deal
>>> with it" to somebody.  Why is that so?  Because the other end may simply be
>>> unable to deal with the situation in the first place.
>> No. It is just saying "I want to suspend now, please not use my resources".
>> In such case consumer should unprepare clocks, disable regulators, etc.
>> But if it is not able to do so it just ignores the request. Provider
>> will know
>> anyway that his resources are in use and will not suspend.
> This goes beyond the runtime PM framework which is based on device reference
> counting and for system suspend it's not practical at all, because one driver
> refusing to suspend aborts the entire operation system-wide.

Aren't current suspend callbacks designed that way?
Documentation/power/devices.txt says clearly:
"If any of these callbacks returns an error, the system won't enter the
desired
 low-power state.  Instead the PM core will unwind its actions by
resuming all
 the devices that were suspended."


>
>>> My idea is to represent a supplier-consumer dependency between devices 
>>> (or
>>> more precisely between device+driver combos) as a "link" 

[PATCH] staging: fwserial: Fix coding style problems

2015-11-29 Thread Rajan Vaja
Fix below coding style problems reported by checkpatch:
  - Check for pointer comparisons to NULL
  - No space after a cast

Signed-off-by: Rajan Vaja 
---
 drivers/staging/fwserial/dma_fifo.c |   10 +-
 drivers/staging/fwserial/fwserial.c |   18 +-
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/staging/fwserial/dma_fifo.c 
b/drivers/staging/fwserial/dma_fifo.c
index 7a3347c..4cd3ed3 100644
--- a/drivers/staging/fwserial/dma_fifo.c
+++ b/drivers/staging/fwserial/dma_fifo.c
@@ -106,7 +106,7 @@ void dma_fifo_free(struct dma_fifo *fifo)
 {
struct dma_pending *pending, *next;
 
-   if (fifo->data == NULL)
+   if (!fifo->data)
return;
 
list_for_each_entry_safe(pending, next, >pending, link)
@@ -123,7 +123,7 @@ void dma_fifo_reset(struct dma_fifo *fifo)
 {
struct dma_pending *pending, *next;
 
-   if (fifo->data == NULL)
+   if (!fifo->data)
return;
 
list_for_each_entry_safe(pending, next, >pending, link)
@@ -149,7 +149,7 @@ int dma_fifo_in(struct dma_fifo *fifo, const void *src, int 
n)
 {
int ofs, l;
 
-   if (fifo->data == NULL)
+   if (!fifo->data)
return -ENOENT;
if (fifo->corrupt)
return -ENXIO;
@@ -192,7 +192,7 @@ int dma_fifo_out_pend(struct dma_fifo *fifo, struct 
dma_pending *pended)
 {
unsigned len, n, ofs, l, limit;
 
-   if (fifo->data == NULL)
+   if (!fifo->data)
return -ENOENT;
if (fifo->corrupt)
return -ENXIO;
@@ -252,7 +252,7 @@ int dma_fifo_out_complete(struct dma_fifo *fifo, struct 
dma_pending *complete)
 {
struct dma_pending *pending, *next, *tmp;
 
-   if (fifo->data == NULL)
+   if (!fifo->data)
return -ENOENT;
if (fifo->corrupt)
return -ENXIO;
diff --git a/drivers/staging/fwserial/fwserial.c 
b/drivers/staging/fwserial/fwserial.c
index b3ea4bb..06c23d3 100644
--- a/drivers/staging/fwserial/fwserial.c
+++ b/drivers/staging/fwserial/fwserial.c
@@ -1466,9 +1466,9 @@ static void fwtty_debugfs_show_peer(struct seq_file *m, 
struct fwtty_peer *peer)
seq_printf(m, " %s:", dev_name(>unit->device));
seq_printf(m, " node:%04x gen:%d", peer->node_id, generation);
seq_printf(m, " sp:%d max:%d guid:%016llx", peer->speed,
-  peer->max_payload, (unsigned long long) peer->guid);
-   seq_printf(m, " mgmt:%012llx", (unsigned long long) peer->mgmt_addr);
-   seq_printf(m, " addr:%012llx", (unsigned long long) peer->status_addr);
+  peer->max_payload, (unsigned long long)peer->guid);
+   seq_printf(m, " mgmt:%012llx", (unsigned long long)peer->mgmt_addr);
+   seq_printf(m, " addr:%012llx", (unsigned long long)peer->status_addr);
seq_putc(m, '\n');
 }
 
@@ -1515,7 +1515,7 @@ static int fwtty_debugfs_peers_show(struct seq_file *m, 
void *v)
rcu_read_lock();
seq_printf(m, "card: %s  guid: %016llx\n",
   dev_name(serial->card->device),
-  (unsigned long long) serial->card->guid);
+  (unsigned long long)serial->card->guid);
list_for_each_entry_rcu(peer, >peer_list, list)
fwtty_debugfs_show_peer(m, peer);
rcu_read_unlock();
@@ -1701,7 +1701,7 @@ static void fwserial_virt_plug_complete(struct fwtty_peer 
*peer,
dma_fifo_change_tx_limit(>tx_fifo, port->max_payload);
spin_unlock_bh(>port->lock);
 
-   if (port->port.console && port->fwcon_ops->notify != NULL)
+   if (port->port.console && port->fwcon_ops->notify)
(*port->fwcon_ops->notify)(FWCON_NOTIFY_ATTACH, port->con_data);
 
fwtty_info(>unit, "peer (guid:%016llx) connected on %s\n",
@@ -1808,7 +1808,7 @@ static void fwserial_release_port(struct fwtty_port 
*port, bool reset)
RCU_INIT_POINTER(port->peer, NULL);
spin_unlock_bh(>lock);
 
-   if (port->port.console && port->fwcon_ops->notify != NULL)
+   if (port->port.console && port->fwcon_ops->notify)
(*port->fwcon_ops->notify)(FWCON_NOTIFY_DETACH, port->con_data);
 }
 
@@ -1987,7 +1987,7 @@ static struct fwtty_peer 
*__fwserial_peer_by_node_id(struct fw_card *card,
 * been probed for any unit devices...
 */
fwtty_err(card, "unknown card (guid %016llx)\n",
- (unsigned long long) card->guid);
+ (unsigned long long)card->guid);
return NULL;
}
 
@@ -2017,7 +2017,7 @@ static void __dump_peer_list(struct fw_card *card)
 
smp_rmb();
fwtty_dbg(card, "peer(%d:%x) guid: %016llx\n",
- g, peer->node_id, (unsigned long long) peer->guid);
+ g, peer->node_id, (unsigned long long)peer->guid);
}
 }
 #else
@@ -2314,7 +2314,7 @@ static int fwserial_create(struct fw_unit 

Re: [PATCH 2/2] zram/zcomp: do not zero out zcomp private pages

2015-11-29 Thread Minchan Kim
On Fri, Nov 27, 2015 at 01:23:14PM +0900, Sergey Senozhatsky wrote:
> Do not __GFP_ZERO allocated zcomp ->private pages. We keep
> allocated streams around and use them for read/write requests,
> so we supply a zeroed out ->private to compression algorithm
> as a scratch buffer only once -- the first time we use that
> stream. For the rest of IO requests served by this stream
> ->private usually contains some temporarily data from the
> previous requests.
> 
> Signed-off-by: Sergey Senozhatsky 
Acked-by: Minchan Kim 

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: Unable to handle kernel paging request for data at address __percpu_counter_add

2015-11-29 Thread Raghavendra K T

On 11/24/2015 02:43 AM, Tejun Heo wrote:

Hello,

On Thu, Nov 19, 2015 at 03:54:35PM +0530, Raghavendra K T wrote:

While I was creating thousands of docker container on a power8 baremetal
(config: 4.3.0 kernel 1TB RAM, 20core (=160 cpu) system. After creating
around 5600 container
I have hit below problem.
[This is looking similar to
https://bugzilla.kernel.org/show_bug.cgi?id=101011, but
kernel had Revert "ext4: remove block_device_ejected" (bdfe0cbd746aa9) since
it is 4.3.0 tagged kernel]

Any hints on how to go about the fix. Please let me know if you think any
more information needed.

docker daemon is device mapper based. (and it took a day to recreate the
problem)

[ by disabling  CONFIG_BLK_CGROUP and CONFIG_CGROUP_WRITEBACK I am able to
create 10k containers without any problem]


Could be the same problem that Ilya is trying to fix.  ie. blkdev i_wb
pointing to a stale wb.  Can you please see whether the following
patch resolves the issue?

  http://lkml.kernel.org/g/1448054554-24138-1-git-send-email-idryo...@gmail.com



Hi Tejun,

Thanks again for the pointer. I was now able to create more than 10k
containers without any problem  with CGROUP_WRITEBACK on whereas
earlier I had hit this problem few times around 5k+ containers itself.
(Also Replying to Ilya's thread).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: block: Always check queue limits for cloned requests

2015-11-29 Thread Markus Trippelsdorf
On 2015.11.30 at 14:11 +0800, Ming Lei wrote:
> On Sun, 29 Nov 2015 18:05:06 +0100
> Markus Trippelsdorf  wrote:
> > 
> > No, I'm not using DM multipath. 
> 
> 
> OK, I guess it is still one block merge issue, care to test the
> following patch?
> 
> The patch can address one issue when bio->bi_seg_front_size
> is set as too small mistakenly, then fewer physical segment may
> be figured out.

Many thanks. Your patch fixes the issue for me.

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 2/2] zram: try vmalloc() after kmalloc()

2015-11-29 Thread Minchan Kim
On Fri, Nov 27, 2015 at 01:10:49PM +0900, Sergey Senozhatsky wrote:
> From: Kyeongdon Kim 
> 
> When we're using LZ4 multi compression streams for zram swap,
> we found out page allocation failure message in system running test.
> That was not only once, but a few(2 - 5 times per test).
> Also, some failure cases were continually occurring to try allocation
> order 3.
> 
> In order to make parallel compression private data, we should call
> kzalloc() with order 2/3 in runtime(lzo/lz4). But if there is no order
>  2/3 size memory to allocate in that time, page allocation fails.
> This patch makes to use vmalloc() as fallback of kmalloc(), this
> prevents page alloc failure warning.
> 
> After using this, we never found warning message in running test, also
> It could reduce process startup latency about 60-120ms in each case.
> 
> For reference a call trace :
> 
> Binder_1: page allocation failure: order:3, mode:0x10c0d0
> CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty #20
> Call trace:
> [] dump_backtrace+0x0/0x270
> [] show_stack+0x10/0x1c
> [] dump_stack+0x1c/0x28
> [] warn_alloc_failed+0xfc/0x11c
> [] __alloc_pages_nodemask+0x724/0x7f0
> [] __get_free_pages+0x14/0x5c
> [] kmalloc_order_trace+0x38/0xd8
> [] zcomp_lz4_create+0x2c/0x38
> [] zcomp_strm_alloc+0x34/0x78
> [] zcomp_strm_multi_find+0x124/0x1ec
> [] zcomp_strm_find+0xc/0x18
> [] zram_bvec_rw+0x2fc/0x780
> [] zram_make_request+0x25c/0x2d4
> [] generic_make_request+0x80/0xbc
> [] submit_bio+0xa4/0x15c
> [] __swap_writepage+0x218/0x230
> [] swap_writepage+0x3c/0x4c
> [] shrink_page_list+0x51c/0x8d0
> [] shrink_inactive_list+0x3f8/0x60c
> [] shrink_lruvec+0x33c/0x4cc
> [] shrink_zone+0x3c/0x100
> [] try_to_free_pages+0x2b8/0x54c
> [] __alloc_pages_nodemask+0x514/0x7f0
> [] __get_free_pages+0x14/0x5c
> [] proc_info_read+0x50/0xe4
> [] vfs_read+0xa0/0x12c
> [] SyS_read+0x44/0x74
> DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
>  0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB
> 
> [minchan: change vmalloc gfp and adding comment about gfp]
> [sergey: tweak comments and styles]
> Signed-off-by: Kyeongdon Kim 
> Signed-off-by: Minchan Kim 

Kyeongdon, Could you test this patch on your device?

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/2] zram/zcomp: use GFP_NOIO to allocate streams

2015-11-29 Thread Minchan Kim
On Fri, Nov 27, 2015 at 01:10:48PM +0900, Sergey Senozhatsky wrote:
> From: Sergey Senozhatsky 
> 
> We can end up allocating a new compression stream with GFP_KERNEL
> from within the IO path, which may result is nested (recursive) IO
> operations. That can introduce problems if the IO path in question
> is a reclaimer, holding some locks that will deadlock nested IOs.
> 
> Allocate streams and working memory using GFP_NOIO flag, forbidding
> recursive IO and FS operations.
> 
> An example:
> 
> [  747.233722] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
> [  747.233724] git/20158 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [  747.233725]  (jbd2_handle){+.+.?.}, at: [] 
> start_this_handle+0x4ca/0x555
> [  747.233733] {IN-RECLAIM_FS-W} state was registered at:
> [  747.233735]   [] __lock_acquire+0x8da/0x117b
> [  747.233738]   [] lock_acquire+0x10c/0x1a7
> [  747.233740]   [] start_this_handle+0x52d/0x555
> [  747.233742]   [] jbd2__journal_start+0xb4/0x237
> [  747.233744]   [] __ext4_journal_start_sb+0x108/0x17e
> [  747.233748]   [] ext4_dirty_inode+0x32/0x61
> [  747.233750]   [] __mark_inode_dirty+0x16b/0x60c
> [  747.233754]   [] iput+0x11e/0x274
> [  747.233757]   [] __dentry_kill+0x148/0x1b8
> [  747.233759]   [] shrink_dentry_list+0x274/0x44a
> [  747.233761]   [] prune_dcache_sb+0x4a/0x55
> [  747.233763]   [] super_cache_scan+0xfc/0x176
> [  747.233767]   [] 
> shrink_slab.part.14.constprop.25+0x2a2/0x4d3
> [  747.233770]   [] shrink_zone+0x74/0x140
> [  747.233772]   [] kswapd+0x6b7/0x930
> [  747.233774]   [] kthread+0x107/0x10f
> [  747.233778]   [] ret_from_fork+0x3f/0x70
> [  747.233783] irq event stamp: 138297
> [  747.233784] hardirqs last  enabled at (138297): [] 
> debug_check_no_locks_freed+0x113/0x12f
> [  747.233786] hardirqs last disabled at (138296): [] 
> debug_check_no_locks_freed+0x33/0x12f
> [  747.233788] softirqs last  enabled at (137818): [] 
> __do_softirq+0x2d3/0x3e9
> [  747.233792] softirqs last disabled at (137813): [] 
> irq_exit+0x41/0x95
> [  747.233794]
>other info that might help us debug this:
> [  747.233796]  Possible unsafe locking scenario:
> [  747.233797]CPU0
> [  747.233798]
> [  747.233799]   lock(jbd2_handle);
> [  747.233801]   
> [  747.233801] lock(jbd2_handle);
> [  747.233803]
> *** DEADLOCK ***
> [  747.233805] 5 locks held by git/20158:
> [  747.233806]  #0:  (sb_writers#7){.+.+.+}, at: [] 
> mnt_want_write+0x24/0x4b
> [  747.233811]  #1:  (>i_mutex_dir_key#2/1){+.+.+.}, at: 
> [] lock_rename+0xd9/0xe3
> [  747.233817]  #2:  (>s_type->i_mutex_key#11){+.+.+.}, at: 
> [] lock_two_nondirectories+0x3f/0x6b
> [  747.233822]  #3:  (>s_type->i_mutex_key#11/4){+.+.+.}, at: 
> [] lock_two_nondirectories+0x66/0x6b
> [  747.233827]  #4:  (jbd2_handle){+.+.?.}, at: [] 
> start_this_handle+0x4ca/0x555
> [  747.233831]
>stack backtrace:
> [  747.233834] CPU: 2 PID: 20158 Comm: git Not tainted 
> 4.1.0-rc7-next-20150615-dbg-00016-g8bdf555-dirty #211
> [  747.233837]  8800a56cea40 88010d0a75f8 814f446d 
> 81077036
> [  747.233840]  823a84b0 88010d0a7638 814f3849 
> 0001
> [  747.233843]  000a 8800a56cf6f8 8800a56cea40 
> 810795dd
> [  747.233846] Call Trace:
> [  747.233849]  [] dump_stack+0x4c/0x6e
> [  747.233852]  [] ? up+0x39/0x3e
> [  747.233854]  [] print_usage_bug.part.23+0x25b/0x26a
> [  747.233857]  [] ? 
> print_shortest_lock_dependencies+0x182/0x182
> [  747.233859]  [] mark_lock+0x384/0x56d
> [  747.233862]  [] mark_held_locks+0x5f/0x76
> [  747.233865]  [] ? zcomp_strm_alloc+0x25/0x73 [zram]
> [  747.233867]  [] lockdep_trace_alloc+0xb2/0xb5
> [  747.233870]  [] kmem_cache_alloc_trace+0x32/0x1e2
> [  747.233873]  [] zcomp_strm_alloc+0x25/0x73 [zram]
> [  747.233876]  [] zcomp_strm_multi_find+0xe7/0x173 [zram]
> [  747.233879]  [] zcomp_strm_find+0xc/0xe [zram]
> [  747.233881]  [] zram_bvec_rw+0x2ca/0x7e0 [zram]
> [  747.233885]  [] zram_make_request+0x1fa/0x301 [zram]
> [  747.233889]  [] generic_make_request+0x9c/0xdb
> [  747.233891]  [] submit_bio+0xf7/0x120
> [  747.233895]  [] ? __test_set_page_writeback+0x1a0/0x1b8
> [  747.233897]  [] ext4_io_submit+0x2e/0x43
> [  747.233899]  [] ext4_bio_write_page+0x1b7/0x300
> [  747.233902]  [] mpage_submit_page+0x60/0x77
> [  747.233905]  [] mpage_map_and_submit_buffers+0x10f/0x21d
> [  747.233907]  [] ext4_writepages+0xc8c/0xe1b
> [  747.233910]  [] do_writepages+0x23/0x2c
> [  747.233913]  [] __filemap_fdatawrite_range+0x84/0x8b
> [  747.233915]  [] filemap_flush+0x1c/0x1e
> [  747.233917]  [] ext4_alloc_da_blocks+0xb8/0x117
> [  747.233919]  [] ext4_rename+0x132/0x6dc
> [  747.233921]  [] ? mark_held_locks+0x5f/0x76
> [  747.233924]  [] ext4_rename2+0x29/0x2b
> [  747.233926]  [] vfs_rename+0x540/0x636
> [  747.233928]  [] SyS_renameat2+0x359/0x44d
> [  747.233931]  [] SyS_rename+0x1e/0x20
> [  747.233933]  [] 

Re: [PATCH 2.6.32 19/38] [PATCH 19/38] pagemap: hide physical addresses from non-privileged users

2015-11-29 Thread Willy Tarreau
On Mon, Nov 30, 2015 at 01:54:22AM +, Ben Hutchings wrote:
> On Sun, 2015-11-29 at 22:47 +0100, Willy Tarreau wrote:
> This is wrong; see
> .

Damned, and I now remember this discussion. The worst thing is that
I purposely booted a machine to test the fix and was happy with it,
I forgot this point :-(

> For 2.6.32 perhaps you could retain the capability check at open time
> but store the result in private state for use at read time.

I'll see if it is possible to opencode security_capable() with 2.6.32's
infrastructure, and how far this brings us. Or maybe we should even drop
this one completely and leave pagemap readable only for superuser on
2.6.32, it doesn't seem to be that big of a deal either.

> The ptrace check presumably should also be done at open time, as was
> implemented upstream in:
> 
> commit a06db751c321546e5563041956a57613259c6720
> Author: Konstantin Khlebnikov 
> Date:   Tue Sep 8 14:59:59 2015 -0700
> 
> pagemap: check permissions and capabilities at open time
> 
> But that wasn't cc'd to stable and hasn't been applied to any stable
> branch (yet).

I really prefer to avoid fixing things that are not in more recent
branches (especially upgrade candidates for 2.6.32 like yours).

Thanks!
Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/4] ARM: dts: exynos4210: MDMA1 device belongs to LCD0 power domain

2015-11-29 Thread Krzysztof Kozlowski
On 26.11.2015 21:49, Marek Szyprowski wrote:
> On Exynos 4210 MDMA1 device belongs to LCD0 power domain, so add proper
> power-domains property. On Exynos 4x12, it belongs to TOP power domain,
> which is always enabled, thus require no assignment in exynos4x12.dtsi.
> 
> Signed-off-by: Marek Szyprowski 
> ---
>  arch/arm/boot/dts/exynos4210.dtsi | 4 
>  1 file changed, 4 insertions(+)

Makes sense. I suppose the rest of the patchset does not depends on it
directly so it can go through samsung-soc tree (otherwise please let us
know).

Reviewed-by: Krzysztof Kozlowski 

Best regards,
Krzysztof


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: fec: fix enet_out clock handling

2015-11-29 Thread Lothar Waßmann
Hi,

> From: Lothar Waßmann  Sent: Friday, November 27, 
> 2015 9:39 PM
> > To: Andrew Lunn; David S. Miller; Estevam Fabio-R49496; Kevin Hao; Lothar
> > Waßmann; Lucas Stach; Duan Fugang-B38611; Philippe Reynes; Russell King;
> > Uwe Kleine-König; linux-kernel@vger.kernel.org; net...@vger.kernel.org;
> > Stefan Agner
> > Subject: [PATCH] net: fec: fix enet_out clock handling
> > 
> > When ENET_OUT is being used as reference clock for an external PHY, the
> > clock must not be disabled while the PHY is active. Otherwise the PHY may
> > lose its internal state and require a reset to become functional again.
> > 
> > A symptom for this bug is a network interface that constantly toggles
> > between UP and DOWN state:
> > fec 800f.ethernet eth0: Link is Up - 100Mbps/Full - flow control
> > rx/tx fec 800f.ethernet eth0: Link is Down fec 800f.ethernet eth0:
> > Link is Up - 100Mbps/Full - flow control rx/tx fec 800f.ethernet eth0:
> > Link is Down [...]
> > 
> > Signed-off-by: Lothar Waßmann 
> > ---
> >  drivers/net/ethernet/freescale/fec_main.c | 34 +
> > --
> >  1 file changed, 14 insertions(+), 20 deletions(-)
> > 
> 
> When MAC is not ready with clocks disabled,  it is not necessary to supply 
> clock for PHY. In fact, PHY also is not ready, why does it need clock ?
> For your problem, you must add PHY reset in your dts file to resolve your 
> problem.
> 
The phy-reset-gpio property is set in the DTB. But fec_reset_phy()
which asserts the RESET is only called from within the probe() function.
It should probably be called from fec_restart() instead?


Lothar Waßmann
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V5 04/26] coresight: etm3x: splitting struct etm_drvdata

2015-11-29 Thread kbuild test robot
Hi Mathieu,

[auto build test ERROR on: tip/perf/core]
[also build test ERROR on: v4.4-rc3 next-20151127]

url:
https://github.com/0day-ci/linux/commits/Mathieu-Poirier/Coresight-integration-with-perf/20151130-102706
config: arm-allyesconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

Note: the 
linux-review/Mathieu-Poirier/Coresight-integration-with-perf/20151130-102706 
HEAD c898ed8b47ed7662cd702f15ed57c520a826fbfb builds fine.
  It only hurts bisectibility.

All errors (new ones prefixed by >>):

   In file included from arch/arm/include/asm/bug.h:62:0,
from arch/arm/include/asm/div64.h:63,
from include/linux/kernel.h:136,
from drivers/hwtracing/coresight/coresight-etm3x.c:13:
   drivers/hwtracing/coresight/coresight-etm3x.c: In function 'etm_disable_hw':
>> drivers/hwtracing/coresight/coresight-etm3x.c:387:20: error: 'config' 
>> undeclared (first use in this function)
 if (WARN_ON_ONCE(!config))
   ^
   include/asm-generic/bug.h:111:27: note: in definition of macro 'WARN_ON_ONCE'
 int __ret_warn_once = !!(condition);   \
  ^
   drivers/hwtracing/coresight/coresight-etm3x.c:387:20: note: each undeclared 
identifier is reported only once for each function it appears in
 if (WARN_ON_ONCE(!config))
   ^
   include/asm-generic/bug.h:111:27: note: in definition of macro 'WARN_ON_ONCE'
 int __ret_warn_once = !!(condition);   \
  ^

vim +/config +387 drivers/hwtracing/coresight/coresight-etm3x.c

   381  
   382  static void etm_disable_hw(void *info)
   383  {
   384  int i;
   385  struct etm_drvdata *drvdata = info;
   386  
 > 387  if (WARN_ON_ONCE(!config))
   388  return;
   389  
   390  CS_UNLOCK(drvdata->base);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC

2015-11-29 Thread Lan, Tianyu

On 11/26/2015 11:56 AM, Alexander Duyck wrote:

> I am not saying you cannot modify the drivers, however what you are
doing is far too invasive.  Do you seriously plan on modifying all of
the PCI device drivers out there in order to allow any device that
might be direct assigned to a port to support migration?  I certainly
hope not.  That is why I have said that this solution will not scale.


Current drivers are not migration friendly. If the driver wants to
support migration, it's necessary to be changed.

RFC PATCH V1 presented our ideas about how to deal with MMIO, ring and
DMA tracking during migration. These are common for most drivers and
they maybe problematic in the previous version but can be corrected later.

Doing suspend and resume() may help to do migration easily but some
devices requires low service down time. Especially network and I got
that some cloud company promised less than 500ms network service downtime.

So I think performance effect also should be taken into account when we 
design the framework.





What I am counter proposing seems like a very simple proposition.  It
can be implemented in two steps.

1.  Look at modifying dma_mark_clean().  It is a function called in
the sync and unmap paths of the lib/swiotlb.c.  If you could somehow
modify it to take care of marking the pages you unmap for Rx as being
dirty it will get you a good way towards your goal as it will allow
you to continue to do DMA while you are migrating the VM.

2.  Look at making use of the existing PCI suspend/resume calls that
are there to support PCI power management.  They have everything
needed to allow you to pause and resume DMA for the device before and
after the migration while retaining the driver state.  If you can
implement something that allows you to trigger these calls from the
PCI subsystem such as hot-plug then you would have a generic solution
that can be easily reproduced for multiple drivers beyond those
supported by ixgbevf.


Glanced at PCI hotplug code. The hotplug events are triggered by PCI 
hotplug controller and these event are defined in the controller spec.
It's hard to extend more events. Otherwise, we also need to add some 
specific codes in the PCI hotplug core since it's only add and remove
PCI device when it gets events. It's also a challenge to modify Windows 
hotplug codes. So we may need to find another way.






Thanks.

- Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.32 00/38] 2.6.32.69-longterm review

2015-11-29 Thread Willy Tarreau
Hi Ben,

On Mon, Nov 30, 2015 at 02:42:13AM +, Ben Hutchings wrote:
> Patches 9 and 30 didn't hit the lists, but I've bounced the versions I
> received.
 
Thanks. Strangely, 9 arrived late, I don't know why.

> Patch 2 didn't arrive here or on the list, but appears to be commit
> a41cbe86df3a ("Failing to send a CLOSE if file is opened WRONLY and
> server reboots on a 4.x mount").

Yes that's it, I've resent it now.

> These subjects in the shortlog don't appear in the patch series:
> 
> > Filipe Manana (1):
> >   Btrfs: fix read corruption of compressed and shared extents
> [...]
> > Herbert Xu (4):
> [...]
> >   crypto: api - Only abort operations on fatal signal
> [...]
> > Jeff Mahoney (1):
> >   btrfs: skip waiting on ordered range for special files
> [...]
> > Michal Kube??ek (1):
> >   ipv6: fix tunnel error handling
> [...]
> > Pravin B Shelar (2):
> >   skbuff: Fix skb checksum flag on skb pull
> >   skbuff: Fix skb checksum partial check.
> 

Indeed I removed them during the build attempt, and long before building
the changelog, I'm worried that there's a bug in my script which seems
to take a specific branch to emit the log instead of the current one :-/
Thanks for letting me know and sorry for the confusion.

> Commit 397d425dc26d ("vfs: Test for and handle paths that are
> unreachable from their mnt_root") is missing.

OK I'm seeing it in your 3.2 branch, I'll try to backport it.

Thanks Ben!
Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] arm, am335x: add support for the bosch shc board

2015-11-29 Thread Heiko Schocher

Hello all,

Am 18.11.2015 um 09:24 schrieb Heiko Schocher:

Hello Dave,

Am 17.11.2015 um 22:29 schrieb Dave Gerlach:

Hi,
On 11/17/2015 02:24 AM, Heiko Schocher wrote:

add support for the am335x based shc board.

UART: 0-2 and 4
DRAM: 512 MiB
MMC:  OMAP SD/MMC: 0 @ 26 MHz
   OMAP SD/MMC: 1 @ 26 MHz
I2C:  at24 eeprom, pcf8563
USB:  USB1 (host)

Signed-off-by: Heiko Schocher 
---
The following patches are needed to get all working
for the shc board:
- disable clkout on pcf8563
   accepted.
   http://www.spinics.net/lists/devicetree/msg98542.html

- leds: leds-gpio: add shutdown function
   accepted.
   https://lkml.org/lkml/2015/10/13/169

- net: phy: smsc: disable energy detect mode
   accepted
   [PATCH v2 2/2] net: phy: smsc: disable energy detect mode
   https://lkml.org/lkml/2015/10/17/2
   [PATCH v2 1/2] drivers: net: cpsw: add phy-handle parsing
   https://lkml.org/lkml/2015/10/17/4

- ARM: OMAP2+: omap_hwmod: Introduce ti,no-init dt property
   http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/328204.html
   @Dave: What is the current state of this patch?
   I have the same problem here on this am335x based board



A different approach is being taken for resolving the issue of rtc hwmod on 
am43x epos evm [1],
which is what I was attempting to solve with the patch you have linked. We 
decided to avoid changing
omap_hwmod code and I haven't been pursuing the ti,no-init flag anymore.


Maybe I overlook something, but I cannot see, how [1] solves the RTC
hwmod problem on am335x SoC based boards. Not all boards have this problem,
so the RTC hwmod cannot be disabled for all am335x boards ...

It must be somehow configurable for boards ... I have am335x boards
which use the rtc from the SoC


gentle ping ...

No more comments on this patch? Is it Ok for mainline or are
there more issues?

bye,
Heiko



Regards,
Dave

[1] http://www.spinics.net/lists/linux-omap/msg121987.html


bye,
Heiko


--
DENX Software Engineering GmbH,  Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: block: Always check queue limits for cloned requests

2015-11-29 Thread Hannes Reinecke
On 11/29/2015 06:05 PM, Markus Trippelsdorf wrote:
> On 2015.11.29 at 11:49 -0500, Mike Snitzer wrote:
>> On Sun, Nov 29 2015 at 11:15am -0500,
>> Markus Trippelsdorf  wrote:
>>
>>> On 2015.11.29 at 16:43 +0100, Hannes Reinecke wrote:
 On 11/29/2015 12:49 PM, Markus Trippelsdorf wrote:
>
> I'm still seeing the issue (BUG at drivers/scsi/scsi_lib.c:1096!) even
> with this patch applied.
>
> markus@x4 linux % git describe
> v4.4-rc2-215-g081f3698e606
>
 Can you generate a crashdump?
 I would need to cross-check with the other dumps I'm having to figure
 out if this really is the same issue.
 There have been other reports (and fixes) which show we're fighting
 several distinct issues here.
>>>
>>> Unfortunately no. The crash happens on the disk where I store my log
>>> files. And after it happened the magic SysRq keys don't work anymore.
>>>
>>> The crash only happens on my spinning rust drive that uses the cfq
>>> scheduler. The SSDs (deadline) are fine.
>>>
>>> The BUG happens reproducibly when building http://www.sagemath.org/ on
>>> that drive.
>>
>> Are you using DM multipath?  If unsure, please let us know which
>> device(s) map to the "spinning rust drive", and provide output from:
>> lsblk
> 
> No, I'm not using DM multipath. 
> 
> /dev/sdb2 on /var type btrfs (rw,relatime,compress=lzo,noacl,space_cache)
> /dev/sdb2  btrfs 1.9T  904G  944G  49% /var
> 
> scsi 1:0:0:0: Direct-Access ATA  ST2000DM001-1CH1 CC29 PQ: 0 ANSI: 5
> sd 1:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
> sd 1:0:0:0: [sdb] 4096-byte physical blocks
> sd 1:0:0:0: [sdb] Write Protect is off
> sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> sd 1:0:0:0: Attached scsi generic sg1 type 0
> sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support 
> DPO or FUA
> 
> Model Family: Seagate Barracuda 7200.14 (AF)
> Device Model: ST2000DM001-1CH164
> 
As Ming Lei indicated, this is probably a different issue. My patch
is for fixing multipath-failover induced I/O errors only.
So if you're not using multipath you won't be affected, neither by
the original issue triggering the BUG_ON nor my patch attempting to
fix it.

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] scsi: ufs: fix spelling mistake in error message

2015-11-29 Thread Johannes Thumshirn

Zitat von Colin King :


From: Colin Ian King 

Minor issue, fix spelling mistake, Intialization -> Initialization

Signed-off-by: Colin Ian King 
---
 drivers/scsi/ufs/ufshcd-pltfrm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/ufs/ufshcd-pltfrm.c  
b/drivers/scsi/ufs/ufshcd-pltfrm.c

index 9714f2a..d2a7b12 100644
--- a/drivers/scsi/ufs/ufshcd-pltfrm.c
+++ b/drivers/scsi/ufs/ufshcd-pltfrm.c
@@ -333,7 +333,7 @@ int ufshcd_pltfrm_init(struct platform_device *pdev,

err = ufshcd_init(hba, mmio_base, irq);
if (err) {
-   dev_err(dev, "Intialization failed\n");
+   dev_err(dev, "Initialization failed\n");
goto out_disable_rpm;
}

--
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Reviewed-by: Johannes Thumshirn 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.32 02/38] [PATCH 02/38] Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount

2015-11-29 Thread Willy Tarreau
resending.

On Sun, Nov 29, 2015 at 10:47:04PM +0100, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me 
> know.
> 
> --
> 
> commit a41cbe86df3afbc82311a1640e20858c0cd7e065 upstream.
> 
> A test case is as the description says:
> open(foobar, O_WRONLY);
> sleep()  --> reboot the server
> close(foobar)
> 
> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
> line before going to restart, there is
> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, >flags).
> 
> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
> out state and when we go to close it, “call_close” doesn’t get set as
> state flag is not set and CLOSE doesn’t go on the wire.
> 
> Signed-off-by: Olga Kornievskaia 
> Signed-off-by: Trond Myklebust 
> Signed-off-by: Willy Tarreau 
> ---
>  fs/nfs/nfs4state.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 71ee6f6..614446b 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -929,7 +929,7 @@ restart:
>   __func__);
>   }
>   nfs4_put_open_state(state);
> - clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
> + clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>   >flags);
>   goto restart;
>   }
> -- 
> 1.7.12.2.21.g234cd45.dirty
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 00/12] MADV_FREE support

2015-11-29 Thread Minchan Kim
In v4, Andrew wanted to settle in old basic MADV_FREE and introduces
new stuffs(ie, lazyfree LRU, swapless support and lazyfreeness) later
so this version doesn't include them.

I have been tested it on mmotm-2015-11-25-17-08 with additional
patch[1] from Kirill to prevent BUG_ON which he didn't send to
linux-mm yet as formal patch. With it, I couldn't find any
problem so far.

Note that this version is based on THP refcount redesign so
I needed some modification on MADV_FREE because split_huge_pmd
doesn't split a THP page any more and pmd_trans_huge(pmd) is not
enough to guarantee the page is not THP page.
As well, for MAVD_FREE lazy-split, THP split should respect
pmd's dirtiness rather than marking ptes of all subpages dirty
unconditionally. Please, review last patch in this patchset.

mm: don't split THP page when syscall is called

[1] https://lkml.org/lkml/2015/11/17/134

git: git://git.kernel.org/pub/scm/linux/kernel/git/minchan/linux.git
branch: mm/madv_free-v4.4-rc2-mmotm-2015-11-25-17-08-v5r2

In this stage, I don't think we need to write man page.
It could be done after solid policy and implementation.

 * Change from v4
   * drop lazyfree LRU
   * drop swapless support
   * drop lazyfreeness
   * rebase on recent mmotom with THP refcount redesign

 * Change from v3
   * some bug fix
   * code refactoring
   * lazyfree reclaim logic change
   * reordering patch

 * Change from v2
   * vm_lazyfreeness tuning knob
   * add new LRU list - Johannes, Shaohua
   * support swapless - Johannes

 * Change from v1
   * Don't do unnecessary TLB flush - Shaohua
   * Added Acked-by - Hugh, Michal
   * Merge deactivate_page and deactivate_file_page
   * Add pmd_dirty/pmd_mkclean patches for several arches
   * Add lazy THP split patch
   * Drop zhangyan...@cn.fujitsu.com - Delivery Failure

Chen Gang (1):
  arch: uapi: asm: mman.h: Let MADV_FREE have same value for all
architectures

Minchan Kim (11):
  mm: support madvise(MADV_FREE)
  mm: define MADV_FREE for some arches
  mm: free swp_entry in madvise_free
  mm: move lazily freed pages to inactive list
  mm: mark stable page dirty in KSM
  x86: add pmd_[dirty|mkclean] for THP
  sparc: add pmd_[dirty|mkclean] for THP
  powerpc: add pmd_[dirty|mkclean] for THP
  arm: add pmd_mkclean for THP
  arm64: add pmd_mkclean for THP
  mm: don't split THP page when syscall is called

 arch/alpha/include/uapi/asm/mman.h   |   2 +
 arch/arm/include/asm/pgtable-3level.h|   1 +
 arch/arm64/include/asm/pgtable.h |   1 +
 arch/mips/include/uapi/asm/mman.h|   2 +
 arch/parisc/include/uapi/asm/mman.h  |   2 +
 arch/powerpc/include/asm/pgtable-ppc64.h |   2 +
 arch/sparc/include/asm/pgtable_64.h  |   9 ++
 arch/x86/include/asm/pgtable.h   |   5 +
 arch/xtensa/include/uapi/asm/mman.h  |   2 +
 include/linux/huge_mm.h  |   3 +
 include/linux/rmap.h |   1 +
 include/linux/swap.h |   1 +
 include/linux/vm_event_item.h|   1 +
 include/uapi/asm-generic/mman-common.h   |   1 +
 mm/huge_memory.c |  87 +-
 mm/ksm.c |   6 +
 mm/madvise.c | 199 +++
 mm/rmap.c|   8 ++
 mm/swap.c|  44 +++
 mm/swap_state.c  |   5 +-
 mm/vmscan.c  |  10 +-
 mm/vmstat.c  |   1 +
 22 files changed, 383 insertions(+), 10 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 04/12] mm: free swp_entry in madvise_free

2015-11-29 Thread Minchan Kim
When I test below piece of code with 12 processes(ie, 512M * 12 = 6G
consume) on my (3G ram + 12 cpu + 8G swap, the madvise_free is siginficat
slower (ie, 2x times) than madvise_dontneed.

loop = 5;
mmap(512M);
while (loop--) {
memset(512M);
madvise(MADV_FREE or MADV_DONTNEED);
}

The reason is lots of swapin.

1) dontneed: 1,612 swapin
2) madvfree: 879,585 swapin

If we find hinted pages were already swapped out when syscall is called,
it's pointless to keep the swapped-out pages in pte.
Instead, let's free the cold page because swapin is more expensive
than (alloc page + zeroing).

With this patch, it reduced swapin from 879,585 to 1,878 so elapsed time

1) dontneed: 6.10user 233.50system 0:50.44elapsed
2) madvfree: 6.03user 401.17system 1:30.67elapsed
2) madvfree + below patch: 6.70user 339.14system 1:04.45elapsed

Acked-by: Michal Hocko 
Acked-by: Hugh Dickins 
Signed-off-by: Minchan Kim 
---
 mm/madvise.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index e2fe2e26f449..8de3d9a636c9 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -270,6 +270,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long 
addr,
spinlock_t *ptl;
pte_t *orig_pte, *pte, ptent;
struct page *page;
+   int nr_swap = 0;
 
split_huge_pmd(vma, pmd, addr);
if (pmd_trans_unstable(pmd))
@@ -280,8 +281,24 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned 
long addr,
for (; addr != end; pte++, addr += PAGE_SIZE) {
ptent = *pte;
 
-   if (!pte_present(ptent))
+   if (pte_none(ptent))
continue;
+   /*
+* If the pte has swp_entry, just clear page table to
+* prevent swap-in which is more expensive rather than
+* (page allocation + zeroing).
+*/
+   if (!pte_present(ptent)) {
+   swp_entry_t entry;
+
+   entry = pte_to_swp_entry(ptent);
+   if (non_swap_entry(entry))
+   continue;
+   nr_swap--;
+   free_swap_and_cache(entry);
+   pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
+   continue;
+   }
 
page = vm_normal_page(vma, addr, ptent);
if (!page)
@@ -353,6 +370,12 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned 
long addr,
}
}
 out:
+   if (nr_swap) {
+   if (current->mm == mm)
+   sync_mm_rss(mm);
+
+   add_mm_counter(mm, MM_SWAPENTS, nr_swap);
+   }
arch_leave_lazy_mmu_mode();
pte_unmap_unlock(orig_pte, ptl);
cond_resched();
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 03/12] arch: uapi: asm: mman.h: Let MADV_FREE have same value for all architectures

2015-11-29 Thread Minchan Kim
From: Chen Gang 

For uapi, need try to let all macros have same value, and MADV_FREE is
added into main branch recently, so need redefine MADV_FREE for it.

At present, '8' can be shared with all architectures, so redefine it to
'8'.

Cc: r...@twiddle.net ,
Cc: i...@jurassic.park.msu.ru 
Cc: matts...@gmail.com 
Cc: Ralf Baechle 
Cc: j...@parisc-linux.org 
Cc: del...@gmx.de 
Cc: ch...@zankel.net 
Cc: jcmvb...@gmail.com 
Cc: Arnd Bergmann 
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: rol...@kernel.org
Cc: darrick.w...@oracle.com
Cc: da...@davemloft.net
Acked-by: Hugh Dickins 
Acked-by: Minchan Kim 
Signed-off-by: Chen Gang 
---
 arch/alpha/include/uapi/asm/mman.h | 1 +
 arch/mips/include/uapi/asm/mman.h  | 1 +
 arch/parisc/include/uapi/asm/mman.h| 1 +
 arch/xtensa/include/uapi/asm/mman.h| 1 +
 include/uapi/asm-generic/mman-common.h | 2 +-
 5 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index d828beb5e69b..ab336c06153e 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -50,6 +50,7 @@
 #define MADV_FREE  7   /* free pages only if memory pressure */
 
 /* common/generic parameters */
+#define MADV_FREE  8   /* free pages only if memory pressure */
 #define MADV_REMOVE9   /* remove these pages & resources */
 #define MADV_DONTFORK  10  /* don't inherit across fork */
 #define MADV_DOFORK11  /* do inherit across fork */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index a6f8daff8e3b..b0ebe59f73fd 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -76,6 +76,7 @@
 #define MADV_FREE  5   /* free pages only if memory pressure */
 
 /* common parameters: try to keep these consistent across architectures */
+#define MADV_FREE  8   /* free pages only if memory pressure */
 #define MADV_REMOVE9   /* remove these pages & resources */
 #define MADV_DONTFORK  10  /* don't inherit across fork */
 #define MADV_DOFORK11  /* do inherit across fork */
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index bda94f0d0b94..cf830d465f75 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -46,6 +46,7 @@
 #define MADV_FREE  8   /* free pages only if memory pressure */
 
 /* common/generic parameters */
+#define MADV_FREE  8   /* free pages only if memory pressure */
 #define MADV_REMOVE9   /* remove these pages & resources */
 #define MADV_DONTFORK  10  /* don't inherit across fork */
 #define MADV_DOFORK11  /* do inherit across fork */
diff --git a/arch/xtensa/include/uapi/asm/mman.h 
b/arch/xtensa/include/uapi/asm/mman.h
index 83c5150b06f9..d030594ed22b 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -89,6 +89,7 @@
 #define MADV_FREE  5   /* free pages only if memory pressure */
 
 /* common parameters: try to keep these consistent across architectures */
+#define MADV_FREE  8   /* free pages only if memory pressure */
 #define MADV_REMOVE9   /* remove these pages & resources */
 #define MADV_DONTFORK  10  /* don't inherit across fork */
 #define MADV_DOFORK11  /* do inherit across fork */
diff --git a/include/uapi/asm-generic/mman-common.h 
b/include/uapi/asm-generic/mman-common.h
index 0e821e3c3d45..58274382a616 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -39,9 +39,9 @@
 #define MADV_SEQUENTIAL2   /* expect sequential page 
references */
 #define MADV_WILLNEED  3   /* will need these pages */
 #define MADV_DONTNEED  4   /* don't need these pages */
-#define MADV_FREE  5   /* free pages only if memory pressure */
 
 /* common parameters: try to keep these consistent across architectures */
+#define MADV_FREE  8   /* free pages only if memory pressure */
 #define MADV_REMOVE9   /* remove these pages & resources */
 #define MADV_DONTFORK  10  /* don't inherit across fork */
 #define MADV_DOFORK11  /* do inherit across fork */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 08/12] sparc: add pmd_[dirty|mkclean] for THP

2015-11-29 Thread Minchan Kim
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
of the contents since MADV_FREE syscall is called for THP page.

This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
support.

Signed-off-by: Minchan Kim 
Signed-off-by: Andrew Morton 
---
 arch/sparc/include/asm/pgtable_64.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/sparc/include/asm/pgtable_64.h 
b/arch/sparc/include/asm/pgtable_64.h
index f5bfcd66aeb5..7a38d6a576c5 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -710,6 +710,15 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd)
return __pmd(pte_val(pte));
 }
 
+static inline pmd_t pmd_mkclean(pmd_t pmd)
+{
+   pte_t pte = __pte(pmd_val(pmd));
+
+   pte = pte_mkclean(pte);
+
+   return __pmd(pte_val(pte));
+}
+
 static inline pmd_t pmd_mkyoung(pmd_t pmd)
 {
pte_t pte = __pte(pmd_val(pmd));
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 02/12] mm: define MADV_FREE for some arches

2015-11-29 Thread Minchan Kim
Most architectures use asm-generic, but alpha, mips, parisc, xtensa
need their own definitions.

This patch defines MADV_FREE for them so it should fix build break
for their architectures.

Maybe, I should split and feed piecies to arch maintainers but
included here for mmotm convenience.

Cc: Michael Kerrisk 
Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: "James E.J. Bottomley" 
Cc: Helge Deller 
Cc: Ralf Baechle 
Cc: Chris Zankel 
Acked-by: Max Filippov 
Reported-by: kbuild test robot 
Signed-off-by: Minchan Kim 
---
 arch/alpha/include/uapi/asm/mman.h  | 1 +
 arch/mips/include/uapi/asm/mman.h   | 1 +
 arch/parisc/include/uapi/asm/mman.h | 1 +
 arch/xtensa/include/uapi/asm/mman.h | 1 +
 4 files changed, 4 insertions(+)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index f2f949671798..d828beb5e69b 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -47,6 +47,7 @@
 #define MADV_WILLNEED  3   /* will need these pages */
 #defineMADV_SPACEAVAIL 5   /* ensure resources are 
available */
 #define MADV_DONTNEED  6   /* don't need these pages */
+#define MADV_FREE  7   /* free pages only if memory pressure */
 
 /* common/generic parameters */
 #define MADV_REMOVE9   /* remove these pages & resources */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index 97c03f468924..a6f8daff8e3b 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -73,6 +73,7 @@
 #define MADV_SEQUENTIAL 2  /* expect sequential page references */
 #define MADV_WILLNEED  3   /* will need these pages */
 #define MADV_DONTNEED  4   /* don't need these pages */
+#define MADV_FREE  5   /* free pages only if memory pressure */
 
 /* common parameters: try to keep these consistent across architectures */
 #define MADV_REMOVE9   /* remove these pages & resources */
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index dd4d1876a020..bda94f0d0b94 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -43,6 +43,7 @@
 #define MADV_SPACEAVAIL 5   /* insure that resources are reserved 
*/
 #define MADV_VPS_PURGE  6   /* Purge pages from VM page cache */
 #define MADV_VPS_INHERIT 7  /* Inherit parents page size */
+#define MADV_FREE  8   /* free pages only if memory pressure */
 
 /* common/generic parameters */
 #define MADV_REMOVE9   /* remove these pages & resources */
diff --git a/arch/xtensa/include/uapi/asm/mman.h 
b/arch/xtensa/include/uapi/asm/mman.h
index 360944e1da52..83c5150b06f9 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -86,6 +86,7 @@
 #define MADV_SEQUENTIAL2   /* expect sequential page 
references */
 #define MADV_WILLNEED  3   /* will need these pages */
 #define MADV_DONTNEED  4   /* don't need these pages */
+#define MADV_FREE  5   /* free pages only if memory pressure */
 
 /* common parameters: try to keep these consistent across architectures */
 #define MADV_REMOVE9   /* remove these pages & resources */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 01/12] mm: support madvise(MADV_FREE)

2015-11-29 Thread Minchan Kim
Linux doesn't have an ability to free pages lazy while other OS already
have been supported that named by madvise(MADV_FREE).

The gain is clear that kernel can discard freed pages rather than swapping
out or OOM if memory pressure happens.

Without memory pressure, freed pages would be reused by userspace without
another additional overhead(ex, page fault + allocation + zeroing).

Jason Evans said:

: Facebook has been using MAP_UNINITIALIZED
: (https://lkml.org/lkml/2012/1/18/308) in some of its applications for
: several years, but there are operational costs to maintaining this
: out-of-tree in our kernel and in jemalloc, and we are anxious to retire it
: in favor of MADV_FREE.  When we first enabled MAP_UNINITIALIZED it
: increased throughput for much of our workload by ~5%, and although the
: benefit has decreased using newer hardware and kernels, there is still
: enough benefit that we cannot reasonably retire it without a replacement.
:
: Aside from Facebook operations, there are numerous broadly used
: applications that would benefit from MADV_FREE.  The ones that immediately
: come to mind are redis, varnish, and MariaDB.  I don't have much insight
: into Android internals and development process, but I would hope to see
: MADV_FREE support eventually end up there as well to benefit applications
: linked with the integrated jemalloc.
:
: jemalloc will use MADV_FREE once it becomes available in the Linux kernel.
: In fact, jemalloc already uses MADV_FREE or equivalent everywhere it's
: available: *BSD, OS X, Windows, and Solaris -- every platform except Linux
: (and AIX, but I'm not sure it even compiles on AIX).  The lack of
: MADV_FREE on Linux forced me down a long series of increasingly
: sophisticated heuristics for madvise() volume reduction, and even so this
: remains a common performance issue for people using jemalloc on Linux.
: Please integrate MADV_FREE; many people will benefit substantially.

How it works:

When madvise syscall is called, VM clears dirty bit of ptes of the range.
If memory pressure happens, VM checks dirty bit of page table and if it
found still "clean", it means it's a "lazyfree pages" so VM could discard
the page instead of swapping out.  Once there was store operation for the
page before VM peek a page to reclaim, dirty bit is set so VM can swap out
the page instead of discarding.

One thing we should notice is that basically, MADV_FREE relies on dirty bit
in page table entry to decide whether VM allows to discard the page or not.
IOW, if page table entry includes marked dirty bit, VM shouldn't discard
the page.

However, as a example, if swap-in by read fault happens, page table entry
doesn't have dirty bit so MADV_FREE could discard the page wrongly.

For avoiding the problem, MADV_FREE did more checks with PageDirty
and PageSwapCache. It worked out because swapped-in page lives on
swap cache and since it is evicted from the swap cache, the page has
PG_dirty flag. So both page flags check effectively prevent
wrong discarding by MADV_FREE.

However, a problem in above logic is that swapped-in page has
PG_dirty still after they are removed from swap cache so VM cannot
consider the page as freeable any more even if madvise_free is
called in future.

Look at below example for detail.

ptr = malloc();
memset(ptr);
..
..
.. heavy memory pressure so all of pages are swapped out
..
..
var = *ptr; -> a page swapped-in and could be removed from
   swapcache. Then, page table doesn't mark
   dirty bit and page descriptor includes PG_dirty
..
..
madvise_free(ptr); -> It doesn't clear PG_dirty of the page.
..
..
..
.. heavy memory pressure again.
.. In this time, VM cannot discard the page because the page
.. has *PG_dirty*

To solve the problem, this patch clears PG_dirty if only the page is owned
exclusively by current process when madvise is called because PG_dirty
represents ptes's dirtiness in several processes so we could clear it only
if we own it exclusively.

Firstly, heavy users would be general allocators(ex, jemalloc, tcmalloc
and hope glibc supports it) and jemalloc/tcmalloc already have supported
the feature for other OS(ex, FreeBSD)

barrios@blaptop:~/benchmark/ebizzy$ lscpu
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):12
On-line CPU(s) list:   0-11
Thread(s) per core:1
Core(s) per socket:1
Socket(s): 12
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 2
Stepping:  3
CPU MHz:   3200.185
BogoMIPS:  6400.53
Virtualization:VT-x
Hypervisor vendor: KVM
Virtualization type:   full
L1d cache: 32K
L1i cache: 32K
L2 cache:  4096K
NUMA node0 CPU(s): 0-11
ebizzy benchmark(./ebizzy -S 10 -n 512)

Higher avg is better.

 

[PATCH v5 06/12] mm: mark stable page dirty in KSM

2015-11-29 Thread Minchan Kim
The MADV_FREE patchset changes page reclaim to simply free a clean
anonymous page with no dirty ptes, instead of swapping it out; but
KSM uses clean write-protected ptes to reference the stable ksm page.
So be sure to mark that page dirty, so it's never mistakenly discarded.

[hughd: adjusted comments]
Acked-by: Hugh Dickins 
Signed-off-by: Minchan Kim 
---
 mm/ksm.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/mm/ksm.c b/mm/ksm.c
index 30cb0f753e19..5e967536c38e 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1015,6 +1015,12 @@ static int try_to_merge_one_page(struct vm_area_struct 
*vma,
 */
set_page_stable_node(page, NULL);
mark_page_accessed(page);
+   /*
+* Page reclaim just frees a clean page with no dirty
+* ptes: make sure that the ksm page would be swapped.
+*/
+   if (!PageDirty(page))
+   SetPageDirty(page);
err = 0;
} else if (pages_identical(page, kpage))
err = replace_page(vma, page, kpage, orig_pte);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 07/12] x86: add pmd_[dirty|mkclean] for THP

2015-11-29 Thread Minchan Kim
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
of the contents since MADV_FREE syscall is called for THP page.

This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
support.

Signed-off-by: Minchan Kim 
Signed-off-by: Andrew Morton 
---
 arch/x86/include/asm/pgtable.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index a8d1aa3a43b0..9ff592003afd 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -269,6 +269,11 @@ static inline pmd_t pmd_mkold(pmd_t pmd)
return pmd_clear_flags(pmd, _PAGE_ACCESSED);
 }
 
+static inline pmd_t pmd_mkclean(pmd_t pmd)
+{
+   return pmd_clear_flags(pmd, _PAGE_DIRTY);
+}
+
 static inline pmd_t pmd_wrprotect(pmd_t pmd)
 {
return pmd_clear_flags(pmd, _PAGE_RW);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 10/12] arm: add pmd_mkclean for THP

2015-11-29 Thread Minchan Kim
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
of the contents since MADV_FREE syscall is called for THP page.

This patch adds pmd_mkclean for THP page MADV_FREE support.

Signed-off-by: Minchan Kim 
---
 arch/arm/include/asm/pgtable-3level.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/include/asm/pgtable-3level.h 
b/arch/arm/include/asm/pgtable-3level.h
index 59d1457ca551..dc46398bc3a5 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -240,6 +240,7 @@ PMD_BIT_FUNC(wrprotect, |= L_PMD_SECT_RDONLY);
 PMD_BIT_FUNC(mkold,&= ~PMD_SECT_AF);
 PMD_BIT_FUNC(mkwrite,   &= ~L_PMD_SECT_RDONLY);
 PMD_BIT_FUNC(mkdirty,   |= L_PMD_SECT_DIRTY);
+PMD_BIT_FUNC(mkclean,   &= ~L_PMD_SECT_DIRTY);
 PMD_BIT_FUNC(mkyoung,   |= PMD_SECT_AF);
 
 #define pmd_mkhuge(pmd)(__pmd(pmd_val(pmd) & ~PMD_TABLE_BIT))
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 05/12] mm: move lazily freed pages to inactive list

2015-11-29 Thread Minchan Kim
MADV_FREE is a hint that it's okay to discard pages if there is memory
pressure and we use reclaimers(ie, kswapd and direct reclaim) to free them
so there is no value keeping them in the active anonymous LRU so this
patch moves them to inactive LRU list's head.

This means that MADV_FREE-ed pages which were living on the inactive list
are reclaimed first because they are more likely to be cold rather than
recently active pages.

An arguable issue for the approach would be whether we should put the page
to the head or tail of the inactive list.  I chose head because the kernel
cannot make sure it's really cold or warm for every MADV_FREE usecase but
at least we know it's not *hot*, so landing of inactive head would be a
comprimise for various usecases.

This fixes suboptimal behavior of MADV_FREE when pages living on the
active list will sit there for a long time even under memory pressure
while the inactive list is reclaimed heavily.  This basically breaks the
whole purpose of using MADV_FREE to help the system to free memory which
is might not be used.

Cc: Johannes Weiner 
Cc: Mel Gorman 
Cc: Rik van Riel 
Cc: Shaohua Li 
Acked-by: Hugh Dickins 
Acked-by: Michal Hocko 
Signed-off-by: Minchan Kim 
---
 include/linux/swap.h |  1 +
 mm/madvise.c |  2 ++
 mm/swap.c| 44 
 3 files changed, 47 insertions(+)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 457181844b6e..d08feef3d047 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -308,6 +308,7 @@ extern void lru_add_drain_cpu(int cpu);
 extern void lru_add_drain_all(void);
 extern void rotate_reclaimable_page(struct page *page);
 extern void deactivate_file_page(struct page *page);
+extern void deactivate_page(struct page *page);
 extern void swap_setup(void);
 
 extern void add_page_to_unevictable_list(struct page *page);
diff --git a/mm/madvise.c b/mm/madvise.c
index 8de3d9a636c9..975e24e4c134 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -366,6 +366,8 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long 
addr,
ptent = pte_mkold(ptent);
ptent = pte_mkclean(ptent);
set_pte_at(mm, addr, pte, ptent);
+   if (PageActive(page))
+   deactivate_page(page);
tlb_remove_tlb_entry(tlb, pte, addr);
}
}
diff --git a/mm/swap.c b/mm/swap.c
index abffc33bb975..674e2c93da4e 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -45,6 +45,7 @@ int page_cluster;
 static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
 static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
 static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs);
+static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
 
 /*
  * This path almost never happens for VM activity - pages are normally
@@ -554,6 +555,24 @@ static void lru_deactivate_file_fn(struct page *page, 
struct lruvec *lruvec,
update_page_reclaim_stat(lruvec, file, 0);
 }
 
+
+static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
+   void *arg)
+{
+   if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
+   int file = page_is_file_cache(page);
+   int lru = page_lru_base_type(page);
+
+   del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE);
+   ClearPageActive(page);
+   ClearPageReferenced(page);
+   add_page_to_lru_list(page, lruvec, lru);
+
+   __count_vm_event(PGDEACTIVATE);
+   update_page_reclaim_stat(lruvec, file, 0);
+   }
+}
+
 /*
  * Drain pages out of the cpu's pagevecs.
  * Either "cpu" is the current CPU, and preemption has already been
@@ -580,6 +599,10 @@ void lru_add_drain_cpu(int cpu)
if (pagevec_count(pvec))
pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL);
 
+   pvec = _cpu(lru_deactivate_pvecs, cpu);
+   if (pagevec_count(pvec))
+   pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
+
activate_page_drain(cpu);
 }
 
@@ -609,6 +632,26 @@ void deactivate_file_page(struct page *page)
}
 }
 
+/**
+ * deactivate_page - deactivate a page
+ * @page: page to deactivate
+ *
+ * deactivate_page() moves @page to the inactive list if @page was on the 
active
+ * list and was not an unevictable page.  This is done to accelerate the 
reclaim
+ * of @page.
+ */
+void deactivate_page(struct page *page)
+{
+   if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
+   struct pagevec *pvec = _cpu_var(lru_deactivate_pvecs);
+
+   page_cache_get(page);
+   if (!pagevec_add(pvec, page))
+   pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
+   put_cpu_var(lru_deactivate_pvecs);
+   }
+}
+
 void lru_add_drain(void)
 {
lru_add_drain_cpu(get_cpu());

[PATCH v5 09/12] powerpc: add pmd_[dirty|mkclean] for THP

2015-11-29 Thread Minchan Kim
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
of the contents since MADV_FREE syscall is called for THP page.

This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
support.

Signed-off-by: Minchan Kim 
Signed-off-by: Andrew Morton 
---
 arch/powerpc/include/asm/pgtable-ppc64.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
b/arch/powerpc/include/asm/pgtable-ppc64.h
index 0db2a3f8e554..21d961bbac0e 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -502,9 +502,11 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd)
 #define pmd_pfn(pmd)   pte_pfn(pmd_pte(pmd))
 #define pmd_dirty(pmd) pte_dirty(pmd_pte(pmd))
 #define pmd_young(pmd) pte_young(pmd_pte(pmd))
+#define pmd_dirty(pmd) pte_dirty(pmd_pte(pmd))
 #define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd)))
 #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd)))
 #define pmd_mkdirty(pmd)   pte_pmd(pte_mkdirty(pmd_pte(pmd)))
+#define pmd_mkclean(pmd)   pte_pmd(pte_mkclean(pmd_pte(pmd)))
 #define pmd_mkyoung(pmd)   pte_pmd(pte_mkyoung(pmd_pte(pmd)))
 #define pmd_mkwrite(pmd)   pte_pmd(pte_mkwrite(pmd_pte(pmd)))
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 11/12] arm64: add pmd_mkclean for THP

2015-11-29 Thread Minchan Kim
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
of the contents since MADV_FREE syscall is called for THP page.

This patch adds pmd_mkclean for THP page MADV_FREE support.

Signed-off-by: Minchan Kim 
---
 arch/arm64/include/asm/pgtable.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index d2a1879b466b..fab3ddb30df7 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -340,6 +340,7 @@ static inline pgprot_t mk_sect_prot(pgprot_t prot)
 #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd)))
 #define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd)))
 #define pmd_mkwrite(pmd)   pte_pmd(pte_mkwrite(pmd_pte(pmd)))
+#define pmd_mkclean(pmd)   pte_pmd(pte_mkclean(pmd_pte(pmd)))
 #define pmd_mkdirty(pmd)   pte_pmd(pte_mkdirty(pmd_pte(pmd)))
 #define pmd_mkyoung(pmd)   pte_pmd(pte_mkyoung(pmd_pte(pmd)))
 #define pmd_mknotpresent(pmd)  (__pmd(pmd_val(pmd) & ~PMD_TYPE_MASK))
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 12/12] mm: don't split THP page when syscall is called

2015-11-29 Thread Minchan Kim
We don't need to split THP page when MADV_FREE syscall is called
if [start, len] is aligned with THP size. The split could be done
when VM decide to free it in reclaim path if memory pressure is
heavy. With that, we could avoid unnecessary THP split.

For the feature, this patch changes pte dirtness marking logic of THP.
Now, it marks every ptes of pages dirty unconditionally in splitting,
which makes MADV_FREE void. So, instead, this patch propagates pmd
dirtiness to all pages via PG_dirty and restores pte dirtiness from
PG_dirty. With this, if pmd is clean(ie, MADV_FREEed) when split
happens(e,g, shrink_page_list), all of pages are clean too so we
could discard them.

Cc: Kirill A. Shutemov 
Cc: Hugh Dickins 
Cc: Andrea Arcangeli 
Signed-off-by: Minchan Kim 
---
 include/linux/huge_mm.h |  3 ++
 mm/huge_memory.c| 87 ++---
 mm/madvise.c|  8 -
 3 files changed, 92 insertions(+), 6 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 72cd942edb22..0160201993d4 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -19,6 +19,9 @@ extern struct page *follow_trans_huge_pmd(struct 
vm_area_struct *vma,
  unsigned long addr,
  pmd_t *pmd,
  unsigned int flags);
+extern int madvise_free_huge_pmd(struct mmu_gather *tlb,
+   struct vm_area_struct *vma,
+   pmd_t *pmd, unsigned long addr, unsigned long next);
 extern int zap_huge_pmd(struct mmu_gather *tlb,
struct vm_area_struct *vma,
pmd_t *pmd, unsigned long addr);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b41793b12a2d..2aa28cbe7263 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1530,6 +1530,77 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
return 0;
 }
 
+int madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
+   pmd_t *pmd, unsigned long addr, unsigned long next)
+
+{
+   spinlock_t *ptl;
+   pmd_t orig_pmd;
+   struct page *page;
+   struct mm_struct *mm = tlb->mm;
+   int ret = 0;
+
+   if (!pmd_trans_huge_lock(pmd, vma, ))
+   goto out;
+
+   orig_pmd = *pmd;
+   if (is_huge_zero_pmd(orig_pmd)) {
+   ret = 1;
+   goto out;
+   }
+
+   page = pmd_page(orig_pmd);
+   /*
+* If other processes are mapping this page, we couldn't discard
+* the page unless they all do MADV_FREE so let's skip the page.
+*/
+   if (page_mapcount(page) != 1)
+   goto out;
+
+   if (!trylock_page(page))
+   goto out;
+
+   /*
+* If user want to discard part-pages of THP, split it so MADV_FREE
+* will deactivate only them.
+*/
+   if (next - addr != HPAGE_PMD_SIZE) {
+   get_page(page);
+   spin_unlock(ptl);
+   if (split_huge_page(page)) {
+   put_page(page);
+   unlock_page(page);
+   goto out_unlocked;
+   }
+   put_page(page);
+   unlock_page(page);
+   ret = 1;
+   goto out_unlocked;
+   }
+
+   if (PageDirty(page))
+   ClearPageDirty(page);
+   unlock_page(page);
+
+   if (PageActive(page))
+   deactivate_page(page);
+
+   if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) {
+   orig_pmd = pmdp_huge_get_and_clear_full(tlb->mm, addr, pmd,
+   tlb->fullmm);
+   orig_pmd = pmd_mkold(orig_pmd);
+   orig_pmd = pmd_mkclean(orig_pmd);
+
+   set_pmd_at(mm, addr, pmd, orig_pmd);
+   tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
+   }
+   ret = 1;
+out:
+   spin_unlock(ptl);
+out_unlocked:
+   return ret;
+}
+
 int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 pmd_t *pmd, unsigned long addr)
 {
@@ -2784,7 +2855,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct 
*vma, pmd_t *pmd,
struct page *page;
pgtable_t pgtable;
pmd_t _pmd;
-   bool young, write;
+   bool young, write, dirty;
int i;
 
VM_BUG_ON(haddr & ~HPAGE_PMD_MASK);
@@ -2808,6 +2879,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct 
*vma, pmd_t *pmd,
atomic_add(HPAGE_PMD_NR - 1, >_count);
write = pmd_write(*pmd);
young = pmd_young(*pmd);
+   dirty = pmd_dirty(*pmd);
 
pgtable = pgtable_trans_huge_withdraw(mm, pmd);
pmd_populate(mm, &_pmd, pgtable);
@@ -2825,12 +2897,14 @@ static void __split_huge_pmd_locked(struct 
vm_area_struct *vma, pmd_t *pmd,
entry = swp_entry_to_pte(swp_entry);
  

[PATCH] ARM: exynos_defconfig: Set recommended options for systemd

2015-11-29 Thread Krzysztof Kozlowski
Set following options to recommended value by systemd (which also
matches the multi_v7 deconfig):
1. Enable AUTOFS4_FS - for systemd.automount [0];
2. Enable BLK_DEV_BSG - SG v4 for recend udev [0][1];
3. Disable UEVENT_HELPER_PATH - legacy hook for hotplug, forked for each
   uevent, slows down booting [0];

[0] http://cgit.freedesktop.org/systemd/systemd/tree/README
[1] http://patchwork.ozlabs.org/patch/47921/

Signed-off-by: Krzysztof Kozlowski 
---
 arch/arm/configs/exynos_defconfig | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/configs/exynos_defconfig 
b/arch/arm/configs/exynos_defconfig
index 409adc1eaf33..24dcd2bb1215 100644
--- a/arch/arm/configs/exynos_defconfig
+++ b/arch/arm/configs/exynos_defconfig
@@ -7,7 +7,6 @@ CONFIG_BLK_DEV_INITRD=y
 CONFIG_KALLSYMS_ALL=y
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
-# CONFIG_BLK_DEV_BSG is not set
 CONFIG_PARTITION_ADVANCED=y
 CONFIG_ARCH_EXYNOS=y
 CONFIG_ARCH_EXYNOS3=y
@@ -44,7 +43,6 @@ CONFIG_IP_PNP_BOOTP=y
 CONFIG_IP_PNP_RARP=y
 CONFIG_CFG80211=y
 CONFIG_RFKILL_REGULATOR=y
-CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
 CONFIG_DEVTMPFS=y
 CONFIG_DEVTMPFS_MOUNT=y
 CONFIG_DMA_CMA=y
@@ -217,6 +215,7 @@ CONFIG_PHY_EXYNOS5250_SATA=y
 CONFIG_EXT2_FS=y
 CONFIG_EXT3_FS=y
 CONFIG_EXT4_FS=y
+CONFIG_AUTOFS4_FS=y
 CONFIG_MSDOS_FS=y
 CONFIG_VFAT_FS=y
 CONFIG_TMPFS=y
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usb: host: pci_quirks: fix memory leak, by adding iounmap

2015-11-29 Thread Saurabh Sengar
pinging in case this patch is lost.


On 6 November 2015 at 17:46, Saurabh Sengar  wrote:
> added iounmap inorder to free memory mapped to base before returning
>
> Signed-off-by: Saurabh Sengar 
> ---
>  drivers/usb/host/pci-quirks.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/usb/host/pci-quirks.c b/drivers/usb/host/pci-quirks.c
> index f940056..332f687 100644
> --- a/drivers/usb/host/pci-quirks.c
> +++ b/drivers/usb/host/pci-quirks.c
> @@ -990,7 +990,7 @@ static void quirk_usb_handoff_xhci(struct pci_dev *pdev)
> /* We're reading garbage from the controller */
> dev_warn(>dev,
>  "xHCI controller failing to respond");
> -   return;
> +   goto iounmap;
> }
>
> if (!ext_cap_offset)
> @@ -1061,7 +1061,7 @@ hc_init:
>  "xHCI HW did not halt within %d usec status = 
> 0x%x\n",
>  XHCI_MAX_HALT_USEC, val);
> }
> -
> +iounmap:
> iounmap(base);
>  }
>
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Cleaning up e820_pmem?

2015-11-29 Thread Andy Lutomirski
My laptop has /sys/devices/platform/e820_pmem and autoloads all the
nvdimm infrastructure.  While it would be really cool if my laptop had
pmem, that's a bit of a pipe dream right now.  (Even if it did have
it, this laptop is brand new -- it should use NFIT, not e820_pmem.)

Could we move the iomem_resource loop from drivers/nvdimm/e820.c to
arch/x86/kernel/pmem.c and actually list the iomem resources the
standard way as resources belonging to the platform device?  That
would match accepted practice, and it would keep the grossly
x86-specific part of the driver in arch/x86.  Then we could further
tweak it to skip creating the platform device at all if there are no
resources, and we'd avoid needlessly loading the module.

I'd do this myself, except that my lovely machine that *does* support
e820 pmem has been repurposed, so testing on a machine that actually
supports this turd is awkward for me.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V1 resend] err.h: add (missing) unlikely() to IS_ERR_OR_NULL()

2015-11-29 Thread Viresh Kumar
On 13-11-15, 13:53, Viresh Kumar wrote:
> On 29-10-15, 07:57, Viresh Kumar wrote:
> > On 13-10-15, 13:57, Viresh Kumar wrote:
> > > IS_ERR_VALUE() already contains it and so we need to add this only to
> > > the !ptr check. That will allow users of IS_ERR_OR_NULL(), to not add
> > > this compiler flag.
> > > 
> > > Signed-off-by: Viresh Kumar 
> > > ---
> > > @Jiri: You have applied all other patches, but this one. Can you please
> > > apply this one as well, as all others were applied based on the
> > > assumption that this one is applied. :)
> > > 
> > >  include/linux/err.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/include/linux/err.h b/include/linux/err.h
> > > index a729120644d5..56762ab41713 100644
> > > --- a/include/linux/err.h
> > > +++ b/include/linux/err.h
> > > @@ -37,7 +37,7 @@ static inline bool __must_check IS_ERR(__force const 
> > > void *ptr)
> > >  
> > >  static inline bool __must_check IS_ERR_OR_NULL(__force const void *ptr)
> > >  {
> > > - return !ptr || IS_ERR_VALUE((unsigned long)ptr);
> > > + return unlikely(!ptr) || IS_ERR_VALUE((unsigned long)ptr);
> > >  }
> > 
> > Ping !!
> 
> Another Ping !!

@Andrew: Will it be possible for you to apply this patch ? Its been on
the lists for over 2 months now.

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] of/address: fix typo in comment block of of_translate_one()

2015-11-29 Thread Masahiro Yamada
Remove the "not" before "cannot".

I am fixing the comment block style while I am here.

Signed-off-by: Masahiro Yamada 
---

 drivers/of/address.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 5289c80..91a469d 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -485,9 +485,10 @@ static int of_translate_one(struct device_node *parent, 
struct of_bus *bus,
int rone;
u64 offset = OF_BAD_ADDR;
 
-   /* Normally, an absence of a "ranges" property means we are
+   /*
+* Normally, an absence of a "ranges" property means we are
 * crossing a non-translatable boundary, and thus the addresses
-* below the current not cannot be converted to CPU physical ones.
+* below the current cannot be converted to CPU physical ones.
 * Unfortunately, while this is very clear in the spec, it's not
 * what Apple understood, and they do have things like /uni-n or
 * /ht nodes with no "ranges" property and a lot of perfectly
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] PCI support added to ARC

2015-11-29 Thread Vineet Gupta
On Monday 30 November 2015 06:30 AM, Bjorn Helgaas wrote:
> + *
> + *  Copyright (C) 2004-2014 Synopsys, Inc. (www.synopsys.com)
>> Perhaps extend this to 2016 (and other copyrights in the patch too if needed)
> What is the reasoning behind claiming a copyright date in the future?
> That doesn't sound right to me.

TBH I could be completely wrong here. That's just a convention I tend to follow
whenever introducing a new file - specially when we are close to a new year.

-Vineet
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Improve spinlock performance by moving work to one core

2015-11-29 Thread Ling Ma
Any comments, the patch is acceptable ?

Thanks
Ling

2015-11-26 17:00 GMT+08:00 Ling Ma :
> Run thread.c with clean kernel  4.3.0-rc4, perf top -G also indicates
> cache_flusharray and cache_alloc_refill functions spend 25.6% time
> on queued_spin_lock_slowpath totally. it means the compared data
> from our spinlock-test.patch is reliable.
>
> Thanks
> Ling
>
> 2015-11-26 11:49 GMT+08:00 Ling Ma :
>> Hi Longman,
>>
>> All compared data is from the below operation in spinlock-test.patch:
>>
>> +#if ORG_QUEUED_SPINLOCK
>> +   org_queued_spin_lock((struct qspinlock *)>list_lock);
>> +   refill_fn();
>> +   org_queued_spin_unlock((struct qspinlock *)>list_lock);
>> +#else
>> +   new_spin_lock((struct nspinlock *)>list_lock, refill_fn, );
>> +#endif
>>
>> and
>>
>> +#if ORG_QUEUED_SPINLOCK
>> +   org_queued_spin_lock((struct qspinlock *)>list_lock);
>> +   flusharray_fn();
>> +   org_queued_spin_unlock((struct qspinlock *)>list_lock);
>> +#else
>> +   new_spin_lock((struct nspinlock *)>list_lock, flusharray_fn, 
>> );
>> +#endif
>>
>> So the result is correct and fair.
>>
>> Yes, we updated the code in include/asm-generic/qspinlock.h to
>> simplified modification and avoid kernel crash,
>> for example there are 10 lock scenarios to use new spin lock,
>> because bottle-neck is only from one or two scenarios, we only modify them,
>> other lock scenarios will continue to use the lock in qspinlock.h, we
>> must modify the code,
>> otherwise the operation will be hooked in the queued and never be waken up.
>>
>> Thanks
>> Ling
>>
>>
>>
>> 2015-11-26 3:05 GMT+08:00 Waiman Long :
>>> On 11/23/2015 04:41 AM, Ling Ma wrote:
 Hi Longman,

 Attachments include user space application thread.c and kernel patch
 spinlock-test.patch based on kernel 4.3.0-rc4

 we run thread.c with kernel patch, test original and new spinlock 
 respectively,
 perf top -G indicates thread.c cause cache_alloc_refill and
 cache_flusharray functions to spend ~25% time on original spinlock,
 after introducing new spinlock in two functions, the cost time become ~22%.

 The printed data  also tell us the new spinlock improves performance
 by about 15%( 93841765576 / 81036259588) on E5-2699V3

 Appreciate your comments.


>>>
>>> I saw that you make the following changes in the code:
>>>
>>> static __always_inline void queued_spin_lock(struct qspinlock *lock)
>>> {
>>> u32 val;
>>> -
>>> +repeat:
>>> val = atomic_cmpxchg(>val, 0, _Q_LOCKED_VAL);
>>> if (likely(val == 0))
>>> return;
>>> - queued_spin_lock_slowpath(lock, val);
>>> + goto repeat;
>>> + //queued_spin_lock_slowpath(lock, val);
>>> }
>>>
>>>
>>> This effectively changes the queued spinlock into an unfair byte lock.
>>> Without a pause to moderate the cmpxchg() call, that is especially bad
>>> for performance. Is the performance data above refers to the unfair byte
>>> lock versus your new spinlock?
>>>
>>> Cheers,
>>> Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] of/address: replace printk(KERN_ERR ...) with pr_err(...)

2015-11-29 Thread Masahiro Yamada
A trivial change suggested by checkpatch.pl.

Signed-off-by: Masahiro Yamada 
---

 drivers/of/address.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index cd53fe4..5289c80 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -596,7 +596,7 @@ static u64 __of_translate_address(struct device_node *dev,
pbus = of_match_bus(parent);
pbus->count_cells(dev, , );
if (!OF_CHECK_COUNTS(pna, pns)) {
-   printk(KERN_ERR "prom_parse: Bad cell count for %s\n",
+   pr_err("prom_parse: Bad cell count for %s\n",
   of_node_full_name(dev));
break;
}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: block: Always check queue limits for cloned requests

2015-11-29 Thread Ming Lei
On Sun, 29 Nov 2015 18:05:06 +0100
Markus Trippelsdorf  wrote:

> On 2015.11.29 at 11:49 -0500, Mike Snitzer wrote:
> > On Sun, Nov 29 2015 at 11:15am -0500,
> > Markus Trippelsdorf  wrote:
> > 
> > > On 2015.11.29 at 16:43 +0100, Hannes Reinecke wrote:
> > > > On 11/29/2015 12:49 PM, Markus Trippelsdorf wrote:
> > > > > 
> > > > > I'm still seeing the issue (BUG at drivers/scsi/scsi_lib.c:1096!) even
> > > > > with this patch applied.
> > > > > 
> > > > > markus@x4 linux % git describe
> > > > > v4.4-rc2-215-g081f3698e606
> > > > > 
> > > > Can you generate a crashdump?
> > > > I would need to cross-check with the other dumps I'm having to figure
> > > > out if this really is the same issue.
> > > > There have been other reports (and fixes) which show we're fighting
> > > > several distinct issues here.
> > > 
> > > Unfortunately no. The crash happens on the disk where I store my log
> > > files. And after it happened the magic SysRq keys don't work anymore.
> > > 
> > > The crash only happens on my spinning rust drive that uses the cfq
> > > scheduler. The SSDs (deadline) are fine.
> > > 
> > > The BUG happens reproducibly when building http://www.sagemath.org/ on
> > > that drive.
> > 
> > Are you using DM multipath?  If unsure, please let us know which
> > device(s) map to the "spinning rust drive", and provide output from:
> > lsblk
> 
> No, I'm not using DM multipath. 


OK, I guess it is still one block merge issue, care to test the
following patch?

The patch can address one issue when bio->bi_seg_front_size
is set as too small mistakenly, then fewer physical segment may
be figured out.

---
>From 7aa725205f400ee6823a0d19bf9f41a2464725ce Mon Sep 17 00:00:00 2001
From: Ming Lei 
Date: Mon, 30 Nov 2015 13:10:12 +0800
Subject: [PATCH] blk-merge: fix computing bio->bi_seg_front_size in case of
 single segment

When bio has only one physical segment, we should set bio's
bi_seg_front_size as the real(final) size of the single segment.

Fixes: 02e707424c2ea(blk-merge: fix blk_bio_segment_split)
Reported-by: Markus Trippelsdorf 
Signed-off-by: Ming Lei 
---
 block/blk-merge.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 41a55ba..e01405a 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -103,6 +103,9 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
bvprv = bv;
bvprvp = 
sectors += bv.bv_len >> 9;
+
+   if (nsegs == 1 && seg_size > front_seg_size)
+   front_seg_size = seg_size;
continue;
}
 new_segment:
-- 
1.9.1









--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] IB/core: constify mmu_notifier_ops structures

2015-11-29 Thread Haggai Eran
On 30/11/2015 00:02, Julia Lawall wrote:
> This mmu_notifier_ops structure is never modified, so declare it as
> const, like the other mmu_notifier_ops structures.
> 
> Done with the help of Coccinelle.
> 
> Signed-off-by: Julia Lawall 

Reviewed-by: Haggai Eran 

Thanks,
Haggai

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: The console log is doubled if earlycon is enabled

2015-11-29 Thread Masahiro Yamada
2015-11-28 9:04 GMT+09:00 Greg Kroah-Hartman :
> On Fri, Nov 27, 2015 at 07:21:06PM +0900, Masahiro Yamada wrote:
>> Hi,
>>
>>
>> If I add "earlycon" to the kernel parameter, the log message
>> on the earlycon is also displayed to the regular console.
>> In other words, the same log messages are displayed twice.
>>
>> I noticed this problem on v4.4-rc1.
>> It has not been fixed in the mainline yet, I think.
>>
>> Anybody who has a clue?
>> (I have not done git-bisect yet.)
>
> Can you do 'git bisect'?


The same problem happened on v4.3.
(I think I just did not notice the problem before.)

So, the bad commit is not in the last merge window.



I also noticed the double-log happens depending on
how the console is specified.


[1]  Good case: both regular console and earlycon are specified via bootargs.

chosen {
 bootargs = "console=ttyS0,115200 earlycon=uniphier,mmio32,0x54006800";
};


[2] Bad case: regular console is given by stdout-path and earlycon is specified
   with parameters in bootargs.

chosen {
bootargs = "earlycon=uniphier,mmio32,0x54006800";
stdout-path = "serial0:115200n8";
};

The early boot log is doubled.


[3] Bad case: regular console is given by stdout-path and
  earlycon is given without parameters in bootargs

chosen {
bootargs = "earlycon";
stdout-path = "serial0:115200n8";
};

The early boot log is doubled.




According to the experiment results,
it looks like earlycon does not get along with stdout-path.



-- 
Best Regards
Masahiro Yamada
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2] ARM: dts: use vmmc-supply of emmc/sd for exynos5422-odroidxu3

2015-11-29 Thread Krzysztof Kozlowski
On 27.11.2015 15:42, Anand Moon wrote:
> hi Krzysztof,
> 
> On 22 October 2015 at 18:34, Anand Moon  wrote:
>> hi Krzysztof,
>>
>> On 22 October 2015 at 06:31, Krzysztof Kozlowski
>>  wrote:
>>> On 20.10.2015 21:56, Anand Moon wrote:
 Changes need for host controller to detect UHS-I highspeed cards.
 Changes in VDDQ_MMC2 voltage range help scale
 the required voltage to detect and load the microSD cards.
>>>
>>> Thanks for updating description of commit.
>>>

 Signed-off-by: Anand Moon 
 ---
 Changes based on 
 git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung.git 
 v4.4-next/dt-samsung branch

 Changes:
 Drop the ranp_delay for LDO9.

 Thanks to : Krzysztof, Doug Anderson, Jaehoon Chung for helping
 me out figure out the mmc core requirement.

 Also drop the previous changes:
 use cd-gpio method to detect sd-card.
 Added UHS-I bus speed support.

 [4.713553] random: nonblocking pool is initialized
 [4.718423] 1453.hdmi supply hdmi-en not found, using dummy 
 regulator
 [4.726206] exynos-drm exynos-drm: bound 1440.fimd (ops 
 fimd_component_ops)
 [4.732555] exynos-drm exynos-drm: bound 1445.mixer (ops 
 mixer_component_ops)
 [4.740180] exynos-drm exynos-drm: bound 1453.hdmi (ops 
 hdmi_component_ops)
 [4.746936] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
 [4.753428] [drm] No driver support for vblank timestamp query.
 [4.940794] Console: switching to colour frame buffer device 274x77
 [4.995344] exynos-drm exynos-drm: fb0:  frame buffer device
 [5.024573] [drm] Initialized exynos 1.0.0 20110530 on minor 0
 [5.031164] exynos-dwc3 usb@1200: no suspend clk specified
 [5.054571] usb 2-1: new full-speed USB device number 2 using 
 exynos-ohci
 [5.159527] dwmmc_exynos 1222.mmc: Busy; trying anyway
 [5.163705] mmc_host mmc1: Timeout sending command (cmd 0x202000 arg 
 0x0 status 0x0)
 ---
  arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

 diff --git a/arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi 
 b/arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi
 index 1af5bdc..a4be3e0 100644
 --- a/arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi
 +++ b/arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi
 @@ -182,9 +182,10 @@

   ldo13_reg: LDO13 {
   regulator-name = "vdd_ldo13";
 - regulator-min-microvolt = <280>;
 + regulator-min-microvolt = <180>;
>>>
>>> You did not convinced me in previous discussion about the change to
>>> 1.8V. I said that:
 On the same diagram few lines below:
 VDDQ_MMC2: 2.8V 250mA
>>>
>>> You responded:
 You are correct.
>>>
>>> So I am confused. Are you sure that this SD card block can/should
>>> operate on 1.8V? Have you actually tried this?
>>>
>>
>> Look like I missed this point. Here is the link I would like to share.
>>
>> http://www.hjreggel.net/cardspeed/cs_sdxc.html
>> Section: Summary of SD modes
>>
>> https://en.wikipedia.org/wiki/Secure_Digital
>> Section: Power consumption
>>
>> Their different requirement for voltage requirement for UHS-I, the max
>> value is around 3.3V
> 
> Do you have any comment on this voltage selection for UHS-I card (1.8V).

I asked whether you tried this, whether setting real 1.8V works fine.
You did not respond. As you can see on Odroid schematics, the VDDQ for
MMC[01] operates under 1.8V.

The VDDQ for MMC2 - under 2.8V.

In commit description you mentioned that this voltage "helps scale the
required voltage to detect and load the microSD cards". What does it
mean "help"? I would expect that detecting and loading of microSD cards
either works or does not work. I am not sure how does it help.

Best regards,
Krzysztof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: fec: fix enet_out clock handling

2015-11-29 Thread Lothar Waßmann
Hi,

> On Fri, Nov 27, 2015 at 02:39:10PM +0100, Lothar Waßmann wrote:
> > When ENET_OUT is being used as reference clock for an external PHY,
> > the clock must not be disabled while the PHY is active. Otherwise the
> > PHY may lose its internal state and require a reset to become
> > functional again.
> > 
> > A symptom for this bug is a network interface that constantly toggles
> > between UP and DOWN state:
> > fec 800f.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
> > fec 800f.ethernet eth0: Link is Down
> > fec 800f.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
> > fec 800f.ethernet eth0: Link is Down
> 
> Hi Lothar
> 
> When does this up/down happen? During normal operation when the link
> is administrative up?
> 
If booting with NFSROOT the rootfs cannot be mounted, because when the
interface is brought up the PHY starts toggling the link state.

> When did this start happening? Did it happen before 
> 8fff755e9f8d net: fec: Ensure clocks are enabled while using mdio bus
> 
No. The behaviour started exactly with this commit.

Lothar Waßmann
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 3/3] thermal: mediatek: Add cpu dynamic power cooling model.

2015-11-29 Thread Viresh Kumar
On 27-11-15, 17:32, Dawei Chien wrote:
> MT8173 cpufreq driver use of_cpufreq_power_cooling_register registering
> cooling devices with dynamic power coefficient.
> 
> Signed-off-by: Dawei Chien 
> ---
> This patch is base on patchset:
> https://lkml.org/lkml/2015/11/17/251
> ---
>  drivers/cpufreq/mt8173-cpufreq.c |   28 
>  1 file changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/cpufreq/mt8173-cpufreq.c 
> b/drivers/cpufreq/mt8173-cpufreq.c
> index 83001dc..4d39468 100644
> --- a/drivers/cpufreq/mt8173-cpufreq.c
> +++ b/drivers/cpufreq/mt8173-cpufreq.c
> @@ -263,24 +263,34 @@ static int mtk_cpufreq_set_target(struct cpufreq_policy 
> *policy,
>   return 0;
>  }
>  
> +#define DYNAMIC_POWER "dynamic-power-coefficient"
> +
>  static void mtk_cpufreq_ready(struct cpufreq_policy *policy)
>  {
>   struct mtk_cpu_dvfs_info *info = policy->driver_data;
>   struct device_node *np = of_node_get(info->cpu_dev->of_node);
> + u32 capacitance;
>  
>   if (WARN_ON(!np))
>   return;
>  
>   if (of_find_property(np, "#cooling-cells", NULL)) {
> - info->cdev = of_cpufreq_cooling_register(np,
> -  policy->related_cpus);
> + if (!info->cdev) {

Why will info->cdev be non-NULL here ?

> + of_property_read_u32(np, DYNAMIC_POWER, );

This can fail, in which case capacitance will be used uninitialized.
Fix that by initializing it with 0 at the beginning of this routine.

> + info->cdev = of_cpufreq_power_cooling_register(np,
> + policy->related_cpus,
> + capacitance,
> + NULL);
>  
> - if (IS_ERR(info->cdev)) {
> - dev_err(info->cpu_dev,
> - "running cpufreq without cooling device: %ld\n",
> - PTR_ERR(info->cdev));
> + if (IS_ERR(info->cdev)) {
> + dev_err(info->cpu_dev,
> + "running cpufreq without cooling 
> device: %ld\n",
> + PTR_ERR(info->cdev));
>  
> - info->cdev = NULL;
> + info->cdev = NULL;
> + }
>   }
>   }
>  
> @@ -460,7 +470,9 @@ static int mtk_cpufreq_exit(struct cpufreq_policy *policy)
>  {
>   struct mtk_cpu_dvfs_info *info = policy->driver_data;
>  
> - cpufreq_cooling_unregister(info->cdev);
> + if (info->cdev)
> + cpufreq_cooling_unregister(info->cdev);
> +

Why do you need to update this?

>   dev_pm_opp_free_cpufreq_table(info->cpu_dev, >freq_table);
>   mtk_cpu_dvfs_info_release(info);
>   kfree(info);
> -- 
> 1.7.9.5

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] usb: dwc2: add ep enabled flag to avoid double enable/disable

2015-11-29 Thread changbin . du
From: "Du, Changbin" 

Enabling a already enabled ep is illegal, because the ep may has trbs
running. Reprogram the ep may break running transfer. So udc driver
must avoid this happening by return an error -EBUSY. Gadget function
driver also should avoid such things, but that is out of udc driver.

Similarly, disable a disabled ep makes no sense, but no need return
an error here.

Signed-off-by: Du, Changbin 
---
 drivers/usb/dwc2/core.h   |  1 +
 drivers/usb/dwc2/gadget.c | 20 +++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
index a66d3cb..cf7eccd 100644
--- a/drivers/usb/dwc2/core.h
+++ b/drivers/usb/dwc2/core.h
@@ -162,6 +162,7 @@ struct dwc2_hsotg_ep {
unsigned char   mc;
unsigned char   interval;
 
+   unsigned intenabled:1;
unsigned inthalted:1;
unsigned intperiodic:1;
unsigned intisochronous:1;
diff --git a/drivers/usb/dwc2/gadget.c b/drivers/usb/dwc2/gadget.c
index 0abf73c..586bbcd 100644
--- a/drivers/usb/dwc2/gadget.c
+++ b/drivers/usb/dwc2/gadget.c
@@ -2423,6 +2423,7 @@ void dwc2_hsotg_core_init_disconnected(struct dwc2_hsotg 
*hsotg,
/* enable, but don't activate EP0in */
dwc2_writel(dwc2_hsotg_ep0_mps(hsotg->eps_out[0]->ep.maxpacket) |
   DXEPCTL_USBACTEP, hsotg->regs + DIEPCTL0);
+   hsotg->eps_out[0]->enabled = 1;
 
dwc2_hsotg_enqueue_setup(hsotg);
 
@@ -2680,6 +2681,14 @@ static int dwc2_hsotg_ep_enable(struct usb_ep *ep,
return -EINVAL;
}
 
+   spin_lock_irqsave(>lock, flags);
+   if (hs_ep->enabled) {
+   dev_warn(hsotg->dev, "%s: ep %s already enabled\n",
+   __func__, hs_ep->name);
+   ret = -EBUSY;
+   goto error;
+   }
+
mps = usb_endpoint_maxp(desc);
 
/* note, we handle this here instead of dwc2_hsotg_set_ep_maxpacket */
@@ -2690,7 +2699,6 @@ static int dwc2_hsotg_ep_enable(struct usb_ep *ep,
dev_dbg(hsotg->dev, "%s: read DxEPCTL=0x%08x from 0x%08x\n",
__func__, epctrl, epctrl_reg);
 
-   spin_lock_irqsave(>lock, flags);
 
epctrl &= ~(DXEPCTL_EPTYPE_MASK | DXEPCTL_MPS_MASK);
epctrl |= DXEPCTL_MPS(mps);
@@ -2806,6 +2814,8 @@ static int dwc2_hsotg_ep_enable(struct usb_ep *ep,
/* enable the endpoint interrupt */
dwc2_hsotg_ctrl_epint(hsotg, index, dir_in, 1);
 
+   hs_ep->enabled = 1;
+
 error:
spin_unlock_irqrestore(>lock, flags);
return ret;
@@ -2835,6 +2845,11 @@ static int dwc2_hsotg_ep_disable(struct usb_ep *ep)
epctrl_reg = dir_in ? DIEPCTL(index) : DOEPCTL(index);
 
spin_lock_irqsave(>lock, flags);
+   if (!hs_ep->enabled) {
+   dev_warn(hsotg->dev, "%s: ep %s already disabled\n",
+   __func__, hs_ep->name);
+   goto out;
+   }
 
hsotg->fifo_map &= ~(1fifo_index = 0;
@@ -2854,6 +2869,9 @@ static int dwc2_hsotg_ep_disable(struct usb_ep *ep)
/* terminate all requests with shutdown */
kill_all_requests(hsotg, hs_ep, -ESHUTDOWN);
 
+   hs_ep->enabled = 0;
+
+out:
spin_unlock_irqrestore(>lock, flags);
return 0;
 }
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] usb: dwc2: forbid queuing request to a disabled ep

2015-11-29 Thread changbin . du
From: "Du, Changbin" 

Queue a request to disabled ep  doesn't make sense, and induce caller
make mistakes.

Here is a example for the android mtp gadget function driver. A mem
corruption can happen on below senario.
1) On disconnect, mtp driver disable its EPs,
2) During send_file_work and receive_file_work, mtp queues a request
   to ep. (The mtp driver need improve its synchronization logic!)
3) mtp_function_unbind is invoked and all mtp requests are freed.
4) when dwc2 process the request queued on step 2, will cause kernel
   NULL pointer dereference exception.

Signed-off-by: Du, Changbin 
---
 drivers/usb/dwc2/gadget.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/usb/dwc2/gadget.c b/drivers/usb/dwc2/gadget.c
index 586bbcd..4d637ab 100644
--- a/drivers/usb/dwc2/gadget.c
+++ b/drivers/usb/dwc2/gadget.c
@@ -786,6 +786,12 @@ static int dwc2_hsotg_ep_queue(struct usb_ep *ep, struct 
usb_request *req,
ep->name, req, req->length, req->buf, req->no_interrupt,
req->zero, req->short_not_ok);
 
+   if (!hs_ep->enabled) {
+   dev_warn(hs->dev, "%s: cannot queue to disabled ep\n",
+   __func__);
+   return -ESHUTDOWN;
+   }
+
/* Prevent new request submission when controller is suspended */
if (hs->lx_state == DWC2_L2) {
dev_dbg(hs->dev, "%s: don't submit request while suspended\n",
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] Two fix for dwc2 gadget driver

2015-11-29 Thread changbin . du
From: "Du, Changbin" 

With the first patch, enable a enabled ep will return -EBUSY.
The second patch forbid queuing on disabled ep to avoid panic.

Du, Changbin (2):
  usb: dwc2: add ep enabled flag to avoid double enable/disable
  usb: dwc2: forbid queuing request to a disabled ep

 drivers/usb/dwc2/core.h   |  1 +
 drivers/usb/dwc2/gadget.c | 26 +-
 2 files changed, 26 insertions(+), 1 deletion(-)

-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 3/5] mtd: devices: m25p80: add support for mmap read request

2015-11-29 Thread Vignesh R
Certain spi controllers may provide accelerated interface to read from
m25p80 type flash devices. This interface provides better read
performance than regular SPI interface.
Call spi_flash_read(), if supported, to make use of such interface.

Signed-off-by: Vignesh R 
---

v4: 
 * Use spi_flash_read_message struct to pass args.
 * support passing of opcode/addr/data nbits.

 drivers/mtd/devices/m25p80.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c
index fe9ceb7b5405..00094a668c62 100644
--- a/drivers/mtd/devices/m25p80.c
+++ b/drivers/mtd/devices/m25p80.c
@@ -131,6 +131,26 @@ static int m25p80_read(struct spi_nor *nor, loff_t from, 
size_t len,
/* convert the dummy cycles to the number of bytes */
dummy /= 8;
 
+   if (spi_flash_read_supported(spi)) {
+   struct spi_flash_read_message msg;
+   int ret;
+
+   msg.buf = buf;
+   msg.from = from;
+   msg.len = len;
+   msg.read_opcode = nor->read_opcode;
+   msg.addr_width = nor->addr_width;
+   msg.dummy_bytes = dummy;
+   /* TODO: Support other combinations */
+   msg.opcode_nbits = SPI_NBITS_SINGLE;
+   msg.addr_nbits = SPI_NBITS_SINGLE;
+   msg.data_nbits = m25p80_rx_nbits(nor);
+
+   ret = spi_flash_read(spi, );
+   *retlen = msg.retlen;
+   return ret;
+   }
+
spi_message_init();
memset(t, 0, (sizeof t));
 
-- 
2.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 1/5] spi: introduce accelerated read support for spi flash devices

2015-11-29 Thread Vignesh R
In addition to providing direct access to SPI bus, some spi controller
hardwares (like ti-qspi) provide special port (like memory mapped port)
that are optimized to improve SPI flash read performance.
This means the controller can automatically send the SPI signals
required to read data from the SPI flash device.
For this, SPI controller needs to know flash specific information like
read command to use, dummy bytes and address width.

Introduce spi_flash_read() interface to support accelerated read
over SPI flash devices. SPI master drivers can implement this callback to
support interfaces such as memory mapped read etc. m25p80 flash driver
and other flash drivers can call this make use of such interfaces. The
interface should only be used with SPI flashes and cannot be used with
other SPI devices.

Signed-off-by: Vignesh R 
---
 drivers/spi/spi.c   | 45 +
 include/linux/spi/spi.h | 41 +
 2 files changed, 86 insertions(+)

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index e2415be209d5..cc2b54139352 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1134,6 +1134,7 @@ static void __spi_pump_messages(struct spi_master 
*master, bool in_kthread)
}
}
 
+   mutex_lock(>bus_lock_mutex);
trace_spi_message_start(master->cur_msg);
 
if (master->prepare_message) {
@@ -1143,6 +1144,7 @@ static void __spi_pump_messages(struct spi_master 
*master, bool in_kthread)
"failed to prepare message: %d\n", ret);
master->cur_msg->status = ret;
spi_finalize_current_message(master);
+   mutex_unlock(>bus_lock_mutex);
return;
}
master->cur_msg_prepared = true;
@@ -1152,6 +1154,7 @@ static void __spi_pump_messages(struct spi_master 
*master, bool in_kthread)
if (ret) {
master->cur_msg->status = ret;
spi_finalize_current_message(master);
+   mutex_unlock(>bus_lock_mutex);
return;
}
 
@@ -1159,8 +1162,10 @@ static void __spi_pump_messages(struct spi_master 
*master, bool in_kthread)
if (ret) {
dev_err(>dev,
"failed to transfer one message from queue\n");
+   mutex_unlock(>bus_lock_mutex);
return;
}
+   mutex_unlock(>bus_lock_mutex);
 }
 
 /**
@@ -2327,6 +2332,46 @@ int spi_async_locked(struct spi_device *spi, struct 
spi_message *message)
 EXPORT_SYMBOL_GPL(spi_async_locked);
 
 
+int spi_flash_read(struct spi_device *spi,
+  struct spi_flash_read_message *msg)
+
+{
+   struct spi_master *master = spi->master;
+   int ret;
+
+   if ((msg->opcode_nbits == SPI_NBITS_DUAL ||
+msg->addr_nbits == SPI_NBITS_DUAL) &&
+   !(spi->mode & (SPI_TX_DUAL | SPI_TX_QUAD)))
+   return -EINVAL;
+   if ((msg->opcode_nbits == SPI_NBITS_QUAD ||
+msg->addr_nbits == SPI_NBITS_QUAD) &&
+   !(spi->mode & SPI_TX_QUAD))
+   return -EINVAL;
+   if (msg->data_nbits == SPI_NBITS_DUAL &&
+   !(spi->mode & (SPI_RX_DUAL | SPI_RX_QUAD)))
+   return -EINVAL;
+   if (msg->data_nbits == SPI_NBITS_QUAD &&
+   !(spi->mode &  SPI_RX_QUAD))
+   return -EINVAL;
+
+   if (master->auto_runtime_pm) {
+   ret = pm_runtime_get_sync(master->dev.parent);
+   if (ret < 0) {
+   dev_err(>dev, "Failed to power device: %d\n",
+   ret);
+   return ret;
+   }
+   }
+   mutex_lock(>bus_lock_mutex);
+   ret = master->spi_flash_read(spi, msg);
+   mutex_unlock(>bus_lock_mutex);
+   if (master->auto_runtime_pm)
+   pm_runtime_put(master->dev.parent);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(spi_flash_read);
+
 /*-*/
 
 /* Utility methods for SPI master protocol drivers, layered on
diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
index cce80e6dc7d1..246d7d519f3f 100644
--- a/include/linux/spi/spi.h
+++ b/include/linux/spi/spi.h
@@ -25,6 +25,7 @@
 struct dma_chan;
 struct spi_master;
 struct spi_transfer;
+struct spi_flash_read_message;
 
 /*
  * INTERFACES between SPI master-side drivers and SPI infrastructure.
@@ -361,6 +362,8 @@ static inline void spi_unregister_driver(struct spi_driver 
*sdrv)
  * @handle_err: the subsystem calls the driver to handle an error that occurs
  * in the generic implementation of transfer_one_message().
  * @unprepare_message: undo any work done by prepare_message().
+ * @spi_flash_read: to support spi-controller hardwares that provide
+ *  accelerated interface to read from flash devices.
  * @cs_gpios: Array 

[PATCH v4 2/5] spi: spi-ti-qspi: add mmap mode read support

2015-11-29 Thread Vignesh R
ti-qspi controller provides mmap port to read data from SPI flashes.
mmap port is enabled in QSPI_SPI_SWITCH_REG. ctrl module register may
also need to be accessed for some SoCs. The QSPI_SPI_SETUP_REGx needs to
be populated with flash specific information like read opcode, read
mode(quad, dual, normal), address width and dummy bytes. Once,
controller is in mmap mode, the whole flash memory is available as a
memory region at SoC specific address. This region can be accessed using
normal memcpy() (or mem-to-mem dma copy). The ti-qspi controller hardware
will internally communicate with SPI flash over SPI bus and get the
requested data.

Implement spi_flash_read() callback to support mmap read over SPI
flash devices. With this, the read throughput increases from ~100kB/s to
~2.5 MB/s.

Signed-off-by: Vignesh R 
---

 drivers/spi/spi-ti-qspi.c | 101 ++
 1 file changed, 94 insertions(+), 7 deletions(-)

diff --git a/drivers/spi/spi-ti-qspi.c b/drivers/spi/spi-ti-qspi.c
index 64318fcfacf2..cd4e63f45e65 100644
--- a/drivers/spi/spi-ti-qspi.c
+++ b/drivers/spi/spi-ti-qspi.c
@@ -56,6 +56,7 @@ struct ti_qspi {
u32 dc;
 
bool ctrl_mod;
+   bool mmap_enabled;
 };
 
 #define QSPI_PID   (0x0)
@@ -65,11 +66,8 @@ struct ti_qspi {
 #define QSPI_SPI_CMD_REG   (0x48)
 #define QSPI_SPI_STATUS_REG(0x4c)
 #define QSPI_SPI_DATA_REG  (0x50)
-#define QSPI_SPI_SETUP0_REG(0x54)
+#define QSPI_SPI_SETUP_REG(n)  ((0x54 + 4 * n))
 #define QSPI_SPI_SWITCH_REG(0x64)
-#define QSPI_SPI_SETUP1_REG(0x58)
-#define QSPI_SPI_SETUP2_REG(0x5c)
-#define QSPI_SPI_SETUP3_REG(0x60)
 #define QSPI_SPI_DATA_REG_1(0x68)
 #define QSPI_SPI_DATA_REG_2(0x6c)
 #define QSPI_SPI_DATA_REG_3(0x70)
@@ -109,6 +107,16 @@ struct ti_qspi {
 
 #define QSPI_AUTOSUSPEND_TIMEOUT 2000
 
+#define MEM_CS_EN(n)   ((n + 1) << 8)
+
+#define MM_SWITCH  0x1
+
+#define QSPI_SETUP_RD_NORMAL   (0x0 << 12)
+#define QSPI_SETUP_RD_DUAL (0x1 << 12)
+#define QSPI_SETUP_RD_QUAD (0x3 << 12)
+#define QSPI_SETUP_ADDR_SHIFT  8
+#define QSPI_SETUP_DUMMY_SHIFT 10
+
 static inline unsigned long ti_qspi_read(struct ti_qspi *qspi,
unsigned long reg)
 {
@@ -366,6 +374,78 @@ static int qspi_transfer_msg(struct ti_qspi *qspi, struct 
spi_transfer *t)
return 0;
 }
 
+static void ti_qspi_enable_memory_map(struct spi_device *spi)
+{
+   struct ti_qspi  *qspi = spi_master_get_devdata(spi->master);
+   u32 val;
+
+   ti_qspi_write(qspi, MM_SWITCH, QSPI_SPI_SWITCH_REG);
+   if (qspi->ctrl_mod) {
+   val = readl(qspi->ctrl_base);
+   val |= MEM_CS_EN(spi->chip_select);
+   writel(val, qspi->ctrl_base);
+   /* dummy readl to ensure bus sync */
+   readl(qspi->ctrl_base);
+   }
+   qspi->mmap_enabled = true;
+}
+
+static void ti_qspi_disable_memory_map(struct spi_device *spi)
+{
+   struct ti_qspi  *qspi = spi_master_get_devdata(spi->master);
+   u32 val;
+
+   ti_qspi_write(qspi, 0, QSPI_SPI_SWITCH_REG);
+   if (qspi->ctrl_mod) {
+   val = readl(qspi->ctrl_base);
+   val &= ~MEM_CS_EN(spi->chip_select);
+   writel(val, qspi->ctrl_base);
+   }
+   qspi->mmap_enabled = false;
+}
+
+static void ti_qspi_setup_mmap_read(struct spi_device *spi,
+   struct spi_flash_read_message *msg)
+{
+   struct ti_qspi  *qspi = spi_master_get_devdata(spi->master);
+   u32 memval = msg->read_opcode;
+
+   switch (msg->data_nbits) {
+   case SPI_NBITS_QUAD:
+   memval |= QSPI_SETUP_RD_QUAD;
+   break;
+   case SPI_NBITS_DUAL:
+   memval |= QSPI_SETUP_RD_DUAL;
+   break;
+   default:
+   memval |= QSPI_SETUP_RD_NORMAL;
+   break;
+   }
+   memval |= ((msg->addr_width - 1) << QSPI_SETUP_ADDR_SHIFT |
+  msg->dummy_bytes << QSPI_SETUP_DUMMY_SHIFT);
+   ti_qspi_write(qspi, memval,
+ QSPI_SPI_SETUP_REG(spi->chip_select));
+}
+
+static int ti_qspi_spi_flash_read(struct  spi_device *spi,
+ struct spi_flash_read_message *msg)
+{
+   struct ti_qspi *qspi = spi_master_get_devdata(spi->master);
+   int ret = 0;
+
+   mutex_lock(>list_lock);
+
+   if (!qspi->mmap_enabled)
+   ti_qspi_enable_memory_map(spi);
+   ti_qspi_setup_mmap_read(spi, msg);
+   memcpy_fromio(msg->buf, qspi->mmap_base + msg->from, msg->len);
+   msg->retlen = msg->len;
+
+   mutex_unlock(>list_lock);
+
+   return ret;
+}
+
 static int ti_qspi_start_transfer_one(struct spi_master *master,
struct spi_message *m)
 {
@@ -398,6 +478,9 

[PATCH v4 5/5] ARM: dts: AM4372: add entry for qspi mmap region

2015-11-29 Thread Vignesh R
Add qspi memory mapped region entries for AM43xx based SoCs. Also,
update the binding documents for the controller to document this change.

Acked-by: Rob Herring 
Signed-off-by: Vignesh R 
---

v4: No changes.

 Documentation/devicetree/bindings/spi/ti_qspi.txt | 5 +++--
 arch/arm/boot/dts/am4372.dtsi | 4 +++-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/spi/ti_qspi.txt 
b/Documentation/devicetree/bindings/spi/ti_qspi.txt
index 334aa3f32cbc..5a1542eda387 100644
--- a/Documentation/devicetree/bindings/spi/ti_qspi.txt
+++ b/Documentation/devicetree/bindings/spi/ti_qspi.txt
@@ -17,9 +17,10 @@ Recommended properties:
 
 Example:
 
+For am4372:
 qspi: qspi@4b30 {
-   compatible = "ti,dra7xxx-qspi";
-   reg = <0x4790 0x100>, <0x3000 0x3ff>;
+   compatible = "ti,am4372-qspi";
+   reg = <0x4790 0x100>, <0x3000 0x400>;
reg-names = "qspi_base", "qspi_mmap";
#address-cells = <1>;
#size-cells = <0>;
diff --git a/arch/arm/boot/dts/am4372.dtsi b/arch/arm/boot/dts/am4372.dtsi
index d83ff9c9701e..e32d164102d1 100644
--- a/arch/arm/boot/dts/am4372.dtsi
+++ b/arch/arm/boot/dts/am4372.dtsi
@@ -963,7 +963,9 @@
 
qspi: qspi@4790 {
compatible = "ti,am4372-qspi";
-   reg = <0x4790 0x100>;
+   reg = <0x4790 0x100>,
+ <0x3000 0x400>;
+   reg-names = "qspi_base", "qspi_mmap";
#address-cells = <1>;
#size-cells = <0>;
ti,hwmods = "qspi";
-- 
2.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 4/5] ARM: dts: DRA7: add entry for qspi mmap region

2015-11-29 Thread Vignesh R
Add qspi memory mapped region entries for DRA7xx based SoCs. Also,
update the binding documents for the controller to document this change.

Acked-by: Rob Herring 
Signed-off-by: Vignesh R 
---

v4: No changes.

 Documentation/devicetree/bindings/spi/ti_qspi.txt | 14 ++
 arch/arm/boot/dts/dra7.dtsi   |  7 +--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/spi/ti_qspi.txt 
b/Documentation/devicetree/bindings/spi/ti_qspi.txt
index 601a360531a5..334aa3f32cbc 100644
--- a/Documentation/devicetree/bindings/spi/ti_qspi.txt
+++ b/Documentation/devicetree/bindings/spi/ti_qspi.txt
@@ -26,3 +26,17 @@ qspi: qspi@4b30 {
spi-max-frequency = <2500>;
ti,hwmods = "qspi";
 };
+
+For dra7xx:
+qspi: qspi@4b30 {
+   compatible = "ti,dra7xxx-qspi";
+   reg = <0x4b30 0x100>,
+ <0x5c00 0x400>,
+ <0x4a002558 0x4>;
+   reg-names = "qspi_base", "qspi_mmap",
+   "qspi_ctrlmod";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   spi-max-frequency = <4800>;
+   ti,hwmods = "qspi";
+};
diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi
index fe99231cbde5..debe7523643d 100644
--- a/arch/arm/boot/dts/dra7.dtsi
+++ b/arch/arm/boot/dts/dra7.dtsi
@@ -1153,8 +1153,11 @@
 
qspi: qspi@4b30 {
compatible = "ti,dra7xxx-qspi";
-   reg = <0x4b30 0x100>;
-   reg-names = "qspi_base";
+   reg = <0x4b30 0x100>,
+ <0x5c00 0x400>,
+ <0x4a002558 0x4>;
+   reg-names = "qspi_base", "qspi_mmap",
+   "qspi_ctrlmod";
#address-cells = <1>;
#size-cells = <0>;
ti,hwmods = "qspi";
-- 
2.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 0/5] Add memory mapped read support for ti-qspi

2015-11-29 Thread Vignesh R
Changes since v3:
Rework to introduce spi_flash_read_message struct.
Support different opcode/addr/data formats as per Brian's suggestion
here: https://lkml.org/lkml/2015/11/11/454

Changes since v2:
Remove mmap_lock_mutex.
Optimize enable/disable of mmap mode.

Changes since v1:
Introduce API in SPI core that MTD flash driver can call for mmap read
instead of directly calling spi-master driver callback. This API makes
sure that SPI core msg queue is locked during mmap transfers.
v1: https://lkml.org/lkml/2015/9/4/103


Cover letter:

This patch series adds support for memory mapped read port of ti-qspi.
ti-qspi has a special memory mapped port through which SPI flash
memories can be accessed directly via SoC specific memory region.

First patch adds a method to pass flash specific information like read
opcode, dummy bytes etc and to request mmap read. Second patch
implements mmap read method in ti-qspi driver. Patch 3 adapts m25p80 to
use mmap read method before trying normal SPI transfer. Patch 4 and 5
add memory map region DT entries for DRA7xx and AM43xx SoCs.

This patch series is based on the discussions here:
http://www.spinics.net/lists/linux-spi/msg04796.html

Tested on DRA74 EVM and AM437x-SK.
Read performance increases from ~100kB/s to ~2.5MB/s.


Vignesh R (5):
  spi: introduce accelerated read support for spi flash devices
  spi: spi-ti-qspi: add mmap mode read support
  mtd: devices: m25p80: add support for mmap read request
  ARM: dts: DRA7: add entry for qspi mmap region
  ARM: dts: AM4372: add entry for qspi mmap region

 Documentation/devicetree/bindings/spi/ti_qspi.txt |  19 +++-
 arch/arm/boot/dts/am4372.dtsi |   4 +-
 arch/arm/boot/dts/dra7.dtsi   |   7 +-
 drivers/mtd/devices/m25p80.c  |  20 +
 drivers/spi/spi-ti-qspi.c | 101 --
 drivers/spi/spi.c |  45 ++
 include/linux/spi/spi.h   |  41 +
 7 files changed, 225 insertions(+), 12 deletions(-)

-- 
2.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v11 10/24] perf config: Document variables for 'call-graph' section in man page

2015-11-29 Thread Namhyung Kim
On Mon, Nov 30, 2015 at 10:42:00AM +0900, Taeung Song wrote:
> Hi, Namhyung

Hi Taeung,

> > On Nov 18, 2015, at 11:51 AM, Namhyung Kim  wrote:
> > 
> > On Tue, Nov 17, 2015 at 10:53:30PM +0900, Taeung Song wrote:
> >> Explain 'call-graph' section and its variables.
> >> 
> >> 'record-mode', 'dump-size', 'print-type', 'order',
> >> 'sort-key', 'threshold' and 'print-limit'.
> >> 
> >> Cc: Namhyung Kim 
> >> Cc: Jiri Olsa 
> >> Signed-off-by: Taeung Song 
> >> ---
> >> tools/perf/Documentation/perf-config.txt | 65 
> >> 
> >> 1 file changed, 65 insertions(+)
> >> 
> >> diff --git a/tools/perf/Documentation/perf-config.txt 
> >> b/tools/perf/Documentation/perf-config.txt
> >> index 7d386d4..dc659d6 100644
> >> --- a/tools/perf/Documentation/perf-config.txt
> >> +++ b/tools/perf/Documentation/perf-config.txt
> >> @@ -285,6 +285,71 @@ ui.*::
> >>There're columns as header 'Overhead', 'Children', 'Shared 
> >> Object', 'Symbol', 'self'.
> >>If this option is false, they are hiden. This option is only 
> >> applied to TUI.
> >> 
> >> +call-graph.*::
> >> +  When sub-commands 'top' and 'report' work with -g/—-children
> >> +  there're options in control of call-graph.
> >> +
> >> +  call-graph.record-mode::
> >> +  The record-mode can be 'fp' (frame pointer) and 'dwarf'.
> > 
> > Also 'lbr' can be used, but it only work for recent intel cpus.
> > 
> > 
> >> +  The value of 'dwarf' is effective only if perf detect needed 
> >> library
> >> +  (libunwind or a recent version of libdw).  Also it doesn't 
> >> *require*
> >> +  the dump-size option since it can use the default value of 8192 
> >> if
> >> +  missing.
> > 
> > I think the last sentence can be omitted.
> > 
> > 
> >> +
> >> +  call-graph.dump-size::
> >> +  The size of stack to dump in order to do post-unwinding. 
> >> Default is 8192 (byte).
> >> +  When using dwarf into record-mode this option should have a 
> >> value.
> > 
> > This contradicts the above, it'll use the default size if omitted.
> > 
> > 
> >> +
> >> +  call-graph.print-type::
> >> +  The print-types can be graph (graph absolute), fractal (graph 
> >> relative), flat.

The 'folded' print type was added recently.  Please update it too.


> >> +  This option controls a way to show overhead for each callchain 
> >> entry.
> >> +  Suppose a following example.
> >> +
> >> +  Overhead  Symbols
> >> +    ...
> >> +40.00%  foo
> >> +|
> >> +--- foo
> >> +|
> >> +|--50.00%-- bar
> >> +|   main
> >> +|
> >> +--50.00%-- baz
> >> +   main
> > 
> >  ^
> >   it needs one more whitespace
> > 
> 
> I checked this patch file and whitespace and tab characters on it.
> But I think the lacking whitespace  because of mail client.
> After 'make install’ I checked it but there is no lack of whitespace.
> Are there different problems that I missed ?

I meant whitespace in the callchain graph (i.e. 'bar' and 'baz' should
be aligned).  But I think I was wrong - it should look like below:


  Overhead  Symbols
    ...
40.00%  foo
|
---foo
   |
   |--50.00%--bar
   |  main
   |
--50.00%--baz
  main


Maybe it's because you used TAB characters for indent?

Thanks,
Namhyung


> 
> > 
> >> +
> >> +  This output is a 'fractal' format. The 'foo' came from 'bar' 
> >> and 'baz' exactly
> >> +  half and half so 'fractal' shows 50.00% for each
> >> +  (meaning that it assumes 100% total overhead of 'foo').
> >> +
> >> +  The 'graph' uses absolute overhead value of 'foo' as total so 
> >> each of
> >> +  'bar' and 'baz' callchain will have 20.00% of overhead.
> >> +
> >> +  call-graph.order::
> >> +  This option controls print order of callchains. The default is
> >> +  'callee' which means callee is printed at top and then followed 
> >> by its
> >> +  caller and so on. The 'caller' prints it in reverse order.
> >> +
> >> +  If this option is not set and report.children or top.children is
> >> +  set to true (or the equivalent command line option is given),
> >> +  the default value of this option is changed to 'caller' for the
> >> +  execution of 'perf report' or 'perf top'. Other commands will
> >> +  still default to 'callee'.
> >> +
> >> +  call-graph.sort-key::
> >> +  The callchains are merged if they contain same information.
> >> +  The sort-key option determines a way to compare the callchains.
> >> +  A value of 'sort-key' can be 'function' or 'address'.
> >> +  The default is 'function'.
> >> 

[PATCH] sched: Move sched_to_prio arrays out of line

2015-11-29 Thread Andi Kleen
From: Andi Kleen 

When building a kernel with a gcc 6 snapshot the compiler complains
about unused const static variables for prio_to_weight and prio_to_mult
for multiple scheduler files (all but core.c and autogroup.c)

The way the array is currently declared it will be duplicated in
every scheduler file that includes sched.h, which seems rather wasteful.

Move the array out of line into core.c. I also added a sched_ prefix
to avoid any potential name space collisions.

v2: This version actually compiles. Sorry I forgot commit a fix up
earlier.
Signed-off-by: Andi Kleen 
---
 kernel/sched/auto_group.c |  2 +-
 kernel/sched/core.c   | 45 +++--
 kernel/sched/sched.h  | 42 ++
 3 files changed, 46 insertions(+), 43 deletions(-)

diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
index 750ed60..a5d966c 100644
--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -212,7 +212,7 @@ int proc_sched_autogroup_set_nice(struct task_struct *p, 
int nice)
ag = autogroup_task_get(p);
 
down_write(>lock);
-   err = sched_group_set_shares(ag->tg, prio_to_weight[nice + 20]);
+   err = sched_group_set_shares(ag->tg, sched_prio_to_weight[nice + 20]);
if (!err)
ag->nice = nice;
up_write(>lock);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 78b4bad10..a685b49 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -822,8 +822,8 @@ static void set_load_weight(struct task_struct *p)
return;
}
 
-   load->weight = scale_load(prio_to_weight[prio]);
-   load->inv_weight = prio_to_wmult[prio];
+   load->weight = scale_load(sched_prio_to_weight[prio]);
+   load->inv_weight = sched_prio_to_wmult[prio];
 }
 
 static void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
@@ -8458,3 +8458,44 @@ void dump_cpu_task(int cpu)
pr_info("Task dump for CPU %d:\n", cpu);
sched_show_task(cpu_curr(cpu));
 }
+
+/*
+ * Nice levels are multiplicative, with a gentle 10% change for every
+ * nice level changed. I.e. when a CPU-bound task goes from nice 0 to
+ * nice 1, it will get ~10% less CPU time than another CPU-bound task
+ * that remained on nice 0.
+ *
+ * The "10% effect" is relative and cumulative: from _any_ nice level,
+ * if you go up 1 level, it's -10% CPU usage, if you go down 1 level
+ * it's +10% CPU usage. (to achieve that we use a multiplier of 1.25.
+ * If a task goes up by ~10% and another task goes down by ~10% then
+ * the relative distance between them is ~25%.)
+ */
+const int sched_prio_to_weight[40] = {
+ /* -20 */ 88761, 71755, 56483, 46273, 36291,
+ /* -15 */ 29154, 23254, 18705, 14949, 11916,
+ /* -10 */  9548,  7620,  6100,  4904,  3906,
+ /*  -5 */  3121,  2501,  1991,  1586,  1277,
+ /*   0 */  1024,   820,   655,   526,   423,
+ /*   5 */   335,   272,   215,   172,   137,
+ /*  10 */   110,87,70,56,45,
+ /*  15 */36,29,23,18,15,
+};
+
+/*
+ * Inverse (2^32/x) values of the sched_prio_to_weight[] array, precalculated.
+ *
+ * In cases where the weight does not change often, we can use the
+ * precalculated inverse to speed up arithmetics by turning divisions
+ * into multiplications:
+ */
+const u32 sched_prio_to_wmult[40] = {
+ /* -20 */ 48388, 59856, 76040, 92818,118348,
+ /* -15 */147320,184698,229616,287308,360437,
+ /* -10 */449829,563644,704093,875809,   1099582,
+ /*  -5 */   1376151,   1717300,   2157191,   2708050,   3363326,
+ /*   0 */   4194304,   5237765,   6557202,   8165337,  10153587,
+ /*   5 */  12820798,  15790321,  19976592,  24970740,  31350126,
+ /*  10 */  39045157,  49367440,  61356676,  76695844,  95443717,
+ /*  15 */ 119304647, 148102320, 186737708, 238609294, 286331153,
+};
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 84d4879..fbe9377 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1128,46 +1128,8 @@ static inline void finish_lock_switch(struct rq *rq, 
struct task_struct *prev)
 #define WEIGHT_IDLEPRIO3
 #define WMULT_IDLEPRIO 1431655765
 
-/*
- * Nice levels are multiplicative, with a gentle 10% change for every
- * nice level changed. I.e. when a CPU-bound task goes from nice 0 to
- * nice 1, it will get ~10% less CPU time than another CPU-bound task
- * that remained on nice 0.
- *
- * The "10% effect" is relative and cumulative: from _any_ nice level,
- * if you go up 1 level, it's -10% CPU usage, if you go down 1 level
- * it's +10% CPU usage. (to achieve that we use a multiplier of 1.25.
- * If a task goes up by ~10% and another task goes down by ~10% then
- * the relative distance between them is ~25%.)
- */
-static const 

Re: [PATCH v2 02/13] bpf tools: Extract and collect map names from BPF object file

2015-11-29 Thread Wangnan (F)



On 2015/11/30 0:14, Namhyung Kim wrote:

Hi Wang,

On Fri, Nov 27, 2015 at 08:47:36AM +, Wang Nan wrote:

This patch collects name of maps in BPF object files and saves them into
'maps' field in 'struct bpf_object'. 'bpf_object__get_map_by_name' is
introduced to retrive fd and definitions of a map through its name.

Signed-off-by: Wang Nan 
Signed-off-by: He Kuang 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: pi3or...@163.com
---
  tools/lib/bpf/libbpf.c | 65 +++---
  tools/lib/bpf/libbpf.h |  3 +++
  2 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index f509825..a298614 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -165,6 +165,7 @@ struct bpf_program {
  
  struct bpf_map {

int fd;
+   char *name;
struct bpf_map_def def;
void *priv;
bpf_map_clear_priv_t clear_priv;
@@ -526,12 +527,46 @@ bpf_object__init_maps(struct bpf_object *obj, void *data,
return 0;
  }
  
+static void

+bpf_object__init_maps_name(struct bpf_object *obj, int maps_shndx)
+{
+   int i;
+   Elf_Data *symbols = obj->efile.symbols;
+
+   if (!symbols || maps_shndx < 0)
+   return;
+
+   for (i = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
+   GElf_Sym sym;
+   size_t map_idx;
+   const char *map_name;
+
+   if (!gelf_getsym(symbols, i, ))
+   continue;
+   if (sym.st_shndx != maps_shndx)
+   continue;
+
+   map_name = elf_strptr(obj->efile.elf,
+ obj->efile.ehdr.e_shstrndx,
+ sym.st_name);

It means that each map name is saved in section header string table?


According to elf format specification:

For an symbol table entry, the st_name field "holds an index
into the object file’s symbol string table, which holds the
character representations of the symbol names. If the value
is non-zero, it represents a string table index that gives
the symbol name. Otherwise, the symbol table entry has no
name."

And so called "object file’s symbol string table" is a
section in the object file which index is stored into
ehdr and be loaded during gelf_getehdr(), and its index
would be set to ehdr->e_shstrndx. So I think for each map
its name should be saved in that string table.




+   map_idx = sym.st_value / sizeof(struct bpf_map_def);
+   if (map_idx >= obj->nr_maps) {
+   pr_warning("index of map \"%s\" is buggy: %zu > %zu\n",
+  map_name, map_idx, obj->nr_maps);
+   continue;
+   }
+   obj->maps[map_idx].name = strdup(map_name);

You need to check the return value.


Will send a patch for it.

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 75/71] ncr5380: Remove FLAG_DTC3181E

2015-11-29 Thread Finn Thain

On Sun, 29 Nov 2015, Ondrej Zary wrote:

> The FLAG_DTC3181E is used to activate a work-around for arbitration lost
> condition that these chips see when ICR is written during arbitration.
> 
> Move the ICR write (to set SEL and BSY) after the arbitration loss check
> and remove FLAG_DTC3181E.

The first test for ICR_ARBITRATION_LOST happens after the required 
arbitration delay, 2.4 us. The second test for ICR_ARBITRATION_LOST 
happens after ICR_ASSERT_SEL.

This second test seems to be pointless. It comes from the flow chart in 
the NCR datasheet (see download link in patch 17). The spec does not 
require this test but some 5380 devices may do. Who knows? It's almost 
impossible to be sure, because it would mean losing a race with another 
bus device right at the end of the arbitration delay (and we extend that 
delay to 3 us anyway).

Certainly one can find other datasheets with sample code and flow charts 
that don't do this second check. The reason is that ICR_ARBITRATION_LOST 
can be triggered when SEL is asserted by any device, so it may be 
triggered after we've won arbitration (because we then set ICR_ASSERT_SEL 
ourselves in order to enter selection phase).

> 
> ... Weird, we now have two consecutive checks for ICR_ARBITRATION_LOST 
> and do different things when they fail...

They do different things because the second exit has to cleanup after the 
ICR write.

I agree that it would be nice to remove the DTC3181E special case. It 
would mean replacing patch 49.

The patch below is another version of your patch 75. It really needs to be 
tested on all kinds of 5380 device, and if possible with a contested bus 
(which would imply diconnection privileges, for which the driver still 
requires that the chip has a working irq).

Index: linux/drivers/scsi/NCR5380.c
===
--- linux.orig/drivers/scsi/NCR5380.c   2015-11-30 15:34:39.0 +1100
+++ linux/drivers/scsi/NCR5380.c2015-11-30 15:34:39.0 +1100
@@ -482,14 +482,13 @@ static void prepare_info(struct Scsi_Hos
 "base 0x%lx, irq %d, "
 "can_queue %d, cmd_per_lun %d, "
 "sg_tablesize %d, this_id %d, "
-"flags { %s%s%s%s}, "
+"flags { %s%s%s}, "
 "options { %s} ",
 instance->hostt->name, instance->io_port, instance->n_io_port,
 instance->base, instance->irq,
 instance->can_queue, instance->cmd_per_lun,
 instance->sg_tablesize, instance->this_id,
 hostdata->flags & FLAG_NO_DMA_FIXUP  ? "NO_DMA_FIXUP "  : "",
-hostdata->flags & FLAG_DTC3181E  ? "DTC3181E "  : "",
 hostdata->flags & FLAG_NO_PSEUDO_DMA ? "NO_PSEUDO_DMA " : "",
 hostdata->flags & FLAG_TOSHIBA_DELAY ? "TOSHIBA_DELAY "  : "",
 #ifdef AUTOPROBE_IRQ
@@ -1085,18 +1084,6 @@ static struct scsi_cmnd *NCR5380_select(
NCR5380_write(INITIATOR_COMMAND_REG,
  ICR_BASE | ICR_ASSERT_SEL | ICR_ASSERT_BSY);
 
-   /* RvC: DTC3181E has some trouble with this so we simply removed it.
-* Seems to work with only Mustek scanner attached.
-*/
-   if (!(hostdata->flags & FLAG_DTC3181E) &&
-   (NCR5380_read(INITIATOR_COMMAND_REG) & ICR_ARBITRATION_LOST)) {
-   NCR5380_write(MODE_REG, MR_BASE);
-   NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE);
-   dsprintk(NDEBUG_ARBITRATION, instance, "arbitration lost, 
negating SEL\n");
-   spin_lock_irq(>lock);
-   goto out;
-   }
-
/*
 * Again, bus clear + bus settle time is 1.2us, however, this is
 * a minimum so we'll udelay ceil(1.2)
Index: linux/drivers/scsi/NCR5380.h
===
--- linux.orig/drivers/scsi/NCR5380.h   2015-11-30 15:34:36.0 +1100
+++ linux/drivers/scsi/NCR5380.h2015-11-30 15:34:39.0 +1100
@@ -233,7 +233,6 @@
 
 #define FLAG_NO_DMA_FIXUP  1   /* No DMA errata workarounds */
 #define FLAG_NO_PSEUDO_DMA 8   /* Inhibit DMA */
-#define FLAG_DTC3181E  16  /* DTC3181E */
 #define FLAG_LATE_DMA_SETUP32  /* Setup NCR before DMA H/W */
 #define FLAG_TAGGED_QUEUING64  /* as X3T9.2 spelled it */
 #define FLAG_TOSHIBA_DELAY 128 /* Allow for borken CD-ROMs */
Index: linux/drivers/scsi/atari_NCR5380.c
===
--- linux.orig/drivers/scsi/atari_NCR5380.c 2015-11-30 15:34:39.0 
+1100
+++ linux/drivers/scsi/atari_NCR5380.c  2015-11-30 15:34:39.0 +1100
@@ -586,13 +586,12 @@ static void prepare_info(struct Scsi_Hos
 "base 0x%lx, irq %d, "
 "can_queue %d, cmd_per_lun %d, "
 "sg_tablesize %d, this_id %d, "
-   

[PATCH] sched: Move sched_to_prio arrays out of line

2015-11-29 Thread Andi Kleen
From: Andi Kleen 

When building a kernel with a gcc 6 snapshot the compiler complains
about unused const static variables for prio_to_weight and prio_to_mult
for multiple scheduler files (all but core.c and autogroup.c)

The way the array is currently declared it will be duplicated in
every scheduler file that includes sched.h, which seems rather wasteful.

Move the array out of line into core.c. I also added a sched_ prefix
to avoid any potential name space collisions.

Signed-off-by: Andi Kleen 
---
 kernel/sched/auto_group.c |  2 +-
 kernel/sched/core.c   | 43 ++-
 kernel/sched/sched.h  | 42 ++
 3 files changed, 45 insertions(+), 42 deletions(-)

diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
index 750ed60..a5d966c 100644
--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -212,7 +212,7 @@ int proc_sched_autogroup_set_nice(struct task_struct *p, 
int nice)
ag = autogroup_task_get(p);
 
down_write(>lock);
-   err = sched_group_set_shares(ag->tg, prio_to_weight[nice + 20]);
+   err = sched_group_set_shares(ag->tg, sched_prio_to_weight[nice + 20]);
if (!err)
ag->nice = nice;
up_write(>lock);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 78b4bad10..ce8fe56 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -822,7 +822,7 @@ static void set_load_weight(struct task_struct *p)
return;
}
 
-   load->weight = scale_load(prio_to_weight[prio]);
+   load->weight = scale_load(sched_prio_to_weight[prio]);
load->inv_weight = prio_to_wmult[prio];
 }
 
@@ -8458,3 +8458,44 @@ void dump_cpu_task(int cpu)
pr_info("Task dump for CPU %d:\n", cpu);
sched_show_task(cpu_curr(cpu));
 }
+
+/*
+ * Nice levels are multiplicative, with a gentle 10% change for every
+ * nice level changed. I.e. when a CPU-bound task goes from nice 0 to
+ * nice 1, it will get ~10% less CPU time than another CPU-bound task
+ * that remained on nice 0.
+ *
+ * The "10% effect" is relative and cumulative: from _any_ nice level,
+ * if you go up 1 level, it's -10% CPU usage, if you go down 1 level
+ * it's +10% CPU usage. (to achieve that we use a multiplier of 1.25.
+ * If a task goes up by ~10% and another task goes down by ~10% then
+ * the relative distance between them is ~25%.)
+ */
+const int sched_prio_to_weight[40] = {
+ /* -20 */ 88761, 71755, 56483, 46273, 36291,
+ /* -15 */ 29154, 23254, 18705, 14949, 11916,
+ /* -10 */  9548,  7620,  6100,  4904,  3906,
+ /*  -5 */  3121,  2501,  1991,  1586,  1277,
+ /*   0 */  1024,   820,   655,   526,   423,
+ /*   5 */   335,   272,   215,   172,   137,
+ /*  10 */   110,87,70,56,45,
+ /*  15 */36,29,23,18,15,
+};
+
+/*
+ * Inverse (2^32/x) values of the sched_prio_to_weight[] array, precalculated.
+ *
+ * In cases where the weight does not change often, we can use the
+ * precalculated inverse to speed up arithmetics by turning divisions
+ * into multiplications:
+ */
+const u32 sched_prio_to_wmult[40] = {
+ /* -20 */ 48388, 59856, 76040, 92818,118348,
+ /* -15 */147320,184698,229616,287308,360437,
+ /* -10 */449829,563644,704093,875809,   1099582,
+ /*  -5 */   1376151,   1717300,   2157191,   2708050,   3363326,
+ /*   0 */   4194304,   5237765,   6557202,   8165337,  10153587,
+ /*   5 */  12820798,  15790321,  19976592,  24970740,  31350126,
+ /*  10 */  39045157,  49367440,  61356676,  76695844,  95443717,
+ /*  15 */ 119304647, 148102320, 186737708, 238609294, 286331153,
+};
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 84d4879..fbe9377 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1128,46 +1128,8 @@ static inline void finish_lock_switch(struct rq *rq, 
struct task_struct *prev)
 #define WEIGHT_IDLEPRIO3
 #define WMULT_IDLEPRIO 1431655765
 
-/*
- * Nice levels are multiplicative, with a gentle 10% change for every
- * nice level changed. I.e. when a CPU-bound task goes from nice 0 to
- * nice 1, it will get ~10% less CPU time than another CPU-bound task
- * that remained on nice 0.
- *
- * The "10% effect" is relative and cumulative: from _any_ nice level,
- * if you go up 1 level, it's -10% CPU usage, if you go down 1 level
- * it's +10% CPU usage. (to achieve that we use a multiplier of 1.25.
- * If a task goes up by ~10% and another task goes down by ~10% then
- * the relative distance between them is ~25%.)
- */
-static const int prio_to_weight[40] = {
- /* -20 */ 88761, 71755, 56483, 46273, 36291,
- /* -15 */ 29154, 23254, 18705, 14949, 11916,
- /* -10 */  9548,  7620,  6100,   

Re: [PATCH] ALSA: pcm: constify action_ops structures

2015-11-29 Thread Takashi Sakamoto

Hi,

On Nov 30 2015 00:36, Julia Lawall wrote:

The action_ops structures are never modified, so declare them as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall 


I think this approach is better to describe a part of design about 
actions for PCM substreams. It may help readers.


Reviewed-by: Takashi Sakamoto 
Tested-by: Takashi Sakamoto 


---
  sound/core/pcm_native.c |   26 +-
  1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index a8b27cd..fadd3eb 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -875,7 +875,7 @@ struct action_ops {
   *  Note: the stream state might be changed also on failure
   *  Note2: call with calling stream lock + link lock
   */
-static int snd_pcm_action_group(struct action_ops *ops,
+static int snd_pcm_action_group(const struct action_ops *ops,
struct snd_pcm_substream *substream,
int state, int do_lock)
  {
@@ -932,7 +932,7 @@ static int snd_pcm_action_group(struct action_ops *ops,
  /*
   *  Note: call with stream lock
   */
-static int snd_pcm_action_single(struct action_ops *ops,
+static int snd_pcm_action_single(const struct action_ops *ops,
 struct snd_pcm_substream *substream,
 int state)
  {
@@ -952,7 +952,7 @@ static int snd_pcm_action_single(struct action_ops *ops,
  /*
   *  Note: call with stream lock
   */
-static int snd_pcm_action(struct action_ops *ops,
+static int snd_pcm_action(const struct action_ops *ops,
  struct snd_pcm_substream *substream,
  int state)
  {
@@ -984,7 +984,7 @@ static int snd_pcm_action(struct action_ops *ops,
  /*
   *  Note: don't use any locks before
   */
-static int snd_pcm_action_lock_irq(struct action_ops *ops,
+static int snd_pcm_action_lock_irq(const struct action_ops *ops,
   struct snd_pcm_substream *substream,
   int state)
  {
@@ -998,7 +998,7 @@ static int snd_pcm_action_lock_irq(struct action_ops *ops,

  /*
   */
-static int snd_pcm_action_nonatomic(struct action_ops *ops,
+static int snd_pcm_action_nonatomic(const struct action_ops *ops,
struct snd_pcm_substream *substream,
int state)
  {
@@ -1056,7 +1056,7 @@ static void snd_pcm_post_start(struct snd_pcm_substream 
*substream, int state)
snd_pcm_timer_notify(substream, SNDRV_TIMER_EVENT_MSTART);
  }

-static struct action_ops snd_pcm_action_start = {
+static const struct action_ops snd_pcm_action_start = {
.pre_action = snd_pcm_pre_start,
.do_action = snd_pcm_do_start,
.undo_action = snd_pcm_undo_start,
@@ -1107,7 +1107,7 @@ static void snd_pcm_post_stop(struct snd_pcm_substream 
*substream, int state)
wake_up(>tsleep);
  }

-static struct action_ops snd_pcm_action_stop = {
+static const struct action_ops snd_pcm_action_stop = {
.pre_action = snd_pcm_pre_stop,
.do_action = snd_pcm_do_stop,
.post_action = snd_pcm_post_stop
@@ -1224,7 +1224,7 @@ static void snd_pcm_post_pause(struct snd_pcm_substream 
*substream, int push)
}
  }

-static struct action_ops snd_pcm_action_pause = {
+static const struct action_ops snd_pcm_action_pause = {
.pre_action = snd_pcm_pre_pause,
.do_action = snd_pcm_do_pause,
.undo_action = snd_pcm_undo_pause,
@@ -1273,7 +1273,7 @@ static void snd_pcm_post_suspend(struct snd_pcm_substream 
*substream, int state)
wake_up(>tsleep);
  }

-static struct action_ops snd_pcm_action_suspend = {
+static const struct action_ops snd_pcm_action_suspend = {
.pre_action = snd_pcm_pre_suspend,
.do_action = snd_pcm_do_suspend,
.post_action = snd_pcm_post_suspend
@@ -1375,7 +1375,7 @@ static void snd_pcm_post_resume(struct snd_pcm_substream 
*substream, int state)
snd_pcm_timer_notify(substream, SNDRV_TIMER_EVENT_MRESUME);
  }

-static struct action_ops snd_pcm_action_resume = {
+static const struct action_ops snd_pcm_action_resume = {
.pre_action = snd_pcm_pre_resume,
.do_action = snd_pcm_do_resume,
.undo_action = snd_pcm_undo_resume,
@@ -1478,7 +1478,7 @@ static void snd_pcm_post_reset(struct snd_pcm_substream 
*substream, int state)
snd_pcm_playback_silence(substream, ULONG_MAX);
  }

-static struct action_ops snd_pcm_action_reset = {
+static const struct action_ops snd_pcm_action_reset = {
.pre_action = snd_pcm_pre_reset,
.do_action = snd_pcm_do_reset,
.post_action = snd_pcm_post_reset
@@ -1522,7 +1522,7 @@ static void snd_pcm_post_prepare(struct snd_pcm_substream 
*substream, int state)
snd_pcm_set_state(substream, SNDRV_PCM_STATE_PREPARED);
  }

-static struct action_ops snd_pcm_action_prepare = {
+static const 

Re: [PATCH 2/2] Minor improvement for smsc95xx netusb driver performance.

2015-11-29 Thread David Miller
From: Ameen 
Date: Wed, 25 Nov 2015 23:55:26 +0200

>   if (csum)
> - tx_cmd_b |= TX_CMD_B_CSUM_ENABLE;
> - cpu_to_le32s(_cmd_b);
> - memcpy(skb->data, _cmd_b, 4);
> +   tx_cmds.cmd_b |= TX_CMD_B_CSUM_ENABLE;

You've corrupted the indentation here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] livepatch: fix race between enabled_store() and klp_unregister_patch()

2015-11-29 Thread Li Bin
There is a potential race as following:

CPU0 |  CPU1
-|---
enabled_store()  |  klp_unregister_patch()
 |  |-mutex_lock(_mutex);
|-mutex_lock(_mutex);|  |-klp_free_patch();
 |  |-mutex_unlock(_mutex);
|-[process the patch's state]|
|-mutex_unlock(_mutex)   |

Fix this race condition by adding klp_is_patch_registered() check in
enabled_store() after get the lock klp_mutex.

Signed-off-by: Li Bin 
---
 kernel/livepatch/core.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index db545cb..50af971 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -614,6 +614,11 @@ static ssize_t enabled_store(struct kobject *kobj, struct 
kobj_attribute *attr,
 
mutex_lock(_mutex);
 
+   if (!klp_is_patch_registered(patch)) {
+   ret = -EINVAL;
+   goto err;
+   }
+
if (val == patch->state) {
/* already in requested state */
ret = -EINVAL;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net] drivers: net: xgene: fix possible use after free

2015-11-29 Thread David Miller
From: Eric Dumazet 
Date: Wed, 25 Nov 2015 09:02:10 -0800

> From: Eric Dumazet 
> 
> Once TX has been enabled on a NIC, it is illegal to access skb,
> as this skb might have been freed by another cpu, from TX completion
> handler.
> 
> Signed-off-by: Eric Dumazet 
> Reported-by: Mark Rutland 
> Tested-by: Mark Rutland 

Applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] atm: solos-pci: Replace simple_strtol by kstrtoint

2015-11-29 Thread David Miller
From: LABBE Corentin 
Date: Wed, 25 Nov 2015 14:44:41 +0100

> The simple_strtol function is obsolete.
> This patch replace it by kstrtoint.
> This will simplify code, since some error case not handled by
> simple_strtol are handled by kstrtoint.
> 
> Signed-off-by: LABBE Corentin 

kstrtoint() actually returns an accurate error code, so please
use it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   >