[PATCHSET 00/24] perf tools: Add support to accumulate hist periods (v7)

2014-01-22 Thread Namhyung Kim
Hello,

This is a new attempt to implement cumulative hist period report.
This work begins from Arun's SORT_INCLUSIVE patch [1] but I completely
rewrote it from scratch.

This patchset is based on my previous patchset [2] but I think it's
almost independent so that it can be applied separately.

Please see the patch 03/24.  I refactored functions that add hist
entries with struct hist_entry_iter.  While I converted all functions
carefully, it'd be better anyone can test and confirm that I didn't
mess up something - especially for branch stack and mem stuff.

This patchset basically adds period in a sample to every node in the
callchain.  A hist_entry now has an additional fields to keep the
cumulative period if --children option is given on perf report.

I changed the option as a separate --children and added a new
"Children" column (and renamed the default "Overhead" column into
"Self").  The output will be sorted by children (cumulative) overhead
for now.  The reason I changed to the --children is that I still think
it's much different from other --call-graph options.  The --call-graph
option will take care of it even with --children option.

I know that the UI should be changed also to be more flexible as Ingo
requested, but I'd like to do this first and then move to work on the
next.  I also added a new config option to enable it by default.

 * chagnes in v7:
  - add Tested-by tags from Arun
  - rebase onto current acme/perf/core

 * changes in v6:
  - separate struct hist_iter_ops (Jiri)
  - check iter->he before calling ->add_entry_cb (Jiri)
  - fix locking issue on perf top (Jiri)

 * changes in v5:
  - support both of --children and --call-graph (Arun)
  - refactor hist_entry_iter to share with perf top (Jiri)
  - various cleanups and fixes (Jiri)
  - add ack's from Jiri

 * changes in v4:
  - change to --children option (Ingo)
  - rebased on new annotation change (Arnaldo)
  - support perf top also
  - enable --children option by default (Ingo)

 * changes in v3:
  - change to --cumulate option
  - fix a couple of bugs (Jiri, Rodrigo)
  - rename some help functions (Arnaldo)
  - cache previous hist entries rathen than just symbol and dso
  - add some preparatory cleanups
  - add report.cumulate config option


Let me show you an example:

  $ cat abc.c
  #define barrier() asm volatile("" ::: "memory")

  void a(void)
  {
int i;
for (i = 0; i < 100; i++)
barrier();
  }
  void b(void)
  {
a();
  }
  void c(void)
  {
b();
  }
  int main(void)
  {
c();
return 0;
  }

With this simple program I ran perf record and report:

  $ perf record -g -e cycles:u ./abc


Case 1.

  $ perf report --stdio --no-call-graph --no-children

  # Overhead  Command  Shared Object  Symbol
  #   ...  .  ..
  #
  91.50%  abc  abc[.] a 
   8.18%  abc  ld-2.17.so [.] strlen
   0.31%  abc  [kernel.kallsyms]  [k] page_fault
   0.01%  abc  ld-2.17.so [.] _start


Case 2. (current default behavior)

  $ perf report --stdio --call-graph --no-children

  # Overhead  Command  Shared Object  Symbol
  #   ...  .  ..
  #
  91.50%  abc  abc[.] a 
  |
  --- a
  b
  c
  main
  __libc_start_main

   8.18%  abc  ld-2.17.so [.] strlen
  |
  --- strlen
  _dl_sysdep_start

   0.31%  abc  [kernel.kallsyms]  [k] page_fault
  |
  --- page_fault
  _start

   0.01%  abc  ld-2.17.so [.] _start
  |
  --- _start


Case 3.

  $ perf report --no-call-graph --children --stdio

  # Self  Children  Command  Shared Object Symbol
  #     ...  .  .
  #
   0.00%91.50%  abc  libc-2.17.so   [.] __libc_start_main
   0.00%91.50%  abc  abc[.] main 
   0.00%91.50%  abc  abc[.] c
   0.00%91.50%  abc  abc[.] b
  91.50%91.50%  abc  abc[.] a
   0.00% 8.18%  abc  ld-2.17.so [.] _dl_sysdep_start 
   8.18% 8.18%  abc  ld-2.17.so [.] strlen   
   0.01% 0.33%  abc  ld-2.17.so [.] _start   
   0.31% 0.31%  abc  [kernel.kallsyms]  [k] page_fault   

As you can see __libc_start_main -> main -> c -> b -> a callchain show
up in the output.

Finally, it looks like below with both option enabled:

Case 4. (default behavior?)

  $ perf report --call-graph 

[PATCH 04/21] perf hists: Accumulate hist entry stat based on the callchain

2014-01-22 Thread Namhyung Kim
Call __hists__add_entry() for each callchain node to get an
accumulated stat for an entry.  Introduce new cumulative_iter ops to
process them properly.

Tested-by: Arun Sharma 
Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-report.c |  2 ++
 tools/perf/util/hist.c  | 87 +
 tools/perf/util/hist.h  |  1 +
 3 files changed, 90 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index b6618ecb474a..3ed0669d7620 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -114,6 +114,8 @@ static int process_sample_event(struct perf_tool *tool,
iter.ops = _iter_branch;
else if (rep->mem_mode == 1)
iter.ops = _iter_mem;
+   else if (symbol_conf.cumulate_callchain)
+   iter.ops = _iter_cumulative;
else
iter.ops = _iter_normal;
 
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 2e9dd5d4ca1d..46402fbf4c0e 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -704,6 +704,85 @@ iter_finish_normal_entry(struct hist_entry_iter *iter, 
struct addr_location *al)
 
return hist_entry__append_callchain(he, sample);
 }
+static int
+iter_prepare_cumulative_entry(struct hist_entry_iter *iter __maybe_unused,
+ struct addr_location *al __maybe_unused)
+{
+   callchain_cursor_commit(_cursor);
+   return 0;
+}
+
+static int
+iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
+struct addr_location *al)
+{
+   struct perf_evsel *evsel = iter->evsel;
+   struct perf_sample *sample = iter->sample;
+   struct hist_entry *he;
+
+   he = __hists__add_entry(>hists, al, iter->parent, NULL, NULL,
+   sample->period, sample->weight,
+   sample->transaction, true);
+   if (he == NULL)
+   return -ENOMEM;
+
+   return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+}
+
+static int
+iter_next_cumulative_entry(struct hist_entry_iter *iter,
+  struct addr_location *al)
+{
+   struct callchain_cursor_node *node;
+
+   node = callchain_cursor_current(_cursor);
+   if (node == NULL)
+   return 0;
+
+   al->map = node->map;
+   al->sym = node->sym;
+   if (node->map)
+   al->addr = node->map->map_ip(node->map, node->ip);
+   else
+   al->addr = node->ip;
+
+   if (iter->hide_unresolved && al->sym == NULL)
+   return 0;
+
+   callchain_cursor_advance(_cursor);
+   return 1;
+}
+
+static int
+iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
+  struct addr_location *al)
+{
+   struct perf_evsel *evsel = iter->evsel;
+   struct perf_sample *sample = iter->sample;
+   struct hist_entry *he;
+
+   he = __hists__add_entry(>hists, al, iter->parent, NULL, NULL,
+   sample->period, sample->weight,
+   sample->transaction, false);
+   if (he == NULL)
+   return -ENOMEM;
+
+   return hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+}
+
+static int
+iter_finish_cumulative_entry(struct hist_entry_iter *iter,
+struct addr_location *al __maybe_unused)
+{
+   struct perf_evsel *evsel = iter->evsel;
+   struct perf_sample *sample = iter->sample;
+
+   evsel->hists.stats.total_period += sample->period;
+   hists__inc_nr_events(>hists, PERF_RECORD_SAMPLE);
+
+   return 0;
+}
+
 const struct hist_iter_ops hist_iter_mem = {
.prepare_entry  = iter_prepare_mem_entry,
.add_single_entry   = iter_add_single_mem_entry,
@@ -728,6 +807,14 @@ const struct hist_iter_ops hist_iter_normal = {
.finish_entry   = iter_finish_normal_entry,
 };
 
+const struct hist_iter_ops hist_iter_cumulative = {
+   .prepare_entry  = iter_prepare_cumulative_entry,
+   .add_single_entry   = iter_add_single_cumulative_entry,
+   .next_entry = iter_next_cumulative_entry,
+   .add_next_entry = iter_add_next_cumulative_entry,
+   .finish_entry   = iter_finish_cumulative_entry,
+};
+
 int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location 
*al,
 struct perf_evsel *evsel, const union perf_event 
*event,
 struct perf_sample *sample, int max_stack_depth)
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index d482e673ecf5..091bf81df8c3 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -120,6 +120,7 @@ struct hist_entry_iter {
 extern const struct hist_iter_ops hist_iter_normal;
 extern const struct hist_iter_ops hist_iter_branch;
 extern const struct hist_iter_ops hist_iter_mem;
+extern const 

[PATCH 03/21] perf hists: Check if accumulated when adding a hist entry

2014-01-22 Thread Namhyung Kim
To support callchain accumulation, @entry should be recognized if it's
accumulated or not when add_hist_entry() called.  The period of an
accumulated entry should be added to ->stat_acc but not ->stat. Add
@sample_self arg for that.

Tested-by: Arun Sharma 
Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-annotate.c |  3 ++-
 tools/perf/builtin-diff.c |  2 +-
 tools/perf/builtin-top.c  |  2 +-
 tools/perf/tests/hists_link.c |  4 ++--
 tools/perf/util/hist.c| 29 ++---
 tools/perf/util/hist.h|  3 ++-
 6 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 0da603b79b61..70b2d52c3b2e 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -65,7 +65,8 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
return 0;
}
 
-   he = __hists__add_entry(>hists, al, NULL, NULL, NULL, 1, 1, 0);
+   he = __hists__add_entry(>hists, al, NULL, NULL, NULL, 1, 1, 0,
+   true);
if (he == NULL)
return -ENOMEM;
 
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index a77e31246c00..93912add75b5 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -308,7 +308,7 @@ static int hists__add_entry(struct hists *hists,
u64 weight, u64 transaction)
 {
if (__hists__add_entry(hists, al, NULL, NULL, NULL, period, weight,
-  transaction) != NULL)
+  transaction, true) != NULL)
return 0;
return -ENOMEM;
 }
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 76cd510d34d0..c574c291383c 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -245,7 +245,7 @@ static struct hist_entry *perf_evsel__add_hist_entry(struct 
perf_evsel *evsel,
pthread_mutex_lock(>hists.lock);
he = __hists__add_entry(>hists, al, NULL, NULL, NULL,
sample->period, sample->weight,
-   sample->transaction);
+   sample->transaction, true);
pthread_mutex_unlock(>hists.lock);
if (he == NULL)
return NULL;
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index 2b6519e0e36f..e4e931ec1dbb 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -223,7 +223,7 @@ static int add_hist_entries(struct perf_evlist *evlist, 
struct machine *machine)
goto out;
 
he = __hists__add_entry(>hists, , NULL,
-   NULL, NULL, 1, 1, 0);
+   NULL, NULL, 1, 1, 0, true);
if (he == NULL)
goto out;
 
@@ -246,7 +246,7 @@ static int add_hist_entries(struct perf_evlist *evlist, 
struct machine *machine)
goto out;
 
he = __hists__add_entry(>hists, , NULL,
-   NULL, NULL, 1, 1, 0);
+   NULL, NULL, 1, 1, 0, true);
if (he == NULL)
goto out;
 
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 45a962f40cea..2e9dd5d4ca1d 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -272,7 +272,8 @@ void hists__decay_entries(struct hists *hists, bool 
zap_user, bool zap_kernel)
  * histogram, sorted on item, collects periods
  */
 
-static struct hist_entry *hist_entry__new(struct hist_entry *template)
+static struct hist_entry *hist_entry__new(struct hist_entry *template,
+ bool sample_self)
 {
size_t callchain_size = 0;
struct hist_entry *he;
@@ -292,6 +293,8 @@ static struct hist_entry *hist_entry__new(struct hist_entry 
*template)
return NULL;
}
memcpy(he->stat_acc, >stat, sizeof(he->stat));
+   if (!sample_self)
+   memset(>stat, 0, sizeof(he->stat));
}
 
if (he->ms.map)
@@ -354,7 +357,8 @@ static u8 symbol__parent_filter(const struct symbol *parent)
 
 static struct hist_entry *add_hist_entry(struct hists *hists,
 struct hist_entry *entry,
-struct addr_location *al)
+struct addr_location *al,
+bool sample_self)
 {
struct rb_node **p;
struct rb_node *parent = NULL;
@@ -378,7 +382,8 @@ static struct hist_entry *add_hist_entry(struct hists 
*hists,
cmp = 

[PATCH 05/21] perf tools: Update cpumode for each cumulative entry

2014-01-22 Thread Namhyung Kim
The cpumode and level in struct addr_localtion was set for a sample
and but updated as cumulative callchains were added.  This led to have
non-matching symbol and cpumode in the output.

Update it accordingly based on the fact whether the map is a part of
the kernel or not.  This is a reverse of what thread__find_addr_map()
does.

Tested-by: Arun Sharma 
Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/util/callchain.c | 42 ++
 tools/perf/util/callchain.h |  2 ++
 tools/perf/util/hist.c  | 13 ++---
 3 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 8d9db454f1a9..ac658135079f 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -551,3 +551,45 @@ int hist_entry__append_callchain(struct hist_entry *he, 
struct perf_sample *samp
return 0;
return callchain_append(he->callchain, _cursor, 
sample->period);
 }
+
+int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node 
*node,
+   bool hide_unresolved)
+{
+   al->map = node->map;
+   al->sym = node->sym;
+   if (node->map)
+   al->addr = node->map->map_ip(node->map, node->ip);
+   else
+   al->addr = node->ip;
+
+   if (al->sym == NULL) {
+   if (hide_unresolved)
+   return 0;
+   if (al->map == NULL)
+   goto out;
+   }
+
+   if (al->map->groups == >machine->kmaps) {
+   if (machine__is_host(al->machine)) {
+   al->cpumode = PERF_RECORD_MISC_KERNEL;
+   al->level = 'k';
+   } else {
+   al->cpumode = PERF_RECORD_MISC_GUEST_KERNEL;
+   al->level = 'g';
+   }
+   } else {
+   if (machine__is_host(al->machine)) {
+   al->cpumode = PERF_RECORD_MISC_USER;
+   al->level = '.';
+   } else if (perf_guest) {
+   al->cpumode = PERF_RECORD_MISC_GUEST_USER;
+   al->level = 'u';
+   } else {
+   al->cpumode = PERF_RECORD_MISC_HYPERVISOR;
+   al->level = 'H';
+   }
+   }
+
+out:
+   return 1;
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 8ad97e9b119f..66faae21370d 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -155,6 +155,8 @@ int sample__resolve_callchain(struct perf_sample *sample, 
struct symbol **parent
  struct perf_evsel *evsel, struct addr_location 
*al,
  int max_stack);
 int hist_entry__append_callchain(struct hist_entry *he, struct perf_sample 
*sample);
+int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node 
*node,
+   bool hide_unresolved);
 
 extern const char record_callchain_help[];
 #endif /* __PERF_CALLCHAIN_H */
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 46402fbf4c0e..beb9f96e4e4f 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -739,18 +739,9 @@ iter_next_cumulative_entry(struct hist_entry_iter *iter,
if (node == NULL)
return 0;
 
-   al->map = node->map;
-   al->sym = node->sym;
-   if (node->map)
-   al->addr = node->map->map_ip(node->map, node->ip);
-   else
-   al->addr = node->ip;
-
-   if (iter->hide_unresolved && al->sym == NULL)
-   return 0;
-
callchain_cursor_advance(_cursor);
-   return 1;
+
+   return fill_callchain_info(al, node, iter->hide_unresolved);
 }
 
 static int
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/21] perf callchain: Add callchain_cursor_snapshot()

2014-01-22 Thread Namhyung Kim
The callchain_cursor_snapshot() is for saving current status of the
callchain.  It'll be used to accumulate callchain information for each node.

Tested-by: Arun Sharma 
Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/util/callchain.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 66faae21370d..bbd63dfbe112 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -159,4 +159,13 @@ int fill_callchain_info(struct addr_location *al, struct 
callchain_cursor_node *
bool hide_unresolved);
 
 extern const char record_callchain_help[];
+
+static inline void callchain_cursor_snapshot(struct callchain_cursor *dest,
+struct callchain_cursor *src)
+{
+   *dest = *src;
+
+   dest->first = src->curr;
+   dest->nr -= src->pos;
+}
 #endif /* __PERF_CALLCHAIN_H */
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the arm-soc tree with Linus' tree

2014-01-22 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the arm-soc tree got a conflict in
arch/arm/boot/dts/bcm11351.dtsi between commit 67a57be85e68 ("ARM:
bcm11351: Enable pinctrl for Broadcom Capri SoCs") from Linus' tree and
commit 0bd898b872ac ("ARM: dts: Declare clocks as fixed on bcm11351") and
several following commits from the arm-soc tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/arm/boot/dts/bcm11351.dtsi
index dd8e878741c0,375a2f8eb878..
--- a/arch/arm/boot/dts/bcm11351.dtsi
+++ b/arch/arm/boot/dts/bcm11351.dtsi
@@@ -142,8 -146,159 +146,164 @@@
status = "disabled";
};
  
 +  pinctrl@35004800 {
 +  compatible = "brcm,capri-pinctrl";
 +  reg = <0x35004800 0x430>;
 +  };
++
+   i2c@3e016000 {
+   compatible = "brcm,bcm11351-i2c", "brcm,kona-i2c";
+   reg = <0x3e016000 0x80>;
+   interrupts = ;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   clocks = <_clk>;
+   status = "disabled";
+   };
+ 
+   i2c@3e017000 {
+   compatible = "brcm,bcm11351-i2c", "brcm,kona-i2c";
+   reg = <0x3e017000 0x80>;
+   interrupts = ;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   clocks = <_clk>;
+   status = "disabled";
+   };
+ 
+   i2c@3e018000 {
+   compatible = "brcm,bcm11351-i2c", "brcm,kona-i2c";
+   reg = <0x3e018000 0x80>;
+   interrupts = ;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   clocks = <_clk>;
+   status = "disabled";
+   };
+ 
+   i2c@3500d000 {
+   compatible = "brcm,bcm11351-i2c", "brcm,kona-i2c";
+   reg = <0x3500d000 0x80>;
+   interrupts = ;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   clocks = <_bsc_clk>;
+   status = "disabled";
+   };
+ 
+   clocks {
+   bsc1_clk: bsc1 {
+   compatible = "fixed-clock";
+   clock-frequency = <1300>;
+   #clock-cells = <0>;
+   };
+ 
+   bsc2_clk: bsc2 {
+   compatible = "fixed-clock";
+   clock-frequency = <1300>;
+   #clock-cells = <0>;
+   };
+ 
+   bsc3_clk: bsc3 {
+   compatible = "fixed-clock";
+   clock-frequency = <1300>;
+   #clock-cells = <0>;
+   };
+ 
+   pmu_bsc_clk: pmu_bsc {
+   compatible = "fixed-clock";
+   clock-frequency = <1300>;
+   #clock-cells = <0>;
+   };
+ 
+   hub_timer_clk: hub_timer {
+   compatible = "fixed-clock";
+   clock-frequency = <32768>;
+   #clock-cells = <0>;
+   };
+ 
+   pwm_clk: pwm {
+   compatible = "fixed-clock";
+   clock-frequency = <2600>;
+   #clock-cells = <0>;
+   };
+ 
+   sdio1_clk: sdio1 {
+   compatible = "fixed-clock";
+   clock-frequency = <4800>;
+   #clock-cells = <0>;
+   };
+ 
+   sdio2_clk: sdio2 {
+   compatible = "fixed-clock";
+   clock-frequency = <4800>;
+   #clock-cells = <0>;
+   };
+ 
+   sdio3_clk: sdio3 {
+   compatible = "fixed-clock";
+   clock-frequency = <4800>;
+   #clock-cells = <0>;
+   };
+ 
+   sdio4_clk: sdio4 {
+   compatible = "fixed-clock";
+   clock-frequency = <4800>;
+   #clock-cells = <0>;
+   };
+ 
+   tmon_1m_clk: tmon_1m {
+   compatible = "fixed-clock";
+   clock-frequency = <100>;
+   #clock-cells = <0>;
+   };
+ 
+   uartb_clk: uartb {
+   compatible = "fixed-clock";
+   clock-frequency = <1300>;
+   #clock-cells = <0>;
+   };
+ 
+   uartb2_clk: uartb2 {
+   compatible = "fixed-clock";
+   clock-frequency = <1300>;
+   #clock-cells = <0>;
+   };
+ 
+   uartb3_clk: uartb3 {
+   compatible = "fixed-clock";
+   clock-frequency = 

Re: randconfig build error with next-20140122, in arch/x86/kernel/devicetree.c

2014-01-22 Thread Paul Gortmaker
On Wed, Jan 22, 2014 at 12:06 PM, Randy Dunlap  wrote:
> On 01/22/2014 08:34 AM, Jim Davis wrote:
>> Building with the attached random configuration file,
>>
>> warning: (X86_INTEL_MID) selects INTEL_SCU_IPC which has unmet direct
>> dependencies (X86 && X86_PLATFORM_DEVICES && X86_INTEL_MID)
>> warning: (USB_OTG_FSM && FSL_USB2_OTG && USB_MV_OTG) selects USB_OTG
>> which has unmet direct dependencies (USB_SUPPORT && USB && PM_RUNTIME)
>> warning: (X86_INTEL_MID) selects INTEL_SCU_IPC which has unmet direct
>> dependencies (X86 && X86_PLATFORM_DEVICES && X86_INTEL_MID)
>> warning: (USB_OTG_FSM && FSL_USB2_OTG && USB_MV_OTG) selects USB_OTG
>> which has unmet direct dependencies (USB_SUPPORT && USB && PM_RUNTIME)
>>
>> arch/x86/kernel/devicetree.c:67:1: warning: data definition has no
>> type or storage class [enabled by default]
>>  module_init(add_bus_probe);
>>  ^
>
> For linux-next, devicetree.c needs to #include .
> For mainline, it would have needed to #include .
> However, it does neither of those.

Thanks guys; I've already queued a fix for this.

http://git.kernel.org/cgit/linux/kernel/git/paulg/init.git/commit/?id=3d83b6b84210066f0886b0916136fa49ca61704d

Paul.
--

>
> See Documentation/SubmitChecklist #1:
>
> 1: If you use a facility then #include the file that defines/declares
>that facility.  Don't depend on other header files pulling in ones
>that you 
> use.http://git.kernel.org/cgit/linux/kernel/git/paulg/init.git/commit/?id=3d83b6b84210066f0886b0916136fa49ca61704d
>
>
>> arch/x86/kernel/devicetree.c:67:1: error: type defaults to ‘int’ in
>> declaration of ‘module_init’ [-Werror=implicit-int]
>> arch/x86/kernel/devicetree.c:67:1: warning: parameter names (without
>> types) in function declaration [enabled by default]
>> arch/x86/kernel/devicetree.c:60:19: warning: ‘add_bus_probe’ defined
>> but not used [-Wunused-function]
>>  static int __init add_bus_probe(void)
>>^
>> cc1: some warnings being treated as errors
>> make[2]: *** [arch/x86/kernel/devicetree.o] Error 1
>>
>
>
> --
> ~Randy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-next" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 04/10] base: power: Add generic OF-based power domain look-up

2014-01-22 Thread Tomasz Figa

Hi Stephen,

On 23.01.2014 01:18, Stephen Boyd wrote:

On 01/11, Tomasz Figa wrote:

+
+/**
+ * of_genpd_lock() - Lock access to of_genpd_providers list
+ */
+static void of_genpd_lock(void)
+{
+   mutex_lock(_genpd_mutex);
+}
+
+/**
+ * of_genpd_unlock() - Unlock access to of_genpd_providers list
+ */
+static void of_genpd_unlock(void)
+{
+   mutex_unlock(_genpd_mutex);
+}


Why do we need these functions? Can't we just call
mutex_lock/unlock directly?


That would be fine as well, I guess. Just duplicated the pattern used in 
CCF, but can remove them in next version if it's found to be better.





+
+/**
+ * of_genpd_add_provider() - Register a domain provider for a node
+ * @np: Device node pointer associated with domain provider
+ * @genpd_src_get: callback for decoding domain
+ * @data: context pointer for @genpd_src_get callback.


These look a little outdated.


Oops, missed this.




+ */
+int of_genpd_add_provider(struct device_node *np, genpd_xlate_t xlate,
+ void *data)
+{
+   struct of_genpd_provider *cp;
+
+   cp = kzalloc(sizeof(struct of_genpd_provider), GFP_KERNEL);


Please use sizeof(*cp) instead.


Right.




+   if (!cp)
+   return -ENOMEM;
+
+   cp->node = of_node_get(np);
+   cp->data = data;
+   cp->xlate = xlate;
+
+   of_genpd_lock();
+   list_add(>link, _genpd_providers);
+   of_genpd_unlock();
+   pr_debug("Added domain provider from %s\n", np->full_name);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(of_genpd_add_provider);
+

[...]

+
+/* See of_genpd_get_from_provider(). */
+static struct generic_pm_domain *__of_genpd_get_from_provider(
+   struct of_phandle_args *genpdspec)
+{
+   struct of_genpd_provider *provider;
+   struct generic_pm_domain *genpd = ERR_PTR(-ENOENT);


Can this be -EPROBE_DEFER so that we can defer probe until a
later time if the power domain provider hasn't registered yet?


Yes, this could be useful. Makes me wonder why clock code (on which I 
based this code) doesn't have it done this way.





+
+   /* Check if we have such a provider in our array */
+   list_for_each_entry(provider, _genpd_providers, link) {
+   if (provider->node == genpdspec->np)
+   genpd = provider->xlate(genpdspec, provider->data);
+   if (!IS_ERR(genpd))
+   break;
+   }
+
+   return genpd;
+}
+

[...]

+static int of_genpd_notifier_call(struct notifier_block *nb,
+ unsigned long event, void *data)
+{
+   struct device *dev = data;
+   int ret;
+
+   if (!dev->of_node)
+   return NOTIFY_DONE;
+
+   switch (event) {
+   case BUS_NOTIFY_BIND_DRIVER:
+   ret = of_genpd_add_to_domain(dev);
+   break;
+
+   case BUS_NOTIFY_UNBOUND_DRIVER:
+   ret = of_genpd_del_from_domain(dev);
+   break;
+
+   default:
+   return NOTIFY_DONE;
+   }
+
+   return notifier_from_errno(ret);
+}
+
+static struct notifier_block of_genpd_notifier_block = {
+   .notifier_call = of_genpd_notifier_call,
+};
+
+static int of_genpd_init(void)
+{
+   return bus_register_notifier(_bus_type,
+   _genpd_notifier_block);
+}
+core_initcall(of_genpd_init);


Would it be possible to call the of_genpd_add_to_domain() and
of_genpd_del_from_domain() functions directly in the driver core,
similar to how the pinctrl framework has a hook in there? That
way we're not relying on any initcall ordering for this.


Hmm, the initcall here just registers a notifier, which needs to be done 
just before any driver registers. So, IMHO, current variant is safe, 
given an early enough initcall level is used.


However, doing it the pinctrl way might still have an advantage of not 
relying on specific bus type, so this is worth consideration indeed. I'd 
like to hear Rafael's and Kevin's opinions on this (and other comments 
above too).


Best regards,
Tomasz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 04/10] base: power: Add generic OF-based power domain look-up

2014-01-22 Thread Stephen Boyd
On 01/20, Tomasz Figa wrote:
> Hi Kevin,
> 
> On 14.01.2014 16:42, Kevin Hilman wrote:
> >Tomasz Figa  writes:
> >
> >>This patch introduces generic code to perform power domain look-up using
> >>device tree and automatically bind devices to their power domains.
> >>Generic device tree binding is introduced to specify power domains of
> >>devices in their device tree nodes.
> >>
> >>Backwards compatibility with legacy Samsung-specific power domain
> >>bindings is provided, but for now the new code is not compiled when
> >>CONFIG_ARCH_EXYNOS is selected to avoid collision with legacy code. This
> >>will change as soon as Exynos power domain code gets converted to use
> >>the generic framework in further patch.
> >>
> >>Signed-off-by: Tomasz Figa 
> >
> >I haven't read through this in detail yet, but wanted to make sure that
> >the DT representation can handle nested power domains.  At least
> >SH-mobile has a hierarchy of power domains and the genpd code can handle
> >that, so wanted to make sure that the DT representation can handle it as
> >well.
> 
> The representation of power domains themselves as implied by this
> patch is fully platform-specific. The only generic part is the
> #power-domain-cells property, which defines the number of cells
> needed to identify the power domain of given provider. You are free
> to have any platform-specific properties (or even generic ones,
> added on top of this patch) to let you specify the hierarchy in DT.
> 

(Semi-related to this thread, but not really the patchset)

I'd like to have a way to say that this power domain is a
subdomain of another domain provided by a different power domain
provider driver. From what I can tell, the only way to reparent
domains as of today is by name or reference and you have to make
a function call to do it (pm_genpd_add_subdomain_names() or
pm_genpd_add_subdomain()). This is annoying in the case where all
the power domains are not regsitered within the same driver
because we don't know which driver comes first.

It would be great if there was a way to specify this relationship
explicitly when initializing a power domain so that the
reparenting is done automatically without requiring any explicit
function call. Perhaps DT could specify this? Or we could add
another field to the generic_power_domain struct like parent_name?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fanotify use after free.

2014-01-22 Thread Dave Jones
On Wed, Jan 22, 2014 at 04:08:52PM -0800, Linus Torvalds wrote:
 > On Wed, Jan 22, 2014 at 3:36 PM, Jan Kara  wrote:
 > >
 > > But refcounting seems like an overkill for this - there is exactly one
 > > fanotify_response_event structure iff it is a permission event. So
 > > something like the (completely untested) attached patch should fix the
 > > problem. But I agree it's a bit ugly so we might want something different.
 > > I'll try to think about something better tomorrow.
 > 
 > Ok, In the meantime, Dave, can you verify whether this hacky patch
 > fixes your problem?

It actually seems worse. I see the tail end of what looks like a slab corruption
trace, and then a total lockup.  And of course none of this makes it over 
ttyUSB0
because it happens so early. Grr.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v3] xen/grant-table: Avoid m2p_override during mapping

2014-01-22 Thread Konrad Rzeszutek Wilk
Zoltan Kiss  wrote:
>The grant mapping API does m2p_override unnecessarily: only gntdev
>needs it,
>for blkback and future netback patches it just cause a lock contention,
>as
>those pages never go to userspace. Therefore this series does the
>following:
>- the original functions were renamed to __gnttab_[un]map_refs, with a
>new
>  parameter m2p_override
>- based on m2p_override either they follow the original behaviour, or
>just set
>  the private flag and call set_phys_to_machine
>- gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs
>with
>  m2p_override false
>- a new function gnttab_[un]map_refs_userspace provides the old
>behaviour

You don't say anything about the 'return ret' changed to 'return 0'.

Any particular reason for that?

Thanks
>
>v2:
>- move the storing of the old mfn in page->index to gnttab_map_refs
>- move the function header update to a separate patch
>
>v3:
>- a new approach to retain old behaviour where it needed
>- squash the patches into one
>
>Signed-off-by: Zoltan Kiss 
>Suggested-by: David Vrabel 
>---
> drivers/block/xen-blkback/blkback.c |   15 +++
> drivers/xen/gntdev.c|   13 +++---
>drivers/xen/grant-table.c   |   81
>+--
> include/xen/grant_table.h   |8 +++-
> 4 files changed, 87 insertions(+), 30 deletions(-)
>
>diff --git a/drivers/block/xen-blkback/blkback.c
>b/drivers/block/xen-blkback/blkback.c
>index 6620b73..875025f 100644
>--- a/drivers/block/xen-blkback/blkback.c
>+++ b/drivers/block/xen-blkback/blkback.c
>@@ -285,8 +285,7 @@ static void free_persistent_gnts(struct xen_blkif
>*blkif, struct rb_root *root,
> 
>   if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST ||
>   !rb_next(_gnt->node)) {
>-  ret = gnttab_unmap_refs(unmap, NULL, pages,
>-  segs_to_unmap);
>+  ret = gnttab_unmap_refs(unmap, pages, segs_to_unmap);
>   BUG_ON(ret);
>   put_free_pages(blkif, pages, segs_to_unmap);
>   segs_to_unmap = 0;
>@@ -321,8 +320,7 @@ static void unmap_purged_grants(struct work_struct
>*work)
>   pages[segs_to_unmap] = persistent_gnt->page;
> 
>   if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
>-  ret = gnttab_unmap_refs(unmap, NULL, pages,
>-  segs_to_unmap);
>+  ret = gnttab_unmap_refs(unmap, pages, segs_to_unmap);
>   BUG_ON(ret);
>   put_free_pages(blkif, pages, segs_to_unmap);
>   segs_to_unmap = 0;
>@@ -330,7 +328,7 @@ static void unmap_purged_grants(struct work_struct
>*work)
>   kfree(persistent_gnt);
>   }
>   if (segs_to_unmap > 0) {
>-  ret = gnttab_unmap_refs(unmap, NULL, pages, segs_to_unmap);
>+  ret = gnttab_unmap_refs(unmap, pages, segs_to_unmap);
>   BUG_ON(ret);
>   put_free_pages(blkif, pages, segs_to_unmap);
>   }
>@@ -670,15 +668,14 @@ static void xen_blkbk_unmap(struct xen_blkif
>*blkif,
>   GNTMAP_host_map, pages[i]->handle);
>   pages[i]->handle = BLKBACK_INVALID_HANDLE;
>   if (++invcount == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
>-  ret = gnttab_unmap_refs(unmap, NULL, unmap_pages,
>-  invcount);
>+  ret = gnttab_unmap_refs(unmap, unmap_pages, invcount);
>   BUG_ON(ret);
>   put_free_pages(blkif, unmap_pages, invcount);
>   invcount = 0;
>   }
>   }
>   if (invcount) {
>-  ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
>+  ret = gnttab_unmap_refs(unmap, unmap_pages, invcount);
>   BUG_ON(ret);
>   put_free_pages(blkif, unmap_pages, invcount);
>   }
>@@ -740,7 +737,7 @@ again:
>   }
> 
>   if (segs_to_map) {
>-  ret = gnttab_map_refs(map, NULL, pages_to_gnt, segs_to_map);
>+  ret = gnttab_map_refs(map, pages_to_gnt, segs_to_map);
>   BUG_ON(ret);
>   }
> 
>diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>index e41c79c..e652c0e 100644
>--- a/drivers/xen/gntdev.c
>+++ b/drivers/xen/gntdev.c
>@@ -284,8 +284,10 @@ static int map_grant_pages(struct grant_map *map)
>   }
> 
>   pr_debug("map %d+%d\n", map->index, map->count);
>-  err = gnttab_map_refs(map->map_ops, use_ptemod ? map->kmap_ops :
>NULL,
>-  map->pages, map->count);
>+  err = gnttab_map_refs_userspace(map->map_ops,
>+  use_ptemod ? map->kmap_ops : NULL,
>+  map->pages,
>+  map->count);
>   if (err)
>   return err;
> 
>@@ 

Re: MAINTAINERS tree branches [xen tip as an example]

2014-01-22 Thread Konrad Rzeszutek Wilk
"Luis R. Rodriguez"  wrote:
>On Mon, Jan 20, 2014 at 2:38 AM, David Vrabel 
>wrote:
>> On 17/01/14 23:02, Luis R. Rodriguez wrote:
>>> As per linux-next Next/Trees [0], and a recent January MAINTAINERS
>patch [1]
>>> from David one of the xen development kernel git trees to track is
>>> xen/git.git [2], this tree however gives has undefined references
>when doing a
>>> fresh clone [shown below], but as expected does work well when only
>cloning
>>> the linux-next branch [also below]. While I'm sure this is fine for
>>> folks who can do the guess work do we really want to live with trees
>like
>>> these on MAINTAINERS ? The MAINTAINERS file doesn't let us specify
>branches
>>> required, so perhaps it should -- if we want to live with these ?
>Curious, how
>>> many other git are there with a similar situation ?
>>
>> We don't recommend doing development work for the Xen subsystem based
>on
>> xen/tip.git so I think it's fine to have to checkout the specific
>branch
>> you are interested in.
>
>OK thanks.
>
>>> The xen project web site actually lists [3] Konrad's xen git tree
>[4] for
>>> development as the primary development tree, that probably should be
>>> updated now, and likely with instructions to clone only the
>linux-next
>>> branch ?
>>
>> I've updated the wiki to read:
>>
>> For development the recommended branch is:
>>
>> The mainline Linus linux.git tree.
>
>Is the delta of what is queued for the next release typically small?

Depends
>Otherwise someone doing development based on linux.git alone should
>have conflicts with anything on the queue, no?

Potentially. Usually the maintainer will spot where there are potential 
conflicts and give you a branch to base on.

>
>> To see what's queued for the next release, the next merge window,
>> and other work in progress:
>>
>> The Xen subsystem maintainers' tip.git tree.
>
>That's the thing, you can't clone the tip.git tree today well, there
>are undefined references and git gives up, asking for the linux-next
>branch however did work.

It should work now. I made master point to 3.13.
>
>  Luis


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git pull] device mapper changes for 3.14

2014-01-22 Thread Mike Snitzer
The following changes since commit 319e2e3f63c348a9b66db4667efa73178e18b17d:

  Linux 3.13-rc4 (2013-12-15 12:31:33 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git 
tags/dm-3.14-changes

for you to fetch changes up to 5066a4df1f427faac8372d20494483bb09a4a1cd:

  dm log userspace: allow mark requests to piggyback on flush requests 
(2014-01-21 23:46:27 -0500)


A set of device-mapper changes for 3.14.

A lot of attention was paid to improving the thin-provisioning target's
handling of metadata operation failures and running out of space.  A new
'error_if_no_space' feature was added to allow users to error IOs rather
than queue them when either the data or metadata space is exhausted.

Additional fixes/features include:
- a few fixes to properly support thin metadata device resizing
- a solution for reliably waiting for a DM device's embedded kobject to
  be released before destroying the device
- old dm-snapshot is updated to use the dm-bufio interface to take
  advantage of readahead capabilities that improve snapshot activation
- new dm-cache target tunables to control how quickly data is promoted
  to the cache (fast) device
- improved write efficiency of cluster mirror target by combining
  userspace flush and mark requests


Chuansheng Liu (1):
  dm snapshot: call destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()

Dongmao Zhang (1):
  dm log userspace: allow mark requests to piggyback on flush requests

Joe Thornber (9):
  dm thin: fix discard support to a previously shared block
  dm thin: return error from alloc_data_block if pool is not in write mode
  dm thin: factor out check_low_water_mark and use bools
  dm thin: handle metadata failures more consistently
  dm cache policy mq: introduce three promotion threshold tunables
  dm space map common: make sure new space is used during extend
  dm space map metadata: fix extending the space map
  dm btree: add dm_btree_find_lowest_key
  dm space map metadata: fix bug in resizing of thin metadata

Mike Snitzer (14):
  dm thin: initialize dm_thin_new_mapping returned by get_next_mapping
  dm space map metadata: limit errors in sm_metadata_new_block
  dm persistent data: cleanup dm-thin specific references in text
  dm thin: use bool rather than unsigned for flags in structures
  dm thin: add mappings to end of prepared_* lists
  dm thin: log info when growing the data or metadata device
  dm thin: cleanup and improve no space handling
  dm thin: requeue bios to DM core if no_free_space and in read-only mode
  dm thin: add error_if_no_space feature
  dm thin: eliminate the no_free_space flag
  dm thin: fix set_pool_mode exposed pool operation races
  dm cache: add block sizes and total cache blocks to status output
  dm thin: fix pool feature parsing
  dm cache: add policy name to status output

Mikulas Patocka (9):
  dm table: remove unused buggy code that extends the targets array
  dm delay: use per-bio data instead of a mempool and slab cache
  dm: remove pointless kobject comparison in dm_get_from_kobject
  dm: wait until embedded kobject is released before destroying a device
  dm snapshot: use GFP_KERNEL when initializing exceptions
  dm snapshot: prepare for switch to using dm-bufio
  dm snapshot: use dm-bufio
  dm snapshot: use dm-bufio prefetch
  dm sysfs: fix a module unload race

Wei Yongjun (1):
  dm cache policy mq: use list_del_init instead of list_del + INIT_LIST_HEAD

 Documentation/device-mapper/cache-policies.txt |  16 +-
 Documentation/device-mapper/cache.txt  |  51 ++--
 Documentation/device-mapper/thin-provisioning.txt  |   7 +
 drivers/md/Kconfig |  11 +-
 drivers/md/Makefile|   1 +
 drivers/md/dm-bufio.c  |  36 ++-
 drivers/md/dm-bufio.h  |  12 +
 drivers/md/dm-builtin.c|  48 
 drivers/md/dm-cache-policy-mq.c|  70 +++--
 drivers/md/dm-cache-policy.c   |   4 +
 drivers/md/dm-cache-policy.h   |   6 +
 drivers/md/dm-cache-target.c   |  20 +-
 drivers/md/dm-delay.c  |  35 +--
 drivers/md/dm-log-userspace-base.c | 206 +++
 drivers/md/dm-snap-persistent.c|  87 +--
 drivers/md/dm-snap.c   |  10 +-
 drivers/md/dm-sysfs.c  |   5 +-
 drivers/md/dm-table.c  |  22 +-
 drivers/md/dm-thin-metadata.c  |  20 ++
 drivers/md/dm-thin-metadata.h  |   4 +-
 

Re: [PATCH v2] ACPI: Fix acpi_evaluate_object() return value check

2014-01-22 Thread Konrad Rzeszutek Wilk
Yijing Wang  wrote:
>Fix acpi_evaluate_object() return value check,
>shoud acpi_status not int.

Should be?
Your mailer also ate the word 'to' .
>
>Signed-off-by: Yijing Wang 
>---
>
>v1->v2: Add CC to the related subsystem MAINTAINERS.
>
>---
> drivers/gpu/drm/i915/intel_acpi.c  |   13 +++--
> drivers/gpu/drm/nouveau/core/subdev/mxm/base.c |6 +++---
> drivers/gpu/drm/nouveau/nouveau_acpi.c |   13 +++--
> drivers/pci/pci-label.c|6 +++---
> 4 files changed, 20 insertions(+), 18 deletions(-)
>
>diff --git a/drivers/gpu/drm/i915/intel_acpi.c
>b/drivers/gpu/drm/i915/intel_acpi.c
>index dfff090..7ea00e5 100644
>--- a/drivers/gpu/drm/i915/intel_acpi.c
>+++ b/drivers/gpu/drm/i915/intel_acpi.c
>@@ -35,7 +35,7 @@ static int intel_dsm(acpi_handle handle, int func)
>   union acpi_object params[4];
>   union acpi_object *obj;
>   u32 result;
>-  int ret = 0;
>+  acpi_status status;
> 
>   input.count = 4;
>   input.pointer = params;
>@@ -50,8 +50,8 @@ static int intel_dsm(acpi_handle handle, int func)
>   params[3].package.count = 0;
>   params[3].package.elements = NULL;
> 
>-  ret = acpi_evaluate_object(handle, "_DSM", , );
>-  if (ret) {
>+  status = acpi_evaluate_object(handle, "_DSM", , );
>+  if (ACPI_FAILURE(status)) {
>   DRM_DEBUG_DRIVER("failed to evaluate _DSM: %d\n", ret);
>   return ret;
>   }
>@@ -141,7 +141,8 @@ static void intel_dsm_platform_mux_info(void)
>   struct acpi_object_list input;
>   union acpi_object params[4];
>   union acpi_object *pkg;
>-  int i, ret;
>+  acpi_status status;
>+  int i;
> 
>   input.count = 4;
>   input.pointer = params;
>@@ -156,9 +157,9 @@ static void intel_dsm_platform_mux_info(void)
>   params[3].package.count = 0;
>   params[3].package.elements = NULL;
> 
>-  ret = acpi_evaluate_object(intel_dsm_priv.dhandle, "_DSM", ,
>+  acpi_status = acpi_evaluate_object(intel_dsm_priv.dhandle, "_DSM",
>,
>  );
>-  if (ret) {
>+  if (ACPI_FAILURE(status)) {
>   DRM_DEBUG_DRIVER("failed to evaluate _DSM: %d\n", ret);
>   goto out;
>   }
>diff --git a/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c
>b/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c
>index 1291204..3920943 100644
>--- a/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c
>+++ b/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c
>@@ -114,14 +114,14 @@ mxm_shadow_dsm(struct nouveau_mxm *mxm, u8
>version)
>   struct acpi_buffer retn = { ACPI_ALLOCATE_BUFFER, NULL };
>   union acpi_object *obj;
>   acpi_handle handle;
>-  int ret;
>+  acpi_status status;
> 
>   handle = ACPI_HANDLE(>pdev->dev);
>   if (!handle)
>   return false;
> 
>-  ret = acpi_evaluate_object(handle, "_DSM", , );
>-  if (ret) {
>+  status = acpi_evaluate_object(handle, "_DSM", , );
>+  if (ACPI_FAILURE(status)) {
>   nv_debug(mxm, "DSM MXMS failed: %d\n", ret);
>   return false;
>   }
>diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c
>b/drivers/gpu/drm/nouveau/nouveau_acpi.c
>index ba0183f..6f810f2 100644
>--- a/drivers/gpu/drm/nouveau/nouveau_acpi.c
>+++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c
>@@ -82,7 +82,8 @@ static int nouveau_optimus_dsm(acpi_handle handle,
>int func, int arg, uint32_t *
>   struct acpi_object_list input;
>   union acpi_object params[4];
>   union acpi_object *obj;
>-  int i, err;
>+  acpi_status status;
>+  int i;
>   char args_buff[4];
> 
>   input.count = 4;
>@@ -101,8 +102,8 @@ static int nouveau_optimus_dsm(acpi_handle handle,
>int func, int arg, uint32_t *
>   args_buff[i] = (arg >> i * 8) & 0xFF;
>   params[3].buffer.pointer = args_buff;
> 
>-  err = acpi_evaluate_object(handle, "_DSM", , );
>-  if (err) {
>+  status = acpi_evaluate_object(handle, "_DSM", , );
>+  if (ACPI_FAILURE(status)) {
>   printk(KERN_INFO "failed to evaluate _DSM: %d\n", err);
>   return err;
>   }
>@@ -134,7 +135,7 @@ static int nouveau_dsm(acpi_handle handle, int
>func, int arg, uint32_t *result)
>   struct acpi_object_list input;
>   union acpi_object params[4];
>   union acpi_object *obj;
>-  int err;
>+  acpi_status status;
> 
>   input.count = 4;
>   input.pointer = params;
>@@ -148,8 +149,8 @@ static int nouveau_dsm(acpi_handle handle, int
>func, int arg, uint32_t *result)
>   params[3].type = ACPI_TYPE_INTEGER;
>   params[3].integer.value = arg;
> 
>-  err = acpi_evaluate_object(handle, "_DSM", , );
>-  if (err) {
>+  status = acpi_evaluate_object(handle, "_DSM", , );
>+  if (ACPI_FAILURE(status)) {
>   printk(KERN_INFO "failed to evaluate _DSM: %d\n", err);
>   return err;
>   }
>diff --git a/drivers/pci/pci-label.c 

Re: [V0 PATCH] xen/pvh: set some cr flags upon vcpu start

2014-01-22 Thread Konrad Rzeszutek Wilk
Mukesh Rathor  wrote:
>pvh was designed to start with pv flags, but a commit in xen tree
>51e2cac257ec8b4080d89f0855c498cbbd76a5e5 removed some of the flags as
>they are not necessary. As a result, these CR flags must be set in the
>guest.
>
>Signed-off-by: Roger Pau Monne 
>Signed-off-by: Mukesh Rathor 
>---
>arch/x86/xen/enlighten.c |   43
>+--
> arch/x86/xen/smp.c   |2 +-
> arch/x86/xen/xen-ops.h   |2 +-
> 3 files changed, 39 insertions(+), 8 deletions(-)
>
>diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>index 628099a..4a2aaa6 100644
>--- a/arch/x86/xen/enlighten.c
>+++ b/arch/x86/xen/enlighten.c
>@@ -1410,12 +1410,8 @@ static void __init
>xen_boot_params_init_edd(void)
>  * Set up the GDT and segment registers for -fstack-protector.  Until
>  * we do this, we have to be careful not to call any stack-protected
>  * function, which is most of the kernel.
>- *
>- * Note, that it is refok - because the only caller of this after init
>- * is PVH which is not going to use xen_load_gdt_boot or other
>- * __init functions.
>  */
>-void __ref xen_setup_gdt(int cpu)
>+static void xen_setup_gdt(int cpu)
> {
>   if (xen_feature(XENFEAT_auto_translated_physmap)) {
> #ifdef CONFIG_X86_64
>@@ -1463,13 +1459,48 @@ void __ref xen_setup_gdt(int cpu)
>   pv_cpu_ops.load_gdt = xen_load_gdt;
> }
> 
>+/*
>+ * A pv guest starts with default flags that are not set for pvh, set
>them
>+ * here asap.
>+ */
>+static void xen_pvh_set_cr_flags(int cpu)
>+{
>+  write_cr0(read_cr0() | X86_CR0_MP | X86_CR0_WP | X86_CR0_AM);

I think it would be good to mention that Xen unconditionally sets PE and ET for 
HVM guests and that additionally for PVH the PG is set.

What about the NE? That looks to be missing from the list above? Should we set 
it?

>+
>+  if (!cpu)
>+  return;
>+  /*
>+   * Unlike PV, for pvh xen does not set: PSE PGE OSFXSR OSXMMEXCPT
>+   * For BSP, PSE PGE will be set in probe_page_size_mask(), for AP
>+   * set them here. For all, OSFXSR OSXMMEXCPT will be set in fpu_init
>+   */
>+  if (cpu_has_pse)
>+  set_in_cr4(X86_CR4_PSE);
>+
>+  if (cpu_has_pge)
>+  set_in_cr4(X86_CR4_PGE);
>+}
>+
>+/*
>+ * Note, that it is refok - because the only caller of this after init
>+ * is PVH which is not going to use xen_load_gdt_boot or other
>+ * __init functions.
>+ */
>+void __ref xen_pvh_secondary_vcpu_init(int cpu)
>+{
>+  xen_setup_gdt(cpu);
>+  xen_pvh_set_cr_flags(cpu);
>+}
>+
> static void __init xen_pvh_early_guest_init(void)
> {
>   if (!xen_feature(XENFEAT_auto_translated_physmap))
>   return;
> 
>-  if (xen_feature(XENFEAT_hvm_callback_vector))
>+  if (xen_feature(XENFEAT_hvm_callback_vector)) {
>   xen_have_vector_callback = 1;
>+  xen_pvh_set_cr_flags(0);
>+  }
> 
> #ifdef CONFIG_X86_32
>   BUG(); /* PVH: Implement proper support. */
>diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
>index 5e46190..a18eadd 100644
>--- a/arch/x86/xen/smp.c
>+++ b/arch/x86/xen/smp.c
>@@ -105,7 +105,7 @@ static void cpu_bringup_and_idle(int cpu)
> #ifdef CONFIG_X86_64
>   if (xen_feature(XENFEAT_auto_translated_physmap) &&
>   xen_feature(XENFEAT_supervisor_mode_kernel))
>-  xen_setup_gdt(cpu);
>+  xen_pvh_secondary_vcpu_init(cpu);
> #endif
>   cpu_bringup();
>   cpu_startup_entry(CPUHP_ONLINE);
>diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
>index 9059c24..1cb6f4c 100644
>--- a/arch/x86/xen/xen-ops.h
>+++ b/arch/x86/xen/xen-ops.h
>@@ -123,5 +123,5 @@ __visible void xen_adjust_exception_frame(void);
> 
> extern int xen_panic_handler_init(void);
> 
>-void xen_setup_gdt(int cpu);
>+void xen_pvh_secondary_vcpu_init(int cpu);
> #endif /* XEN_OPS_H */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MAINTAINERS tree branches [xen tip as an example]

2014-01-22 Thread Konrad Rzeszutek Wilk
"Luis R. Rodriguez"  wrote:
>As per linux-next Next/Trees [0], and a recent January MAINTAINERS
>patch [1]
>from David one of the xen development kernel git trees to track is
>xen/git.git [2], this tree however gives has undefined references when
>doing a
>fresh clone [shown below], but as expected does work well when only
>cloning
>the linux-next branch [also below]. While I'm sure this is fine for
>folks who can do the guess work do we really want to live with trees
>like
>these on MAINTAINERS ? The MAINTAINERS file doesn't let us specify
>branches
>required, so perhaps it should -- if we want to live with these ?

The master branch can be linked to the #linux-next or stable/for-linus.

That would solve the problem I think.
>Curious, how
>many other git are there with a similar situation ?
>
>The xen project web site actually lists [3] Konrad's xen git tree [4]
>for
>development as the primary development tree, that probably should be
>updated now, and likely with instructions to clone only the linux-next
>branch ?

Thank you for reporting. Will fix it next week if nobody else beats me to it.
>
>[0]
>https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/Next/Trees#n176
>[1] http://lists.xen.org/archives/html/xen-devel/2014-01/msg01504.html
>[2] git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git
>[3]
>http://wiki.xenproject.org/wiki/Xen_Repositories#Primary_Xen_Repository
>[4] git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git
>
>mcgrof@bubbles ~ $ git clone
>git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git --reference
>linux/.git
>Cloning into 'tip'...
>remote: Counting objects: 2806, done.
>remote: Compressing objects: 100% (334/334), done.
>remote: Total 1797 (delta 1511), reused 1646 (delta 1462)
>Receiving objects: 100% (1797/1797), 711.01 KiB | 640.00 KiB/s, done.
>Resolving deltas: 100% (1511/1511), completed with 306 local objects.
>Checking connectivity... done.
>warning: remote HEAD refers to nonexistent ref, unable to checkout.
>
>mcgrof@work ~ $ git clone
>git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git -b linux-next
>--reference linux/.git
>Cloning into 'tip'...
>remote: Counting objects: 2806, done.
>remote: Compressing objects: 100% (377/377), done.
>remote: Total 1797 (delta 1545), reused 1607 (delta 1419)
>Receiving objects: 100% (1797/1797), 485.23 KiB | 0 bytes/s, done.
>Resolving deltas: 100% (1545/1545), completed with 327 local objects.
>Checking connectivity... done.
>Checking out files: 100% (44979/44979), done.
>
>  Luis


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [V0 PATCH] xen/pvh: set some cr flags upon vcpu start

2014-01-22 Thread Konrad Rzeszutek Wilk
Mukesh Rathor  wrote:
>pvh was designed to start with pv flags, but a commit in xen tree
>51e2cac257ec8b4080d89f0855c498cbbd76a5e5 removed some of the flags as

"Name of the patch in the Xen tree"

>they are not necessary. As a result, these CR flags must be set in the
>guest.

>
>Signed-off-by: Roger Pau Monne 

You missed modifying the patch to reflect the authorship to be Roger's.

Please use git commit --amend --author "somebody s name " 

Also Roger should be credited with Reported-by. I can add that.

>Signed-off-by: Mukesh Rathor 
>---
>arch/x86/xen/enlighten.c |   43
>+--
> arch/x86/xen/smp.c   |2 +-
> arch/x86/xen/xen-ops.h   |2 +-
> 3 files changed, 39 insertions(+), 8 deletions(-)
>
>diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>index 628099a..4a2aaa6 100644
>--- a/arch/x86/xen/enlighten.c
>+++ b/arch/x86/xen/enlighten.c
>@@ -1410,12 +1410,8 @@ static void __init
>xen_boot_params_init_edd(void)
>  * Set up the GDT and segment registers for -fstack-protector.  Until
>  * we do this, we have to be careful not to call any stack-protected
>  * function, which is most of the kernel.
>- *
>- * Note, that it is refok - because the only caller of this after init
>- * is PVH which is not going to use xen_load_gdt_boot or other
>- * __init functions.
>  */
>-void __ref xen_setup_gdt(int cpu)
>+static void xen_setup_gdt(int cpu)
>   if (xen_feature(XENFEAT_auto_translated_physmap)) {
> #ifdef CONFIG_X86_64
>@@ -1463,13 +1459,48 @@ void __ref xen_setup_gdt(int cpu)
>   pv_cpu_ops.load_gdt = xen_load_gdt;
> }
> 
>+/*
>+ * A pv guest starts with default flags that are not set for pvh, set
>them
>+ * here asap.
>+ */
>+static void xen_pvh_set_cr_flags(int cpu)

>+{

Pls add:

/* See 'secondary_startup_64' for how bare metal does it. */

>+  write_cr0(read_cr0() | X86_CR0_MP | X86_CR0_WP | X86_CR0_AM);
>+
>+  if (!cpu)
>+  return;
>+  /*
>+   * Unlike PV, for pvh xen does not set: PSE PGE OSFXSR OSXMMEXCPT
>+   * For BSP, PSE PGE will be set in probe_page_size_mask(), for AP
>+   * set them here. For all, OSFXSR

Might want to mention that for AP on bare metal they are set in 
'secondary_start_64'

... 
>+   */

Is it OK to set this twice? 

Meaning remove the 'if (!cpu)..' check so that this code path is run for BSP 
and AP?

>+  if (cpu_has_pse)
>+  set_in_cr4(X86_CR4_PSE);
>+
>+  if (cpu_has_pge)
>+  set_in_cr4(X86_CR4_PGE);
>+}
>+
>+/*
>+ * Note, that it is refok - because the only caller of this after init
>+ * is PVH which is not going to use xen_load_gdt_boot or other
>+ * __init functions.

Hmm. You must be using and older tree. The new one has __ref comment.

>+ */
>+void __ref xen_pvh_secondary_vcpu_init(int cpu)
>+{
>+  xen_setup_gdt(cpu);
>+  xen_pvh_set_cr_flags(cpu);
>+}
>+
> static void __init xen_pvh_early_guest_init(void)
> {
>   if (!xen_feature(XENFEAT_auto_translated_physmap))
>   return;
> 
>-  if (xen_feature(XENFEAT_hvm_callback_vector))
>+  if (xen_feature(XENFEAT_hvm_callback_vector)) 

>   xen_have_vector_callback = 1;
>+  xen_pvh_set_cr_flags(0);
>+  }
> 
> #ifdef CONFIG_X86_32
>   BUG(); /* PVH: Implement proper support. */
>diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
>index 5e46190..a18eadd 100644
>--- a/arch/x86/xen/smp.c
>+++ b/arch/x86/xen/smp.c
>@@ -105,7 +105,7 @@ static void cpu_bringup_and_idle(int cpu)
> #ifdef CONFIG_X86_64
>   if (xen_feature(XENFEAT_auto_translated_physmap) &&
>   xen_feature(XENFEAT_supervisor_mode_kernel))
>-  xen_setup_gdt(cpu);
>+  xen_pvh_secondary_vcpu_init(cpu);
> #endif
>   cpu_bringup();
>   cpu_startup_entry(CPUHP_ONLINE);
>diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
>index 9059c24..1cb6f4c 100644
>--- a/arch/x86/xen/xen-ops.h
>+++ b/arch/x86/xen/xen-ops.h
>@@ -123,5 +123,5 @@ __visible void xen_adjust_exception_frame(void);
> 
> extern int xen_panic_handler_init(void);
> 
>-void xen_setup_gdt(int cpu);
>+void xen_pvh_secondary_vcpu_init(int cpu);
> #endif /* XEN_OPS_H */

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 00/73] tree-wide: clean up some no longer required #include

2014-01-22 Thread Paul Gortmaker
[Re: [PATCH RFC 00/73] tree-wide: clean up some no longer required #include 
] On 22/01/2014 (Wed 18:00) Stephen Rothwell wrote:

> Hi Paul,
> 
> On Tue, 21 Jan 2014 16:22:03 -0500 Paul Gortmaker 
>  wrote:
> >
> > Where: This work exists as a queue of patches that I apply to
> > linux-next; since the changes are fixing some things that currently
> > can only be found there.  The patch series can be found at:
> > 
> >http://git.kernel.org/cgit/linux/kernel/git/paulg/init.git
> >git://git.kernel.org/pub/scm/linux/kernel/git/paulg/init.git
> > 
> > I've avoided annoying Stephen with another queue of patches for
> > linux-next while the development content was in flux, but now that
> > the merge window has opened, and new additions are fewer, perhaps he
> > wouldn't mind tacking it on the end...  Stephen?
> 
> OK, I have added this to the end of linux-next today - we will see how we
> go.  It is called "init".

Thanks, it was a great help as it uncovered a few issues in fringe arch
that I didn't have toolchains for, and I've fixed all of those up.

I've noticed that powerpc has been un-buildable for a while now; I have
used this hack patch locally so I could run the ppc defconfigs to check
that I didn't break anything.  Maybe useful for linux-next in the
interim?  It is a hack patch -- Not-Signed-off-by: Paul Gortmaker.  :)

Paul.
--

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
b/arch/powerpc/include/asm/pgtable-ppc64.h
index d27960c89a71..d0f070a2b395 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -560,9 +560,9 @@ extern void pmdp_invalidate(struct vm_area_struct *vma, 
unsigned long address,
pmd_t *pmdp);
 
 #define pmd_move_must_withdraw pmd_move_must_withdraw
-typedef struct spinlock spinlock_t;
-static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
-spinlock_t *old_pmd_ptl)
+struct spinlock;
+static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
+struct spinlock *old_pmd_ptl)
 {
/*
 * Archs like ppc64 use pgtable to store per pmd

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [V0 PATCH] xen/pvh: set some cr flags upon vcpu start

2014-01-22 Thread Konrad Rzeszutek Wilk
Mukesh Rathor  wrote:
>Konrad,
>
>The following patch sets the bits in CR0 and CR4. Please note, I'm
>working
>on patch for the xen side. The CR4 features are not currently exported
>to a PVH guest. 

The patch should really have been split in two - one for CR0 and one for CR4.

Especially as the ramifications of enabling PGE are much more complex. For 
example - there is a need to  fix up the __supported_pte_mask to allow one to 
use PAGE_GLOBAL. There might be other things too that need tweaking.

>
>Roger, I added your SOB line, please lmk if I need to add anything
>else.
>
>This patch was build on top of a71accb67e7645c68061cec2bee6067205e439fc
>in
>konrad devel/pvh.v13 branch.

Pls use #linux-next at this stage.

Thank you!
>
>thanks
>Mukesh


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xen-blkfront: remove type check from blkfront_setup_discard

2014-01-22 Thread Konrad Rzeszutek Wilk
Boris Ostrovsky  wrote:
>On 01/13/2014 04:30 AM, Olaf Hering wrote:
>> On Fri, Jan 10, Boris Ostrovsky wrote:
>>
>>> I don't know discard code works but it seems to me that if you pass,
>for
>>> example,  zero as discard_granularity (which may happen if
>xenbus_gather()
>>> fails) then blkdev_issue_discard() in the backend will set
>granularity to 1
>>> and continue with discard. This may not be what the the guest admin
>>> requested. And he won't know about this since no error message is
>printed
>>> anywhere.
>> If I understand the code using granularity/alignment correctly, both
>are
>> optional properties. So if the granularity is just 1 it means byte
>> ranges, which is fine if the backend uses FALLOC_FL_PUNCH_HOLE. Also
>> both properties are not admin controlled, for phy the blkbk drivers
>just
>> passes on what it gets from the underlying hardware.
>>
>>> Similarly, if xenbug_gather("discard-secure") fails, I think the
>code will
>>> assume that secure discard has not been requested. I don't know what
>>> security implications this will have but it sounds bad to me.
>> There are no security implications, if the backend does not advertise
>it
>> then its not present.
>
>Right. But my questions was what if the backend does advertise it and 
>wants the frontent to use it but xenbus_gather() in the frontend fails.
>
>Do we want to silently continue without discard-secure? Is this safe?
>

Yes
>
>-boris
>
>>
>> After poking around some more it seems that blkif.h is the spec, it
>does
>> not say anything that the three properties are optional. Also the
>> backend drivers in sles11sp2 and mainline create all three properties
>> unconditionally. So I think a better change is to expect all three
>> properties in the frontend. I will send another version of the patch.
>>
>>
>> Olaf


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v2] xen-blkfront: remove type check from blkfront_setup_discard

2014-01-22 Thread Konrad Rzeszutek Wilk
Jan Beulich  wrote:
 On 13.01.14 at 14:45, David Vrabel  wrote:
>> On 13/01/14 13:16, Jan Beulich wrote:
>> On 13.01.14 at 14:00, Ian Campbell 
>wrote:
 On Mon, 2014-01-13 at 12:34 +, Jan Beulich wrote:
 On 13.01.14 at 13:01, Olaf Hering  wrote:
>> On Mon, Jan 13, Jan Beulich wrote:
>>
>>> You can't do this in one go - the first two and the last one may
>be
>>> set independently (and are independent in their meaning), and
>>> hence need to be queried independently (xenbus_gather() fails
>>> on the first absent value).
>>
>> Yes, thats the purpose. Since the properties are required its an
>all or
>> nothing thing. If they are truly optional then blkif.h should be
>updated
>> to say that.
>
> They _are_ optional.

 But is it true that either they are all present or they are all
>absent?
>>> 
>>> No, it's not. discard-secure is independent of the other two (but
>>> those other two are tied together).
>> 
>> Can we have a patch to blkif.h that clarifies this?
>> 
>> e.g.,
>> 
>> feature-discard
>> 
>>...
>> 
>>discard-granularity and discard-offset must also be present if
>>feature-discard is enabled
>
>It would be "may" here too afaict. But I'll defer to Konrad, who
>has done more work in this area...
>
>Jan
>
>>discard-secure may also be present if feature-discard is enabled.
>> 
>> David
>
>
>
>
>___
>Xen-devel mailing list
>xen-de...@lists.xen.org
>http://lists.xen.org/xen-devel

It is all 'may'. If there is just 'feature-discard' without any other options 
that is OK.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] tick: A few more cleanups

2014-01-22 Thread Frederic Weisbecker
On Thu, Jan 16, 2014 at 04:41:48PM +0100, Frederic Weisbecker wrote:
> Ingo,
> 
> Please pull the timers/core branch that can be found at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
>   timers/core
> 
> HEAD: 8fe8ff09ce3b5750e1f3e45a1f4a81d59c7ff1f1
> 
> 
> Nothing very exiting, just a bunch of non-critical cleanups for the next 
> merge window:
> 
> 1) Make the IRQ tick APIs naming more symetric
> 
> 2) Optimize a bit jiffies_lock code coverage
> 
> 3) Whitespace fixes from Alex Shi
> 
> 4) Fix overflow in scheduler tick max deferment calculation. Given the
> current 1 second max limitation, this bug shouldn't happen in mainline.
> It's rather to prepare for making this value tunable. Or simply in case
> we change the current constant.
> 
> Thanks,
>   Frederic
> ---
> 
> Frederic Weisbecker (2):
>   tick: Rename tick_check_idle() to tick_irq_enter()
>   nohz: Get timekeeping max deferment outside jiffies_lock
> 
> Alex Shi (1):
>   nohz_full: fix code style issue of tick_nohz_full_stop_tick
> 
> Kevin Hilman (1):
>   sched/nohz: Fix overflow error in scheduler_tick_max_deferment()
> 

Ping.

> 
>  include/linux/jiffies.h  |  6 ++
>  include/linux/tick.h |  6 +++---
>  kernel/sched/core.c  |  2 +-
>  kernel/softirq.c |  2 +-
>  kernel/time/tick-sched.c | 27 ++-
>  5 files changed, 25 insertions(+), 18 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] gic: change access of gicc_ctrl register to read modify write.

2014-01-22 Thread Feng Kan
Just checking to see anyone had time to take a look at this and comment.

On Sun, Dec 8, 2013 at 12:22 PM, Feng Kan  wrote:
> This change is made to preserve the GIC v2 releated bits in the
> GIC_CPU_CTRL register (also known as the GICC_CTLR register in spec).
> The original code only set the enable/disable group bit in this register.
> This code will preserve all other bits configured by the bootload except
> the enable/disable bit. The main reason for this change is to allow the
> bypass bits specified in the v2 spec to remain untouched by the current
> GIC code. In the X-Gene platform, the bypass functionality is not used
> and bypass must be disabled at all time.
>
> Signed-off-by: Vinayak Kale 
> Acked-by: Anup Patel 
> Signed-off-by: Feng Kan 
> ---
>  drivers/irqchip/irq-gic.c |   19 ---
>  1 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> index d0e9480..6550ac9 100644
> --- a/drivers/irqchip/irq-gic.c
> +++ b/drivers/irqchip/irq-gic.c
> @@ -419,6 +419,7 @@ static void gic_cpu_init(struct gic_chip_data *gic)
> void __iomem *dist_base = gic_data_dist_base(gic);
> void __iomem *base = gic_data_cpu_base(gic);
> unsigned int cpu_mask, cpu = smp_processor_id();
> +   unsigned int ctrl_mask;
> int i;
>
> /*
> @@ -450,13 +451,21 @@ static void gic_cpu_init(struct gic_chip_data *gic)
> writel_relaxed(0xa0a0a0a0, dist_base + GIC_DIST_PRI + i * 4 / 
> 4);
>
> writel_relaxed(0xf0, base + GIC_CPU_PRIMASK);
> -   writel_relaxed(1, base + GIC_CPU_CTRL);
> +
> +   ctrl_mask = readl(base + GIC_CPU_CTRL);
> +   ctrl_mask |= 0x1;
> +   writel_relaxed(ctrl_mask, base + GIC_CPU_CTRL);
>  }
>
>  void gic_cpu_if_down(void)
>  {
> +   unsigned int ctrl_mask;
> +
> void __iomem *cpu_base = gic_data_cpu_base(_data[0]);
> -   writel_relaxed(0, cpu_base + GIC_CPU_CTRL);
> +
> +   ctrl_mask = readl(base + GIC_CPU_CTRL);
> +   ctrl_mask &= 0xfffe;
> +   writel_relaxed(ctrl_mask, cpu_base + GIC_CPU_CTRL);
>  }
>
>  #ifdef CONFIG_CPU_PM
> @@ -567,6 +576,7 @@ static void gic_cpu_restore(unsigned int gic_nr)
>  {
> int i;
> u32 *ptr;
> +   unsigned int ctrl_mask;
> void __iomem *dist_base;
> void __iomem *cpu_base;
>
> @@ -591,7 +601,10 @@ static void gic_cpu_restore(unsigned int gic_nr)
> writel_relaxed(0xa0a0a0a0, dist_base + GIC_DIST_PRI + i * 4);
>
> writel_relaxed(0xf0, cpu_base + GIC_CPU_PRIMASK);
> -   writel_relaxed(1, cpu_base + GIC_CPU_CTRL);
> +
> +   ctrl_mask = readl(base + GIC_CPU_CTRL);
> +   ctrl_mask |= 0x1;
> +   writel_relaxed(ctrl_mask, cpu_base + GIC_CPU_CTRL);
>  }
>
>  static int gic_notifier(struct notifier_block *self, unsigned long cmd,  
>   void *v)
> --
> 1.7.6.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] gic: change access of gicc_ctrl register to read modify write.

2014-01-22 Thread Feng Kan
Just checking to see anyone had time to take a look at this and comment.

Thanks

On Sun, Dec 8, 2013 at 12:22 PM, Feng Kan  wrote:
> This change is made to preserve the GIC v2 releated bits in the
> GIC_CPU_CTRL register (also known as the GICC_CTLR register in spec).
> The original code only set the enable/disable group bit in this register.
> This code will preserve all other bits configured by the bootload except
> the enable/disable bit. The main reason for this change is to allow the
> bypass bits specified in the v2 spec to remain untouched by the current
> GIC code. In the X-Gene platform, the bypass functionality is not used
> and bypass must be disabled at all time.
>
> Signed-off-by: Vinayak Kale 
> Acked-by: Anup Patel 
> Signed-off-by: Feng Kan 
> ---
>  drivers/irqchip/irq-gic.c |   19 ---
>  1 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> index d0e9480..6550ac9 100644
> --- a/drivers/irqchip/irq-gic.c
> +++ b/drivers/irqchip/irq-gic.c
> @@ -419,6 +419,7 @@ static void gic_cpu_init(struct gic_chip_data *gic)
> void __iomem *dist_base = gic_data_dist_base(gic);
> void __iomem *base = gic_data_cpu_base(gic);
> unsigned int cpu_mask, cpu = smp_processor_id();
> +   unsigned int ctrl_mask;
> int i;
>
> /*
> @@ -450,13 +451,21 @@ static void gic_cpu_init(struct gic_chip_data *gic)
> writel_relaxed(0xa0a0a0a0, dist_base + GIC_DIST_PRI + i * 4 / 
> 4);
>
> writel_relaxed(0xf0, base + GIC_CPU_PRIMASK);
> -   writel_relaxed(1, base + GIC_CPU_CTRL);
> +
> +   ctrl_mask = readl(base + GIC_CPU_CTRL);
> +   ctrl_mask |= 0x1;
> +   writel_relaxed(ctrl_mask, base + GIC_CPU_CTRL);
>  }
>
>  void gic_cpu_if_down(void)
>  {
> +   unsigned int ctrl_mask;
> +
> void __iomem *cpu_base = gic_data_cpu_base(_data[0]);
> -   writel_relaxed(0, cpu_base + GIC_CPU_CTRL);
> +
> +   ctrl_mask = readl(base + GIC_CPU_CTRL);
> +   ctrl_mask &= 0xfffe;
> +   writel_relaxed(ctrl_mask, cpu_base + GIC_CPU_CTRL);
>  }
>
>  #ifdef CONFIG_CPU_PM
> @@ -567,6 +576,7 @@ static void gic_cpu_restore(unsigned int gic_nr)
>  {
> int i;
> u32 *ptr;
> +   unsigned int ctrl_mask;
> void __iomem *dist_base;
> void __iomem *cpu_base;
>
> @@ -591,7 +601,10 @@ static void gic_cpu_restore(unsigned int gic_nr)
> writel_relaxed(0xa0a0a0a0, dist_base + GIC_DIST_PRI + i * 4);
>
> writel_relaxed(0xf0, cpu_base + GIC_CPU_PRIMASK);
> -   writel_relaxed(1, cpu_base + GIC_CPU_CTRL);
> +
> +   ctrl_mask = readl(base + GIC_CPU_CTRL);
> +   ctrl_mask |= 0x1;
> +   writel_relaxed(ctrl_mask, cpu_base + GIC_CPU_CTRL);
>  }
>
>  static int gic_notifier(struct notifier_block *self, unsigned long cmd,  
>   void *v)
> --
> 1.7.6.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm/zswap: add writethrough option

2014-01-22 Thread Bob Liu

On 01/23/2014 08:18 AM, Minchan Kim wrote:
> Hello all,
> 
> On Wed, Jan 22, 2014 at 12:33:58PM -0800, Andrew Morton wrote:
>> On Wed, 22 Jan 2014 09:19:58 -0500 Dan Streetman  wrote:
>>
>>> Acutally, I really don't know how much benefit we have that in-memory
>>> swap overcomming to the real storage but if you want, zRAM with dm-cache
>>> is another option rather than invent new wheel by "just having is 
>>> better".
>>
>> I'm not sure if this patch is related to the zswap vs. zram discussions. 
>>  This
>> only adds the option of using writethrough to zswap.  It's a first
>> step to possibly
>> making zswap work more efficiently using writeback and/or writethrough
>> depending on
>> the system and conditions.
>
> The patch size is small. Okay I don't want to be a party-pooper
> but at least, I should say my thought for Andrew to help judging.

 Sure, I'm glad to have your suggestions.
>>>
>>> To give this a bump - Andrew do you have any concerns about this
>>> patch?  Or can you pick this up?
>>
>> I don't pay much attention to new features during the merge window,
>> preferring to shove them into a folder to look at later.  Often they
>> have bitrotted by the time -rc1 comes around.
>>
>> I'm not sure that this review discussion has played out yet - is
>> Minchan happy?
> 
> From the beginning, zswap is for reducing swap I/O but if workingset
> overflows, it should write back rather than OOM with expecting a small
> number of writeback would make the system happy because the high memory
> pressure is temporal so soon most of workload would be hit in zswap
> without further writeback.
> 
> If memory pressure continues and writeback steadily, it means zswap's
> benefit would be mitigated, even worse by addding comp/decomp overhead.
> In that case, it would be better to disable zswap, even.
> 
> Dan said writethrough supporting is first step to make zswap smart
> but anybody didn't say further words to step into the smart and
> what's the *real* workload want it and what's the *real* number from
> that because dm-cache/zram might be a good fit.
> (I don't intend to argue zram VS zswap. If the concern is solved by
> existing solution, why should we invent new function and
> have maintenace cost?) so it's very hard for me to judge that we should
> accept and maintain it.
> 

Speak of dm-cache, there are also bcache, flashcache and bcache.

> We need blueprint for the future and make an agreement on the
> direction before merging this patch.
> 
> But code size is not much and Seth already gave an his Ack so I don't
> want to hurt Dan any more(Sorry for Dan) and wasting my time so pass
> the decision to others(ex, Seth and Bob).

Since zswap is a cache layer and write-back and write-through are two
common options for any cache. I'm fine with adding this write-through
option.

Thanks,
-Bob

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


mm: BUG: Bad rss-counter state

2014-01-22 Thread Sasha Levin

Hi all,

While fuzzing with trinity running inside a KVM tools guest using latest -next 
kernel,
I've stumbled on a "mm: BUG: Bad rss-counter state" error which was pretty 
non-obvious
in the mix of the kernel spew (why?).

I've added a small BUG() after the printk() in check_mm(), and here's the full 
output:

[  318.334905] BUG: Bad rss-counter state mm:8801e6dec000 idx:0 val:1
[  318.335955] [ cut here ]
[  318.336507] kernel BUG at kernel/fork.c:562!
[  318.336930] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  318.337826] Dumping ftrace buffer:
[  318.338431](ftrace buffer empty)
[  318.338951] Modules linked in:
[  318.339287] CPU: 45 PID: 10022 Comm: trinity-c190 Tainted: GW
3.13.0-next
-20140122-sasha-00011-gcc8342a-dirty #4
[  318.340120] task: 8801e6a9b000 ti: 8801e6aee000 task.ti: 
8801e6aee000
[  318.340120] RIP: 0010:[]  [] 
__mmdrop+0x9a/0xc0
[  318.340120] RSP: :8801e6aefe68  EFLAGS: 00010292
[  318.340120] RAX: 003a RBX: 8801e6dec000 RCX: 0001
[  318.340120] RDX:  RSI: 0001 RDI: 0286
[  318.340120] RBP: 8801e6aefe78 R08: 0001 R09: 
[  318.340120] R10: 0001 R11: 0001 R12: 8801e6dec138
[  318.340120] R13: 8801e6dec000 R14: 8801e6dec0a8 R15: 00a3
[  318.340120] FS:  7f6bc5915700() GS:88007b40() 
knlGS:

[  318.340120] CS:  0010 DS:  ES:  CR0: 8005003b
[  318.340120] CR2: 7fffd3d62588 CR3: 05e26000 CR4: 06e0
[  318.340120] Stack:
[  318.340120]  8801e6dec138 8801e6dec000 8801e6aefe98 
8113cb3b
[  318.340120]  8801e6a9bbb0 8801e6a9b000 8801e6aefef8 
81140ced
[  318.340120]  8801e6c4db00 8801e6c4db00 8801e6aefef8 
811f3ea5
[  318.340120] Call Trace:
[  318.340120]  [] mmput+0xcb/0xe0
[  318.340120]  [] exit_mm+0x18d/0x1a0
[  318.340120]  [] ? acct_collect+0x175/0x1b0
[  318.340120]  [] do_exit+0x26f/0x520
[  318.355754]  [] do_group_exit+0xa9/0xe0
[  318.355754]  [] SyS_exit_group+0x17/0x20
[  318.355754]  [] tracesys+0xdd/0xe2
[  318.355754] Code: 00 00 eb 16 0f 1f 44 00 00 48 8b 8b 68 03 00 00 48 85 c9 74 24 ba 02 00 00 00 
48 89 de 48 c7 c7 10 16 68 85 31 c0 e8 a2 d2 2f 03 <0f> 0b 0f 1f 40 00 eb fe 66 0f 1f 44 00 00 48 89 
de 48 8b 3d 1e

[  318.355754] RIP  [] __mmdrop+0x9a/0xc0
[  318.355754]  RSP 
[  318.363991] ---[ end trace 7d85aceb881be62b ]---


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clocksource: fix some comments typo in clocksource.c

2014-01-22 Thread Yijing Wang
On 2014/1/23 5:05, Thomas Gleixner wrote:
> On Thu, 2 Jan 2014, Yijing Wang wrote:
> 
>> Fix some trivial comments typo in kernel/time/clocksource.c
> 
> That's not a typo. Thats a left over. The function simply cannot fail
> anymore. So the subject of that patch should be something like:
> 
> clocksource: Remove outdated comments

Hi Thomas, sorry for my poor English, I will update this patch title and 
changelog.

> 
> And the changelog should explain, that the functions always return 0,
> so the comment is just pointless. A nice follow up on that would be to
> actually make the function void instead of returning a pointless int,
> but that requires to check all call sites.

You are right, it's pointless to return 0, I will try to change the function 
type to void
in a separate patch, thanks!

>  
>> Signed-off-by: Yijing Wang 
>> ---
>>  kernel/time/clocksource.c |3 ---
>>  1 files changed, 0 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>> index ba3e502..9951575 100644
>> --- a/kernel/time/clocksource.c
>> +++ b/kernel/time/clocksource.c
>> @@ -779,8 +779,6 @@ EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale);
>>   * @scale:  Scale factor multiplied against freq to get clocksource hz
>>   * @freq:   clocksource frequency (cycles per second) divided by scale
>>   *
>> - * Returns -EBUSY if registration fails, zero otherwise.
>> - *
>>   * This *SHOULD NOT* be called directly! Please use the
>>   * clocksource_register_hz() or clocksource_register_khz helper functions.
>>   */
>> @@ -805,7 +803,6 @@ EXPORT_SYMBOL_GPL(__clocksource_register_scale);
>>   * clocksource_register - Used to install new clocksources
>>   * @cs: clocksource to be registered
>>   *
>> - * Returns -EBUSY if registration fails, zero otherwise.
>>   */
>>  int clocksource_register(struct clocksource *cs)
>>  {
>> -- 
>> 1.7.1
>>
>>
>>
> 
> 


-- 
Thanks!
Yijing

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the drm-intel tree with the drm tree

2014-01-22 Thread Olof Johansson
On Wed, Jan 22, 2014 at 2:06 AM, Daniel Vetter  wrote:
> Hi Stephen,
>
> On Wed, Jan 22, 2014 at 4:04 AM, Stephen Rothwell  
> wrote:
>> Hi all,
>>
>> Today's linux-next merge of the drm-intel tree got a conflict in
>> drivers/gpu/drm/i915/i915_irq.c between commit abca9e454498 ("drm: Pass
>> 'flags' from the caller to .get_scanout_position()") from the drm tree
>> and commit d59a63ad8234 ("drm/i915: Add intel_get_crtc_scanline()") from
>> the drm-intel tree.
>>
>> I fixed it up (I think - see below) and can carry the fix as necessary
>> (no action is required).
>
> Oops, this patch escaped - it's only for 3.15. I've shuffled my
> branches around now for the merge window so this should not pop up in
> your -next tree again until 3.15 starts.

I just bisected boot failures on x86 chromebooks with -next to this
merge commit. I'll take a look tomorrow morning and make sure they're
gone.


-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: BUG: Bad rss-counter state

2014-01-22 Thread David Rientjes
On Wed, 22 Jan 2014, Sasha Levin wrote:

> Hi all,
> 
> While fuzzing with trinity running inside a KVM tools guest using latest -next
> kernel,
> I've stumbled on a "mm: BUG: Bad rss-counter state" error which was pretty
> non-obvious
> in the mix of the kernel spew (why?).
> 

It's not a fatal condition and there's only a few possible stack traces 
that could be emitted during the exit() path.  I don't see how we could 
make it more visible other than its log-level which is already KERN_ALERT.

> I've added a small BUG() after the printk() in check_mm(), and here's the full
> output:
> 

Worst place to add it :)  At line 562 of kernel/fork.c in linux-next 
you're going to hit BUG() when there may be other counters that are also 
bad and they don't get printed.  

> [  318.334905] BUG: Bad rss-counter state mm:8801e6dec000 idx:0 val:1

So our mm has a non-zero MM_FILEPAGES count, but there's nothing that was 
cited that would tell us what that is so there's not much to go on, unless 
someone already recognizes this as another issue.  Is this reproducible on 
3.13 or only on linux-next?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] mm/zswap: Check all pool pages instead of one pool pages

2014-01-22 Thread Cai Liu
Hello Dan

2014/1/22 Dan Streetman :
> On Wed, Jan 22, 2014 at 7:16 AM, Cai Liu  wrote:
>> Hello Minchan
>>
>>
>> 2014/1/22 Minchan Kim 
>>>
>>> Hello Cai,
>>>
>>> On Tue, Jan 21, 2014 at 09:52:25PM +0800, Cai Liu wrote:
>>> > Hello Minchan
>>> >
>>> > 2014/1/21 Minchan Kim :
>>> > > Hello,
>>> > >
>>> > > On Tue, Jan 21, 2014 at 02:35:07PM +0800, Cai Liu wrote:
>>> > >> 2014/1/21 Minchan Kim :
>>> > >> > Please check your MUA and don't break thread.
>>> > >> >
>>> > >> > On Tue, Jan 21, 2014 at 11:07:42AM +0800, Cai Liu wrote:
>>> > >> >> Thanks for your review.
>>> > >> >>
>>> > >> >> 2014/1/21 Minchan Kim :
>>> > >> >> > Hello Cai,
>>> > >> >> >
>>> > >> >> > On Mon, Jan 20, 2014 at 03:50:18PM +0800, Cai Liu wrote:
>>> > >> >> >> zswap can support multiple swapfiles. So we need to check
>>> > >> >> >> all zbud pool pages in zswap.
>>> > >> >> >>
>>> > >> >> >> Version 2:
>>> > >> >> >>   * add *total_zbud_pages* in zbud to record all the pages in 
>>> > >> >> >> pools
>>> > >> >> >>   * move the updating of pool pages statistics to
>>> > >> >> >> alloc_zbud_page/free_zbud_page to hide the details
>>> > >> >> >>
>>> > >> >> >> Signed-off-by: Cai Liu 
>>> > >> >> >> ---
>>> > >> >> >>  include/linux/zbud.h |2 +-
>>> > >> >> >>  mm/zbud.c|   44 
>>> > >> >> >> 
>>> > >> >> >>  mm/zswap.c   |4 ++--
>>> > >> >> >>  3 files changed, 35 insertions(+), 15 deletions(-)
>>> > >> >> >>
>>> > >> >> >> diff --git a/include/linux/zbud.h b/include/linux/zbud.h
>>> > >> >> >> index 2571a5c..1dbc13e 100644
>>> > >> >> >> --- a/include/linux/zbud.h
>>> > >> >> >> +++ b/include/linux/zbud.h
>>> > >> >> >> @@ -17,6 +17,6 @@ void zbud_free(struct zbud_pool *pool, 
>>> > >> >> >> unsigned long handle);
>>> > >> >> >>  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int 
>>> > >> >> >> retries);
>>> > >> >> >>  void *zbud_map(struct zbud_pool *pool, unsigned long handle);
>>> > >> >> >>  void zbud_unmap(struct zbud_pool *pool, unsigned long handle);
>>> > >> >> >> -u64 zbud_get_pool_size(struct zbud_pool *pool);
>>> > >> >> >> +u64 zbud_get_pool_size(void);
>>> > >> >> >>
>>> > >> >> >>  #endif /* _ZBUD_H_ */
>>> > >> >> >> diff --git a/mm/zbud.c b/mm/zbud.c
>>> > >> >> >> index 9451361..711aaf4 100644
>>> > >> >> >> --- a/mm/zbud.c
>>> > >> >> >> +++ b/mm/zbud.c
>>> > >> >> >> @@ -52,6 +52,13 @@
>>> > >> >> >>  #include 
>>> > >> >> >>  #include 
>>> > >> >> >>
>>> > >> >> >> +/*
>>> > >> >> >> +* statistics
>>> > >> >> >> +**/
>>> > >> >> >> +
>>> > >> >> >> +/* zbud pages in all pools */
>>> > >> >> >> +static u64 total_zbud_pages;
>>> > >> >> >> +
>>> > >> >> >>  /*
>>> > >> >> >>   * Structures
>>> > >> >> >>  */
>>> > >> >> >> @@ -142,10 +149,28 @@ static struct zbud_header 
>>> > >> >> >> *init_zbud_page(struct page *page)
>>> > >> >> >>   return zhdr;
>>> > >> >> >>  }
>>> > >> >> >>
>>> > >> >> >> +static struct page *alloc_zbud_page(struct zbud_pool *pool, 
>>> > >> >> >> gfp_t gfp)
>>> > >> >> >> +{
>>> > >> >> >> + struct page *page;
>>> > >> >> >> +
>>> > >> >> >> + page = alloc_page(gfp);
>>> > >> >> >> +
>>> > >> >> >> + if (page) {
>>> > >> >> >> + pool->pages_nr++;
>>> > >> >> >> + total_zbud_pages++;
>>> > >> >> >
>>> > >> >> > Who protect race?
>>> > >> >>
>>> > >> >> Yes, here the pool->pages_nr and also the total_zbud_pages are not 
>>> > >> >> protected.
>>> > >> >> I will re-do it.
>>> > >> >>
>>> > >> >> I will change *total_zbud_pages* to atomic type.
>>> > >> >
>>> > >> > Wait, it doesn't make sense. Now, you assume zbud allocator would be 
>>> > >> > used
>>> > >> > for only zswap. It's true until now but we couldn't make sure it in 
>>> > >> > future.
>>> > >> > If other user start to use zbud allocator, total_zbud_pages would be 
>>> > >> > pointless.
>>> > >>
>>> > >> Yes, you are right.  ZBUD is a common module. So in this patch 
>>> > >> calculate the
>>> > >> zswap pool size in zbud is not suitable.
>>> > >>
>>> > >> >
>>> > >> > Another concern is that what's your scenario for above two swap?
>>> > >> > How often we need to call zbud_get_pool_size?
>>> > >> > In previous your patch, you reduced the number of call so IIRC,
>>> > >> > we only called it in zswap_is_full and for debugfs.
>>> > >>
>>> > >> zbud_get_pool_size() is called frequently when adding/freeing zswap
>>> > >> entry happen in zswap . This is why in this patch I added a counter in 
>>> > >> zbud,
>>> > >> and then in zswap the iteration of zswap_list to calculate the pool 
>>> > >> size will
>>> > >> not be needed.
>>> > >
>>> > > We can remove updating zswap_pool_pages in zswap_frontswap_store and
>>> > > zswap_free_entry as I said. So zswap_is_full is only hot spot.
>>> > > Do you think it's still big overhead? Why? Maybe locking to prevent
>>> > > destroying? Then, we can use RCU to minimize the overhead as 

Re: [PATCH] SUNRPC: Allow one callback request to be received from two sk_buff

2014-01-22 Thread shaobingqing
2014/1/23 J. Bruce Fields :
> On Tue, Jan 21, 2014 at 08:35:36AM -0700, Trond Myklebust wrote:
>>
>> On Jan 21, 2014, at 3:08, shaobingqing  wrote:
>>
>> > 2014/1/21 Trond Myklebust :
>> >> On Mon, 2014-01-20 at 14:59 +0800, shaobingqing wrote:
>> >>> In current code, there only one struct rpc_rqst is prealloced. If one
>> >>> callback request is received from two sk_buff, the xprt_alloc_bc_request
>> >>> would be execute two times with the same transport->xid. The first time
>> >>> xprt_alloc_bc_request will alloc one struct rpc_rqst and the 
>> >>> TCP_RCV_COPY_DATA
>> >>> bit of transport->tcp_flags will not be cleared. The second time
>> >>> xprt_alloc_bc_request could not alloc struct rpc_rqst any more and NULL
>> >>> pointer will be returned, then xprt_force_disconnect occur. I think one
>> >>> callback request can be allowed to be received from two sk_buff.
>> >>>
>> >>> Signed-off-by: shaobingqing 
>> >>> ---
>> >>> net/sunrpc/xprtsock.c |   11 +--
>> >>> 1 files changed, 9 insertions(+), 2 deletions(-)
>> >>>
>> >>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>> >>> index ee03d35..606950d 100644
>> >>> --- a/net/sunrpc/xprtsock.c
>> >>> +++ b/net/sunrpc/xprtsock.c
>> >>> @@ -1271,8 +1271,13 @@ static inline int xs_tcp_read_callback(struct 
>> >>> rpc_xprt *xprt,
>> >>>  struct sock_xprt *transport =
>> >>>  container_of(xprt, struct sock_xprt, xprt);
>> >>>  struct rpc_rqst *req;
>> >>> + static struct rpc_rqst *req_partial;
>> >>> +
>> >>> + if (req_partial == NULL)
>> >>> + req = xprt_alloc_bc_request(xprt);
>> >>> + else if (req_partial->rq_xid == transport->tcp_xid)
>> >>> + req = req_partial;
>> >>
>> >> What happens here if req_partial->rq_xid != transport->tcp_xid? AFAICS,
>> >> req will be undefined. Either way, you cannot use a static variable for
>> >> storage here: that isn't re-entrant.
>> >
>> > Because metadata sever only have one slot for backchannel request,
>> > req_partial->rq_xid == transport->tcp_xid always happens, if the callback
>> > request just being splited in two sk_buffs. But req_partial->rq_xid !=
>> > transport->tcp_xid may also happens in some special cases, such as
>> > retransmission occurs?
>>
>> If the server retransmits, then it is broken. The NFSv4.1 protocol does not 
>> allow it to retransmit unless the connection breaks.
>
> shaobingqing, are you actually seeing retransmission?  (If so, are we
> setting up the callback client wrong?)
No, not actually. Here I just see that one client can receive two
callback requests with the same xid.
>
> --b.
>
>>
>> > If one callback request is splited in two sk_buffs, xs_tcp_read_callback
>> > will be execute two times. The req_partial should be a static variable,
>> > because  the second execution of xs_tcp_read_callback should use
>> > the rpc_rqst allocated for the first execution, which saves information
>> > copies from the first sk_buff.
>>
>> No! This is a multi-threaded/process environment which can support multiple 
>> connection. It is a bug to use a static variable.
>>
>> --
>> Trond Myklebust
>> Linux NFS client maintainer
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 00/36] mtd: st_spi_fsm: Add new driver

2014-01-22 Thread Brian Norris
Hi Lee,

On Wed, Jan 22, 2014 at 12:50:49PM +, Lee Jones wrote:
> > Version 4:
> >   Tended to Brian's previous review comments
> > - Checkpatch acceptance
> > - MODULE_DEVICE_TABLE() name slip correction
> > - Timeout issue(s) resolved
> > - Potential infinite loop mitigated
> > - Code clarity suggests heeded
> > - Duplication with MTD core code removed
> > - Upgraded to using ROUND_UP() helper
> > - Moved non-shared header code into main driver
> > - Relocated dynamic msg sequence stores into main struct
> > - Averted adaption of static (table) data
> > - Basic whitespace/spelling/data type/dev_err suggestions accepted
> > 
> > Version 3:
> >   Okay, this thing should be fully functional now. Identify a chip
> >   based on it's JEDEC ID, Read, Write, Erase (all or by sector).
> >   Support for various chip quirks added too.
> >  
> > Version 2:
> >   The first bunch of these patches have been on the MLs before, but
> >   didn't receive a great deal of attention for the most part. We are
> >   a little more featureful this time however. We can now successfully
> >   setup and configure the N25Q256. We still can't read/write/erase
> >   it though. I'll start work on that next week and will provide it in
> >   the next instalment.
> >  
> > Version 1:
> >   First stab at getting this thing Mainlined. It doesn't do a great deal
> >   yet, but we are able to initialise the device and dynamically set it up
> >   correctly based on an extracted JEDEC ID.
> > 
> >  Documentation/devicetree/bindings/mtd/st-fsm.txt |   26 ++
> >  arch/arm/boot/dts/stih416-b2105.dts  |   14 +
> >  arch/arm/boot/dts/stih416-pinctrl.dtsi   |   12 +
> >  drivers/mtd/devices/Kconfig  |8 +
> >  drivers/mtd/devices/Makefile |1 +
> >  drivers/mtd/devices/serial_flash_cmds.h  |   81 
> >  drivers/mtd/devices/st_spi_fsm.c | 2124 
> > +
> >  7 files changed, 2266 insertions(+)
> 
> Can you confirm receipt of this set, or would you like me to resend?

Well, I personally have the patch set but haven't had a chance to review
it. Can you resend with MTD in the CC, since we haven't had any comments
anyway? I believe MTD people are much less likely to look at it if you
forget the CC :)

You can just title it [PATCH RESEND v4 X/Y], possibly with a LKML
link back to the original v4, if you want to help avoid confusion.

Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/4] Intel MPX support

2014-01-22 Thread Ren Qiaowei

On 01/22/2014 08:30 PM, Ingo Molnar wrote:


* Ren, Qiaowei  wrote:





-Original Message-
From: Ingo Molnar [mailto:mingo.kernel@gmail.com] On Behalf Of Ingo
Molnar
Sent: Wednesday, January 22, 2014 7:53 PM
To: Ren, Qiaowei
Cc: H. Peter Anvin; Thomas Gleixner; Ingo Molnar; x...@kernel.org;
linux-kernel@vger.kernel.org; Peter Zijlstra
Subject: Re: [PATCH v2 0/4] Intel MPX support


* Qiaowei Ren  wrote:


Changes since v1:
   * check to see if #BR occurred in userspace or kernel space.
   * use generic structure and macro as much as possible when
 decode mpx instructions.

Qiaowei Ren (4):
   x86, mpx: add documentation on Intel MPX
   x86, mpx: hook #BR exception handler to allocate bound tables
   x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE
   x86, mpx: extend siginfo structure to include bound violation
 information

  Documentation/x86/intel_mpx.txt|   76 +++
  arch/x86/Kconfig   |4 +
  arch/x86/include/asm/mpx.h |   63 ++
  arch/x86/include/asm/processor.h   |   16 ++
  arch/x86/kernel/Makefile   |1 +
  arch/x86/kernel/mpx.c  |  417



  arch/x86/kernel/traps.c|   61 +-
  include/uapi/asm-generic/siginfo.h |9 +-
  include/uapi/linux/prctl.h |6 +
  kernel/signal.c|4 +
  kernel/sys.c   |   12 +
  11 files changed, 667 insertions(+), 2 deletions(-)  create mode
100644 Documentation/x86/intel_mpx.txt  create mode 100644
arch/x86/include/asm/mpx.h  create mode 100644 arch/x86/kernel/mpx.c


Such a patch submission is absolutely inadequate!

Please outline:

   - a short summary of what the feature does

   - a short description of what hardware supports it today or will
 support it in the future

   - a short description of whether the feature needs any
 configuration from the user or it's entirely auto-enabled on
 hardware that supports it.

   - a cost/benefit description to unrelated code: is this slowing down
 anything else?

   - how does user-space compiler support stand, what's the expected
 status there, etc.

Only a small fraction of that information can be found in
Documentation/x86/intel_mpx.txt. in

I'm absolutely sick of these semi-anonymous patch submissions from Intel, so
I'm NAK-ing it until it's communicated properly.


Ok. I will add related content into this documentation.


More importantly, put it into the 0/X mail! That's how people can
review such a patch set effectively.


Ok. Thanks for your feedback. I will do it.

Thanks,
Qiaowei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 7/8] ARM: brcmstb: gic: add compatible string for Broadcom Brahma15

2014-01-22 Thread Marc C
Hi Florian,

> Do not we also need to update drivers/irqchip/irq-gic.c to look for
> this compatible property? Alternatively should the example DTS contain
> the following:
>
> compatible = "brcm,brahma-b15-gic", "arm,cortex-a15-gic"?

Patch #8 [1] of this series has the "compatible" string set exactly that way. I 
was
following the pattern seen in the other reference DTS files, where 
"arm,cortex-a15-gic" is
used as the fall-back.

Thanks,
Marc C

[1] https://lkml.org/lkml/2014/1/21/649

On 01/22/2014 02:40 PM, Florian Fainelli wrote:
> Hi Marc,
> 
> 2014/1/21 Marc Carino :
>> Document the Broadcom Brahma B15 GIC implementation as compatible
>> with the ARM GIC standard.
>>
>> Signed-off-by: Marc Carino 
>> Acked-by: Florian Fainelli 
> 
> Do not we also need to update drivers/irqchip/irq-gic.c to look for
> this compatible property? Alternatively should the example DTS contain
> the following:
> 
> compatible = "brcm,brahma-b15-gic", "arm,cortex-a15-gic"?
> 
>> ---
>>  Documentation/devicetree/bindings/arm/gic.txt |1 +
>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/arm/gic.txt 
>> b/Documentation/devicetree/bindings/arm/gic.txt
>> index 3dfb0c0..d7409fd 100644
>> --- a/Documentation/devicetree/bindings/arm/gic.txt
>> +++ b/Documentation/devicetree/bindings/arm/gic.txt
>> @@ -15,6 +15,7 @@ Main node required properties:
>> "arm,cortex-a9-gic"
>> "arm,cortex-a7-gic"
>> "arm,arm11mp-gic"
>> +   "brcm,brahma-b15-gic"
>>  - interrupt-controller : Identifies the node as an interrupt controller
>>  - #interrupt-cells : Specifies the number of cells needed to encode an
>>interrupt source.  The type shall be a  and the value shall be 3.
>> --
>> 1.7.1
>>
> 
> 
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/9] mtd: nand: retrieve ECC requirements from Hynix READ ID byte 4

2014-01-22 Thread Brian Norris
+ Huang

Hi Boris,

On Wed, Jan 08, 2014 at 03:21:56PM +0100, Boris BREZILLON wrote:
> The Hynix nand flashes store their ECC requirements in byte 4 of its id
> (returned on READ ID command).
> 
> Signed-off-by: Boris BREZILLON 

I haven't verified yet (perhaps Huang can confirm?), but this may be
similar to a patch Huang submitted recently. In his case, we found that
this table is actually quite unreliable and is likely hard to maintain.

Why do you need this ECC information, for my reference?

Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net-next v5 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy

2014-01-22 Thread David Miller
From: Zoltan Kiss 
Date: Mon, 20 Jan 2014 21:24:20 +

> A long known problem of the upstream netback implementation that on the TX
> path (from guest to Dom0) it copies the whole packet from guest memory into
> Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
> huge perfomance penalty. The classic kernel version of netback used grant
> mapping, and to get notified when the page can be unmapped, it used page
> destructors. Unfortunately that destructor is not an upstreamable solution.
> Ian Campbell's skb fragment destructor patch series [1] tried to solve this
> problem, however it seems to be very invasive on the network stack's code,
> and therefore haven't progressed very well.
> This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
> know when the skb is freed up.

This series does not apply to net-next due to some other recent changes.

Please respin, thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: BUG: Bad rss-counter state

2014-01-22 Thread Dave Jones
On Wed, Jan 22, 2014 at 05:39:25PM -0800, David Rientjes wrote:
 
 > > While fuzzing with trinity running inside a KVM tools guest using latest 
 > > -next
 > > kernel,
 > > I've stumbled on a "mm: BUG: Bad rss-counter state" error which was pretty
 > > non-obvious
 > > in the mix of the kernel spew (why?).
 > > 
 > 
 > It's not a fatal condition and there's only a few possible stack traces 
 > that could be emitted during the exit() path.  I don't see how we could 
 > make it more visible other than its log-level which is already KERN_ALERT.
 > 
 > > I've added a small BUG() after the printk() in check_mm(), and here's the 
 > > full
 > > output:
 > > 
 > 
 > Worst place to add it :)  At line 562 of kernel/fork.c in linux-next 
 > you're going to hit BUG() when there may be other counters that are also 
 > bad and they don't get printed.  
 > 
 > > [  318.334905] BUG: Bad rss-counter state mm:8801e6dec000 idx:0 val:1
 > 
 > So our mm has a non-zero MM_FILEPAGES count, but there's nothing that was 
 > cited that would tell us what that is so there's not much to go on, unless 
 > someone already recognizes this as another issue.  Is this reproducible on 
 > 3.13 or only on linux-next?

Sasha, is this the current git tree version of Trinity ?
(I'm wondering if yesterdays munmap changes might be tickling this bug).

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 13/15] sched: Use a static_key for sched_clock_stable

2014-01-22 Thread Dave Young
On 01/22/14 at 12:59pm, Peter Zijlstra wrote:
> On Wed, Jan 22, 2014 at 11:45:32AM +0100, Peter Zijlstra wrote:
> > Ho humm.
> 
> OK, so I had me a ponder; does the below fix things for you and David?
> I've only done a boot test on real proper hardware :-)
> 
> ---
>  kernel/sched/clock.c | 42 +-
>  1 file changed, 33 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> index 6bd6a6731b21..6bbcd97f4532 100644
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -77,35 +77,45 @@ __read_mostly int sched_clock_running;
>  
>  #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
>  static struct static_key __sched_clock_stable = STATIC_KEY_INIT;
> +static int __sched_clock_stable_early;
>  
>  int sched_clock_stable(void)
>  {
> - if (static_key_false(&__sched_clock_stable))
> - return false;
> - return true;
> + return static_key_false(&__sched_clock_stable);
>  }
>  
>  void set_sched_clock_stable(void)
>  {
> + __sched_clock_stable_early = 1;
> +
> + smp_mb(); /* matches sched_clock_init() */
> +
> + if (!sched_clock_running)
> + return;
> +
>   if (!sched_clock_stable())
> - static_key_slow_dec(&__sched_clock_stable);
> + static_key_slow_inc(&__sched_clock_stable);
>  }
>  
>  static void __clear_sched_clock_stable(struct work_struct *work)
>  {
>   /* XXX worry about clock continuity */
>   if (sched_clock_stable())
> - static_key_slow_inc(&__sched_clock_stable);
> + static_key_slow_dec(&__sched_clock_stable);
>  }
>  
>  static DECLARE_WORK(sched_clock_work, __clear_sched_clock_stable);
>  
>  void clear_sched_clock_stable(void)
>  {
> - if (keventd_up())
> - schedule_work(_clock_work);
> - else
> - __clear_sched_clock_stable(_clock_work);
> + __sched_clock_stable_early = 0;
> +
> + smp_mb(); /* matches sched_clock_init() */
> +
> + if (!sched_clock_running)
> + return;
> +
> + schedule_work(_clock_work);
>  }
>  
>  struct sched_clock_data {
> @@ -140,6 +150,20 @@ void sched_clock_init(void)
>   }
>  
>   sched_clock_running = 1;
> +
> + /*
> +  * Ensure that it is impossible to not do a static_key update.
> +  *
> +  * Either {set,clear}_sched_clock_stable() must see sched_clock_running
> +  * and do the update, or we must see their __sched_clock_stable_early
> +  * and do the update, or both.
> +  */
> + smp_mb(); /* matches {set,clear}_sched_clock_stable() */
> +
> + if (__sched_clock_stable_early)
> + set_sched_clock_stable();
> + else
> + clear_sched_clock_stable();
>  }
>  
>  /*

It does not fix the prink time issue, here is the log:
[0.00] efi: mem26: type=6, attr=0x800f, 
range=[0x0dbe-0x0dc0) (0MB)
[0.00] DMI not present or invalid.
[0.00] Hypervisor detected: KVM
[0.00] e820: last_pfn = 0xdbe0 max_arch_pfn = 0x4
[0.00] PAT not supported by CPU.
[0.00] init_memory_mapping: [mem 0x-0x000f]
[0.00] init_memory_mapping: [mem 0x0aa0-0x0abf]
[0.00] init_memory_mapping: [mem 0x0800-0x0a9f]
[0.00] init_memory_mapping: [mem 0x0010-0x07ff]
[0.00] init_memory_mapping: [mem 0x0ac0-0x0bd93fff]
[0.00] init_memory_mapping: [mem 0x0bdc1000-0x0d580fff]
[0.00] init_memory_mapping: [mem 0x0d5e5000-0x0dbd]
[0.00] RAMDISK: [mem 0x0ac0e000-0x0b583fff]
[0.00] ACPI: RSDP 0d5e0014 24 (v02 OVMF  )
[0.00] ACPI: XSDT 0d5df0e8 3C (v01 OVMF   OVMFEDK2 20130221 
 0113)
[0.00] ACPI: FACP 0d5de000 F4 (v03 OVMF   OVMFEDK2 20130221 
OVMF 0099)
[0.00] ACPI: DSDT 0d5dc000 000D57 (v01 INTEL  OVMF 0004 
INTL 20120913)
[0.00] ACPI: FACS 0d5e4000 40
[0.00] ACPI: APIC 0d5dd000 78 (v01 OVMF   OVMFEDK2 20130221 
OVMF 0099)
[0.00] ACPI: SSDT 0d5db000 57 (v01 REDHAT OVMF 0001 
INTL 20120913)
[0.00] crashkernel reservation failed - No suitable area found.
[0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00
[0.00] kvm-clock: cpu 0, msr 0:d401001, boot clock
[65465.267798] Zone ranges:
[65465.268914]   DMA  [mem 0x1000-0x00ff]
[65465.271107]   DMA32[mem 0x0100-0x]
[65465.273348]   Normal   empty
[65465.274683] Movable zone start for each node
[65465.276646] Early memory node ranges
[65465.278321]   node   0: [mem 0x1000-0x0009]
[65465.280572]   node   0: [mem 0x0010-0x0bd93fff]
[65465.282825]   node   0: [mem 0x0bdc1000-0x0d580fff]
[65465.285084]   node   0: [mem 0x0d5e5000-0x0dbd]
[65465.289251] ACPI: PM-Timer IO Port: 0xb008
[65465.291105] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[65465.293766] 

Re: [PATCH 13/15] sched: Use a static_key for sched_clock_stable

2014-01-22 Thread Dave Young
On 01/22/14 at 10:08pm, Peter Zijlstra wrote:
> > 
> > I think its the right region to look through. My current suspect is the
> > linear continuity fit with the initial 'random' multiplier.
> > 
> > That initial 'random' multiplier can get us quite high, and we'll fit
> > the function to match that but continue at a sane rate.
> > 
> > I'll try and prod a little more later this evening as time permits.
> 
> Does this cure things?

Peter, the odd timstamp still happens with this patch for me.

> 
> ---
>  arch/x86/kernel/tsc.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index a3acbac2ee72..bb04148c5fe0 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -237,7 +237,7 @@ static inline unsigned long long cycles_2_ns(unsigned 
> long long cyc)
>  /* XXX surely we already have this someplace in the kernel?! */
>  #define DIV_ROUND(n, d) (((n) + ((d) / 2)) / (d))
>  
> -static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
> +static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu, bool origin)
>  {
>   unsigned long long tsc_now, ns_now;
>   struct cyc2ns_data *data;
> @@ -252,7 +252,10 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int 
> cpu)
>   data = cyc2ns_write_begin(cpu);
>  
>   rdtscll(tsc_now);
> - ns_now = cycles_2_ns(tsc_now);
> + if (origin)
> + ns_now = 0;
> + else
> + ns_now = cycles_2_ns(tsc_now);
>  
>   /*
>* Compute a new multiplier as per the above comment and ensure our
> @@ -926,7 +929,7 @@ static int time_cpufreq_notifier(struct notifier_block 
> *nb, unsigned long val,
>   mark_tsc_unstable("cpufreq changes");
>   }
>  
> - set_cyc2ns_scale(tsc_khz, freq->cpu);
> + set_cyc2ns_scale(tsc_khz, freq->cpu, false);
>  
>   return 0;
>  }
> @@ -1199,7 +1202,7 @@ void __init tsc_init(void)
>*/
>   for_each_possible_cpu(cpu) {
>   cyc2ns_init(cpu);
> - set_cyc2ns_scale(cpu_khz, cpu);
> + set_cyc2ns_scale(cpu_khz, cpu, true);
>   }
>  
>   if (tsc_disabled > 0)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] f2fs updates for v3.14

2014-01-22 Thread Jaegeuk Kim
Hi Linus,

This is a pull request on f2fs updates for v3.14.

In this round, a couple of sysfs entries were introduced to tune the
f2fs at runtime.
In addition, f2fs starts to support inline_data and improves the
read/write performance in some workloads by refactoring bio-related
flows.
This patch-set also includes a number of clean-ups and several bug
fixes.

Thank you very much.

The following changes since commit
413541dd66d51f791a0b169d9b9014e4f56be13c:

  Linux 3.13-rc5 (2013-12-22 13:08:32 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git
tags/for-f2fs-3.14

for you to fetch changes up to bf39c00a9a7f3cdb5ce7d6695d9f044daf8f0b53:

  f2fs: drop obsolete node page when it is truncated (2014-01-23
08:04:21 +0900)


f2fs updates for v3.14

This patch-set includes the following major enhancement patches.
o support inline_data
o refactor bio operations such as merge operations and rw type
assignment
o enhance the direct IO path
o enhance bio operations
o truncate a node page when it becomes obsolete
o add sysfs entries: small_discards, max_victim_search, and
in-place-update
o add a sysfs entry to control max_victim_search

The other bug fixes are as follows.
o fix a bug in truncate_partial_nodes
o avoid warnings during sparse and build process
o fix error handling flows
o fix potential bit overflows

And, there are a bunch of cleanups.


Changman Lee (7):
  f2fs: introduce __find_rev_next(_zero)_bit
  f2fs: improve searching speed of __next_free_blkoff
  f2fs: simplify IS_DATASEG and IS_NODESEG macro
  f2fs: send REQ_META or REQ_PRIO when reading meta area
  f2fs: missing kmem_cache_destroy for discard_entry
  f2fs: add delimiter to seperate name and value in debug phrase
  f2fs: missing REQ_META and REQ_PRIO when
sync_meta_pages(META_FLUSH)

Chao Yu (20):
  f2fs: use f2fs_put_page to release page for uniform style
  f2fs: add a new function to support for merging contiguous read
  f2fs: adds a tracepoint for submit_read_page
  f2fs: adds a tracepoint for f2fs_submit_read_bio
  f2fs: read contiguous sit entry pages by merging for mount
performance
  f2fs: remove unneeded code in punch_hole
  f2fs: avoid to calculate incorrect max orphan number
  f2fs: correct type of wait in struct bio_private
  f2fs: use true and false for boolean variable
  f2fs: check return value of f2fs_readpage in find_data_page
  f2fs: convert recover_orphan_inodes to void
  f2fs: readahead contiguous pages for restore_node_summary
  f2fs: use inner macro GFP_F2FS_ZERO for simplification
  f2fs: avoid unneeded page release for correct _count of page
  f2fs: add unlikely() macro for compiler optimization
  f2fs: update several comments
  f2fs: avoid to set wrong pino of inode when rename dir
  f2fs: check filename length in recover_dentry
  f2fs: avoid to left uninitialized data in page when read inline
data
  f2fs: avoid to read inline data except first page

Chris Fries (1):
  f2fs: clean checkpatch warnings

Fan Li (1):
  f2fs: merge pages with the same sync_mode flag

Gu Zheng (14):
  f2fs: convert remove_inode_page to void
  f2fs: convert dev_valid_block_count to void
  f2fs: convert inc/dec_valid_node_count to inc/dec one count
  f2fs: simplify write_orphan_inodes for better readable
  f2fs: move the list_head initialization into the lock protection
region
  f2fs: fix a potential out of range issue
  f2fs: move all the bio initialization into __bio_alloc
  f2fs: remove the rw_flag domain from f2fs_io_info
  f2fs: convert max_orphans to a field of f2fs_sb_info
  f2fs: move grabing orphan pages out of protection region
  f2fs: move alloc new orphan node out of lock protection region
  f2fs: use spinlock rather than mutex for better speed
  f2fs: add help function META_MAPPING
  f2fs: remove the orphan block page array

Huajun Li (6):
  f2fs: add a new function: f2fs_reserve_block()
  f2fs: add flags and helpers to support inline data
  f2fs: add a new mount option: inline_data
  f2fs: key functions to handle inline data
  f2fs: handle inline data operations
  f2fs: update f2fs Documentation

Jaegeuk Kim (42):
  f2fs: add a slab cache entry for small discards
  f2fs: add key functions for small discards
  f2fs: add a sysfs entry to control max_discards
  f2fs: introduce f2fs_issue_discard() to clean up
  f2fs: add a tracepoint for f2fs_issue_discard
  f2fs: clean up the do_submit_bio flow
  f2fs: use sbi->write_mutex for write bios
  f2fs: disable the extent cache ops on high fragmented files
  f2fs: introduce a bio array for per-page write bios
  f2fs: merge read IOs at ra_nat_pages()
  f2fs: avoid lock 

Re: [PATCH] clk: export __clk_get_hw for re-use in others

2014-01-22 Thread SeongJae Park
On Thu, Jan 23, 2014 at 3:11 AM, Mike Turquette  wrote:
> On Wed, Jan 22, 2014 at 9:59 AM, Stephen Boyd  wrote:
>> On 01/21/14 21:23, SeongJae Park wrote:
>>> On Wed, Jan 22, 2014 at 1:59 PM, Greg KH  wrote:
 On Wed, Jan 22, 2014 at 12:05:57PM +0900, SeongJae Park wrote:
> Dear Greg, Mike,
>
> May I ask your answer or other opinion, please?
 It's the middle of the merge window, it's not time for new development,
 or much time for free-time for me, sorry.  Feel free to fix it the best
 way you know how.
>>> Oops, I've forgot about the merge window. Thank you very much for your
>>> kind answer.
>>> Sorry if I bothered you while you're in busy time.
>>> Because the build problem is not a big deal because it exists only in
>>> -next tree,
>>> I will wait until merge window be closed and then fix it again if it
>>> still exist.
>>>
>>
>> I've already sent a patch that exports this and other clock provider
>> functions. Please use this one:
>>
>> https://patchwork.kernel.org/patch/3507921/
>
> I'm going to take Stephen's patch into a fixes branch and send it as
> part of a pull request. Maybe -rc1 or -rc2 at the latest.

Got it. Thank you for let me know :)

>
> Thanks all.
>
> Regards,
> Mike
>
>>
>> --
>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>> hosted by The Linux Foundation
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression on next-20140116 [Was: [PATCH 3/3 v4] usb: chipidea: hw_phymode_configure moved before ci_usb_phy_init]

2014-01-22 Thread Peter Chen
On Wed, Jan 22, 2014 at 10:41:33PM +0100, Uwe Kleine-König wrote:
> Hello,
> 
> On Wed, Jan 22, 2014 at 10:49:51AM +0100, Uwe Kleine-König wrote:
> > On Tue, Dec 03, 2013 at 04:01:50PM +0800, Chris Ruehl wrote:
> > > usb: chipidea: hw_phymode_configure moved before ci_usb_phy_init
> > > hw_phymode_configure configures the PORTSC registers and allow the
> > > following phy_inits to operate on the right parameters. This fix a problem
> > > where the UPLI (ISP1504) could not detected, because the Viewport was not
> > > available and read the viewport return 0's only.
> > This patch (or a later revision of it to be more exact) made it into
> > mainline as cd0b42c2a6d2.
> > 
> > On an i.MX27 based machine I'm hitting an oops (see below) on
> > next-20140116 + a few patches. (I didn't switch to 3.13+ yet, as I think
> > not everything I need has landed there.) The oops goes away (and still
> > better, lsusb reports my connected devices instead of "unable to
> > initialize libusb: -99") when I do at least one of the following:
> > 
> >  - set CONFIG_USB_CHIPIDEA=y instead of =m
> >  - revert commit
> >   cd0b42c2a6d2 (usb: chipidea: put hw_phymode_configure before 
> > ci_usb_phy_init)
> I debugged that a bit further and the problem is that
> hw_phymode_configure depends on the phy's clk being enabled (i.e.
> usb_ipg_gate) and this is only enforced in ci_usb_phy_init (via
> usb_phy_init -> usb_gen_phy_init). When CONFIG_USB_CHIPIDEA=y the init
> call to disable all unused clocks wasn't run yet and so the clock is
> still on as this is the boot default.

Hi Uwe,
I am a little puzzled at your platform

- Which phy you have used? ulpi phy ,internal phy or other external phy?
- If you use ulpi phy, why you still need to use nop phy driver?
 Besides, according to chris patch, the ulpi can only be visited after
hw_phymode_configure?
- Do you have some hardware related operation at phy's probe? If it exists,
why not move it to phy->init?

Peter

> 
> Considering that it's already late today and that I don't know the
> chipidea driver I'm sure there are people who can come up with a better
> patch with less effort than me. Any volunteers?
> 
> Best regards
> Uwe
> 
> -- 
> Pengutronix e.K.   | Uwe Kleine-König|
> Industrial Linux Solutions | http://www.pengutronix.de/  |
> 
> 

-- 

Best Regards,
Peter Chen

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 13/15] sched: Use a static_key for sched_clock_stable

2014-01-22 Thread Dave Young
On 01/23/14 at 09:53am, Dave Young wrote:
> On 01/22/14 at 10:08pm, Peter Zijlstra wrote:
> > > 
> > > I think its the right region to look through. My current suspect is the
> > > linear continuity fit with the initial 'random' multiplier.
> > > 
> > > That initial 'random' multiplier can get us quite high, and we'll fit
> > > the function to match that but continue at a sane rate.
> > > 
> > > I'll try and prod a little more later this evening as time permits.
> > 
> > Does this cure things?
> 
> Peter, the odd timstamp still happens with this patch for me.

Hmm, seems the my physical machine is booting fine with this patch. kvm
guest problem still exist, but that kvm thing might be other problem.

> 
> > 
> > ---
> >  arch/x86/kernel/tsc.c | 11 +++
> >  1 file changed, 7 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index a3acbac2ee72..bb04148c5fe0 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -237,7 +237,7 @@ static inline unsigned long long cycles_2_ns(unsigned 
> > long long cyc)
> >  /* XXX surely we already have this someplace in the kernel?! */
> >  #define DIV_ROUND(n, d) (((n) + ((d) / 2)) / (d))
> >  
> > -static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
> > +static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu, bool origin)
> >  {
> > unsigned long long tsc_now, ns_now;
> > struct cyc2ns_data *data;
> > @@ -252,7 +252,10 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, 
> > int cpu)
> > data = cyc2ns_write_begin(cpu);
> >  
> > rdtscll(tsc_now);
> > -   ns_now = cycles_2_ns(tsc_now);
> > +   if (origin)
> > +   ns_now = 0;
> > +   else
> > +   ns_now = cycles_2_ns(tsc_now);
> >  
> > /*
> >  * Compute a new multiplier as per the above comment and ensure our
> > @@ -926,7 +929,7 @@ static int time_cpufreq_notifier(struct notifier_block 
> > *nb, unsigned long val,
> > mark_tsc_unstable("cpufreq changes");
> > }
> >  
> > -   set_cyc2ns_scale(tsc_khz, freq->cpu);
> > +   set_cyc2ns_scale(tsc_khz, freq->cpu, false);
> >  
> > return 0;
> >  }
> > @@ -1199,7 +1202,7 @@ void __init tsc_init(void)
> >  */
> > for_each_possible_cpu(cpu) {
> > cyc2ns_init(cpu);
> > -   set_cyc2ns_scale(cpu_khz, cpu);
> > +   set_cyc2ns_scale(cpu_khz, cpu, true);
> > }
> >  
> > if (tsc_disabled > 0)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: BUG: Bad rss-counter state

2014-01-22 Thread Sasha Levin

On 01/22/2014 08:39 PM, David Rientjes wrote:

On Wed, 22 Jan 2014, Sasha Levin wrote:


Hi all,

While fuzzing with trinity running inside a KVM tools guest using latest -next
kernel,
I've stumbled on a "mm: BUG: Bad rss-counter state" error which was pretty
non-obvious
in the mix of the kernel spew (why?).



It's not a fatal condition and there's only a few possible stack traces
that could be emitted during the exit() path.  I don't see how we could
make it more visible other than its log-level which is already KERN_ALERT.


Would it make sense to add a VM_BUG_ON() to make it more obvious when we have
CONFIG_VM_DEBUG enabled? Many of the VM_BUG_ON test cases are non-fatal either,
and it would make it easier spotting this issue.


I've added a small BUG() after the printk() in check_mm(), and here's the full
output:



Worst place to add it :)  At line 562 of kernel/fork.c in linux-next
you're going to hit BUG() when there may be other counters that are also
bad and they don't get printed.


I gave the condition before curly braces :)

if (unlikely(x)) {
printk(KERN_ALERT "BUG: Bad rss-counter state "
  "mm:%p idx:%d val:%ld\n", mm, i, x);
BUG();
}


[  318.334905] BUG: Bad rss-counter state mm:8801e6dec000 idx:0 val:1


So our mm has a non-zero MM_FILEPAGES count, but there's nothing that was
cited that would tell us what that is so there's not much to go on, unless
someone already recognizes this as another issue.  Is this reproducible on
3.13 or only on linux-next?


Yup, I see it in v3.13 too, which is odd.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: BUG: Bad rss-counter state

2014-01-22 Thread Sasha Levin

On 01/22/2014 08:52 PM, Dave Jones wrote:

Sasha, is this the current git tree version of Trinity ?
(I'm wondering if yesterdays munmap changes might be tickling this bug).


Ah yes, my tree has the munmap patch from yesterday, which would explain why we
started seeing this issue just now.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] Git v1.9-rc0

2014-01-22 Thread Jeff King
On Wed, Jan 22, 2014 at 08:30:30PM +, Ken Moffat wrote:

>  Two questions: Does regenerating (e.g. if the tarball has dropped
> out of the cache) change its sums (md5sum or similar) ?  In (beyond)
> linuxfromscratch we use md5sums to verify that a tarball has not
> changed.

The tarballs we auto-generate from tags are cached, but they can
change if the cached version expires _and_ the archive-generation code
changes.

We use "git archive" to generate the tarballs themselves, and then gzip
the with "gzip -n". So it should be consistent from run to run. However,
very occasionally there are bugfixes in "git archive" which can affect
the output. E.g., commit 22f0dcd (archive-tar: split long paths more
carefully, 2013-01-05) changes the representation of certain long paths,
and generating a tarball with and without it will result in different
checksums (for some repos).

So if you are planning on baking md5sums into a package-build system, it
is much better to point at "official" releases which are rolled once by
the project maintainer, rather than the automatic tag page.

Junio, since you prepare such tarballs[1] anyway for kernel.org, it
might be worth uploading them to the "Releases" page of git/git.  I
imagine there is a programmatic way to do so via GitHub's API, but I
don't know offhand. I can look into it if you are interested.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] ACPI: Fix acpi_evaluate_object() return value check

2014-01-22 Thread Yijing Wang
On 2014/1/23 5:37, Bjorn Helgaas wrote:
> On Mon, Jan 20, 2014 at 7:46 PM, Yijing Wang  wrote:
>> Since acpi_evaluate_object() returns acpi_status and not plain int,
>> ACPI_FAILURE() should be used for checking its return value.
>>
>> Reviewed-by: Jani Nikula 
>> Signed-off-by: Yijing Wang 
>> ---
>> v3->v4: Fix spell error, add Jani Nikula reviewed-by.
>> v2->v3: Fix compile error pointed out by Hanjun.
>> v1->v2: Add CC to related subsystem MAINTAINERS
>> ---
>>  drivers/gpu/drm/i915/intel_acpi.c  |   24 
>> ++--
>>  drivers/gpu/drm/nouveau/core/subdev/mxm/base.c |9 +
>>  drivers/gpu/drm/nouveau/nouveau_acpi.c |   23 
>> +--
>>  drivers/pci/pci-label.c|9 ++---
> 
> For the drivers/pci/pci-label.c part,
> 
> Acked-by: Bjorn Helgaas 

Thanks.

> 
>> +   status = acpi_evaluate_object(handle, "_DSM", , );
>> +   if (ACPI_FAILURE(status)) {
>> +   DRM_DEBUG_DRIVER("failed to evaluate _DSM: %s\n",
>> +   acpi_format_exception(status));
> 
> It's too bad there isn't an easy way to produce more informative error
> messages, e.g., by including a namespace path or something.  A message
> like:
> 
> failed to evaluate _DSM: A requested entity is not found
> 
> is only useful if there's enough context to figure out what's going on.

Yes, I will add the namespace path into the debug info, thanks!

> 
> Bjorn
> 
> .
> 


-- 
Thanks!
Yijing

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] ARM: kexec: copying code to ioremapped area

2014-01-22 Thread Wang Nan
On 2014/1/22 21:27, Russell King - ARM Linux wrote:
> On Wed, Jan 22, 2014 at 07:25:15PM +0800, Wang Nan wrote:
>> ARM's kdump is actually corrupted (at least for omap4460), mainly because of
>> cache problem: flush_icache_range can't reliably ensure the copied data
>> correctly goes into RAM.
> 
> Quite right too.  You're mistake here is thinking that flush_icache_range()
> should push it to RAM.  That's incorrect.
> 
> flush_icache_range() is there to deal with such things as loadable modules
> and self modifying code, where the MMU is not being turned off.  Hence, it
> only flushes to the point of coherency between the I and D caches, and
> any further levels of cache between that point and memory are not touched.
> Why should it touch any more levels - it's not the function's purpose.
> 
>> After mmu turned off and jump to the trampoline, kexec always failed due
>> to random undef instructions.
> 
> We already have code in the kernel which deals with shutting the MMU off.
> An instance of how this can be done is illustrated in the soft_restart()
> code path, and kexec already uses this.
> 
> One of the first things soft_restart() does is turn off the outer cache -
> which OMAP4 does have, but this can only be done if there is a single CPU
> running.  If there's multiple CPUs running, then the outer cache can't be
> disabled, and that's the most likely cause of the problem you're seeing.
> 

You are right, commit b25f3e1c (OMAP4/highbank: Flush L2 cache before disabling)
solves my problem, it flushes outer cache before disabling. I have tested it in
UP and SMP situations and it works (actually, omap4 has not ready to support 
kexec
in SMP case, I insert an empty cpu_kill() to make it work), so the first 2
patches are unneeded.

What about the 3rd one (ARM: allow kernel to be loaded in middle of phymem)?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: BUG: Bad rss-counter state

2014-01-22 Thread Dave Jones
On Wed, Jan 22, 2014 at 09:16:03PM -0500, Sasha Levin wrote:
 > On 01/22/2014 08:52 PM, Dave Jones wrote:
 > > Sasha, is this the current git tree version of Trinity ?
 > > (I'm wondering if yesterdays munmap changes might be tickling this bug).
 > 
 > Ah yes, my tree has the munmap patch from yesterday, which would explain why 
 > we
 > started seeing this issue just now.

So that change is basically allowing trinity to munmap just part of a prior 
mmap.
So it may do things like..

mmap   |--|

munmap |XXX---|

munmap |--XXX-|

ie, it might try unmapping some pages more than once, and may even overlap 
prior munmaps.

until yesterdays change, it would only munmap the entire mmap.

There's no easy way to tell exactly what happened without a trinity log of 
course.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SUNRPC: Allow one callback request to be received from two sk_buff

2014-01-22 Thread shaobingqing
2014/1/21 Trond Myklebust :
>
> On Jan 21, 2014, at 3:08, shaobingqing  wrote:
>
>> 2014/1/21 Trond Myklebust :
>>> On Mon, 2014-01-20 at 14:59 +0800, shaobingqing wrote:
 In current code, there only one struct rpc_rqst is prealloced. If one
 callback request is received from two sk_buff, the xprt_alloc_bc_request
 would be execute two times with the same transport->xid. The first time
 xprt_alloc_bc_request will alloc one struct rpc_rqst and the 
 TCP_RCV_COPY_DATA
 bit of transport->tcp_flags will not be cleared. The second time
 xprt_alloc_bc_request could not alloc struct rpc_rqst any more and NULL
 pointer will be returned, then xprt_force_disconnect occur. I think one
 callback request can be allowed to be received from two sk_buff.

 Signed-off-by: shaobingqing 
 ---
 net/sunrpc/xprtsock.c |   11 +--
 1 files changed, 9 insertions(+), 2 deletions(-)

 diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
 index ee03d35..606950d 100644
 --- a/net/sunrpc/xprtsock.c
 +++ b/net/sunrpc/xprtsock.c
 @@ -1271,8 +1271,13 @@ static inline int xs_tcp_read_callback(struct 
 rpc_xprt *xprt,
  struct sock_xprt *transport =
  container_of(xprt, struct sock_xprt, xprt);
  struct rpc_rqst *req;
 + static struct rpc_rqst *req_partial;
 +
 + if (req_partial == NULL)
 + req = xprt_alloc_bc_request(xprt);
 + else if (req_partial->rq_xid == transport->tcp_xid)
 + req = req_partial;
>>>
>>> What happens here if req_partial->rq_xid != transport->tcp_xid? AFAICS,
>>> req will be undefined. Either way, you cannot use a static variable for
>>> storage here: that isn't re-entrant.
>>
>> Because metadata sever only have one slot for backchannel request,
>> req_partial->rq_xid == transport->tcp_xid always happens, if the callback
>> request just being splited in two sk_buffs. But req_partial->rq_xid !=
>> transport->tcp_xid may also happens in some special cases, such as
>> retransmission occurs?
>
> If the server retransmits, then it is broken. The NFSv4.1 protocol does not 
> allow it to retransmit unless the connection breaks.

What I am saying above is bogus. As far as I can see, If one callback
request is splitted into two sk_buffs, the function
xs_tcp_read_callback will be called two times with the same rpc_xprt
and the same xid. If between the two calls there is
another call with the same rpc_xprt, but different xid, we consider it
is another callback request from the same server, in
the condition that there is no retransmission in our enviorenment. But
this might not happen because there is only one
callback slot in each server.

>
>> If one callback request is splited in two sk_buffs, xs_tcp_read_callback
>> will be execute two times. The req_partial should be a static variable,
>> because  the second execution of xs_tcp_read_callback should use
>> the rpc_rqst allocated for the first execution, which saves information
>> copies from the first sk_buff.
>
> No! This is a multi-threaded/process environment which can support multiple 
> connection. It is a bug to use a static variable.

I think I have misunderstood the question. Here a static variable can
not be used. Perhaps, we should define a variable for each
rpc_client  (or rpc xprt).

>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Internal error: Oops: 17 [#1] ARM

2014-01-22 Thread Fabio Estevam
On Wed, Jan 22, 2014 at 9:49 PM, John Tobias  wrote:
> Hello all,
>
> Just to confirm that the error I posted previously exist in 3.13
> released. Just be noted that some patches related to eMMC/sdhci has
> been applied in order to boot the 3.13 on my board.
> Addition to that, I was getting additional errors (please see below):
> - It happened during the reboot.
>
> Cc'ng Dong Aisheng.

What are the steps to reproduce this? Which SoC are you using?

Regards,

Fabio Estevam
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -tip 4/8] perf-probe: Use the actual address instead of the symbol name

2014-01-22 Thread Masami Hiramatsu
Since several local symbols can have same name (e.g. t_show),
we need to use the actual address instead of symbol name for
those points. Note that this works only with debuginfo.

E.g. without this change;

# ./perf probe -a t_show \$vars
Added new events:
  probe:t_show (on t_show with $vars)
  probe:t_show_1   (on t_show with $vars)
  probe:t_show_2   (on t_show with $vars)
  probe:t_show_3   (on t_show with $vars)

You can now use it in all perf tools, such as:

perf record -e probe:t_show_3 -aR sleep 1

OK, we have 4 different t_show()s. All functions have
different arguments as below;

# cat /sys/kernel/debug/tracing/kprobe_events
p:probe/t_show t_show m=%di:u64 v=%si:u64
p:probe/t_show_1 t_show m=%di:u64 v=%si:u64 t=%si:u64
p:probe/t_show_2 t_show m=%di:u64 v=%si:u64 fmt=%si:u64
p:probe/t_show_3 t_show m=%di:u64 v=%si:u64 file=%si:u64

However, all of them have been put on the *same* address.

# cat /sys/kernel/debug/kprobes/list
810d9720  k  t_show+0x0[DISABLED]
810d9720  k  t_show+0x0[DISABLED]
810d9720  k  t_show+0x0[DISABLED]
810d9720  k  t_show+0x0[DISABLED]


With this change;

# ./perf probe -a t_show \$vars
Added new events:
  probe:t_show (on t_show with $vars)
  probe:t_show_1   (on t_show with $vars)
  probe:t_show_2   (on t_show with $vars)
  probe:t_show_3   (on t_show with $vars)

You can now use it in all perf tools, such as:

perf record -e probe:t_show_3 -aR sleep 1

# cat /sys/kernel/debug/tracing/kprobe_events
p:probe/t_show 0x810d9720 m=%di:u64 v=%si:u64
p:probe/t_show_1 0x810e2e40 m=%di:u64 v=%si:u64 t=%si:u64
p:probe/t_show_2 0x810ece30 m=%di:u64 v=%si:u64 fmt=%si:u64
p:probe/t_show_3 0x810f4ad0 m=%di:u64 v=%si:u64 file=%si:u64

# cat /sys/kernel/debug/kprobes/list
810e2e40  k  t_show+0x0[DISABLED]
810ece30  k  t_show+0x0[DISABLED]
810f4ad0  k  t_show+0x0[DISABLED]
810d9720  k  t_show+0x0[DISABLED]

This time, each event is put in different address
correctly.

Note that currently this doesn't support address-based
probe on modules (thus the probes on modules are symbol
based), since it requires relative address probe syntax
for kprobe-tracer, and it doesn't implemented yet.

One more note, this allows us to put events on correct
address, but --list option should be updated to show
correct corresponding source code.

Signed-off-by: Masami Hiramatsu 
---
 tools/perf/util/probe-event.c |   23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 2fb4486..92ab688 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1529,20 +1529,27 @@ char *synthesize_probe_trace_command(struct 
probe_trace_event *tev)
if (buf == NULL)
return NULL;
 
-   if (tev->uprobes)
-   len = e_snprintf(buf, MAX_CMDLEN, "%c:%s/%s %s:%s",
-tp->retprobe ? 'r' : 'p',
-tev->group, tev->event,
+   len = e_snprintf(buf, MAX_CMDLEN, "%c:%s/%s ", tp->retprobe ? 'r' : 'p',
+tev->group, tev->event);
+   if (len <= 0)
+   goto error;
+
+   /* Use the real address, except for kernel modules */
+   if (tp->address && !(tp->module && !tev->uprobes))
+   ret = e_snprintf(buf + len, MAX_CMDLEN, "%s%s0x%lx",
+tp->module ?: "", tp->module ? ":" : "",
+tp->address);
+   else if (tev->uprobes)
+   ret = e_snprintf(buf + len, MAX_CMDLEN, "%s:%s",
 tp->module, tp->symbol);
else
-   len = e_snprintf(buf, MAX_CMDLEN, "%c:%s/%s %s%s%s+%lu",
-tp->retprobe ? 'r' : 'p',
-tev->group, tev->event,
+   ret = e_snprintf(buf + len, MAX_CMDLEN, "%s%s%s+%lu",
 tp->module ?: "", tp->module ? ":" : "",
 tp->symbol, tp->offset);
 
-   if (len <= 0)
+   if (ret <= 0)
goto error;
+   len += ret;
 
for (i = 0; i < tev->nargs; i++) {
ret = synthesize_probe_trace_arg(>args[i], buf + len,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -tip 6/8] perf-probe: Show symbol+offset for address only kprobes

2014-01-22 Thread Masami Hiramatsu
Show the symbol+offset information for address only kprobe
events when --list operation without debuginfo. Currently
those events are shown by the address itself. With this change
perf probe finds symbols on those addresses and shows it.

E.g. without this change (when debuginfo is not available);
# ./perf probe -l
  probe:t_show (on 0x810d9720 with m v)
  probe:t_show_1   (on 0x810e2e40 with m v t)
  probe:t_show_2   (on 0x810ece30 with m v fmt)
  probe:t_show_3   (on 0x810f4ad0 with m v file)

With this change;
# ./perf probe -l
  probe:t_show (on t_show with m v)
  probe:t_show_1   (on t_show with m v t)
  probe:t_show_2   (on t_show with m v fmt)
  probe:t_show_3   (on t_show with m v file)

Signed-off-by: Masami Hiramatsu 
---
 tools/perf/util/probe-event.c |   35 +++
 1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 3470934..bf1d73b 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -118,6 +118,7 @@ static void exit_symbol_maps(void)
symbol__exit();
 }
 
+/* Caller must call init_symbol_maps before use this */
 static struct symbol *__find_kernel_function_by_name(const char *name,
 struct map **mapp)
 {
@@ -125,6 +126,12 @@ static struct symbol *__find_kernel_function_by_name(const 
char *name,
 NULL);
 }
 
+/* Caller must call init_symbol_maps before use this */
+static struct symbol *__find_kernel_function(u64 addr, struct map **mapp)
+{
+   return machine__find_kernel_function(host_machine, addr, mapp, NULL);
+}
+
 static struct map *kernel_get_module_map(const char *module)
 {
struct rb_node *nd;
@@ -222,17 +229,29 @@ static int convert_to_perf_probe_point(struct 
probe_trace_point *tp,
 {
char buf[128];
int ret;
-
-   if (tp->symbol) {
+   struct symbol *sym;
+   struct map *map;
+   u64 addr;
+
+   if (!tp->symbol) {
+   sym = __find_kernel_function(tp->address, );
+   if (sym) {
+   pp->function = strdup(sym->name);
+   addr = map->unmap_ip(map, sym->start);
+   pp->offset = tp->address - addr;
+   } else {
+   ret = e_snprintf(buf, 128, "0x%" PRIx64,
+(u64)tp->address);
+   if (ret < 0)
+   return ret;
+   pp->function = strdup(buf);
+   pp->offset = 0;
+   }
+   } else {
pp->function = strdup(tp->symbol);
pp->offset = tp->offset;
-   } else {
-   ret = e_snprintf(buf, 128, "0x%" PRIx64, (u64)tp->address);
-   if (ret < 0)
-   return ret;
-   pp->function = strdup(buf);
-   pp->offset = 0;
}
+
if (pp->function == NULL)
return -ENOMEM;
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -tip 1/8] [BUGFIX] perf-probe: Fix to do exit call for symbol maps

2014-01-22 Thread Masami Hiramatsu
Some perf-probe commands do symbol_init() but doesn't
do exit call. This fixes that to call symbol_exit()
and relase machine if needed.
This also merges init_vmlinux() and init_user_exec()
because both of them are doing similar things.
(init_user_exec() just skips init vmlinux related
 symbol maps)

Signed-off-by: Masami Hiramatsu 
---
 tools/perf/util/probe-event.c |  110 +++--
 1 file changed, 61 insertions(+), 49 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index a8a9b6c..14c649df 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -73,31 +73,35 @@ static char *synthesize_perf_probe_point(struct 
perf_probe_point *pp);
 static int convert_name_to_addr(struct perf_probe_event *pev,
const char *exec);
 static void clear_probe_trace_event(struct probe_trace_event *tev);
-static struct machine machine;
+static struct machine *host_machine;
 
 /* Initialize symbol maps and path of vmlinux/modules */
-static int init_vmlinux(void)
+static int init_symbol_maps(bool user_only)
 {
int ret;
 
symbol_conf.sort_by_name = true;
-   if (symbol_conf.vmlinux_name == NULL)
-   symbol_conf.try_vmlinux_path = true;
-   else
-   pr_debug("Use vmlinux: %s\n", symbol_conf.vmlinux_name);
+   if (user_only)
+   symbol_conf.try_vmlinux_path = false;
+   else {
+   if (symbol_conf.vmlinux_name == NULL)
+   symbol_conf.try_vmlinux_path = true;
+   else
+   pr_debug("Use vmlinux: %s\n", symbol_conf.vmlinux_name);
+   }
ret = symbol__init();
if (ret < 0) {
pr_debug("Failed to init symbol map.\n");
goto out;
}
 
-   ret = machine__init(, "", HOST_KERNEL_ID);
-   if (ret < 0)
-   goto out;
-
-   if (machine__create_kernel_maps() < 0) {
-   pr_debug("machine__create_kernel_maps() failed.\n");
-   goto out;
+   if (host_machine || user_only)  /* already initialized */
+   return 0;
+   host_machine = machine__new_host();
+   if (!host_machine) {
+   pr_debug("machine__new_host() failed.\n");
+   symbol__exit();
+   ret = -1;
}
 out:
if (ret < 0)
@@ -105,21 +109,30 @@ out:
return ret;
 }
 
+static void exit_symbol_maps(void)
+{
+   if (host_machine) {
+   machine__delete(host_machine);
+   host_machine = NULL;
+   }
+   symbol__exit();
+}
+
 static struct symbol *__find_kernel_function_by_name(const char *name,
 struct map **mapp)
 {
-   return machine__find_kernel_function_by_name(, name, mapp,
+   return machine__find_kernel_function_by_name(host_machine, name, mapp,
 NULL);
 }
 
 static struct map *kernel_get_module_map(const char *module)
 {
struct rb_node *nd;
-   struct map_groups *grp = 
+   struct map_groups *grp = _machine->kmaps;
 
/* A file path -- this is an offline module */
if (module && strchr(module, '/'))
-   return machine__new_module(, 0, module);
+   return machine__new_module(host_machine, 0, module);
 
if (!module)
module = "kernel";
@@ -141,7 +154,7 @@ static struct dso *kernel_get_module_dso(const char *module)
const char *vmlinux_name;
 
if (module) {
-   list_for_each_entry(dso, _dsos, node) {
+   list_for_each_entry(dso, _machine->kernel_dsos, node) {
if (strncmp(dso->short_name + 1, module,
dso->short_name_len - 2) == 0)
goto found;
@@ -150,7 +163,7 @@ static struct dso *kernel_get_module_dso(const char *module)
return NULL;
}
 
-   map = machine.vmlinux_maps[MAP__FUNCTION];
+   map = host_machine->vmlinux_maps[MAP__FUNCTION];
dso = map->dso;
 
vmlinux_name = symbol_conf.vmlinux_name;
@@ -173,20 +186,6 @@ const char *kernel_get_module_path(const char *module)
return (dso) ? dso->long_name : NULL;
 }
 
-static int init_user_exec(void)
-{
-   int ret = 0;
-
-   symbol_conf.try_vmlinux_path = false;
-   symbol_conf.sort_by_name = true;
-   ret = symbol__init();
-
-   if (ret < 0)
-   pr_debug("Failed to init symbol map.\n");
-
-   return ret;
-}
-
 static int convert_exec_to_group(const char *exec, char **result)
 {
char *ptr1, *ptr2, *exec_copy;
@@ -563,7 +562,7 @@ static int _show_one_line(FILE *fp, int l, bool skip, bool 
show_num)
  * Show line-range always requires debuginfo to find source file and
  * line number.
  */
-int show_line_range(struct line_range *lr, const char *module)
+static int 

[PATCH -tip 7/8] perf-probe: Show source-level or symbol-level info for uprobes

2014-01-22 Thread Masami Hiramatsu
Show source-level or symbol-level information for uprobe events.

Without this change;
# ./perf probe -l
  probe_perf:dso__load_vmlinux (on 0x0006d110 in 
/kbuild/ksrc/linux-3/tools/perf/perf)

With this change;
# ./perf probe -l
  probe_perf:dso__load_vmlinux (on dso__load_vmlinux@util/symbol.c in 
/kbuild/ksrc/linux-3/tools/perf/perf)

Signed-off-by: Masami Hiramatsu 
---
 tools/perf/util/probe-event.c |  149 -
 1 file changed, 88 insertions(+), 61 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index bf1d73b..84c1807 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -224,42 +224,6 @@ out:
return ret;
 }
 
-static int convert_to_perf_probe_point(struct probe_trace_point *tp,
-   struct perf_probe_point *pp)
-{
-   char buf[128];
-   int ret;
-   struct symbol *sym;
-   struct map *map;
-   u64 addr;
-
-   if (!tp->symbol) {
-   sym = __find_kernel_function(tp->address, );
-   if (sym) {
-   pp->function = strdup(sym->name);
-   addr = map->unmap_ip(map, sym->start);
-   pp->offset = tp->address - addr;
-   } else {
-   ret = e_snprintf(buf, 128, "0x%" PRIx64,
-(u64)tp->address);
-   if (ret < 0)
-   return ret;
-   pp->function = strdup(buf);
-   pp->offset = 0;
-   }
-   } else {
-   pp->function = strdup(tp->symbol);
-   pp->offset = tp->offset;
-   }
-
-   if (pp->function == NULL)
-   return -ENOMEM;
-
-   pp->retprobe = tp->retprobe;
-
-   return 0;
-}
-
 #ifdef HAVE_DWARF_SUPPORT
 /* Open new debuginfo of given module */
 static struct debuginfo *open_debuginfo(const char *module)
@@ -285,8 +249,9 @@ static struct debuginfo *open_debuginfo(const char *module)
  * Convert trace point to probe point with debuginfo
  * Currently only handles kprobes.
  */
-static int kprobe_convert_to_perf_probe(struct probe_trace_point *tp,
-   struct perf_probe_point *pp)
+static int find_perf_probe_point_from_dwarf(struct probe_trace_point *tp,
+   struct perf_probe_point *pp,
+   bool is_kprobe)
 {
struct symbol *sym;
struct map *map;
@@ -306,7 +271,11 @@ static int kprobe_convert_to_perf_probe(struct 
probe_trace_point *tp,
pr_debug("try to find information at %" PRIx64 " in %s\n", addr,
 tp->module ? : "kernel");
 
-   dinfo = debuginfo__new_online_kernel(addr);
+   if (is_kprobe)
+   dinfo = debuginfo__new_online_kernel(addr);
+   else
+   dinfo = open_debuginfo(tp->module);
+
if (dinfo) {
ret = debuginfo__find_probe_point(dinfo,
 (unsigned long)addr, pp);
@@ -319,9 +288,8 @@ static int kprobe_convert_to_perf_probe(struct 
probe_trace_point *tp,
 
if (ret <= 0) {
 error:
-   pr_debug("Failed to find corresponding probes from "
-"debuginfo. Use kprobe event information.\n");
-   return convert_to_perf_probe_point(tp, pp);
+   pr_debug("Failed to find corresponding probes from 
debuginfo.\n");
+   return ret ? : -ENOENT;
}
pp->retprobe = tp->retprobe;
 
@@ -776,21 +744,12 @@ out:
 
 #else  /* !HAVE_DWARF_SUPPORT */
 
-static int kprobe_convert_to_perf_probe(struct probe_trace_point *tp,
-   struct perf_probe_point *pp)
+static int
+find_perf_probe_point_from_dwarf(struct probe_trace_point *tp __maybe_unused,
+struct perf_probe_point *pp __maybe_unused,
+bool is_kprobe __maybe_unused)
 {
-   struct symbol *sym;
-
-   if (tp->symbol) {
-   sym = __find_kernel_function_by_name(tp->symbol, NULL);
-   if (!sym) {
-   pr_err("Failed to find symbol %s in kernel.\n",
-   tp->symbol);
-   return -ENOENT;
-   }
-   }
-
-   return convert_to_perf_probe_point(tp, pp);
+   return -ENOSYS;
 }
 
 static int try_to_find_probe_trace_events(struct perf_probe_event *pev,
@@ -1609,6 +1568,78 @@ error:
return NULL;
 }
 
+static int find_perf_probe_point_from_map(struct probe_trace_point *tp,
+ struct perf_probe_point *pp,
+ bool is_kprobe)
+{
+   struct symbol *sym = NULL;
+   struct map *map = NULL;
+   u64 addr;
+   int ret = 0;
+
+   if (is_kprobe)
+   

[PATCH -tip 8/8] perf-probe: Allow to add events on the local functions

2014-01-22 Thread Masami Hiramatsu
Allow to add events on the local functions without debuginfo.
(With the debuginfo, we can add events even on inlined functions)
Currently, probing on local functions requires debuginfo to
locate actual address. It is also possible without debuginfo since
we have symbol maps.

Without this change;

# ./perf probe -a t_show
Added new event:
  probe:t_show (on t_show)

You can now use it in all perf tools, such as:

perf record -e probe:t_show -aR sleep 1

# ./perf probe -x perf -a identity__map_ip
no symbols found in /kbuild/ksrc/linux-3/tools/perf/perf, maybe install a debug 
package?
Failed to load map.
  Error: Failed to add events. (-22)

As the above results, perf probe just put one event
on the first found symbol for kprobe event. Moreover,
for uprobe event, perf probe failed to find local
functions.

With this change;

# ./perf probe -a t_show
Added new events:
  probe:t_show (on t_show)
  probe:t_show_1   (on t_show)
  probe:t_show_2   (on t_show)
  probe:t_show_3   (on t_show)

You can now use it in all perf tools, such as:

perf record -e probe:t_show_3 -aR sleep 1

# ./perf probe -x perf -a identity__map_ip
Added new events:
  probe_perf:identity__map_ip (on identity__map_ip in 
/kbuild/ksrc/linux-3/tools/perf/perf)
  probe_perf:identity__map_ip_1 (on identity__map_ip in 
/kbuild/ksrc/linux-3/tools/perf/perf)
  probe_perf:identity__map_ip_2 (on identity__map_ip in 
/kbuild/ksrc/linux-3/tools/perf/perf)
  probe_perf:identity__map_ip_3 (on identity__map_ip in 
/kbuild/ksrc/linux-3/tools/perf/perf)

You can now use it in all perf tools, such as:

perf record -e probe_perf:identity__map_ip_3 -aR sleep 1

Now we succeed to put events on every given local functions
for both kprobes and uprobes. :)

Note that this also introduces some symbol rbtree
iteration macros; symbols__for_each, dso__for_each_symbol,
and map__for_each_symbol. These are for walking through
the symbol list in a map.

Signed-off-by: Masami Hiramatsu 
---
 tools/perf/util/dso.h |   10 +
 tools/perf/util/map.h |   10 +
 tools/perf/util/probe-event.c |  351 ++---
 tools/perf/util/symbol.h  |   11 +
 4 files changed, 183 insertions(+), 199 deletions(-)

diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index cd7d6f0..ab06f1c 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -102,6 +102,16 @@ struct dso {
char name[0];
 };
 
+/* dso__for_each_symbol - iterate over the symbols of given type
+ *
+ * @dso: the 'struct dso *' in which symbols itereated
+ * @pos: the 'struct symbol *' to use as a loop cursor
+ * @n: the 'struct rb_node *' to use as a temporary storage
+ * @type: the 'enum map_type' type of symbols
+ */
+#define dso__for_each_symbol(dso, pos, n, type)\
+   symbols__for_each_entry(&(dso)->symbols[(type)], pos, n)
+
 static inline void dso__set_loaded(struct dso *dso, enum map_type type)
 {
dso->loaded |= (1 << type);
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index 18068c6..ef18a48 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -89,6 +89,16 @@ u64 map__objdump_2mem(struct map *map, u64 ip);
 
 struct symbol;
 
+/* map__for_each_symbol - iterate over the symbols in the given map
+ *
+ * @map: the 'struct map *' in which symbols itereated
+ * @pos: the 'struct symbol *' to use as a loop cursor
+ * @n: the 'struct rb_node *' to use as a temporary storage
+ * Note: caller must ensure map->dso is not NULL (map is loaded).
+ */
+#define map__for_each_symbol(map, pos, n)  \
+   dso__for_each_symbol(map->dso, pos, n, map->type)
+
 typedef int (*symbol_filter_t)(struct map *map, struct symbol *sym);
 
 void map__init(struct map *map, enum map_type type,
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 84c1807..93087d7 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -70,8 +70,6 @@ static int e_snprintf(char *str, size_t size, const char 
*format, ...)
 }
 
 static char *synthesize_perf_probe_point(struct perf_probe_point *pp);
-static int convert_name_to_addr(struct perf_probe_event *pev,
-   const char *exec);
 static void clear_probe_trace_event(struct probe_trace_event *tev);
 static struct machine *host_machine;
 
@@ -119,14 +117,6 @@ static void exit_symbol_maps(void)
 }
 
 /* Caller must call init_symbol_maps before use this */
-static struct symbol *__find_kernel_function_by_name(const char *name,
-struct map **mapp)
-{
-   return machine__find_kernel_function_by_name(host_machine, name, mapp,
-NULL);
-}
-
-/* Caller must call init_symbol_maps before use this */
 static struct symbol *__find_kernel_function(u64 addr, struct map **mapp)
 {
return machine__find_kernel_function(host_machine, addr, 

[PATCH -tip 2/8] [BUGFIX] perf-tools: Load map before using map->map_ip

2014-01-22 Thread Masami Hiramatsu
In map_groups__find_symbol() map->map_ip is used without
ensuring the map is loaded. Then the address passed
to map->map_ip isn't mapped at the first time.

E.g. below code always fails to get a symbol at the first call;

addr = /* Somewhere in the kernel text */
symbol_conf.try_vmlinux_path = true;
symbol__init();
host_machine = machine__new_host();
sym = machine__find_kernel_function(host_machine,
 addr, NULL, NULL);
/* Note that machine__find_kernel_function calls
   map_groups__find_symbol */

This ensures it by calling map__load before using it in
map_groups__find_symbol().

Signed-off-by: Masami Hiramatsu 
---
 tools/perf/util/map.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 9b9bd71..6a805e7 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -386,7 +386,8 @@ struct symbol *map_groups__find_symbol(struct map_groups 
*mg,
 {
struct map *map = map_groups__find(mg, type, addr);
 
-   if (map != NULL) {
+   /* Ensure map is loaded before using map->map_ip */
+   if (map != NULL && map__load(map, filter) >= 0) {
if (mapp != NULL)
*mapp = map;
return map__find_symbol(map, map->map_ip(map, addr), filter);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -tip 3/8] perf-probe: Show in what binaries/modules probes are set

2014-01-22 Thread Masami Hiramatsu
Show the name of binary file or modules in which the probes
are set with --list option.

Without this change;

# ./perf probe -m drm drm_av_sync_delay
# ./perf probe -x perf dso__load_vmlinux

# ./perf probe -l
  probe:drm_av_sync_delay (on drm_av_sync_delay)
  probe_perf:dso__load_vmlinux (on 0x0006d110)

With this change;

# ./perf probe -l
  probe:drm_av_sync_delay (on drm_av_sync_delay in drm)
  probe_perf:dso__load_vmlinux (on 0x0006d110 in 
/kbuild/ksrc/linux-3/tools/perf/perf)

Signed-off-by: Masami Hiramatsu 
---
 tools/perf/util/probe-event.c |   10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 14c649df..2fb4486 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1742,7 +1742,8 @@ static struct strlist 
*get_probe_trace_command_rawlist(int fd)
 }
 
 /* Show an event */
-static int show_perf_probe_event(struct perf_probe_event *pev)
+static int show_perf_probe_event(struct perf_probe_event *pev,
+const char *module)
 {
int i, ret;
char buf[128];
@@ -1758,6 +1759,8 @@ static int show_perf_probe_event(struct perf_probe_event 
*pev)
return ret;
 
printf("  %-20s (on %s", buf, place);
+   if (module)
+   printf(" in %s", module);
 
if (pev->nargs > 0) {
printf(" with");
@@ -1795,7 +1798,8 @@ static int __show_perf_probe_events(int fd, bool 
is_kprobe)
ret = convert_to_perf_probe_event(, ,
is_kprobe);
if (ret >= 0)
-   ret = show_perf_probe_event();
+   ret = show_perf_probe_event(,
+   tev.point.module);
}
clear_perf_probe_event();
clear_probe_trace_event();
@@ -1994,7 +1998,7 @@ static int __add_probe_trace_events(struct 
perf_probe_event *pev,
group = pev->group;
pev->event = tev->event;
pev->group = tev->group;
-   show_perf_probe_event(pev);
+   show_perf_probe_event(pev, tev->point.module);
/* Trick here - restore current event/group */
pev->event = (char *)event;
pev->group = (char *)group;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -tip 5/8] perf-probe: Show source level information for address only kprobes

2014-01-22 Thread Masami Hiramatsu
Show the source code level information for address only kprobe
events. Currently the perf probe shows such information only
for symbol-based probes. With this change, perf-probe correctly
parses the address-based events and tries to find the actual
lines of code from the debuginfo.

E.g. without this patch;

# ./perf probe -l
  probe:t_show (on 0x810d9720 with m v)
  probe:t_show_1   (on 0x810e2e40 with m v t)
  probe:t_show_2   (on 0x810ece30 with m v fmt)
  probe:t_show_3   (on 0x810f4ad0 with m v file)

With this patch;

# ./perf probe -l
  probe:t_show (on t_show@linux-3/kernel/trace/ftrace.c with m v)
  probe:t_show_1   (on t_show@linux-3/kernel/trace/trace.c with m v t)
  probe:t_show_2   (on t_show@kernel/trace/trace_printk.c with m v fmt)
  probe:t_show_3   (on t_show@kernel/trace/trace_events.c with m v file)

Signed-off-by: Masami Hiramatsu 
---
 tools/perf/util/probe-event.c |   87 ++---
 1 file changed, 56 insertions(+), 31 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 92ab688..3470934 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -153,7 +153,7 @@ static struct dso *kernel_get_module_dso(const char *module)
struct map *map;
const char *vmlinux_name;
 
-   if (module) {
+   if (module && strcmp(module, "kernel") != 0) {
list_for_each_entry(dso, _machine->kernel_dsos, node) {
if (strncmp(dso->short_name + 1, module,
dso->short_name_len - 2) == 0)
@@ -220,12 +220,22 @@ out:
 static int convert_to_perf_probe_point(struct probe_trace_point *tp,
struct perf_probe_point *pp)
 {
-   pp->function = strdup(tp->symbol);
+   char buf[128];
+   int ret;
 
+   if (tp->symbol) {
+   pp->function = strdup(tp->symbol);
+   pp->offset = tp->offset;
+   } else {
+   ret = e_snprintf(buf, 128, "0x%" PRIx64, (u64)tp->address);
+   if (ret < 0)
+   return ret;
+   pp->function = strdup(buf);
+   pp->offset = 0;
+   }
if (pp->function == NULL)
return -ENOMEM;
 
-   pp->offset = tp->offset;
pp->retprobe = tp->retprobe;
 
return 0;
@@ -261,28 +271,35 @@ static int kprobe_convert_to_perf_probe(struct 
probe_trace_point *tp,
 {
struct symbol *sym;
struct map *map;
-   u64 addr;
-   int ret = -ENOENT;
+   u64 addr = tp->address;
+   int ret;
struct debuginfo *dinfo;
 
-   sym = __find_kernel_function_by_name(tp->symbol, );
-   if (sym) {
+   if (!addr) {
+   sym = __find_kernel_function_by_name(tp->symbol, );
+   if (!sym) {
+   ret = -ENOENT;
+   goto error;
+   }
addr = map->unmap_ip(map, sym->start + tp->offset);
-   pr_debug("try to find %s+%ld@%" PRIx64 "\n", tp->symbol,
-tp->offset, addr);
+   }
+
+   pr_debug("try to find information at %" PRIx64 " in %s\n", addr,
+tp->module ? : "kernel");
 
-   dinfo = debuginfo__new_online_kernel(addr);
-   if (dinfo) {
-   ret = debuginfo__find_probe_point(dinfo,
+   dinfo = debuginfo__new_online_kernel(addr);
+   if (dinfo) {
+   ret = debuginfo__find_probe_point(dinfo,
 (unsigned long)addr, pp);
-   debuginfo__delete(dinfo);
-   } else {
-   pr_debug("Failed to open debuginfo at 0x%" PRIx64 "\n",
-addr);
-   ret = -ENOENT;
-   }
+   debuginfo__delete(dinfo);
+   } else {
+   pr_debug("Failed to open debuginfo at 0x%" PRIx64 "\n",
+addr);
+   ret = -ENOENT;
}
+
if (ret <= 0) {
+error:
pr_debug("Failed to find corresponding probes from "
 "debuginfo. Use kprobe event information.\n");
return convert_to_perf_probe_point(tp, pp);
@@ -745,10 +762,13 @@ static int kprobe_convert_to_perf_probe(struct 
probe_trace_point *tp,
 {
struct symbol *sym;
 
-   sym = __find_kernel_function_by_name(tp->symbol, NULL);
-   if (!sym) {
-   pr_err("Failed to find symbol %s in kernel.\n", tp->symbol);
-   return -ENOENT;
+   if (tp->symbol) {
+   sym = __find_kernel_function_by_name(tp->symbol, NULL);
+   if (!sym) {
+   pr_err("Failed to find symbol %s in kernel.\n",
+   tp->symbol);
+   return -ENOENT;
+   }
 

[PATCH -tip 0/8] perf-probe: Updates for handling local functions correctly

2014-01-22 Thread Masami Hiramatsu
Hi,

Here is a series of patches for handling local functions
correctly in perf-probe.

Issue 1)
 Current perf-probe can't handle probe-points for kprobes,
 since it uses symbol-based probe definition. The symbol
 based definition is easy to read and robust for differnt
 kernel and modules. However, when user gives a local
 function name which has several different instances,
 it may put probes on wrong (or unexpected) address.
 On the other hand, since uprobe events are based on the
 actual address, it can avoid this issue.

 E.g.
In the case to probe t_show local functions (which has
4 different instances.

# grep " t_show\$" /proc/kallsyms
810d9720 t t_show
810e2e40 t t_show
810ece30 t t_show
810f4ad0 t t_show
# ./perf probe -f t_show \$vars
Added new events:
  probe:t_show (on t_show with $vars)
  probe:t_show_1   (on t_show with $vars)
  probe:t_show_2   (on t_show with $vars)
  probe:t_show_3   (on t_show with $vars)

You can now use it in all perf tools, such as:

perf record -e probe:t_show_3 -aR sleep 1

OK, we have 4 different t_show()s. All functions have
different arguments as below;

# cat /sys/kernel/debug/tracing/kprobe_events
p:probe/t_show t_show m=%di:u64 v=%si:u64
p:probe/t_show_1 t_show m=%di:u64 v=%si:u64 t=%si:u64
p:probe/t_show_2 t_show m=%di:u64 v=%si:u64 fmt=%si:u64
p:probe/t_show_3 t_show m=%di:u64 v=%si:u64 file=%si:u64

However, all of them have been put on the *same* address.

# cat /sys/kernel/debug/kprobes/list
810d9720  k  t_show+0x0[DISABLED]
810d9720  k  t_show+0x0[DISABLED]
810d9720  k  t_show+0x0[DISABLED]
810d9720  k  t_show+0x0[DISABLED]

 oops...

Issue 2)
 With the debuginfo, issue 1 can be solved by using
 address-based probe definition instead of symbol-based.
 However, without debuginfo, perf-probe can only use
 symbol-map in the binary (or kallsyms). The map provides
 symbol find methods, but it returns only the first matched
 symbol. To put probes on all functions which have given
 symbol, we need a symbol-list iterator for the map.

 E.g. (built perf with NO_DWARF=1)
In the case to probe t_show and identity__map_ip in perf.

# ./perf probe -a t_show
Added new event:
  probe:t_show (on t_show)

You can now use it in all perf tools, such as:

perf record -e probe:t_show -aR sleep 1

# ./perf probe -x perf -a identity__map_ip
no symbols found in /kbuild/ksrc/linux-3/tools/perf/perf, maybe install a debug 
package?
Failed to load map.
  Error: Failed to add events. (-22)

 oops.


Solutions)
To solve the issue 1, this series changes perf probe to
use address-based probe definition. This means that we
also need to fix the --list options to analyze probe
addresses instead of symbols (and that has been done
in this series).

E.g. with this series;

# ./perf probe -f t_show \$vars
Added new events:
  probe:t_show (on t_show with $vars)
  probe:t_show_1   (on t_show with $vars)
  probe:t_show_2   (on t_show with $vars)
  probe:t_show_3   (on t_show with $vars)

You can now use it in all perf tools, such as:

perf record -e probe:t_show_3 -aR sleep 1

# cat /sys/kernel/debug/tracing/kprobe_events
p:probe/t_show 0x810d9720 m=%di:u64 v=%si:u64
p:probe/t_show_1 0x810e2e40 m=%di:u64 v=%si:u64 t=%si:u64
p:probe/t_show_2 0x810ece30 m=%di:u64 v=%si:u64 fmt=%si:u64
p:probe/t_show_3 0x810f4ad0 m=%di:u64 v=%si:u64 file=%si:u64

# cat /sys/kernel/debug/kprobes/list
810e2e40  k  t_show+0x0[DISABLED]
810ece30  k  t_show+0x0[DISABLED]
810f4ad0  k  t_show+0x0[DISABLED]
810d9720  k  t_show+0x0[DISABLED]

This time we can see the events are set in different
addresses.

And for the issue 2, the last patch introduces symbol
iterators for map, dso and symbols (since the symbol
list is the symbols and it is included dso, and perf
probe accesses dso via map).

E.g. with this series (built perf with NO_DWARF=1);

# ./perf probe -a t_show
Added new events:
  probe:t_show (on t_show)
  probe:t_show_1   (on t_show)
  probe:t_show_2   (on t_show)
  probe:t_show_3   (on t_show)

You can now use it in all perf tools, such as:

perf record -e probe:t_show_3 -aR sleep 1

# ./perf probe -x perf -a identity__map_ip
Added new events:
  probe_perf:identity__map_ip (on identity__map_ip in 
/kbuild/ksrc/linux-3/tools/perf/perf)
  probe_perf:identity__map_ip_1 (on identity__map_ip in 
/kbuild/ksrc/linux-3/tools/perf/perf)
  probe_perf:identity__map_ip_2 (on identity__map_ip in 
/kbuild/ksrc/linux-3/tools/perf/perf)
  probe_perf:identity__map_ip_3 (on identity__map_ip in 
/kbuild/ksrc/linux-3/tools/perf/perf)

You can now use it in all perf tools, such as:

perf record -e probe_perf:identity__map_ip_3 -aR sleep 1

Now, even without the debuginfo, both the kprobe and
uprobe are set 4 different 

Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors - going beyond 4096 bytes

2014-01-22 Thread David Lang

On Wed, 22 Jan 2014, Chris Mason wrote:


On Wed, 2014-01-22 at 11:50 -0800, Andrew Morton wrote:

On Wed, 22 Jan 2014 11:30:19 -0800 James Bottomley 
 wrote:


But this, I think, is the fundamental point for debate.  If we can pull
alignment and other tricks to solve 99% of the problem is there a need
for radical VM surgery?  Is there anything coming down the pipe in the
future that may move the devices ahead of the tricks?


I expect it would be relatively simple to get large blocksizes working
on powerpc with 64k PAGE_SIZE.  So before diving in and doing huge
amounts of work, perhaps someone can do a proof-of-concept on powerpc
(or ia64) with 64k blocksize.



Maybe 5 drives in raid5 on MD, with 4K coming from each drive.  Well
aligned 16K IO will work, everything else will about the same as a rmw
from a single drive.


I think this is the key point to think about here. How will these new hard drive 
large block sizes differ from RAID stripes and SSD eraseblocks?


In all of these cases there are very clear advantages to doing the writes in 
properly sized and aligned chunks that correspond with the underlying structure 
to avoid the RMW overhead.


It's extremely unlikely that drive manufacturers will produce drives that won't 
work with any existing OS, so they are going to support smaller writes in 
firmware. If they don't, they won't be able to sell their drives to anyone 
running existing software. Given the Enterprise software upgrade cycle compared 
to the expanding storage needs, whatever they ship will have to work on OS and 
firmware releases that happened several years ago.


I think what is needed is some way to be able to get a report on how man RMW 
cycles have to happen. Then people can work on ways to reduce this number and 
measure the results.


I don't know if md and dm are currently smart enough to realize that the entire 
stripe is being overwritten and avoid the RMW cycle. If they can't, I would 
expect that once we start measuring it, they will gain such support.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] sched: tip/master show soft lockup while running multiple VM

2014-01-22 Thread Michael wang
On 01/22/2014 08:36 PM, Peter Zijlstra wrote:
> On Wed, Jan 22, 2014 at 04:27:45PM +0800, Michael wang wrote:
>> # CONFIG_PREEMPT_NONE is not set
>> CONFIG_PREEMPT_VOLUNTARY=y
>> # CONFIG_PREEMPT is not set
> 
> Could you try the patch here:
> 
>   lkml.kernel.org/r/20140122102435.gh31...@twins.programming.kicks-ass.net
> 
> I suspect its the same issue.

Yup, it works.

Regards,
Michael Wang

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Internal error: Oops: 17 [#1] ARM

2014-01-22 Thread walimis
On Wed, Jan 22, 2014 at 08:23:36AM -0800, John Tobias wrote:
>Hello all,
>
>I am using 3.13-rc1 kernel on iMX6SL processor. My filesystem is in
>eMMC running SDR50.
>Is anyone here encountered these problem and if there's any existing
>patch that I can get?.
hi,

Do you use gcc 4.8.1? If so, maybe you should look at following link
to see whether it's a similar issue.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854

Liming Wang

>
>Regards,
>
>john
>
>[ 1552.394899] Unable to handle kernel NULL pointer dereference at
>virtual address 0037
>[ 1552.403034] pgd = beef4000
>[ 1552.405855] [0037] *pgd=bef60831, *pte=, *ppte=
>[ 1552.412245] Internal error: Oops: 17 [#1] ARM
>[ 1552.416627] Modules linked in: bt8xxx(O) sd8xxx(O) mlan(O)
>[ 1552.422249] CPU: 0 PID: 232 Comm: commsd Tainted: G   O 3.13.0-rc1 
>#7
>[ 1552.429409] task: bfbc7500 ti: bec96000 task.ti: bec96000
>[ 1552.434844] PC is at lookup_fast+0x5c/0x318
>[ 1552.439067] LR is at mark_held_locks+0x78/0x13c
>[ 1552.443622] pc : [<80101184>]lr : [<80056e48>]psr: a00f0013
>[ 1552.443622] sp : bec97d88  ip : 00666e6f  fp : bec97ddc
>[ 1552.455124] r10:   r9 : bec97e08  r8 : 80102d94
>[ 1552.460370] r7 : bec97e60  r6 : bf133ac8  r5 : bec97e60  r4 : bec97e00
>[ 1552.466918] r3 : bee4f01d  r2 :   r1 :   r0 : 
>[ 1552.473471] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
>user
>[ 1552.480629] Control: 10c53c7d  Table: beef4059  DAC: 0015
>[ 1552.486397] Process commsd (pid: 232, stack limit = 0xbec96238)
>[ 1552.492341] Stack: (0xbec97d88 to 0xbec98000)
>[ 1552.496728] 7d80:   80102b94 80057108 bfb95310
>bf133ac8 bf15f4e8 bfb95310
>[ 1552.504936] 7da0: c08bb14d  bee4f015 0008 bfbc7500
>bec97e08  0041
>[ 1552.513142] 7dc0: bec97e60 bec96020 bec96000  bec97e3c
>bec97de0 80102d94 80101134
>[ 1552.521347] 7de0: bec97df8  800d982c bec96018 0010
>bec97e00  bec97e08
>[ 1552.529553] 7e00: 8026e25c 800d97e8 bee4f000 0ff0 80d4e3a4
>0001 bee4f000 bec97e60
>[ 1552.537758] 7e20: ff9c ff9c bec96000  bec97e5c
>bec97e40 801033bc 80102c70
>[ 1552.545964] 7e40: bee4f000 0001 bec97e60 bec97f00 bec97ee4
>bec97e60 80105dc0 80103398
>[ 1552.554170] 7e60: bfb95310 bf133ac8 c08bb14d 000b bee4f015
>8005992c bfb95310 bf133398
>[ 1552.562375] 7e80: bf15f4e8 0041 0002 008a 
> 600f0013 bec96000
>[ 1552.570581] 7ea0: ffea bf8c1840 807b4430 80115444 801156e4
>8011563c 0008 
>[ 1552.578786] 7ec0: bec97f04 733fe4e0 0001 ff9c 757e3810
>bec97f40 bec97efc bec97ee8
>[ 1552.586991] 7ee0: 80105e0c 80105d68  bc950fe0 bec97f2c
>bec97f00 800faf64 80105df4
>[ 1552.595196] 7f00: 801156e4 80115420 bec97f54 733fe4e0 733ff8f0
>733fe61c 00c3 8000f504
>[ 1552.603402] 7f20: bec97f3c bec97f30 800fafe0 800faf1c bec97fa4
>bec97f40 800fb71c 800fafc4
>[ 1552.611608] 7f40: 8000f310 bfbc7500 733fe550 8000f458 733fe61c
>00c3 bec97f84 bec97f68
>[ 1552.619813] 7f60: 80056f28 800a0270 733fe550 733ff8f0 733fe61c
>00c3 bec97f94 bec97f88
>[ 1552.628019] 7f80: 80057110 80056f18  bec97f98 8000f458
>733fe550  bec97fa8
>[ 1552.636226] 7fa0: 8000f280 800fb704 733fe550 733ff8f0 757e3810
>733fe4e0 733fe550 0003
>[ 1552.644431] 7fc0: 733fe550 733ff8f0 733fe61c 00c3 
>0002 733fe92c 0200
>[ 1552.652636] 7fe0: 00c3 733fe4d8 7579b7e5 7572e276 200f0030
>757e3810 bfffd821 bfffdc21
>[ 1552.660828] Backtrace:
>[ 1552.663343] [<80101128>] (lookup_fast+0x0/0x318) from [<80102d94>]
>(path_lookupat+0x130/0x728)
>[ 1552.671994] [<80102c64>] (path_lookupat+0x0/0x728) from
>[<801033bc>] (filename_lookup.isra.40+0x30/0x70)
>[ 1552.681515] [<8010338c>] (filename_lookup.isra.40+0x0/0x70) from
>[<80105dc0>] (user_path_at_empty+0x64/0x8c)
>[ 1552.691361]  r7:bec97f00 r6:bec97e60 r5:0001 r4:bee4f000
>[ 1552.697163] [<80105d5c>] (user_path_at_empty+0x0/0x8c) from
>[<80105e0c>] (user_path_at+0x24/0x2c)
>[ 1552.706053]  r8:bec97f40 r7:757e3810 r6:ff9c r5:0001 r4:733fe4e0
>[ 1552.712927] [<80105de8>] (user_path_at+0x0/0x2c) from [<800faf64>]
>(vfs_fstatat+0x54/0xa8)
>[ 1552.721232] [<800faf10>] (vfs_fstatat+0x0/0xa8) from [<800fafe0>]
>(vfs_stat+0x28/0x2c)
>[ 1552.729167]  r8:8000f504 r7:00c3 r6:733fe61c r5:733ff8f0 r4:733fe4e0
>[ 1552.736031] [<800fafb8>] (vfs_stat+0x0/0x2c) from [<800fb71c>]
>(SyS_stat64+0x24/0x40)
>[ 1552.743902] [<800fb6f8>] (SyS_stat64+0x0/0x40) from [<8000f280>]
>(ret_fast_syscall+0x0/0x48)
>[ 1552.752359]  r4:733fe550
>[ 1552.754946] Code: eb00352d e350 e50b0038 0a80 (e5903038)
>[ 1552.761270] ---[ end trace 02679086a39365e8 ]---
>[ 1552.765968] Kernel panic - not syncing: Fatal exception
>
>___
>linux-arm-kernel mailing list
>linux-arm-ker...@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
To unsubscribe from this list: 

Re: [PATCH v2] mm/zswap: Check all pool pages instead of one pool pages

2014-01-22 Thread Minchan Kim
Hello Cai,

On Thu, Jan 23, 2014 at 09:38:41AM +0800, Cai Liu wrote:
> Hello Dan
> 
> 2014/1/22 Dan Streetman :
> > On Wed, Jan 22, 2014 at 7:16 AM, Cai Liu  wrote:
> >> Hello Minchan
> >>
> >>
> >> 2014/1/22 Minchan Kim 
> >>>
> >>> Hello Cai,
> >>>
> >>> On Tue, Jan 21, 2014 at 09:52:25PM +0800, Cai Liu wrote:
> >>> > Hello Minchan
> >>> >
> >>> > 2014/1/21 Minchan Kim :
> >>> > > Hello,
> >>> > >
> >>> > > On Tue, Jan 21, 2014 at 02:35:07PM +0800, Cai Liu wrote:
> >>> > >> 2014/1/21 Minchan Kim :
> >>> > >> > Please check your MUA and don't break thread.
> >>> > >> >
> >>> > >> > On Tue, Jan 21, 2014 at 11:07:42AM +0800, Cai Liu wrote:
> >>> > >> >> Thanks for your review.
> >>> > >> >>
> >>> > >> >> 2014/1/21 Minchan Kim :
> >>> > >> >> > Hello Cai,
> >>> > >> >> >
> >>> > >> >> > On Mon, Jan 20, 2014 at 03:50:18PM +0800, Cai Liu wrote:
> >>> > >> >> >> zswap can support multiple swapfiles. So we need to check
> >>> > >> >> >> all zbud pool pages in zswap.
> >>> > >> >> >>
> >>> > >> >> >> Version 2:
> >>> > >> >> >>   * add *total_zbud_pages* in zbud to record all the pages in 
> >>> > >> >> >> pools
> >>> > >> >> >>   * move the updating of pool pages statistics to
> >>> > >> >> >> alloc_zbud_page/free_zbud_page to hide the details
> >>> > >> >> >>
> >>> > >> >> >> Signed-off-by: Cai Liu 
> >>> > >> >> >> ---
> >>> > >> >> >>  include/linux/zbud.h |2 +-
> >>> > >> >> >>  mm/zbud.c|   44 
> >>> > >> >> >> 
> >>> > >> >> >>  mm/zswap.c   |4 ++--
> >>> > >> >> >>  3 files changed, 35 insertions(+), 15 deletions(-)
> >>> > >> >> >>
> >>> > >> >> >> diff --git a/include/linux/zbud.h b/include/linux/zbud.h
> >>> > >> >> >> index 2571a5c..1dbc13e 100644
> >>> > >> >> >> --- a/include/linux/zbud.h
> >>> > >> >> >> +++ b/include/linux/zbud.h
> >>> > >> >> >> @@ -17,6 +17,6 @@ void zbud_free(struct zbud_pool *pool, 
> >>> > >> >> >> unsigned long handle);
> >>> > >> >> >>  int zbud_reclaim_page(struct zbud_pool *pool, unsigned int 
> >>> > >> >> >> retries);
> >>> > >> >> >>  void *zbud_map(struct zbud_pool *pool, unsigned long handle);
> >>> > >> >> >>  void zbud_unmap(struct zbud_pool *pool, unsigned long handle);
> >>> > >> >> >> -u64 zbud_get_pool_size(struct zbud_pool *pool);
> >>> > >> >> >> +u64 zbud_get_pool_size(void);
> >>> > >> >> >>
> >>> > >> >> >>  #endif /* _ZBUD_H_ */
> >>> > >> >> >> diff --git a/mm/zbud.c b/mm/zbud.c
> >>> > >> >> >> index 9451361..711aaf4 100644
> >>> > >> >> >> --- a/mm/zbud.c
> >>> > >> >> >> +++ b/mm/zbud.c
> >>> > >> >> >> @@ -52,6 +52,13 @@
> >>> > >> >> >>  #include 
> >>> > >> >> >>  #include 
> >>> > >> >> >>
> >>> > >> >> >> +/*
> >>> > >> >> >> +* statistics
> >>> > >> >> >> +**/
> >>> > >> >> >> +
> >>> > >> >> >> +/* zbud pages in all pools */
> >>> > >> >> >> +static u64 total_zbud_pages;
> >>> > >> >> >> +
> >>> > >> >> >>  /*
> >>> > >> >> >>   * Structures
> >>> > >> >> >>  */
> >>> > >> >> >> @@ -142,10 +149,28 @@ static struct zbud_header 
> >>> > >> >> >> *init_zbud_page(struct page *page)
> >>> > >> >> >>   return zhdr;
> >>> > >> >> >>  }
> >>> > >> >> >>
> >>> > >> >> >> +static struct page *alloc_zbud_page(struct zbud_pool *pool, 
> >>> > >> >> >> gfp_t gfp)
> >>> > >> >> >> +{
> >>> > >> >> >> + struct page *page;
> >>> > >> >> >> +
> >>> > >> >> >> + page = alloc_page(gfp);
> >>> > >> >> >> +
> >>> > >> >> >> + if (page) {
> >>> > >> >> >> + pool->pages_nr++;
> >>> > >> >> >> + total_zbud_pages++;
> >>> > >> >> >
> >>> > >> >> > Who protect race?
> >>> > >> >>
> >>> > >> >> Yes, here the pool->pages_nr and also the total_zbud_pages are 
> >>> > >> >> not protected.
> >>> > >> >> I will re-do it.
> >>> > >> >>
> >>> > >> >> I will change *total_zbud_pages* to atomic type.
> >>> > >> >
> >>> > >> > Wait, it doesn't make sense. Now, you assume zbud allocator would 
> >>> > >> > be used
> >>> > >> > for only zswap. It's true until now but we couldn't make sure it 
> >>> > >> > in future.
> >>> > >> > If other user start to use zbud allocator, total_zbud_pages would 
> >>> > >> > be pointless.
> >>> > >>
> >>> > >> Yes, you are right.  ZBUD is a common module. So in this patch 
> >>> > >> calculate the
> >>> > >> zswap pool size in zbud is not suitable.
> >>> > >>
> >>> > >> >
> >>> > >> > Another concern is that what's your scenario for above two swap?
> >>> > >> > How often we need to call zbud_get_pool_size?
> >>> > >> > In previous your patch, you reduced the number of call so IIRC,
> >>> > >> > we only called it in zswap_is_full and for debugfs.
> >>> > >>
> >>> > >> zbud_get_pool_size() is called frequently when adding/freeing zswap
> >>> > >> entry happen in zswap . This is why in this patch I added a counter 
> >>> > >> in zbud,
> >>> > >> and then in zswap the iteration of zswap_list to calculate the pool 
> >>> > >> size will
> >>> > >> 

Re: kvm virtio ethernet ring on guest side over high throughput (packet per second)

2014-01-22 Thread Jason Wang
On 01/22/2014 11:22 PM, Stefan Hajnoczi wrote:
> On Tue, Jan 21, 2014 at 04:06:05PM -0200, Alejandro Comisario wrote:
>
> CCed Michael Tsirkin and Jason Wang who work on KVM networking.
>
>> Hi guys, we had in the past when using physical servers, several
>> throughput issues regarding the throughput of our APIS, in our case we
>> measure this with packets per seconds, since we dont have that much
>> bandwidth (Mb/s) since our apis respond lots of packets very small
>> ones (maximum response of 3.5k and avg response of 1.5k), when we
>> where using this physical servers, when we reach throughput capacity
>> (due to clients tiemouts) we touched the ethernet ring configuration
>> and we made the problem dissapear.
>>
>> Today with kvm and over 10k virtual instances, when we want to
>> increase the throughput of KVM instances, we bumped with the fact that
>> when using virtio on guests, we have a max configuration of the ring
>> of 256 TX/RX, and from the host side the atached vnet has a txqueuelen
>> of 500.
>>
>> What i want to know is, how can i tune the guest to support more
>> packets per seccond if i know that's my bottleneck?
> I suggest investigating performance in a systematic way.  Set up a
> benchmark that saturates the network.  Post the details of the benchmark
> and the results that you are seeing.
>
> Then, we can discuss how to investigate the root cause of the bottleneck.
>
>> * does virtio exposes more packets to configure in the virtual ethernet's 
>> ring ?
> No, ring size is hardcoded in QEMU (on the host).

Do it make sense to let user can configure it through something at least
like qemu command line?
>
>> * does the use of vhost_net helps me with increasing packets per
>> second and not only bandwidth?
> vhost_net is generally the most performant network option.
>
>> does anyone has to struggle with this before and knows where i can look into 
>> ?
>> there's LOOOTS of information about networking performance
>> tuning of kvm, but nothing related to increase throughput in pps
>> capacity.
>>
>> This is a couple of configurations that we are having right now on the
>> compute nodes:
>>
>> * 2x1Gb bonded interfaces (want to know the more than 20 models we are
>> using, just ask for it)
>> * Multi queue interfaces, pined via irq to different cores

Maybe you can have a try with multiqueue virtio-net with vhost. It can
let guest to use more than one tx/rx virtqueue pairs to do the network
processing.
>> * Linux bridges,  no VLAN, no open-vswitch
>> * ubuntu 12.04 kernel 3.2.0-[40-48]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm: BUG: Bad rss-counter state

2014-01-22 Thread Sasha Levin

On 01/22/2014 09:21 PM, Dave Jones wrote:

On Wed, Jan 22, 2014 at 09:16:03PM -0500, Sasha Levin wrote:
  > On 01/22/2014 08:52 PM, Dave Jones wrote:
  > > Sasha, is this the current git tree version of Trinity ?
  > > (I'm wondering if yesterdays munmap changes might be tickling this bug).
  >
  > Ah yes, my tree has the munmap patch from yesterday, which would explain 
why we
  > started seeing this issue just now.

So that change is basically allowing trinity to munmap just part of a prior 
mmap.
So it may do things like..

mmap   |--|

munmap |XXX---|

munmap |--XXX-|

ie, it might try unmapping some pages more than once, and may even overlap 
prior munmaps.

until yesterdays change, it would only munmap the entire mmap.

There's no easy way to tell exactly what happened without a trinity log of 
course.


I've attached the trinity log of the child that triggered the bug. Odd thing is 
that I
don't see any munmaps in it.


Thanks,
Sasha

[child234:9994] [0] [32BIT] munlock(addr=0x7f724f784000, len=0x40) = -1 (Cannot allocate memory)
[child234:9994] [1] remap_file_pages(start=0x7f724e984000, size=0x406f79, prot=0, pgoff=6, flags=0x1) = 0
[child234:9994] [2] vmsplice(fd=682, iov=0x318d710, nr_segs=404, flags=2) = 0x5000
[child234:9994] [3] mbind(start=0x7f724f384000, len=0x40, mode=1, nmask=0, maxnode=0x8000, flags=0) = 0
[child234:9994] [4] mmap(addr=0, len=0x20, prot=7[PROT_READ|PROT_WRITE|PROT_EXEC], flags=0x43842, fd=682, off=0) = -1 (Invalid argument)
[child234:9994] [5] mprotect(start=0x7f7250384000, len=0x20, prot=0) = 0
[child234:9994] [6] mprotect(start=0x7f7250886000, len=8192, prot=0x205) = -1 (Invalid argument)
[child234:9994] [7] munlock(addr=0x7f7250584000, len=0x10) = 0
[child234:9994] [8] [32BIT] mlock(addr=0x7f7250684000, len=0x10) = -1 (Cannot allocate memory)
[child234:9994] [9] move_pages(pid=0, nr_pages=236, pages=0x3015ed0, nodes=0x3111010, status=0x31909d0, flags=4) = 0
[child234:9994] [10] mlock(addr=0x7f7250384000, len=0x20) = -1 (Cannot allocate memory)
[child234:9994] [11] remap_file_pages(start=0x7f724f784000, size=0x3bbfbd, prot=0, pgoff=19, flags=0) = 0
[child234:9994] [12] msync(start=0x7f724d584000, len=0xa0, flags=3) = 0
[child234:9994] [13] mlock(addr=0x7f7250684000, len=0x10) = 0
[child234:9994] [14] madvise(start=0x7f7250384000, len_in=0x20, advice=0) = 0
[child234:9994] [15] mlock(addr=0x7f7250888000, len=8192) = 0
[child234:9994] [16] mbind(start=0x7f7250584000, len=0x10, mode=0, nmask=0, maxnode=0x8000, flags=0x4000) = -1 (Invalid argument)
[child234:9994] [17] move_pages(pid=9896, nr_pages=124, pages=0x3015ed0, nodes=0x3190d90, status=0x3109500, flags=4) = -1 (Invalid argument)
[child234:9994] [18] mprotect(start=0x7f724df84000, len=0xa0, prot=8) = 0
[child234:9994] [19] move_pages(pid=0, nr_pages=221, pages=0x3015ed0, nodes=0x3109700, status=0x3109a80, flags=6) = 0
[child234:9994] [20] [32BIT] madvise(start=0x7f7250184000, len_in=0x20, advice=14) = -1 (Cannot allocate memory)
[child234:9994] [21] move_pages(pid=0, nr_pages=337, pages=0x3015ed0, nodes=0x318f790, status=0x318fce0, flags=4) = 0
[child234:9994] [22] move_pages(pid=9981, nr_pages=115, pages=0x3015ed0, nodes=0x3109e00, status=0x9db1a0, flags=4) = 0
[child234:9994] [23] migrate_pages(pid=0, maxnode=0x680016c3, old_nodes=0x6ba000[page_0xff], new_nodes=0x8100) = -1 (Invalid argument)
[child234:9994] [24] msync(start=0x7f7250384000, len=0x20, flags=1) = 0
[child234:9994] [25] msync(start=0x7f724fb84000, len=0x40, flags=6) = 0
[child234:9994] [26] mincore(start=0, len=0, vec=0x8100) = -1 (Bad address)
[child234:9994] [27] remap_file_pages(start=0x7f7250184000, size=0x1597ab, prot=0, pgoff=336, flags=0) = 0
[child234:9994] [28] move_pages(pid=0, nr_pages=99, pages=0x3015ed0, nodes=0x3190230, status=0x9db380, flags=0) = 0
[child234:9994] [29] mincore(start=0x7f724df84000, len=0x31978b, vec=0x7f724df84001) = -1 (Bad address)
[child234:9994] [30] move_pages(pid=9962, nr_pages=83, pages=0x3015ed0, nodes=0x31113d0, status=0x9db520, flags=6) = -1 (Invalid argument)
[child234:9994] [31] [32BIT] madvise(start=0x7f7250384000, len_in=0x20, advice=9) = -1 (Cannot allocate memory)
[child234:9994] [32] msync(start=0x7f7250886000, len=8192, flags=6) = 0
[child234:9994] [33] migrate_pages(pid=0, maxnode=0x929292929292, old_nodes=1, new_nodes=0x6c[page_allocs]) = -1 (Invalid argument)
[child234:9994] [34] mlock(addr=0x7f7250888000, len=8192) = 0
[child234:9994] [35] mbind(start=0x7f724fb84000, len=0x40, mode=3, nmask=0x6c[page_allocs], maxnode=0x8000, flags=0) = -1 (Invalid argument)
[child234:9994] [36] vmsplice(fd=681, iov=0x31903d0, nr_segs=68, flags=1) = 4096
[child234:9994] [37] mbind(start=0x7f7250584000, len=0x10, mode=3, nmask=0x3016ee0, maxnode=0x8000, flags=0x8000) = -1 (Invalid argument)
[child234:9994] [38] mmap(addr=0, len=0x4000, 

Re: kvm virtio ethernet ring on guest side over high throughput (packet per second)

2014-01-22 Thread Jason Wang
On 01/23/2014 05:32 AM, Alejandro Comisario wrote:
> Thank you so much Stefan for the help and cc'ing Michael & Jason.
> Like you advised yesterday on IRC, today we are making some tests with
> the application setting TCP_NODELAY in the socket options.
>
> So we will try that and get back to you with further information.
> In the mean time, maybe showing what options the vms are using while running !
>
> # 
> --
> /usr/bin/kvm -S -M pc-1.0 -cpu
> core2duo,+lahf_lm,+rdtscp,+pdpe1gb,+aes,+popcnt,+x2apic,+sse4.2,+sse4.1,+dca,+xtpr,+cx16,+tm2,+est,+vmx,+ds_cpl,+pbe,+tm,+ht,+ss,+acpi,+ds
> -enable-kvm -m 32768 -smp 8,sockets=1,cores=6,threads=2 -name
> instance-0254 -uuid d25b1b20-409e-4d7f-bd92-2ef4073c7c2b
> -nodefconfig -nodefaults -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-0254.monitor,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
> -no-shutdown -kernel /var/lib/nova/instances/instance-0254/kernel
> -initrd /var/lib/nova/instances/instance-0254/ramdisk -append
> root=/dev/vda console=ttyS0 -drive
> file=/var/lib/nova/instances/instance-0254/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=writethrough
> -device 
> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
> -netdev tap,fd=19,id=hostnet0 -device

Better enable vhost as Stefan suggested. It may help a lot here.
> virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:27:d4:6d,bus=pci.0,addr=0x3
> -chardev 
> file,id=charserial0,path=/var/lib/nova/instances/instance-0254/console.log
> -device isa-serial,chardev=charserial0,id=serial0 -chardev
> pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1
> -usb -device usb-tablet,id=input0 -vnc 0.0.0.0:4 -k en-us -vga cirrus
> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
> # 
> --
>
> best regards
>
>
> Alejandro Comisario
> #melicloud CloudBuilders
> Arias 3751, Piso 7 (C1430CRG)
> Ciudad de Buenos Aires - Argentina
> Cel: +549(11) 15-3770-1857
> Tel : +54(11) 4640-8443
>
>
> On Wed, Jan 22, 2014 at 12:22 PM, Stefan Hajnoczi  wrote:
>> On Tue, Jan 21, 2014 at 04:06:05PM -0200, Alejandro Comisario wrote:
>>
>> CCed Michael Tsirkin and Jason Wang who work on KVM networking.
>>
>>> Hi guys, we had in the past when using physical servers, several
>>> throughput issues regarding the throughput of our APIS, in our case we
>>> measure this with packets per seconds, since we dont have that much
>>> bandwidth (Mb/s) since our apis respond lots of packets very small
>>> ones (maximum response of 3.5k and avg response of 1.5k), when we
>>> where using this physical servers, when we reach throughput capacity
>>> (due to clients tiemouts) we touched the ethernet ring configuration
>>> and we made the problem dissapear.
>>>
>>> Today with kvm and over 10k virtual instances, when we want to
>>> increase the throughput of KVM instances, we bumped with the fact that
>>> when using virtio on guests, we have a max configuration of the ring
>>> of 256 TX/RX, and from the host side the atached vnet has a txqueuelen
>>> of 500.
>>>
>>> What i want to know is, how can i tune the guest to support more
>>> packets per seccond if i know that's my bottleneck?
>> I suggest investigating performance in a systematic way.  Set up a
>> benchmark that saturates the network.  Post the details of the benchmark
>> and the results that you are seeing.
>>
>> Then, we can discuss how to investigate the root cause of the bottleneck.
>>
>>> * does virtio exposes more packets to configure in the virtual ethernet's 
>>> ring ?
>> No, ring size is hardcoded in QEMU (on the host).
>>
>>> * does the use of vhost_net helps me with increasing packets per
>>> second and not only bandwidth?
>> vhost_net is generally the most performant network option.
>>
>>> does anyone has to struggle with this before and knows where i can look 
>>> into ?
>>> there's LOOOTS of information about networking performance
>>> tuning of kvm, but nothing related to increase throughput in pps
>>> capacity.
>>>
>>> This is a couple of configurations that we are having right now on the
>>> compute nodes:
>>>
>>> * 2x1Gb bonded interfaces (want to know the more than 20 models we are
>>> using, just ask for it)
>>> * Multi queue interfaces, pined via irq to different cores
>>> * Linux bridges,  no VLAN, no open-vswitch
>>> * ubuntu 12.04 kernel 3.2.0-[40-48]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To 

Re: [V0 PATCH] xen/pvh: set some cr flags upon vcpu start

2014-01-22 Thread Mukesh Rathor
On Mon, 20 Jan 2014 10:09:30 -0500
Konrad Rzeszutek Wilk  wrote:

> On Fri, Jan 17, 2014 at 06:24:55PM -0800, Mukesh Rathor wrote:
> > pvh was designed to start with pv flags, but a commit in xen tree
> 
> Thank you for posting this!
> 
> > 51e2cac257ec8b4080d89f0855c498cbbd76a5e5 removed some of the flags
> > as
> 
> You need to always include the title of said commit.
> 
> > they are not necessary. As a result, these CR flags must be set in
> > the guest.
> 
> I sent out replies to this over the weekend but somehow they are not
> showing up.
> 

Well, they finally showed up today... US mail must be slow :)...


> 
> > +
> > +   if (!cpu)
> > +   return;
> 
> And what happens if don't have this check? Will be bad if do multiple
> cr4 writes?

no, but just confuses the reader/debugger of the code IMO :)... 


> Fyi, this (cr4) should have been a seperate patch. I fixed it up that
> way.
> > +   /*
> > +* Unlike PV, for pvh xen does not set: PSE PGE OSFXSR
> > OSXMMEXCPT
> > +* For BSP, PSE PGE will be set in probe_page_size_mask(),
> > for AP
> > +* set them here. For all, OSFXSR OSXMMEXCPT will be set
> > in fpu_init
> > +*/
> > +   if (cpu_has_pse)
> > +   set_in_cr4(X86_CR4_PSE);
> > +
> > +   if (cpu_has_pge)
> > +   set_in_cr4(X86_CR4_PGE);
> > +}
> 
> Seperate patch and since the PGE part is more complicated that just
> setting the CR4 - you also have to tweak this:
> 
> 1512 /* Prevent unwanted bits from being set in PTEs.
> */ 1513 __supported_pte_mask &=
> ~_PAGE_GLOBAL;  
> 
> I think it should be done once we have actually confirmed that you can
> do 2MB pages within the guest. (might need some more tweaking?)

Umm... well, the above is just setting the PSE and PGE in the APs, the
BSP is already doing that in probe_page_size_mask, and setting 
__supported_pte_mask which needs to be set just once. So, because it's
being set in the BSP, it's already broken/untested if we add expose of PGE
from xen to a linux PVH guest... 

IOW, leaving above is no more harm, or we should 'if (pvh)' the code in 
probe_page_size_mask() for PSE, and wait till we can test it...

thanks
Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Internal error: Oops: 17 [#1] ARM

2014-01-22 Thread John Tobias
Hi Fabio,

Attached are the two patch files that I applied in the 3.13 released
so that the kernel will detect my eMMC in DDR50.
(let me correct my previous email, I was mentioning SDR50 but it
should be DDR50).
eMMC info:

clock:  5200 Hz
actual clock:   4950 Hz
vdd:21 (3.3 ~ 3.4 V)
bus mode:   2 (push-pull)
chip select:0 (don't care)
power mode: 2 (on)
bus width:  3 (8 bits)
timing spec:1 (mmc high-speed)
signal voltage: 0 (3.30 V)

I reboot my device often and it shows during the reboot.

Regards,

john




On Wed, Jan 22, 2014 at 6:28 PM, Fabio Estevam  wrote:
> On Wed, Jan 22, 2014 at 9:49 PM, John Tobias  wrote:
>> Hello all,
>>
>> Just to confirm that the error I posted previously exist in 3.13
>> released. Just be noted that some patches related to eMMC/sdhci has
>> been applied in order to boot the 3.13 on my board.
>> Addition to that, I was getting additional errors (please see below):
>> - It happened during the reboot.
>>
>> Cc'ng Dong Aisheng.
>
> What are the steps to reproduce this? Which SoC are you using?
>
> Regards,
>
> Fabio Estevam


sdhci-esdhc-imx.patch
Description: Binary data


sdhci.patch
Description: Binary data


Re: Internal error: Oops: 17 [#1] ARM

2014-01-22 Thread John Tobias
Hi Liming,

Yes, I am using 4.8.1. I switched back to 4.7.3 and will test it again
if I can re-produce it.

Regards,

john

On Wed, Jan 22, 2014 at 7:01 PM, walimis  wrote:
> On Wed, Jan 22, 2014 at 08:23:36AM -0800, John Tobias wrote:
>>Hello all,
>>
>>I am using 3.13-rc1 kernel on iMX6SL processor. My filesystem is in
>>eMMC running SDR50.
>>Is anyone here encountered these problem and if there's any existing
>>patch that I can get?.
> hi,
>
> Do you use gcc 4.8.1? If so, maybe you should look at following link
> to see whether it's a similar issue.
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854
>
> Liming Wang
>
>>
>>Regards,
>>
>>john
>>
>>[ 1552.394899] Unable to handle kernel NULL pointer dereference at
>>virtual address 0037
>>[ 1552.403034] pgd = beef4000
>>[ 1552.405855] [0037] *pgd=bef60831, *pte=, *ppte=
>>[ 1552.412245] Internal error: Oops: 17 [#1] ARM
>>[ 1552.416627] Modules linked in: bt8xxx(O) sd8xxx(O) mlan(O)
>>[ 1552.422249] CPU: 0 PID: 232 Comm: commsd Tainted: G   O 3.13.0-rc1 
>>#7
>>[ 1552.429409] task: bfbc7500 ti: bec96000 task.ti: bec96000
>>[ 1552.434844] PC is at lookup_fast+0x5c/0x318
>>[ 1552.439067] LR is at mark_held_locks+0x78/0x13c
>>[ 1552.443622] pc : [<80101184>]lr : [<80056e48>]psr: a00f0013
>>[ 1552.443622] sp : bec97d88  ip : 00666e6f  fp : bec97ddc
>>[ 1552.455124] r10:   r9 : bec97e08  r8 : 80102d94
>>[ 1552.460370] r7 : bec97e60  r6 : bf133ac8  r5 : bec97e60  r4 : bec97e00
>>[ 1552.466918] r3 : bee4f01d  r2 :   r1 :   r0 : 
>>[ 1552.473471] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
>>user
>>[ 1552.480629] Control: 10c53c7d  Table: beef4059  DAC: 0015
>>[ 1552.486397] Process commsd (pid: 232, stack limit = 0xbec96238)
>>[ 1552.492341] Stack: (0xbec97d88 to 0xbec98000)
>>[ 1552.496728] 7d80:   80102b94 80057108 bfb95310
>>bf133ac8 bf15f4e8 bfb95310
>>[ 1552.504936] 7da0: c08bb14d  bee4f015 0008 bfbc7500
>>bec97e08  0041
>>[ 1552.513142] 7dc0: bec97e60 bec96020 bec96000  bec97e3c
>>bec97de0 80102d94 80101134
>>[ 1552.521347] 7de0: bec97df8  800d982c bec96018 0010
>>bec97e00  bec97e08
>>[ 1552.529553] 7e00: 8026e25c 800d97e8 bee4f000 0ff0 80d4e3a4
>>0001 bee4f000 bec97e60
>>[ 1552.537758] 7e20: ff9c ff9c bec96000  bec97e5c
>>bec97e40 801033bc 80102c70
>>[ 1552.545964] 7e40: bee4f000 0001 bec97e60 bec97f00 bec97ee4
>>bec97e60 80105dc0 80103398
>>[ 1552.554170] 7e60: bfb95310 bf133ac8 c08bb14d 000b bee4f015
>>8005992c bfb95310 bf133398
>>[ 1552.562375] 7e80: bf15f4e8 0041 0002 008a 
>> 600f0013 bec96000
>>[ 1552.570581] 7ea0: ffea bf8c1840 807b4430 80115444 801156e4
>>8011563c 0008 
>>[ 1552.578786] 7ec0: bec97f04 733fe4e0 0001 ff9c 757e3810
>>bec97f40 bec97efc bec97ee8
>>[ 1552.586991] 7ee0: 80105e0c 80105d68  bc950fe0 bec97f2c
>>bec97f00 800faf64 80105df4
>>[ 1552.595196] 7f00: 801156e4 80115420 bec97f54 733fe4e0 733ff8f0
>>733fe61c 00c3 8000f504
>>[ 1552.603402] 7f20: bec97f3c bec97f30 800fafe0 800faf1c bec97fa4
>>bec97f40 800fb71c 800fafc4
>>[ 1552.611608] 7f40: 8000f310 bfbc7500 733fe550 8000f458 733fe61c
>>00c3 bec97f84 bec97f68
>>[ 1552.619813] 7f60: 80056f28 800a0270 733fe550 733ff8f0 733fe61c
>>00c3 bec97f94 bec97f88
>>[ 1552.628019] 7f80: 80057110 80056f18  bec97f98 8000f458
>>733fe550  bec97fa8
>>[ 1552.636226] 7fa0: 8000f280 800fb704 733fe550 733ff8f0 757e3810
>>733fe4e0 733fe550 0003
>>[ 1552.644431] 7fc0: 733fe550 733ff8f0 733fe61c 00c3 
>>0002 733fe92c 0200
>>[ 1552.652636] 7fe0: 00c3 733fe4d8 7579b7e5 7572e276 200f0030
>>757e3810 bfffd821 bfffdc21
>>[ 1552.660828] Backtrace:
>>[ 1552.663343] [<80101128>] (lookup_fast+0x0/0x318) from [<80102d94>]
>>(path_lookupat+0x130/0x728)
>>[ 1552.671994] [<80102c64>] (path_lookupat+0x0/0x728) from
>>[<801033bc>] (filename_lookup.isra.40+0x30/0x70)
>>[ 1552.681515] [<8010338c>] (filename_lookup.isra.40+0x0/0x70) from
>>[<80105dc0>] (user_path_at_empty+0x64/0x8c)
>>[ 1552.691361]  r7:bec97f00 r6:bec97e60 r5:0001 r4:bee4f000
>>[ 1552.697163] [<80105d5c>] (user_path_at_empty+0x0/0x8c) from
>>[<80105e0c>] (user_path_at+0x24/0x2c)
>>[ 1552.706053]  r8:bec97f40 r7:757e3810 r6:ff9c r5:0001 r4:733fe4e0
>>[ 1552.712927] [<80105de8>] (user_path_at+0x0/0x2c) from [<800faf64>]
>>(vfs_fstatat+0x54/0xa8)
>>[ 1552.721232] [<800faf10>] (vfs_fstatat+0x0/0xa8) from [<800fafe0>]
>>(vfs_stat+0x28/0x2c)
>>[ 1552.729167]  r8:8000f504 r7:00c3 r6:733fe61c r5:733ff8f0 r4:733fe4e0
>>[ 1552.736031] [<800fafb8>] (vfs_stat+0x0/0x2c) from [<800fb71c>]
>>(SyS_stat64+0x24/0x40)
>>[ 1552.743902] [<800fb6f8>] (SyS_stat64+0x0/0x40) from [<8000f280>]
>>(ret_fast_syscall+0x0/0x48)
>>[ 1552.752359]  r4:733fe550
>>[ 1552.754946] Code: eb00352d e350 e50b0038 0a80 (e5903038)
>>[ 1552.761270] ---[ end trace 

Re: [PATCH V5 6/8] time/cpuidle: Support in tick broadcast framework in the absence of external clock device

2014-01-22 Thread Preeti U Murthy
Hi Thomas,

Thank you very much for the review.

On 01/22/2014 06:57 PM, Thomas Gleixner wrote:
> On Wed, 15 Jan 2014, Preeti U Murthy wrote:
>> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
>> index 086ad60..d61404e 100644
>> --- a/kernel/time/clockevents.c
>> +++ b/kernel/time/clockevents.c
>> @@ -524,12 +524,13 @@ void clockevents_resume(void)
>>  #ifdef CONFIG_GENERIC_CLOCKEVENTS
>>  /**
>>   * clockevents_notify - notification about relevant events
>> + * Returns non zero on error.
>>   */
>> -void clockevents_notify(unsigned long reason, void *arg)
>> +int clockevents_notify(unsigned long reason, void *arg)
>>  {
> 
> The interface change of clockevents_notify wants to be a separate
> patch.
> 
>> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
>> index 9532690..1c23912 100644
>> --- a/kernel/time/tick-broadcast.c
>> +++ b/kernel/time/tick-broadcast.c
>> @@ -20,6 +20,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include "tick-internal.h"
>>  
>> @@ -35,6 +36,15 @@ static cpumask_var_t tmpmask;
>>  static DEFINE_RAW_SPINLOCK(tick_broadcast_lock);
>>  static int tick_broadcast_force;
>>  
>> +/*
>> + * Helper variables for handling broadcast in the absence of a
>> + * tick_broadcast_device.
>> + * */
>> +static struct hrtimer *bc_hrtimer;
>> +static int bc_cpu = -1;
>> +static ktime_t bc_next_wakeup;
> 
> Why do you need another variable to store the expiry time? The
> broadcast code already knows it and the hrtimer expiry value gives you
> the same information for free.

The reason was functions like tick_handle_oneshot_broadcast() and
tick_broadcast_switch_to_oneshot() were using the
tick_broadcast_device.evtdev->next_event to set/get the next wakeups.

But since this patchset introduced an explicit hrtimer for archs which
did not have such a device, I wanted these functions to use a generic
parameter to set/get the next wakeups without having to know about the
existence of this hrtimer, if at all. And program the hrtimer/tick
broadcast device whichever was present only when the next event was to
be set. But with your below concept patch, we will not be required to do
this.
> 
>> +static int hrtimer_initialized = 0;
> 
> What's the point of this hrtimer_initialized dance? Why not simply
> making the hrtimer static and avoid that all together. Also adding the
> initialization into tick_broadcast_oneshot_available() is
> braindamaged.  Why not adding this to tick_broadcast_init() which is
> the proper place to do?

Right I agree, this hrtimer initialization should have been in
tick_broadcast_init() and a simple static declaration would have done
the job.
> 
> Aside of that you are making this hrtimer mode unconditional, which
> might break existing systems which are not aware of the hrtimer
> implications.
> 
> What you really want is a pseudo clock event device which has the
> proper functions for handling the timer and you can register it from
> your architecture code. The broadcast core code needs a few tweaks to
> avoid the shutdown of the cpu local clock event device, but aside of
> that the whole thing just falls into place. So architectures can use
> this if they want and are sure that their low level idle code knows
> about the deep idle preventing return value of
> clockevents_notify(). Once that works you can register the hrtimer
> based broadcast device and a real hardware broadcast device with a
> higher rating. It just works.

I now completely see your point. This will surely break on archs which
are not using the return value of the BROADCAST_ENTER notification.

I am not even giving them a choice about using the hrtimer mode of
broadcast framework and am expecting them to take action for the failed
return of BROADCAST_ENTER. I missed that critical point. I went through
the below patch and am able to see how you are solving this problem.
> 
> Find an incomplete and nonfunctional concept patch below. It should be
> simple to make it work for real.

Thank you very much for the valuable review. The below patch makes your
points very clear. Let me try this out.

Regards
Preeti U Murthy
> 
> Thanks,
> 
>   tglx
> 
> Index: linux-2.6/include/linux/clockchips.h
> ===
> --- linux-2.6.orig/include/linux/clockchips.h
> +++ linux-2.6/include/linux/clockchips.h
> @@ -62,6 +62,11 @@ enum clock_event_mode {
>  #define CLOCK_EVT_FEAT_DYNIRQ0x20
>  #define CLOCK_EVT_FEAT_PERCPU0x40
> 
> +/*
> + * Clockevent device is based on a hrtimer for broadcast
> + */
> +#define CLOCK_EVT_FEAT_HRTIMER   0x80
> +
>  /**
>   * struct clock_event_device - clock event device descriptor
>   * @event_handler:   Assigned by the framework to be called by the low
> @@ -83,6 +88,7 @@ enum clock_event_mode {
>   * @name:ptr to clock event name
>   * @rating:  variable to rate clock event devices
>   * @irq: 

[PATCH v5] ACPI: Fix acpi_evaluate_object() return value check

2014-01-22 Thread Yijing Wang
Since acpi_evaluate_object() returns acpi_status and not plain int,
ACPI_FAILURE() should be used for checking its return value. Also
add some detailed debug info when acpi_evaluate_object() failed.

Reviewed-by: Jani Nikula 
Acked-by: Bjorn Helgaas 
Signed-off-by: Yijing Wang 
---
v4->v5: Add some detailed debug info for acpi_evaluate_object() 
failure suggested by Bjorn.
v3->v4: Fix spell error, add Jani Nikula reviewed-by.
v2->v3: Fix compile error pointed out by Hanjun.
v1->v2: Add CC to related subsystem MAINTAINERS
---
 drivers/gpu/drm/i915/intel_acpi.c  |   33 ---
 drivers/gpu/drm/nouveau/core/subdev/mxm/base.c |   13 ++---
 drivers/gpu/drm/nouveau/nouveau_acpi.c |   25 +++---
 drivers/pci/pci-label.c|   10 +--
 4 files changed, 54 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_acpi.c 
b/drivers/gpu/drm/i915/intel_acpi.c
index dfff090..e7b526b 100644
--- a/drivers/gpu/drm/i915/intel_acpi.c
+++ b/drivers/gpu/drm/i915/intel_acpi.c
@@ -31,11 +31,13 @@ static const u8 intel_dsm_guid[] = {
 static int intel_dsm(acpi_handle handle, int func)
 {
struct acpi_buffer output = { ACPI_ALLOCATE_BUFFER, NULL };
+   struct acpi_buffer string = { ACPI_ALLOCATE_BUFFER, NULL };
struct acpi_object_list input;
union acpi_object params[4];
union acpi_object *obj;
u32 result;
-   int ret = 0;
+   acpi_status status;
+   int ret;
 
input.count = 4;
input.pointer = params;
@@ -50,10 +52,14 @@ static int intel_dsm(acpi_handle handle, int func)
params[3].package.count = 0;
params[3].package.elements = NULL;
 
-   ret = acpi_evaluate_object(handle, "_DSM", , );
-   if (ret) {
-   DRM_DEBUG_DRIVER("failed to evaluate _DSM: %d\n", ret);
-   return ret;
+   status = acpi_evaluate_object(handle, "_DSM", , );
+   if (ACPI_FAILURE(status)) {
+   acpi_get_name(handle, ACPI_FULL_PATHNAME, );
+   DRM_DEBUG_DRIVER(
+   "failed to evaluate _DSM for %s, exit status %u\n",
+   (char *)string.pointer, (unsigned int)status);
+   kfree(string.pointer);
+   return -EINVAL;
}
 
obj = (union acpi_object *)output.pointer;
@@ -138,10 +144,12 @@ static char *intel_dsm_mux_type(u8 type)
 static void intel_dsm_platform_mux_info(void)
 {
struct acpi_buffer output = { ACPI_ALLOCATE_BUFFER, NULL };
+   struct acpi_buffer string = { ACPI_ALLOCATE_BUFFER, NULL };
struct acpi_object_list input;
union acpi_object params[4];
union acpi_object *pkg;
-   int i, ret;
+   acpi_status status;
+   int i;
 
input.count = 4;
input.pointer = params;
@@ -156,10 +164,15 @@ static void intel_dsm_platform_mux_info(void)
params[3].package.count = 0;
params[3].package.elements = NULL;
 
-   ret = acpi_evaluate_object(intel_dsm_priv.dhandle, "_DSM", ,
-  );
-   if (ret) {
-   DRM_DEBUG_DRIVER("failed to evaluate _DSM: %d\n", ret);
+   acpi_status = acpi_evaluate_object(intel_dsm_priv.dhandle,
+   "_DSM", , );
+   if (ACPI_FAILURE(status)) {
+   acpi_get_name(intel_dsm_priv.dhandle,
+   ACPI_FULL_PATHNAME, );
+   DRM_DEBUG_DRIVER(
+   "failed to evaluate _DSM for %s, exit status %u\n",
+   (char *)string.pointer, (unsigned int)status);
+   kfree(string.pointer);
goto out;
}
 
diff --git a/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c 
b/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c
index 1291204..c30ee88 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c
@@ -112,17 +112,22 @@ mxm_shadow_dsm(struct nouveau_mxm *mxm, u8 version)
};
struct acpi_object_list list = { ARRAY_SIZE(args), args };
struct acpi_buffer retn = { ACPI_ALLOCATE_BUFFER, NULL };
+   struct acpi_buffer string = { ACPI_ALLOCATE_BUFFER, NULL };
union acpi_object *obj;
acpi_handle handle;
-   int ret;
+   acpi_status status;
 
handle = ACPI_HANDLE(>pdev->dev);
if (!handle)
return false;
 
-   ret = acpi_evaluate_object(handle, "_DSM", , );
-   if (ret) {
-   nv_debug(mxm, "DSM MXMS failed: %d\n", ret);
+   status = acpi_evaluate_object(handle, "_DSM", , );
+   if (ACPI_FAILURE(status)) {
+   acpi_get_name(handle, ACPI_FULL_PATHNAME, );
+   nv_debug(mxm, "DSM MXMS failed for %s: exit status %u\n",
+   (char *)string.pointer,
+   (unsigned int)status);
+   kfree(string.pointer);
return false;
}
 
diff 

Re: Internal error: Oops: 17 [#1] ARM

2014-01-22 Thread walimis
On Wed, Jan 22, 2014 at 07:28:55PM -0800, John Tobias wrote:
>Hi Liming,
>
>Yes, I am using 4.8.1. I switched back to 4.7.3 and will test it again
>if I can re-produce it.

Hi,

Or you can use the latest linaro 4.8.x toolchain, which has been applied that 
patch:

http://releases.linaro.org/13.12/components/toolchain/binaries/

Please select this one to try:

gcc-linaro-arm-linux-gnueabihf-4.8-2013.12_linux.tar.bz2

Liming Wang
>
>Regards,
>
>john
>
>On Wed, Jan 22, 2014 at 7:01 PM, walimis  wrote:
>> On Wed, Jan 22, 2014 at 08:23:36AM -0800, John Tobias wrote:
>>>Hello all,
>>>
>>>I am using 3.13-rc1 kernel on iMX6SL processor. My filesystem is in
>>>eMMC running SDR50.
>>>Is anyone here encountered these problem and if there's any existing
>>>patch that I can get?.
>> hi,
>>
>> Do you use gcc 4.8.1? If so, maybe you should look at following link
>> to see whether it's a similar issue.
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854
>>
>> Liming Wang
>>
>>>
>>>Regards,
>>>
>>>john
>>>
>>>[ 1552.394899] Unable to handle kernel NULL pointer dereference at
>>>virtual address 0037
>>>[ 1552.403034] pgd = beef4000
>>>[ 1552.405855] [0037] *pgd=bef60831, *pte=, *ppte=
>>>[ 1552.412245] Internal error: Oops: 17 [#1] ARM
>>>[ 1552.416627] Modules linked in: bt8xxx(O) sd8xxx(O) mlan(O)
>>>[ 1552.422249] CPU: 0 PID: 232 Comm: commsd Tainted: G   O 
>>>3.13.0-rc1 #7
>>>[ 1552.429409] task: bfbc7500 ti: bec96000 task.ti: bec96000
>>>[ 1552.434844] PC is at lookup_fast+0x5c/0x318
>>>[ 1552.439067] LR is at mark_held_locks+0x78/0x13c
>>>[ 1552.443622] pc : [<80101184>]lr : [<80056e48>]psr: a00f0013
>>>[ 1552.443622] sp : bec97d88  ip : 00666e6f  fp : bec97ddc
>>>[ 1552.455124] r10:   r9 : bec97e08  r8 : 80102d94
>>>[ 1552.460370] r7 : bec97e60  r6 : bf133ac8  r5 : bec97e60  r4 : bec97e00
>>>[ 1552.466918] r3 : bee4f01d  r2 :   r1 :   r0 : 
>>>[ 1552.473471] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
>>>user
>>>[ 1552.480629] Control: 10c53c7d  Table: beef4059  DAC: 0015
>>>[ 1552.486397] Process commsd (pid: 232, stack limit = 0xbec96238)
>>>[ 1552.492341] Stack: (0xbec97d88 to 0xbec98000)
>>>[ 1552.496728] 7d80:   80102b94 80057108 bfb95310
>>>bf133ac8 bf15f4e8 bfb95310
>>>[ 1552.504936] 7da0: c08bb14d  bee4f015 0008 bfbc7500
>>>bec97e08  0041
>>>[ 1552.513142] 7dc0: bec97e60 bec96020 bec96000  bec97e3c
>>>bec97de0 80102d94 80101134
>>>[ 1552.521347] 7de0: bec97df8  800d982c bec96018 0010
>>>bec97e00  bec97e08
>>>[ 1552.529553] 7e00: 8026e25c 800d97e8 bee4f000 0ff0 80d4e3a4
>>>0001 bee4f000 bec97e60
>>>[ 1552.537758] 7e20: ff9c ff9c bec96000  bec97e5c
>>>bec97e40 801033bc 80102c70
>>>[ 1552.545964] 7e40: bee4f000 0001 bec97e60 bec97f00 bec97ee4
>>>bec97e60 80105dc0 80103398
>>>[ 1552.554170] 7e60: bfb95310 bf133ac8 c08bb14d 000b bee4f015
>>>8005992c bfb95310 bf133398
>>>[ 1552.562375] 7e80: bf15f4e8 0041 0002 008a 
>>> 600f0013 bec96000
>>>[ 1552.570581] 7ea0: ffea bf8c1840 807b4430 80115444 801156e4
>>>8011563c 0008 
>>>[ 1552.578786] 7ec0: bec97f04 733fe4e0 0001 ff9c 757e3810
>>>bec97f40 bec97efc bec97ee8
>>>[ 1552.586991] 7ee0: 80105e0c 80105d68  bc950fe0 bec97f2c
>>>bec97f00 800faf64 80105df4
>>>[ 1552.595196] 7f00: 801156e4 80115420 bec97f54 733fe4e0 733ff8f0
>>>733fe61c 00c3 8000f504
>>>[ 1552.603402] 7f20: bec97f3c bec97f30 800fafe0 800faf1c bec97fa4
>>>bec97f40 800fb71c 800fafc4
>>>[ 1552.611608] 7f40: 8000f310 bfbc7500 733fe550 8000f458 733fe61c
>>>00c3 bec97f84 bec97f68
>>>[ 1552.619813] 7f60: 80056f28 800a0270 733fe550 733ff8f0 733fe61c
>>>00c3 bec97f94 bec97f88
>>>[ 1552.628019] 7f80: 80057110 80056f18  bec97f98 8000f458
>>>733fe550  bec97fa8
>>>[ 1552.636226] 7fa0: 8000f280 800fb704 733fe550 733ff8f0 757e3810
>>>733fe4e0 733fe550 0003
>>>[ 1552.644431] 7fc0: 733fe550 733ff8f0 733fe61c 00c3 
>>>0002 733fe92c 0200
>>>[ 1552.652636] 7fe0: 00c3 733fe4d8 7579b7e5 7572e276 200f0030
>>>757e3810 bfffd821 bfffdc21
>>>[ 1552.660828] Backtrace:
>>>[ 1552.663343] [<80101128>] (lookup_fast+0x0/0x318) from [<80102d94>]
>>>(path_lookupat+0x130/0x728)
>>>[ 1552.671994] [<80102c64>] (path_lookupat+0x0/0x728) from
>>>[<801033bc>] (filename_lookup.isra.40+0x30/0x70)
>>>[ 1552.681515] [<8010338c>] (filename_lookup.isra.40+0x0/0x70) from
>>>[<80105dc0>] (user_path_at_empty+0x64/0x8c)
>>>[ 1552.691361]  r7:bec97f00 r6:bec97e60 r5:0001 r4:bee4f000
>>>[ 1552.697163] [<80105d5c>] (user_path_at_empty+0x0/0x8c) from
>>>[<80105e0c>] (user_path_at+0x24/0x2c)
>>>[ 1552.706053]  r8:bec97f40 r7:757e3810 r6:ff9c r5:0001 r4:733fe4e0
>>>[ 1552.712927] [<80105de8>] (user_path_at+0x0/0x2c) from [<800faf64>]
>>>(vfs_fstatat+0x54/0xa8)
>>>[ 1552.721232] [<800faf10>] (vfs_fstatat+0x0/0xa8) from [<800fafe0>]

Re: [PATCH] tracing: Use task_nice() in function __update_max_tr() to get the nice value of task.

2014-01-22 Thread Steven Rostedt
On Wed, 22 Jan 2014 17:41:45 -0500
Dongsheng Yang  wrote:

> There is already a function named task_nice in sched.h to get the nice value
> of task_struct. We can use it in __update_max_tr() rather than calculate it
> manually.
> 
> Signed-off-by: Dongsheng Yang 
> ---
>  kernel/trace/trace.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 9d20cd9..ec149b4 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -970,7 +970,7 @@ __update_max_tr(struct trace_array *tr, struct 
> task_struct *tsk, int cpu)
>   else
>   max_data->uid = task_uid(tsk);
>  
> - max_data->nice = tsk->static_prio - 20 - MAX_RT_PRIO;
> + max_data->nice = task_nice(tsk);

Except that's a function call in a critical path. Switch it to
TASK_NICE(), and I'll take the patch.

Thanks,

-- Steve

>   max_data->policy = tsk->policy;
>   max_data->rt_priority = tsk->rt_priority;
>  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tracing: Use task_nice() in function __update_max_tr() to get the nice value of task.

2014-01-22 Thread Steven Rostedt
On Wed, 22 Jan 2014 22:56:32 -0500
Steven Rostedt  wrote:

> On Wed, 22 Jan 2014 17:41:45 -0500
> Dongsheng Yang  wrote:
> 
> > There is already a function named task_nice in sched.h to get the nice value
> > of task_struct. We can use it in __update_max_tr() rather than calculate it
> > manually.
> > 
> > Signed-off-by: Dongsheng Yang 
> > ---
> >  kernel/trace/trace.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index 9d20cd9..ec149b4 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -970,7 +970,7 @@ __update_max_tr(struct trace_array *tr, struct 
> > task_struct *tsk, int cpu)
> > else
> > max_data->uid = task_uid(tsk);
> >  
> > -   max_data->nice = tsk->static_prio - 20 - MAX_RT_PRIO;
> > +   max_data->nice = task_nice(tsk);
> 
> Except that's a function call in a critical path. Switch it to
> TASK_NICE(), and I'll take the patch.

Bah, I just noticed that TASK_NICE is in kernel/sched/sched.h not
include/linux/sched.h

Peter, is there a reason that task_nice() is not a static inline in
sched.h and have these macros there too? They only reference fields in
task_struct that are already defined there. I don't see why they need
to be private to kernel/sched.

-- Steve

> 
> Thanks,
> 
> -- Steve
> 
> > max_data->policy = tsk->policy;
> > max_data->rt_priority = tsk->rt_priority;
> >  
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tracing: Use task_nice() in function __update_max_tr() to get the nice value of task.

2014-01-22 Thread Dongsheng Yang

On 01/22/2014 11:00 PM, Steven Rostedt wrote:


Bah, I just noticed that TASK_NICE is in kernel/sched/sched.h not
include/linux/sched.h

Peter, is there a reason that task_nice() is not a static inline in
sched.h and have these macros there too? They only reference fields in
task_struct that are already defined there. I don't see why they need
to be private to kernel/sched.


Agree. These macros are useful to other modules out of kernel/sched.
But they are private to kernel/sched currently.

If we move them to include/linux/sched.h, I will use TASK_NICE in this 
patch.


-- Steve


Thanks,

-- Steve


max_data->policy = tsk->policy;
max_data->rt_priority = tsk->rt_priority;
  




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Internal error: Oops: 17 [#1] ARM

2014-01-22 Thread John Tobias
Thanks!

I will try it tomorrow.

Regards,

John

Sent from my iPhone

> On Jan 22, 2014, at 7:46 PM, walimis  wrote:
> 
>> On Wed, Jan 22, 2014 at 07:28:55PM -0800, John Tobias wrote:
>> Hi Liming,
>> 
>> Yes, I am using 4.8.1. I switched back to 4.7.3 and will test it again
>> if I can re-produce it.
> 
> Hi,
> 
> Or you can use the latest linaro 4.8.x toolchain, which has been applied that 
> patch:
> 
> http://releases.linaro.org/13.12/components/toolchain/binaries/
> 
> Please select this one to try:
> 
> gcc-linaro-arm-linux-gnueabihf-4.8-2013.12_linux.tar.bz2
> 
> Liming Wang
>> 
>> Regards,
>> 
>> john
>> 
>>> On Wed, Jan 22, 2014 at 7:01 PM, walimis  wrote:
 On Wed, Jan 22, 2014 at 08:23:36AM -0800, John Tobias wrote:
 Hello all,
 
 I am using 3.13-rc1 kernel on iMX6SL processor. My filesystem is in
 eMMC running SDR50.
 Is anyone here encountered these problem and if there's any existing
 patch that I can get?.
>>> hi,
>>> 
>>> Do you use gcc 4.8.1? If so, maybe you should look at following link
>>> to see whether it's a similar issue.
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854
>>> 
>>> Liming Wang
>>> 
 
 Regards,
 
 john
 
 [ 1552.394899] Unable to handle kernel NULL pointer dereference at
 virtual address 0037
 [ 1552.403034] pgd = beef4000
 [ 1552.405855] [0037] *pgd=bef60831, *pte=, *ppte=
 [ 1552.412245] Internal error: Oops: 17 [#1] ARM
 [ 1552.416627] Modules linked in: bt8xxx(O) sd8xxx(O) mlan(O)
 [ 1552.422249] CPU: 0 PID: 232 Comm: commsd Tainted: G   O 
 3.13.0-rc1 #7
 [ 1552.429409] task: bfbc7500 ti: bec96000 task.ti: bec96000
 [ 1552.434844] PC is at lookup_fast+0x5c/0x318
 [ 1552.439067] LR is at mark_held_locks+0x78/0x13c
 [ 1552.443622] pc : [<80101184>]lr : [<80056e48>]psr: a00f0013
 [ 1552.443622] sp : bec97d88  ip : 00666e6f  fp : bec97ddc
 [ 1552.455124] r10:   r9 : bec97e08  r8 : 80102d94
 [ 1552.460370] r7 : bec97e60  r6 : bf133ac8  r5 : bec97e60  r4 : bec97e00
 [ 1552.466918] r3 : bee4f01d  r2 :   r1 :   r0 : 
 [ 1552.473471] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  
 Segment user
 [ 1552.480629] Control: 10c53c7d  Table: beef4059  DAC: 0015
 [ 1552.486397] Process commsd (pid: 232, stack limit = 0xbec96238)
 [ 1552.492341] Stack: (0xbec97d88 to 0xbec98000)
 [ 1552.496728] 7d80:   80102b94 80057108 bfb95310
 bf133ac8 bf15f4e8 bfb95310
 [ 1552.504936] 7da0: c08bb14d  bee4f015 0008 bfbc7500
 bec97e08  0041
 [ 1552.513142] 7dc0: bec97e60 bec96020 bec96000  bec97e3c
 bec97de0 80102d94 80101134
 [ 1552.521347] 7de0: bec97df8  800d982c bec96018 0010
 bec97e00  bec97e08
 [ 1552.529553] 7e00: 8026e25c 800d97e8 bee4f000 0ff0 80d4e3a4
 0001 bee4f000 bec97e60
 [ 1552.537758] 7e20: ff9c ff9c bec96000  bec97e5c
 bec97e40 801033bc 80102c70
 [ 1552.545964] 7e40: bee4f000 0001 bec97e60 bec97f00 bec97ee4
 bec97e60 80105dc0 80103398
 [ 1552.554170] 7e60: bfb95310 bf133ac8 c08bb14d 000b bee4f015
 8005992c bfb95310 bf133398
 [ 1552.562375] 7e80: bf15f4e8 0041 0002 008a 
  600f0013 bec96000
 [ 1552.570581] 7ea0: ffea bf8c1840 807b4430 80115444 801156e4
 8011563c 0008 
 [ 1552.578786] 7ec0: bec97f04 733fe4e0 0001 ff9c 757e3810
 bec97f40 bec97efc bec97ee8
 [ 1552.586991] 7ee0: 80105e0c 80105d68  bc950fe0 bec97f2c
 bec97f00 800faf64 80105df4
 [ 1552.595196] 7f00: 801156e4 80115420 bec97f54 733fe4e0 733ff8f0
 733fe61c 00c3 8000f504
 [ 1552.603402] 7f20: bec97f3c bec97f30 800fafe0 800faf1c bec97fa4
 bec97f40 800fb71c 800fafc4
 [ 1552.611608] 7f40: 8000f310 bfbc7500 733fe550 8000f458 733fe61c
 00c3 bec97f84 bec97f68
 [ 1552.619813] 7f60: 80056f28 800a0270 733fe550 733ff8f0 733fe61c
 00c3 bec97f94 bec97f88
 [ 1552.628019] 7f80: 80057110 80056f18  bec97f98 8000f458
 733fe550  bec97fa8
 [ 1552.636226] 7fa0: 8000f280 800fb704 733fe550 733ff8f0 757e3810
 733fe4e0 733fe550 0003
 [ 1552.644431] 7fc0: 733fe550 733ff8f0 733fe61c 00c3 
 0002 733fe92c 0200
 [ 1552.652636] 7fe0: 00c3 733fe4d8 7579b7e5 7572e276 200f0030
 757e3810 bfffd821 bfffdc21
 [ 1552.660828] Backtrace:
 [ 1552.663343] [<80101128>] (lookup_fast+0x0/0x318) from [<80102d94>]
 (path_lookupat+0x130/0x728)
 [ 1552.671994] [<80102c64>] (path_lookupat+0x0/0x728) from
 [<801033bc>] (filename_lookup.isra.40+0x30/0x70)
 [ 1552.681515] [<8010338c>] (filename_lookup.isra.40+0x0/0x70) from
 [<80105dc0>] (user_path_at_empty+0x64/0x8c)
 [ 1552.691361]  r7:bec97f00 r6:bec97e60 r5:0001 r4:bee4f000
 [ 1552.697163] 

linux-next: manual merge of the userns tree with the mips tree

2014-01-22 Thread Stephen Rothwell
Hi Eric,

Today's linux-next merge of the userns tree got conflicts in
arch/mips/include/asm/vpe.h and arch/mips/kernel/vpe.c between commits
1a2a6d7e8816 ("MIPS: APRP: Split VPE loader into separate files") and
5792bf643865 ("MIPS: APRP: Code formatting clean-ups") from the mips tree
and commit f58437f1f916 ("MIPS: VPE: Remove vpe_getuid and vpe_getgid")
from the userns tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/mips/include/asm/vpe.h
index e0684f5f0054,0880fe8809b1..
--- a/arch/mips/include/asm/vpe.h
+++ b/arch/mips/include/asm/vpe.h
@@@ -9,88 -18,7 +9,87 @@@
  #ifndef _ASM_VPE_H
  #define _ASM_VPE_H
  
 +#include 
 +#include 
 +#include 
 +#include 
 +
 +#define VPE_MODULE_NAME "vpe"
 +#define VPE_MODULE_MINOR 1
 +
 +/* grab the likely amount of memory we will need. */
 +#ifdef CONFIG_MIPS_VPE_LOADER_TOM
 +#define P_SIZE (2 * 1024 * 1024)
 +#else
 +/* add an overhead to the max kmalloc size for non-striped symbols/etc */
 +#define P_SIZE (256 * 1024)
 +#endif
 +
 +#define MAX_VPES 16
 +#define VPE_PATH_MAX 256
 +
 +static inline int aprp_cpu_index(void)
 +{
 +#ifdef CONFIG_MIPS_CMP
 +  return setup_max_cpus;
 +#else
 +  extern int tclimit;
 +  return tclimit;
 +#endif
 +}
 +
 +enum vpe_state {
 +  VPE_STATE_UNUSED = 0,
 +  VPE_STATE_INUSE,
 +  VPE_STATE_RUNNING
 +};
 +
 +enum tc_state {
 +  TC_STATE_UNUSED = 0,
 +  TC_STATE_INUSE,
 +  TC_STATE_RUNNING,
 +  TC_STATE_DYNAMIC
 +};
 +
 +struct vpe {
 +  enum vpe_state state;
 +
 +  /* (device) minor associated with this vpe */
 +  int minor;
 +
 +  /* elfloader stuff */
 +  void *load_addr;
 +  unsigned long len;
 +  char *pbuffer;
 +  unsigned long plen;
-   unsigned int uid, gid;
 +  char cwd[VPE_PATH_MAX];
 +
 +  unsigned long __start;
 +
 +  /* tc's associated with this vpe */
 +  struct list_head tc;
 +
 +  /* The list of vpe's */
 +  struct list_head list;
 +
 +  /* shared symbol address */
 +  void *shared_ptr;
 +
 +  /* the list of who wants to know when something major happens */
 +  struct list_head notify;
 +
 +  unsigned int ntcs;
 +};
 +
 +struct tc {
 +  enum tc_state state;
 +  int index;
 +
 +  struct vpe *pvpe;   /* parent VPE */
 +  struct list_head tc;/* The list of TC's with this VPE */
 +  struct list_head list;  /* The global list of tc's */
 +};
 +
  struct vpe_notifications {
void (*start)(int vpe);
void (*stop)(int vpe);
@@@ -98,36 -26,10 +97,34 @@@
struct list_head list;
  };
  
 +struct vpe_control {
 +  spinlock_t vpe_list_lock;
 +  struct list_head vpe_list;  /* Virtual processing elements */
 +  spinlock_t tc_list_lock;
 +  struct list_head tc_list;   /* Thread contexts */
 +};
 +
 +extern unsigned long physical_memsize;
 +extern struct vpe_control vpecontrol;
 +extern const struct file_operations vpe_fops;
 +
 +int vpe_notify(int index, struct vpe_notifications *notify);
 +
 +void *vpe_get_shared(int index);
- int vpe_getuid(int index);
- int vpe_getgid(int index);
 +char *vpe_getcwd(int index);
 +
 +struct vpe *get_vpe(int minor);
 +struct tc *get_tc(int index);
 +struct vpe *alloc_vpe(int minor);
 +struct tc *alloc_tc(int index);
 +void release_vpe(struct vpe *v);
  
 -extern int vpe_notify(int index, struct vpe_notifications *notify);
 +void *alloc_progmem(unsigned long len);
 +void release_progmem(void *ptr);
  
 -extern void *vpe_get_shared(int index);
 -extern char *vpe_getcwd(int index);
 +int __weak vpe_run(struct vpe *v);
 +void cleanup_tc(struct tc *tc);
  
 +int __init vpe_module_init(void);
 +void __exit vpe_module_exit(void);
  #endif /* _ASM_VPE_H */
diff --cc arch/mips/kernel/vpe.c
index 42d3ca08bd28,2d5c142bad67..
--- a/arch/mips/kernel/vpe.c
+++ b/arch/mips/kernel/vpe.c
@@@ -899,35 -1262,14 +896,13 @@@ void *vpe_get_shared(int index
  
return v->shared_ptr;
  }
 -
  EXPORT_SYMBOL(vpe_get_shared);
  
- int vpe_getuid(int index)
- {
-   struct vpe *v = get_vpe(index);
- 
-   if (v == NULL)
-   return -1;
- 
-   return v->uid;
- }
- EXPORT_SYMBOL(vpe_getuid);
- 
- int vpe_getgid(int index)
- {
-   struct vpe *v = get_vpe(index);
- 
-   if (v == NULL)
-   return -1;
- 
-   return v->gid;
- }
- EXPORT_SYMBOL(vpe_getgid);
- 
  int vpe_notify(int index, struct vpe_notifications *notify)
  {
 -  struct vpe *v;
 +  struct vpe *v = get_vpe(index);
  
 -  if ((v = get_vpe(index)) == NULL)
 +  if (v == NULL)
return -1;
  
list_add(>list, >notify);


pgpclBr1_jBTi.pgp
Description: PGP signature


Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock

2014-01-22 Thread Rusty Russell
"Srivatsa S. Bhat"  writes:
> On 01/22/2014 02:00 PM, Srivatsa S. Bhat wrote:
>> Hi Paul,

I find an old patch for register_allcpu_notifier(), but the "bool
replay_history" should be eliminated (always true): it's too weird.

Then we should get rid of register_cpu_notifier, or at least hide it.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression on next-20140116 [Was: [PATCH 3/3 v4] usb: chipidea: hw_phymode_configure moved before ci_usb_phy_init]

2014-01-22 Thread Chris Ruehl

On Thursday, January 23, 2014 09:22 AM, Peter Chen wrote:

On Wed, Jan 22, 2014 at 10:41:33PM +0100, Uwe Kleine-König wrote:

Hello,

On Wed, Jan 22, 2014 at 10:49:51AM +0100, Uwe Kleine-König wrote:

On Tue, Dec 03, 2013 at 04:01:50PM +0800, Chris Ruehl wrote:

usb: chipidea: hw_phymode_configure moved before ci_usb_phy_init
hw_phymode_configure configures the PORTSC registers and allow the
following phy_inits to operate on the right parameters. This fix a problem
where the UPLI (ISP1504) could not detected, because the Viewport was not
available and read the viewport return 0's only.

This patch (or a later revision of it to be more exact) made it into
mainline as cd0b42c2a6d2.

On an i.MX27 based machine I'm hitting an oops (see below) on
next-20140116 + a few patches. (I didn't switch to 3.13+ yet, as I think
not everything I need has landed there.) The oops goes away (and still
better, lsusb reports my connected devices instead of "unable to
initialize libusb: -99") when I do at least one of the following:

  - set CONFIG_USB_CHIPIDEA=y instead of =m
  - revert commit
   cd0b42c2a6d2 (usb: chipidea: put hw_phymode_configure before 
ci_usb_phy_init)

I debugged that a bit further and the problem is that
hw_phymode_configure depends on the phy's clk being enabled (i.e.
usb_ipg_gate) and this is only enforced in ci_usb_phy_init (via
usb_phy_init -> usb_gen_phy_init). When CONFIG_USB_CHIPIDEA=y the init
call to disable all unused clocks wasn't run yet and so the clock is
still on as this is the boot default.

Hi Uwe,
I am a little puzzled at your platform

- Which phy you have used? ulpi phy ,internal phy or other external phy?
- If you use ulpi phy, why you still need to use nop phy driver?
  Besides, according to chris patch, the ulpi can only be visited after
hw_phymode_configure?
- Do you have some hardware related operation at phy's probe? If it exists,
why not move it to phy->init?

Peter

Peter,
I think thats my fault, I send Uwe my patches which call the phy-ulpi 
from the nop driver

in order to get the ISP1504 running with my board.

Its obversely wrong to call an other driver from the nop
see:  [PATCH 3/3] usb: phy-generic: Add ULPI VBUS support and the 
concerns from

Heikki (mail-list linux-usb)

Uwe we may work together on this.

Chris


Considering that it's already late today and that I don't know the
chipidea driver I'm sure there are people who can come up with a better
patch with less effort than me. Any volunteers?

Best regards
Uwe

--
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 5/9] mm + fs: prepare for non-page entries in page cache radix trees

2014-01-22 Thread Minchan Kim
Hi Hannes,

On Wed, Jan 22, 2014 at 12:47:44PM -0500, Johannes Weiner wrote:
> On Mon, Jan 13, 2014 at 11:01:32AM +0900, Minchan Kim wrote:
> > On Fri, Jan 10, 2014 at 01:10:39PM -0500, Johannes Weiner wrote:
> > > shmem mappings already contain exceptional entries where swap slot
> > > information is remembered.
> > > 
> > > To be able to store eviction information for regular page cache,
> > > prepare every site dealing with the radix trees directly to handle
> > > entries other than pages.
> > > 
> > > The common lookup functions will filter out non-page entries and
> > > return NULL for page cache holes, just as before.  But provide a raw
> > > version of the API which returns non-page entries as well, and switch
> > > shmem over to use it.
> > > 
> > > Signed-off-by: Johannes Weiner 
> > Reviewed-by: Minchan Kim 
> 
> Thanks, Minchan!
> 
> > > @@ -890,6 +973,73 @@ repeat:
> > >  EXPORT_SYMBOL(find_or_create_page);
> > >  
> > >  /**
> > > + * __find_get_pages - gang pagecache lookup
> > > + * @mapping: The address_space to search
> > > + * @start:   The starting page index
> > > + * @nr_pages:The maximum number of pages
> > > + * @pages:   Where the resulting pages are placed
> > 
> > where is @indices?
> 
> Fixed :)
> 
> > > @@ -894,6 +894,53 @@ EXPORT_SYMBOL(__pagevec_lru_add);
> > >  
> > >  /**
> > >   * pagevec_lookup - gang pagecache lookup
> > 
> >   __pagevec_lookup?
> > 
> > > + * @pvec:Where the resulting entries are placed
> > > + * @mapping: The address_space to search
> > > + * @start:   The starting entry index
> > > + * @nr_pages:The maximum number of entries
> > 
> >   missing @indices?
> > 
> > > + *
> > > + * pagevec_lookup() will search for and return a group of up to
> > > + * @nr_pages pages and shadow entries in the mapping.  All entries are
> > > + * placed in @pvec.  pagevec_lookup() takes a reference against actual
> > > + * pages in @pvec.
> > > + *
> > > + * The search returns a group of mapping-contiguous entries with
> > > + * ascending indexes.  There may be holes in the indices due to
> > > + * not-present entries.
> > > + *
> > > + * pagevec_lookup() returns the number of entries which were found.
> > 
> >   __pagevec_lookup
> 
> Yikes, all three fixed.
> 
> > > @@ -22,6 +22,22 @@
> > >  #include 
> > >  #include "internal.h"
> > >  
> > > +static void clear_exceptional_entry(struct address_space *mapping,
> > > + pgoff_t index, void *entry)
> > > +{
> > > + /* Handled by shmem itself */
> > > + if (shmem_mapping(mapping))
> > > + return;
> > > +
> > > + spin_lock_irq(>tree_lock);
> > > + /*
> > > +  * Regular page slots are stabilized by the page lock even
> > > +  * without the tree itself locked.  These unlocked entries
> > > +  * need verification under the tree lock.
> > > +  */
> > 
> > Could you explain why repeated spin_lock with irq disabled isn't problem
> > in truncation path?
> 
> To modify the cache tree, we have to take the IRQ-safe tree_lock, this
> is no different than removing a page (see truncate_complete_page).

I meant we can do batch irq_[lock|unlock] part with periodic irq release
because clear_exceptional_entry is always called with gang pagecache
lookup.

Just a comment about optimiztation so it shouldn't be critical for merging
and we could do in future if it's really problem for scalability.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 9/9] mm: keep page cache radix tree nodes in check

2014-01-22 Thread Minchan Kim
On Wed, Jan 22, 2014 at 01:42:17PM -0500, Johannes Weiner wrote:
> On Mon, Jan 13, 2014 at 04:39:47PM +0900, Minchan Kim wrote:
> > On Fri, Jan 10, 2014 at 01:10:43PM -0500, Johannes Weiner wrote:
> > > Previously, page cache radix tree nodes were freed after reclaim
> > > emptied out their page pointers.  But now reclaim stores shadow
> > > entries in their place, which are only reclaimed when the inodes
> > > themselves are reclaimed.  This is problematic for bigger files that
> > > are still in use after they have a significant amount of their cache
> > > reclaimed, without any of those pages actually refaulting.  The shadow
> > > entries will just sit there and waste memory.  In the worst case, the
> > > shadow entries will accumulate until the machine runs out of memory.
> > > 
> > > To get this under control, the VM will track radix tree nodes
> > > exclusively containing shadow entries on a per-NUMA node list.
> > > Per-NUMA rather than global because we expect the radix tree nodes
> > > themselves to be allocated node-locally and we want to reduce
> > > cross-node references of otherwise independent cache workloads.  A
> > > simple shrinker will then reclaim these nodes on memory pressure.
> > > 
> > > A few things need to be stored in the radix tree node to implement the
> > > shadow node LRU and allow tree deletions coming from the list:
> > > 
> > > 1. There is no index available that would describe the reverse path
> > >from the node up to the tree root, which is needed to perform a
> > >deletion.  To solve this, encode in each node its offset inside the
> > >parent.  This can be stored in the unused upper bits of the same
> > >member that stores the node's height at no extra space cost.
> > > 
> > > 2. The number of shadow entries needs to be counted in addition to the
> > >regular entries, to quickly detect when the node is ready to go to
> > >the shadow node LRU list.  The current entry count is an unsigned
> > >int but the maximum number of entries is 64, so a shadow counter
> > >can easily be stored in the unused upper bits.
> > > 
> > > 3. Tree modification needs tree lock and tree root, which are located
> > >in the address space, so store an address_space backpointer in the
> > >node.  The parent pointer of the node is in a union with the 2-word
> > >rcu_head, so the backpointer comes at no extra cost as well.
> > > 
> > > 4. The node needs to be linked to an LRU list, which requires a list
> > >head inside the node.  This does increase the size of the node, but
> > >it does not change the number of objects that fit into a slab page.
> > > 
> > > Signed-off-by: Johannes Weiner 
> > > ---
> > >  include/linux/list_lru.h   |   2 +
> > >  include/linux/mmzone.h |   1 +
> > >  include/linux/radix-tree.h |  32 +---
> > >  include/linux/swap.h   |   1 +
> > >  lib/radix-tree.c   |  36 --
> > >  mm/filemap.c   |  77 +++--
> > >  mm/list_lru.c  |   8 +++
> > >  mm/truncate.c  |  20 +++-
> > >  mm/vmstat.c|   1 +
> > >  mm/workingset.c| 121 
> > > +
> > >  10 files changed, 259 insertions(+), 40 deletions(-)
> > > 
> > > diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
> > > index 3ce541753c88..b02fc233eadd 100644
> > > --- a/include/linux/list_lru.h
> > > +++ b/include/linux/list_lru.h
> > > @@ -13,6 +13,8 @@
> > >  /* list_lru_walk_cb has to always return one of those */
> > >  enum lru_status {
> > >   LRU_REMOVED,/* item removed from list */
> > > + LRU_REMOVED_RETRY,  /* item removed, but lock has been
> > > +dropped and reacquired */
> > >   LRU_ROTATE, /* item referenced, give another pass */
> > >   LRU_SKIP,   /* item cannot be locked, skip */
> > >   LRU_RETRY,  /* item not freeable. May drop the lock
> > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > > index 118ba9f51e86..8cac5a7ef7a7 100644
> > > --- a/include/linux/mmzone.h
> > > +++ b/include/linux/mmzone.h
> > > @@ -144,6 +144,7 @@ enum zone_stat_item {
> > >  #endif
> > >   WORKINGSET_REFAULT,
> > >   WORKINGSET_ACTIVATE,
> > > + WORKINGSET_NODERECLAIM,
> > >   NR_ANON_TRANSPARENT_HUGEPAGES,
> > >   NR_FREE_CMA_PAGES,
> > >   NR_VM_ZONE_STAT_ITEMS };
> > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> > > index 13636c40bc42..33170dbd9db4 100644
> > > --- a/include/linux/radix-tree.h
> > > +++ b/include/linux/radix-tree.h
> > > @@ -72,21 +72,37 @@ static inline int radix_tree_is_indirect_ptr(void 
> > > *ptr)
> > >  #define RADIX_TREE_TAG_LONGS \
> > >   ((RADIX_TREE_MAP_SIZE + BITS_PER_LONG - 1) / BITS_PER_LONG)
> > >  
> > > +#define RADIX_TREE_INDEX_BITS  (8 /* CHAR_BIT */ * sizeof(unsigned long))
> > > +#define RADIX_TREE_MAX_PATH 

Re: [PATCH] [media] s5p-mfc: Add Horizontal and Vertical search range for Video Macro Blocks

2014-01-22 Thread swaminathan

Hi All,
Is there any review Comments for the patch "[PATCH] [media] s5p-mfc: Add 
Horizontal and Vertical search range for Video Macro Blocks"

posted on 30-Dec-2013 ?


Regards,
Swaminathan




--
From: "Amit Grover" 
Sent: Monday, December 30, 2013 4:13 PM
To: ; ; 
; ; 
; ; 
; ; ; 

Cc: ; ; 
; ; ; 
; ; "Swami Nathan" 

Subject: [PATCH] [media] s5p-mfc: Add Horizontal and Vertical search range 
for Video Macro Blocks



This patch adds Controls to set Horizontal and Vertical search range
for Motion Estimation block for Samsung MFC video Encoders.

Signed-off-by: Swami Nathan 
Signed-off-by: Amit Grover 
---
Documentation/DocBook/media/v4l/controls.xml|   14 +
drivers/media/platform/s5p-mfc/s5p_mfc_common.h |2 ++
drivers/media/platform/s5p-mfc/s5p_mfc_enc.c|   24 
+++

drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c |8 ++--
drivers/media/v4l2-core/v4l2-ctrls.c|   14 +
include/uapi/linux/v4l2-controls.h  |2 ++
6 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/Documentation/DocBook/media/v4l/controls.xml 
b/Documentation/DocBook/media/v4l/controls.xml

index 7a3b49b..70a0f6f 100644
--- a/Documentation/DocBook/media/v4l/controls.xml
+++ b/Documentation/DocBook/media/v4l/controls.xml
@@ -2258,6 +2258,20 @@ Applicable to the MPEG1, MPEG2, MPEG4 
encoders.

VBV buffer control.
   

+   
+   
+ spanname="id">V4L2_CID_MPEG_VIDEO_HORZ_SEARCH_RANGE

+ integer
+   Sets the Horizontal search 
range for Video Macro blocks.

+   
+
+ 
+   
+ spanname="id">V4L2_CID_MPEG_VIDEO_VERT_SEARCH_RANGE

+ integer
+   Sets the Vertical search range 
for Video Macro blocks.

+   
+
   
   
 spanname="id">V4L2_CID_MPEG_VIDEO_H264_CPB_SIZE
diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc_common.h 
b/drivers/media/platform/s5p-mfc/s5p_mfc_common.h

index 6920b54..f2c13c3 100644
--- a/drivers/media/platform/s5p-mfc/s5p_mfc_common.h
+++ b/drivers/media/platform/s5p-mfc/s5p_mfc_common.h
@@ -430,6 +430,8 @@ struct s5p_mfc_vp8_enc_params {
struct s5p_mfc_enc_params {
 u16 width;
 u16 height;
+ u32 horz_range;
+ u32 vert_range;

 u16 gop_size;
 enum v4l2_mpeg_video_multi_slice_mode slice_mode;
diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc_enc.c 
b/drivers/media/platform/s5p-mfc/s5p_mfc_enc.c

index 4ff3b6c..a02e7b8 100644
--- a/drivers/media/platform/s5p-mfc/s5p_mfc_enc.c
+++ b/drivers/media/platform/s5p-mfc/s5p_mfc_enc.c
@@ -208,6 +208,24 @@ static struct mfc_control controls[] = {
 .default_value = 0,
 },
 {
+ .id = V4L2_CID_MPEG_VIDEO_HORZ_SEARCH_RANGE,
+ .type = V4L2_CTRL_TYPE_INTEGER,
+ .name = "horizontal search range of video macro block",
+ .minimum = 16,
+ .maximum = 128,
+ .step = 16,
+ .default_value = 32,
+ },
+ {
+ .id = V4L2_CID_MPEG_VIDEO_VERT_SEARCH_RANGE,
+ .type = V4L2_CTRL_TYPE_INTEGER,
+ .name = "vertical search range of video macro block",
+ .minimum = 16,
+ .maximum = 128,
+ .step = 16,
+ .default_value = 32,
+ },
+ {
 .id = V4L2_CID_MPEG_VIDEO_H264_CPB_SIZE,
 .type = V4L2_CTRL_TYPE_INTEGER,
 .minimum = 0,
@@ -1377,6 +1395,12 @@ static int s5p_mfc_enc_s_ctrl(struct v4l2_ctrl 
*ctrl)

 case V4L2_CID_MPEG_VIDEO_VBV_SIZE:
 p->vbv_size = ctrl->val;
 break;
+ case V4L2_CID_MPEG_VIDEO_HORZ_SEARCH_RANGE:
+ p->horz_range = ctrl->val;
+ break;
+ case V4L2_CID_MPEG_VIDEO_VERT_SEARCH_RANGE:
+ p->vert_range = ctrl->val;
+ break;
 case V4L2_CID_MPEG_VIDEO_H264_CPB_SIZE:
 p->codec.h264.cpb_size = ctrl->val;
 break;
diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c 
b/drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c

index 461358c..47e1807 100644
--- a/drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c
+++ b/drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c
@@ -727,14 +727,10 @@ static int s5p_mfc_set_enc_params(struct s5p_mfc_ctx 
*ctx)

 WRITEL(reg, S5P_FIMV_E_RC_CONFIG_V6);

 /* setting for MV range [16, 256] */
- reg = 0;
- reg &= ~(0x3FFF);
- reg = 256;
+ reg = (p->horz_range & 0x3fff); /* conditional check in app */
 WRITEL(reg, S5P_FIMV_E_MV_HOR_RANGE_V6);

- reg = 0;
- reg &= ~(0x3FFF);
- reg = 256;
+ reg = (p->vert_range & 0x3fff); /* conditional check in app */
 WRITEL(reg, S5P_FIMV_E_MV_VER_RANGE_V6);

 WRITEL(0x0, S5P_FIMV_E_FRAME_INSERTION_V6);
diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c 
b/drivers/media/v4l2-core/v4l2-ctrls.c

index fb46790..7cf23d5 100644
--- a/drivers/media/v4l2-core/v4l2-ctrls.c
+++ b/drivers/media/v4l2-core/v4l2-ctrls.c
@@ -735,6 +735,8 @@ const char *v4l2_ctrl_get_name(u32 id)
 case V4L2_CID_MPEG_VIDEO_DEC_PTS: return "Video Decoder PTS";
 case V4L2_CID_MPEG_VIDEO_DEC_FRAME: return "Video Decoder Frame Count";
 case V4L2_CID_MPEG_VIDEO_VBV_DELAY: return "Initial Delay for VBV 
Control";
+ case V4L2_CID_MPEG_VIDEO_HORZ_SEARCH_RANGE: return "hor search range of 
video MB";
+ case V4L2_CID_MPEG_VIDEO_VERT_SEARCH_RANGE: return "vert search range of 
video MB";
 case 

Re: Kconfig errors

2014-01-22 Thread Prabhakar Lad
On Wed, Jan 22, 2014 at 5:56 PM, Russell King - ARM Linux
 wrote:
> On Wed, Jan 22, 2014 at 05:54:29PM +0530, Prabhakar Lad wrote:
>> Hi Russell,
>>
>> On Fri, Jan 17, 2014 at 1:07 PM, Prabhakar Lad
>>  wrote:
>> > Hi,
>> >
>> > On Linux-next branch I see following errors for davinci_all_defconfig
>> > & da8xx_omapl_defconfig configs,
>> >
>> > arch/arm/Kconfig:1966:error: recursive dependency detected!
>> > arch/arm/Kconfig:1966:symbol ZBOOT_ROM depends on AUTO_ZRELADDR
>> > arch/arm/Kconfig:2154:symbol AUTO_ZRELADDR is selected by ZBOOT_ROM
>> > #
>> > # configuration written to .config
>> > #
>> >
>> I am seeing this errors on linux-next, with your recent patch,
>> "[PATCH] Fix select-induced Kconfig warning for ZBOOT_ROM"
>> and strangely I see that AUTO_ZRELADDR doesnt select ZBOOT_ROM
>> but still an error.
>>
>> Note: For the davinci configs CONFIG_AUTO_ZRELADDR is not set and
>> CONFIG_ZBOOT_ROM_TEXT=0x0, CONFIG_ZBOOT_ROM_BSS=0x0
>
> I've killed off the "select AUTO_ZRELADDR if !ZBOOT_ROM" in the IMX
> Kconfig now, so when linux-next picks up my tree, that should be gone.
>
Thanks that helps.

Regards,
--Prabhakar Lad
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors - going beyond 4096 bytes

2014-01-22 Thread Theodore Ts'o
On Wed, Jan 22, 2014 at 06:46:11PM -0800, David Lang wrote:
> It's extremely unlikely that drive manufacturers will produce drives
> that won't work with any existing OS, so they are going to support
> smaller writes in firmware. If they don't, they won't be able to
> sell their drives to anyone running existing software. Given the
> Enterprise software upgrade cycle compared to the expanding storage
> needs, whatever they ship will have to work on OS and firmware
> releases that happened several years ago.

I've been talking to a number of HDD vendors, and while most of the
discussions has been about SMR, the topic of 64k sectors did come up
recently.  In the opinion of at least one drive vendor, the pressure
or 64k sectors will start increasing (roughly paraphrasing that
vendor's engineer, "it's a matter of physics"), and it might not be
surprising that in 2 or 3 years, we might start seing drives with 64k
sectors.  Like with 4k sector drives, it's likely that at least
initial said drives will have an emulation mode where sub-64k writes
will require a read-modify-write cycle.

What I told that vendor was that if this were the case, he should
seriously consider submitting a topic proposal to the LSF/MM, since if
he wants those drives to be well supported, we need to start thinking
about what changes might be necessary at the VM and FS layers now.  So
hopefully we'll see a topic proposal from that HDD vendor in the next
couple of days.

The bottom line is that I'm pretty well convinced that like SMR
drives, 64k sector drives will be coming, and it's not something we
can duck.  It might not come as quickly as the HDD vendor community
might like --- I remember attending an IDEMA conference in 2008 where
they confidently predicted that 4k sector drives would be the default
in 2 years, and it took a wee bit longer than that.  But nevertheless,
looking at the most likely roadmap and trajectory of hard drive
technology, these are two things that will very likely be coming down
the pike, and it would be best if we start thinking about how to
engage with these changes constructively sooner rather than putting it
off and then getting caught behind the eight-ball later.

Cheers,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock

2014-01-22 Thread Srivatsa S. Bhat
On 01/23/2014 07:59 AM, Rusty Russell wrote:
> "Srivatsa S. Bhat"  writes:
>> On 01/22/2014 02:00 PM, Srivatsa S. Bhat wrote:
>>> Hi Paul,
> 
> I find an old patch for register_allcpu_notifier(), but the "bool
> replay_history" should be eliminated (always true): it's too weird.
> 

Sorry, I didn't get this part. Why do you say that replay_history
will always be true?

replay_history will be set to true whenever the caller wants to
get notified of CPU_UP_PREPARE and CPU_ONLINE notifications for the
already online CPUs, or wants to run a custom setup-routine of its
own. And it will be false whenever the caller simply wants to just
register the callback.

Note that passing NULL for the setup-routine, by itself isn't enough
to make a decision. NULL + replay_history == True will invoke the normal
CPU_UP_PREPARE/CPU_ONLINE notifiers for the already online CPUs before
registering the callback. NULL + replay_history == False will just
register the callback and do nothing else.

> Then we should get rid of register_cpu_notifier, or at least hide it.
> 

Why? Isn't it easier to use (since you don't have to pass 2 additional
parameters)? I see register_allcpu_notifier (or whatever better name
we can give it), as an API for special cases where there is something
more to be done than just registering the callback. And register_cpu_notifier
will continue to be the API for the regular case when the caller wants
to just register the callback. This latter case is the majority in the
kernel. So I don't think eliminating the regular API would be a good idea.


By the way, I'm still tempted to try out the simpler-looking alternative
idea of exporting cpu_maps_update_begin() and cpu_maps_update_done()
and then mandating that the callers do:

cpu_maps_update_begin();
for_each_online_cpu(cpu) {
...
}

__register_cpu_notifier(); // this doesn't take the add_remove_lock
cpu_maps_update_done();


I'm working on a patchset that does this and performs a tree-wide
conversion. Please let me know if you have any objections to exporting
cpu_maps_update_begin/done() in this manner.

I thought I'd give this solution a try first, before going to the much
fancier register_allcpu_notifier() method.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Is it ok for deferrable timer wakeup the idle cpu?

2014-01-22 Thread Lei Wen
On Wed, Jan 22, 2014 at 10:07 PM, Thomas Gleixner  wrote:
> On Wed, 22 Jan 2014, Lei Wen wrote:
>> Recently I want to do the experiment for cpu isolation over 3.10 kernel.
>> But I find the isolated one is periodically waken up by IPI interrupt.
>>
>> By checking the trace, I find those IPI is generated by add_timer_on,
>> which would calls wake_up_nohz_cpu, and wake up the already idle cpu.
>>
>> With further checking, I find this timer is added by on_demand governor of
>> cpufreq. It would periodically check each cores' state.
>> The problem I see here is cpufreq_governor using INIT_DEFERRABLE_WORK
>> as the tool, while timer is made as deferrable anyway.
>> And what is more that cpufreq checking is very frequent. In my case, the
>> isolated cpu is wakenup by IPI every 5ms.
>>
>> So why kernel need to wake the remote processor when mount the deferrable
>> timer? As per my understanding, we'd better keep cpu as idle when use
>> the deferrable timer.
>
> Indeed, we can avoid the wakeup of the remote cpu when the timer is
> deferrable.

Glad to hear that we could fix this unwanted wakeup.
Do you have related patches already?

>
> Though you really want to figure out why the cpufreq governor is
> arming timers on other cores every 5ms. That smells like an utterly
> stupid approach.

Not sure why cpufreq choose such frequent profiling over each cpu.
As my understanding, since kernel is smp, launching profiler over one cpu
would be enough...

Thanks,
Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [media] s5p-mfc: Add Horizontal and Vertical search range for Video Macro Blocks

2014-01-22 Thread Prabhakar Lad
Hi Swaminathan,

On Thu, Jan 23, 2014 at 10:49 AM, swaminathan  wrote:
> Hi All,
> Is there any review Comments for the patch "[PATCH] [media] s5p-mfc: Add
> Horizontal and Vertical search range for Video Macro Blocks"
> posted on 30-Dec-2013 ?
>
>
Just a side note, please don’t top post and always reply as plain text.

[Snip]

> Subject: [PATCH] [media] s5p-mfc: Add Horizontal and Vertical search range
> for Video Macro Blocks
>
>
>> This patch adds Controls to set Horizontal and Vertical search range
>> for Motion Estimation block for Samsung MFC video Encoders.
>>
>> Signed-off-by: Swami Nathan 
>> Signed-off-by: Amit Grover 
>> ---
>> Documentation/DocBook/media/v4l/controls.xml|   14 +
>> drivers/media/platform/s5p-mfc/s5p_mfc_common.h |2 ++
>> drivers/media/platform/s5p-mfc/s5p_mfc_enc.c|   24
>> +++
>> drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c |8 ++--
>> drivers/media/v4l2-core/v4l2-ctrls.c|   14 +
>> include/uapi/linux/v4l2-controls.h  |2 ++
>> 6 files changed, 58 insertions(+), 6 deletions(-)
>>
This patch from the outset looks OK,  but you need to split up
into two, first adding a v4l control and second one using it up in the driver.

Regards,
--Prabhakar Lad
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] backlight: turn backlight on/off when necessary

2014-01-22 Thread Jingoo Han
On Wednesday, January 22, 2014 6:36 PM, Jani Nikula wrote:
> On Mon, 20 Jan 2014, Liu Ying  wrote:
> > We don't have to turn backlight on/off everytime a blanking
> > or unblanking event comes because the backlight status may
> > have already been what we want. Another thought is that one
> > backlight device may be shared by multiple framebuffers. We
> > don't hope blanking one of the framebuffers may turn the
> > backlight off for all the other framebuffers, since they are
> > likely being active to display something. This patch adds
> > some logics to record each framebuffer's backlight usage to
> > determine the backlight device use count and whether the
> > backlight should be turned on or off. To be more specific,
> > only one unblank operation on a certain blanked framebuffer
> > may increase the backlight device's use count by one, while
> > one blank operation on a certain unblanked framebuffer may
> > decrease the use count by one, because the userspace is
> > likely to unblank a unblanked framebuffer or blank a blanked
> > framebuffer.
> >
> > Signed-off-by: Liu Ying 
> > ---
> > v1 can be found at https://lkml.org/lkml/2013/5/30/139
> >
> > v1->v2:
> > * Make the commit message be more specific about the condition
> >   in which backlight device use count can be increased/decreased.
> > * Correct the setting for bd->props.fb_blank.
> >
> >  drivers/video/backlight/backlight.c |   28 +---
> >  include/linux/backlight.h   |6 ++
> >  2 files changed, 27 insertions(+), 7 deletions(-)
> >

[.]
> 
> Anything backlight worries me a little, and there are actually three
> changes bundled into one patch here:
> 
> 1. Changing bd->props.state and bd->props.fb_blank only when use_count
>changes from 0->1 or 1->0.
> 
> 2. Calling backlight_update_status() only with the above change, and not
>on all notifier callbacks.
> 
> 3. Setting bd->props.fb_blank always to either FB_BLANK_UNBLANK or
>FB_BLANK_POWERDOWN instead of *(int *)evdata->data.
> 
> The rationale in the commit message seems plausible, and AFAICT the code
> does what it says on the box, so for that (and for that alone) you can
> have my
> 
> Reviewed-by: Jani Nikula 
> 
> *BUT* it would be laborous to figure out whether this change in
> behaviour might regress some drivers. I'm just punting on that. And that
> brings us back to the three changes above - in a bisect POV it might be
> helpful to split the patch up. Up to the maintainers.

I agree with Jani Nikula's opinion.
Please split this patch into three patches as above mentioned.

Best regards,
Jingoo Han

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.

2014-01-22 Thread Tang Chen
Dave found that the kernel will hang during boot. This is because
the nodemask_t type stack variable numa_kernel_nodes is large enough
to overflow the stack.

This doesn't always happen. According to Dave, this happened once
in about five boots. The backtrace is like the following:

dump_stack
panic
? numa_clear_kernel_node_hotplug
__stack_chk_fail
numa_clear_kernel_node_hotplug
? memblock_search_pfn_nid
? __early_pfn_to_nid
numa_init
x86_numa_init
initmem_init
setup_arch
start_kernel

This patch fix this problem by defining numa_kernel_nodes as a
static global variable in __initdata area.

Reported-by: Dave Jones 
Signed-off-by: Tang Chen 
Tested-by: Gu Zheng 
---
 arch/x86/mm/numa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 81b2750..ebefeb7 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -562,10 +562,10 @@ static void __init numa_init_array(void)
}
 }
 
+static nodemask_t numa_kernel_nodes __initdata;
 static void __init numa_clear_kernel_node_hotplug(void)
 {
int i, nid;
-   nodemask_t numa_kernel_nodes;
unsigned long start, end;
struct memblock_type *type = 
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mtd: mtd_oobtest: fix verify errors due to incorrect use of prandom_bytes_state()

2014-01-22 Thread Lothar Waßmann
Hi,

Akinobu Mita wrote:
> 2014/1/23 Lothar Waßmann :
> > Hi,
> >
> > Akinobu Mita wrote:
> >> 2014/1/22 Lothar Waßmann :
> >> > Hi,
> >> >
> >> > Is anyone taking care of this?
> >> >
> >> > Lothar Waßmann wrote:
> >> >> When using prandom_bytes_state() it is critical to use the same block
> >> >> size in all invocations that are to produce the same random sequence.
> >> >> Otherwise the state of the PRNG will be out of sync if the blocksize
> >> >> is not divisible by 4.
> >> >> This leads to bogus verification errors in several tests which use
> >> >> different block sizes to initialize the buffer for writing and
> >> >> comparison.
> >> >>
> >> >> Signed-off-by: Lothar Waßmann 
> >> >> ---
> >> >>  drivers/mtd/tests/oobtest.c |   14 --
> >> >>  1 files changed, 12 insertions(+), 2 deletions(-)
> >> >>
> >> >> diff --git a/drivers/mtd/tests/oobtest.c b/drivers/mtd/tests/oobtest.c
> >> >> index 2e9e2d1..72c7359 100644
> >> >> --- a/drivers/mtd/tests/oobtest.c
> >> >> +++ b/drivers/mtd/tests/oobtest.c
> >> >> @@ -213,8 +213,15 @@ static int verify_eraseblock_in_one_go(int ebnum)
> >> >>   int err = 0;
> >> >>   loff_t addr = ebnum * mtd->erasesize;
> >> >>   size_t len = mtd->ecclayout->oobavail * pgcnt;
> >> >> + int i;
> >> >> +
> >> >> + for (i = 0; i < pgcnt; i++)
> >> >> + prandom_bytes_state(_state, [i * use_len],
> >> >> + use_len);
> >> >> + if (len % use_len)
> >> >> + prandom_bytes_state(_state, [i * use_len],
> >> >> + len % use_len);
> >> >>
> >> >> - prandom_bytes_state(_state, writebuf, len);
> >> >>   ops.mode  = MTD_OPS_AUTO_OOB;
> >> >>   ops.len   = 0;
> >> >>   ops.retlen= 0;
> >>
> >> I would rather fix the use of prandom_bytes_state() in write_eraseblock()
> >> than fix in verify_eraseblock_in_one_go().
> >>
> > Why and how?
> 
> I thought that it could reduce calls of prandom_bytes_state() and
> it makes code simpler than increasing calls.
> 
> > write_whole_device() (which calls write_eraseblock()) is used multiple
> > times with different verification methods (all blocks in one go or each
> > block individually).
> > If prandom_state_bytes() in write_eraseblock() would be changed, that
> > function would have to know, how the block are going to be checked
> > lateron to know how to set up the writebuffer.
> 
> Instead of calling prandom_bytes_state() in the for loop in
> write_eraseblock(), call prandom_bytes_state() at once before going
> into the loop and use correct offset in writebuf in the loop.
> Although, we also need to fix verify_eraseblock() in the same way.
> 
> Doesn't that fix this problem?
>
Of course one could fix it that way, but that would be a much more
invasive change that also needs more testing.


Lothar Waßmann
-- 
___

Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen
Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10
Geschäftsführer: Matthias Kaussen
Handelsregistereintrag: Amtsgericht Aachen, HRB 4996

www.karo-electronics.de | i...@karo-electronics.de
___
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Is it ok for deferrable timer wakeup the idle cpu?

2014-01-22 Thread Viresh Kumar
On 23 January 2014 11:11, Lei Wen  wrote:
> On Wed, Jan 22, 2014 at 10:07 PM, Thomas Gleixner  wrote:
>> On Wed, 22 Jan 2014, Lei Wen wrote:
>>> Recently I want to do the experiment for cpu isolation over 3.10 kernel.
>>> But I find the isolated one is periodically waken up by IPI interrupt.
>>>
>>> By checking the trace, I find those IPI is generated by add_timer_on,
>>> which would calls wake_up_nohz_cpu, and wake up the already idle cpu.
>>>
>>> With further checking, I find this timer is added by on_demand governor of
>>> cpufreq. It would periodically check each cores' state.
>>> The problem I see here is cpufreq_governor using INIT_DEFERRABLE_WORK
>>> as the tool, while timer is made as deferrable anyway.
>>> And what is more that cpufreq checking is very frequent. In my case, the
>>> isolated cpu is wakenup by IPI every 5ms.
>>>
>>> So why kernel need to wake the remote processor when mount the deferrable
>>> timer? As per my understanding, we'd better keep cpu as idle when use
>>> the deferrable timer.
>>
>> Indeed, we can avoid the wakeup of the remote cpu when the timer is
>> deferrable.
>
> Glad to hear that we could fix this unwanted wakeup.
> Do you have related patches already?
>
>>
>> Though you really want to figure out why the cpufreq governor is
>> arming timers on other cores every 5ms. That smells like an utterly
>> stupid approach.
>
> Not sure why cpufreq choose such frequent profiling over each cpu.
> As my understanding, since kernel is smp, launching profiler over one cpu
> would be enough...


Hi Guys,

So the first question is why cpufreq needs it and is it really stupid?
Yes, it is stupid but that's how its implemented since a long time. It does
so to get data about the load on CPUs, so that freq can be scaled up/down.

Though there is a solution in discussion currently, which will take
inputs from scheduler and so these background timers would go away.
But we need to wait until that time.

Now, why do we need that for every cpu, while that for a single cpu might
be enough? The answer is cpuidle here: What if the cpu responsible for
running timer goes to sleep? Who will evaluate the load then? And if we
make this timer run on one cpu in non-deferrable mode then that cpu
would be waken up again and again from idle. So, it was decided to have
a per-cpu deferrable timer. Though to improve efficiency, once it is fired
on any cpu, timer for all other CPUs are rescheduled, so that they don't
fire before 5ms (sampling time)..

I think below diff might get this fixed for you, though I am not sure if it
breaks something else. Probably Thomas/Frederic can answer here.
If this looks fine I will send it formally again:

diff --git a/kernel/timer.c b/kernel/timer.c
index accfd24..3a2c7fa 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -940,7 +940,8 @@ void add_timer_on(struct timer_list *timer, int cpu)
 * makes sure that a CPU on the way to stop its tick can not
 * evaluate the timer wheel.
 */
-   wake_up_nohz_cpu(cpu);
+   if (!tbase_get_deferrable(timer->base))
+   wake_up_nohz_cpu(cpu);
spin_unlock_irqrestore(>lock, flags);
 }
 EXPORT_SYMBOL_GPL(add_timer_on);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 9/9] mm: keep page cache radix tree nodes in check

2014-01-22 Thread Minchan Kim
On Mon, Jan 20, 2014 at 06:17:37PM -0500, Johannes Weiner wrote:
> On Fri, Jan 17, 2014 at 11:05:17AM +1100, Dave Chinner wrote:
> > On Fri, Jan 10, 2014 at 01:10:43PM -0500, Johannes Weiner wrote:
> > > Previously, page cache radix tree nodes were freed after reclaim
> > > emptied out their page pointers.  But now reclaim stores shadow
> > > entries in their place, which are only reclaimed when the inodes
> > > themselves are reclaimed.  This is problematic for bigger files that
> > > are still in use after they have a significant amount of their cache
> > > reclaimed, without any of those pages actually refaulting.  The shadow
> > > entries will just sit there and waste memory.  In the worst case, the
> > > shadow entries will accumulate until the machine runs out of memory.
> > > 
> > > To get this under control, the VM will track radix tree nodes
> > > exclusively containing shadow entries on a per-NUMA node list.
> > > Per-NUMA rather than global because we expect the radix tree nodes
> > > themselves to be allocated node-locally and we want to reduce
> > > cross-node references of otherwise independent cache workloads.  A
> > > simple shrinker will then reclaim these nodes on memory pressure.
> > > 
> > > A few things need to be stored in the radix tree node to implement the
> > > shadow node LRU and allow tree deletions coming from the list:
> > 
> > Just a couple of things with the list_lru interfaces.
> > 
> > 
> > > @@ -123,9 +129,39 @@ static void page_cache_tree_delete(struct 
> > > address_space *mapping,
> > >* same time and miss a shadow entry.
> > >*/
> > >   smp_wmb();
> > > - } else
> > > - radix_tree_delete(>page_tree, page->index);
> > > + }
> > >   mapping->nrpages--;
> > > +
> > > + if (!node) {
> > > + /* Clear direct pointer tags in root node */
> > > + mapping->page_tree.gfp_mask &= __GFP_BITS_MASK;
> > > + radix_tree_replace_slot(slot, shadow);
> > > + return;
> > > + }
> > > +
> > > + /* Clear tree tags for the removed page */
> > > + index = page->index;
> > > + offset = index & RADIX_TREE_MAP_MASK;
> > > + for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) {
> > > + if (test_bit(offset, node->tags[tag]))
> > > + radix_tree_tag_clear(>page_tree, index, tag);
> > > + }
> > > +
> > > + /* Delete page, swap shadow entry */
> > > + radix_tree_replace_slot(slot, shadow);
> > > + node->count--;
> > > + if (shadow)
> > > + node->count += 1U << RADIX_TREE_COUNT_SHIFT;
> > > + else
> > > + if (__radix_tree_delete_node(>page_tree, node))
> > > + return;
> > > +
> > > + /* Only shadow entries in there, keep track of this node */
> > > + if (!(node->count & RADIX_TREE_COUNT_MASK) &&
> > > + list_empty(>private_list)) {
> > > + node->private_data = mapping;
> > > + list_lru_add(_shadow_nodes, >private_list);
> > > + }
> > 
> > You can't do this list_empty(>private_list) check safely
> > externally to the list_lru code - only time that entry can be
> > checked safely is under the LRU list locks. This is the reason that
> > list_lru_add/list_lru_del return a boolean to indicate is the object
> > was added/removed from the list - they do this list_empty() check
> > internally. i.e. the correct, safe way to do conditionally update
> > state iff the object was added to the LRU is:
> > 
> > if (!(node->count & RADIX_TREE_COUNT_MASK)) {
> > if (list_lru_add(_shadow_nodes, >private_list))
> > node->private_data = mapping;
> > }
> > 
> > > + radix_tree_replace_slot(slot, page);
> > > + mapping->nrpages++;
> > > + if (node) {
> > > + node->count++;
> > > + /* Installed page, can't be shadow-only anymore */
> > > + if (!list_empty(>private_list))
> > > + list_lru_del(_shadow_nodes,
> > > +  >private_list);
> > > + }
> > 
> > Same issue here:
> > 
> > if (node) {
> > node->count++;
> > list_lru_del(_shadow_nodes, >private_list);
> > }
> 
> All modifications to node->private_list happen under
> mapping->tree_lock, and modifications of a neighboring link should not
> affect the outcome of the list_empty(), so I don't think the lru lock
> is necessary.
> 
> It would be cleaner to take it of course, but that would mean adding
> an unconditional NUMAnode-wide lock to every page cache population.
> 
> > >  static int __add_to_page_cache_locked(struct page *page,
> > > diff --git a/mm/list_lru.c b/mm/list_lru.c
> > > index 72f9decb0104..47a9faf4070b 100644
> > > --- a/mm/list_lru.c
> > > +++ b/mm/list_lru.c
> > > @@ -88,10 +88,18 @@ restart:
> > >   ret = isolate(item, >lock, cb_arg);
> > >   switch (ret) {
> > >   case LRU_REMOVED:
> > > + case LRU_REMOVED_RETRY:
> > >   if (--nlru->nr_items == 0)
> > >   node_clear(nid, lru->active_nodes);
> > > 

[patch] mm, compaction: ignore pageblock skip when manually invoking compaction

2014-01-22 Thread David Rientjes
The cached pageblock hint should be ignored when triggering compaction
through /proc/sys/vm/compact_memory so all eligible memory is isolated.  
Manually invoking compaction is known to be expensive, there's no need to
skip pageblocks based on heuristics (mainly for debugging).

Signed-off-by: David Rientjes 
---
 mm/compaction.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/compaction.c b/mm/compaction.c
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1177,6 +1177,7 @@ static void compact_node(int nid)
struct compact_control cc = {
.order = -1,
.sync = true,
+   .ignore_skip_hint = true,
};
 
__compact_pgdat(NODE_DATA(nid), );
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 67651] Bisected: Lots of fragmented mmaps cause gimp to fail in 3.12 after exceeding vm_max_map_count

2014-01-22 Thread Cyrill Gorcunov
On Wed, Jan 22, 2014 at 02:45:53PM -0800, Andy Lutomirski wrote:
> > 
> > Thus when user space application track memory changes now it can detect 
> > if
> > vma area is renewed.
> 
> Presumably some path is failing to set VM_SOFTDIRTY, thus preventing mms
> from being merged.
> 
> That being said, this could cause vma blowups for programs that are
> actually using this thing.

Hi Andy, indeed, this could happen. The easiest way is to ignore softdirty bit
when we're trying to merge vmas and set it one new merged. I think this should
be correct. Once I finish I'll send the patch.

Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net/neighbour: queue work on power efficient wq

2014-01-22 Thread David Miller
From: Viresh Kumar 
Date: Wed, 22 Jan 2014 12:23:33 +0530

> Workqueue used in neighbour layer have no real dependency of scheduling these 
> on
> the cpu which scheduled them.
> 
> On a idle system, it is observed that an idle cpu wakes up many times just to
> service this work. It would be better if we can schedule it on a cpu which the
> scheduler believes to be the most appropriate one.
> 
> This patch replaces normal workqueues with power efficient versions. This
> doesn't change existing behavior of code unless CONFIG_WQ_POWER_EFFICIENT is
> enabled.
> 
> Signed-off-by: Viresh Kumar 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] net/ipv4: queue work on power efficient wq

2014-01-22 Thread David Miller
From: Viresh Kumar 
Date: Wed, 22 Jan 2014 12:23:32 +0530

> Workqueue used in ipv4 layer have no real dependency of scheduling these on 
> the
> cpu which scheduled them.
> 
> On a idle system, it is observed that an idle cpu wakes up many times just to
> service this work. It would be better if we can schedule it on a cpu which the
> scheduler believes to be the most appropriate one.
> 
> This patch replaces normal workqueues with power efficient versions. This
> doesn't change existing behavior of code unless CONFIG_WQ_POWER_EFFICIENT is
> enabled.
> 
> Signed-off-by: Viresh Kumar 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] 6lowpan: add a license to 6lowpan_iphc module

2014-01-22 Thread David Miller
From: Yann Droneaud 
Date: Wed, 22 Jan 2014 20:25:24 +0100

> Since commit 8df8c56a5abc, 6lowpan_iphc is a module of its own.
> 
> Unfortunately, it lacks some infrastructure to behave like a
> good kernel citizen:
> 
>   kernel: 6lowpan_iphc: module license 'unspecified' taints kernel.
>   kernel: Disabling lock debugging due to kernel taint
> 
> This patch adds the basic MODULE_LICENSE(); with GPL license:
> the code was copied from net/ieee802154/6lowpan.c which is GPL
> and the module exports symbol with EXPORT_SYMBOL_GPL();.
> 
> Cc: Jukka Rissanen 
> Cc: Alexander Aring 
> Cc: Marcel Holtmann 
> Signed-off-by: Yann Droneaud 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.

2014-01-22 Thread Dave Jones
On Thu, Jan 23, 2014 at 01:49:28PM +0800, Tang Chen wrote:
 
 > This doesn't always happen. According to Dave, this happened once
 > in about five boots. The backtrace is like the following:
 > 
 > dump_stack
 > panic
 > ? numa_clear_kernel_node_hotplug
 > __stack_chk_fail
 > numa_clear_kernel_node_hotplug
 > ? memblock_search_pfn_nid
 > ? __early_pfn_to_nid
 > numa_init
 > x86_numa_init
 > initmem_init
 > setup_arch
 > start_kernel
 > 
 > This patch fix this problem by defining numa_kernel_nodes as a
 > static global variable in __initdata area.
 > 
 > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
 > index 81b2750..ebefeb7 100644
 > --- a/arch/x86/mm/numa.c
 > +++ b/arch/x86/mm/numa.c
 > @@ -562,10 +562,10 @@ static void __init numa_init_array(void)
 >  }
 >  }
 >  
 > +static nodemask_t numa_kernel_nodes __initdata;
 >  static void __init numa_clear_kernel_node_hotplug(void)
 >  {
 >  int i, nid;
 > -nodemask_t numa_kernel_nodes;
 >  unsigned long start, end;
 >  struct memblock_type *type = 

I'm surprised that this worked for anyone.
By my math, nodemask_t is 1024 longs, which should fill the whole stack.

Any idea why it only broke sometimes ?

There are other on-stack nodemask_t's in the tree too, why are they safe ?

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH Resend 4/8] ASoC: simple-card: Add snd_card's name parsing from DT node support

2014-01-22 Thread Xiubo Li
If the DT is used and the CPU DAI device has only one DAI, the card
name will be like :

ALSA device list:
0: 40031000.sai-sgtl5000

And this name maybe a little ugly to some customers, so here the
card name parsing from DT node is supported.

Signed-off-by: Xiubo Li 
---
 sound/soc/generic/simple-card.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/sound/soc/generic/simple-card.c b/sound/soc/generic/simple-card.c
index f38e56e..546b93d 100644
--- a/sound/soc/generic/simple-card.c
+++ b/sound/soc/generic/simple-card.c
@@ -140,6 +140,9 @@ static int asoc_simple_card_parse_of(struct device_node 
*node,
char *name;
int ret;
 
+   /* parsing the card name from DT */
+   snd_soc_of_parse_card_name(>snd_card, "simple-audio-card,name");
+
/* get CPU/CODEC common format via simple-audio-card,format */
priv->daifmt = snd_soc_of_parse_daifmt(node, "simple-audio-card,") &
(SND_SOC_DAIFMT_FORMAT_MASK | SND_SOC_DAIFMT_INV_MASK);
@@ -184,7 +187,8 @@ static int asoc_simple_card_parse_of(struct device_node 
*node,
GFP_KERNEL);
sprintf(name, "%s-%s", dai_link->cpu_dai_name,
dai_link->codec_dai_name);
-   priv->snd_card.name = name;
+   if (!priv->snd_card.name)
+   priv->snd_card.name = name;
dai_link->name = dai_link->stream_name = name;
 
/* simple-card assumes platform == cpu */
-- 
1.8.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   9   10   >