[PATCHSET 00/24] perf tools: Add support to accumulate hist periods (v7)
Hello, This is a new attempt to implement cumulative hist period report. This work begins from Arun's SORT_INCLUSIVE patch [1] but I completely rewrote it from scratch. This patchset is based on my previous patchset [2] but I think it's almost independent so that it can be applied separately. Please see the patch 03/24. I refactored functions that add hist entries with struct hist_entry_iter. While I converted all functions carefully, it'd be better anyone can test and confirm that I didn't mess up something - especially for branch stack and mem stuff. This patchset basically adds period in a sample to every node in the callchain. A hist_entry now has an additional fields to keep the cumulative period if --children option is given on perf report. I changed the option as a separate --children and added a new "Children" column (and renamed the default "Overhead" column into "Self"). The output will be sorted by children (cumulative) overhead for now. The reason I changed to the --children is that I still think it's much different from other --call-graph options. The --call-graph option will take care of it even with --children option. I know that the UI should be changed also to be more flexible as Ingo requested, but I'd like to do this first and then move to work on the next. I also added a new config option to enable it by default. * chagnes in v7: - add Tested-by tags from Arun - rebase onto current acme/perf/core * changes in v6: - separate struct hist_iter_ops (Jiri) - check iter->he before calling ->add_entry_cb (Jiri) - fix locking issue on perf top (Jiri) * changes in v5: - support both of --children and --call-graph (Arun) - refactor hist_entry_iter to share with perf top (Jiri) - various cleanups and fixes (Jiri) - add ack's from Jiri * changes in v4: - change to --children option (Ingo) - rebased on new annotation change (Arnaldo) - support perf top also - enable --children option by default (Ingo) * changes in v3: - change to --cumulate option - fix a couple of bugs (Jiri, Rodrigo) - rename some help functions (Arnaldo) - cache previous hist entries rathen than just symbol and dso - add some preparatory cleanups - add report.cumulate config option Let me show you an example: $ cat abc.c #define barrier() asm volatile("" ::: "memory") void a(void) { int i; for (i = 0; i < 100; i++) barrier(); } void b(void) { a(); } void c(void) { b(); } int main(void) { c(); return 0; } With this simple program I ran perf record and report: $ perf record -g -e cycles:u ./abc Case 1. $ perf report --stdio --no-call-graph --no-children # Overhead Command Shared Object Symbol # ... . .. # 91.50% abc abc[.] a 8.18% abc ld-2.17.so [.] strlen 0.31% abc [kernel.kallsyms] [k] page_fault 0.01% abc ld-2.17.so [.] _start Case 2. (current default behavior) $ perf report --stdio --call-graph --no-children # Overhead Command Shared Object Symbol # ... . .. # 91.50% abc abc[.] a | --- a b c main __libc_start_main 8.18% abc ld-2.17.so [.] strlen | --- strlen _dl_sysdep_start 0.31% abc [kernel.kallsyms] [k] page_fault | --- page_fault _start 0.01% abc ld-2.17.so [.] _start | --- _start Case 3. $ perf report --no-call-graph --children --stdio # Self Children Command Shared Object Symbol # ... . . # 0.00%91.50% abc libc-2.17.so [.] __libc_start_main 0.00%91.50% abc abc[.] main 0.00%91.50% abc abc[.] c 0.00%91.50% abc abc[.] b 91.50%91.50% abc abc[.] a 0.00% 8.18% abc ld-2.17.so [.] _dl_sysdep_start 8.18% 8.18% abc ld-2.17.so [.] strlen 0.01% 0.33% abc ld-2.17.so [.] _start 0.31% 0.31% abc [kernel.kallsyms] [k] page_fault As you can see __libc_start_main -> main -> c -> b -> a callchain show up in the output. Finally, it looks like below with both option enabled: Case 4. (default behavior?) $ perf report --call-graph
[PATCH 04/21] perf hists: Accumulate hist entry stat based on the callchain
Call __hists__add_entry() for each callchain node to get an accumulated stat for an entry. Introduce new cumulative_iter ops to process them properly. Tested-by: Arun Sharma Cc: Frederic Weisbecker Signed-off-by: Namhyung Kim --- tools/perf/builtin-report.c | 2 ++ tools/perf/util/hist.c | 87 + tools/perf/util/hist.h | 1 + 3 files changed, 90 insertions(+) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index b6618ecb474a..3ed0669d7620 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -114,6 +114,8 @@ static int process_sample_event(struct perf_tool *tool, iter.ops = _iter_branch; else if (rep->mem_mode == 1) iter.ops = _iter_mem; + else if (symbol_conf.cumulate_callchain) + iter.ops = _iter_cumulative; else iter.ops = _iter_normal; diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index 2e9dd5d4ca1d..46402fbf4c0e 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -704,6 +704,85 @@ iter_finish_normal_entry(struct hist_entry_iter *iter, struct addr_location *al) return hist_entry__append_callchain(he, sample); } +static int +iter_prepare_cumulative_entry(struct hist_entry_iter *iter __maybe_unused, + struct addr_location *al __maybe_unused) +{ + callchain_cursor_commit(_cursor); + return 0; +} + +static int +iter_add_single_cumulative_entry(struct hist_entry_iter *iter, +struct addr_location *al) +{ + struct perf_evsel *evsel = iter->evsel; + struct perf_sample *sample = iter->sample; + struct hist_entry *he; + + he = __hists__add_entry(>hists, al, iter->parent, NULL, NULL, + sample->period, sample->weight, + sample->transaction, true); + if (he == NULL) + return -ENOMEM; + + return hist_entry__inc_addr_samples(he, evsel->idx, al->addr); +} + +static int +iter_next_cumulative_entry(struct hist_entry_iter *iter, + struct addr_location *al) +{ + struct callchain_cursor_node *node; + + node = callchain_cursor_current(_cursor); + if (node == NULL) + return 0; + + al->map = node->map; + al->sym = node->sym; + if (node->map) + al->addr = node->map->map_ip(node->map, node->ip); + else + al->addr = node->ip; + + if (iter->hide_unresolved && al->sym == NULL) + return 0; + + callchain_cursor_advance(_cursor); + return 1; +} + +static int +iter_add_next_cumulative_entry(struct hist_entry_iter *iter, + struct addr_location *al) +{ + struct perf_evsel *evsel = iter->evsel; + struct perf_sample *sample = iter->sample; + struct hist_entry *he; + + he = __hists__add_entry(>hists, al, iter->parent, NULL, NULL, + sample->period, sample->weight, + sample->transaction, false); + if (he == NULL) + return -ENOMEM; + + return hist_entry__inc_addr_samples(he, evsel->idx, al->addr); +} + +static int +iter_finish_cumulative_entry(struct hist_entry_iter *iter, +struct addr_location *al __maybe_unused) +{ + struct perf_evsel *evsel = iter->evsel; + struct perf_sample *sample = iter->sample; + + evsel->hists.stats.total_period += sample->period; + hists__inc_nr_events(>hists, PERF_RECORD_SAMPLE); + + return 0; +} + const struct hist_iter_ops hist_iter_mem = { .prepare_entry = iter_prepare_mem_entry, .add_single_entry = iter_add_single_mem_entry, @@ -728,6 +807,14 @@ const struct hist_iter_ops hist_iter_normal = { .finish_entry = iter_finish_normal_entry, }; +const struct hist_iter_ops hist_iter_cumulative = { + .prepare_entry = iter_prepare_cumulative_entry, + .add_single_entry = iter_add_single_cumulative_entry, + .next_entry = iter_next_cumulative_entry, + .add_next_entry = iter_add_next_cumulative_entry, + .finish_entry = iter_finish_cumulative_entry, +}; + int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al, struct perf_evsel *evsel, const union perf_event *event, struct perf_sample *sample, int max_stack_depth) diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index d482e673ecf5..091bf81df8c3 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -120,6 +120,7 @@ struct hist_entry_iter { extern const struct hist_iter_ops hist_iter_normal; extern const struct hist_iter_ops hist_iter_branch; extern const struct hist_iter_ops hist_iter_mem; +extern const
[PATCH 03/21] perf hists: Check if accumulated when adding a hist entry
To support callchain accumulation, @entry should be recognized if it's accumulated or not when add_hist_entry() called. The period of an accumulated entry should be added to ->stat_acc but not ->stat. Add @sample_self arg for that. Tested-by: Arun Sharma Cc: Frederic Weisbecker Signed-off-by: Namhyung Kim --- tools/perf/builtin-annotate.c | 3 ++- tools/perf/builtin-diff.c | 2 +- tools/perf/builtin-top.c | 2 +- tools/perf/tests/hists_link.c | 4 ++-- tools/perf/util/hist.c| 29 ++--- tools/perf/util/hist.h| 3 ++- 6 files changed, 26 insertions(+), 17 deletions(-) diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index 0da603b79b61..70b2d52c3b2e 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -65,7 +65,8 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel, return 0; } - he = __hists__add_entry(>hists, al, NULL, NULL, NULL, 1, 1, 0); + he = __hists__add_entry(>hists, al, NULL, NULL, NULL, 1, 1, 0, + true); if (he == NULL) return -ENOMEM; diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c index a77e31246c00..93912add75b5 100644 --- a/tools/perf/builtin-diff.c +++ b/tools/perf/builtin-diff.c @@ -308,7 +308,7 @@ static int hists__add_entry(struct hists *hists, u64 weight, u64 transaction) { if (__hists__add_entry(hists, al, NULL, NULL, NULL, period, weight, - transaction) != NULL) + transaction, true) != NULL) return 0; return -ENOMEM; } diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index 76cd510d34d0..c574c291383c 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -245,7 +245,7 @@ static struct hist_entry *perf_evsel__add_hist_entry(struct perf_evsel *evsel, pthread_mutex_lock(>hists.lock); he = __hists__add_entry(>hists, al, NULL, NULL, NULL, sample->period, sample->weight, - sample->transaction); + sample->transaction, true); pthread_mutex_unlock(>hists.lock); if (he == NULL) return NULL; diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c index 2b6519e0e36f..e4e931ec1dbb 100644 --- a/tools/perf/tests/hists_link.c +++ b/tools/perf/tests/hists_link.c @@ -223,7 +223,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine) goto out; he = __hists__add_entry(>hists, , NULL, - NULL, NULL, 1, 1, 0); + NULL, NULL, 1, 1, 0, true); if (he == NULL) goto out; @@ -246,7 +246,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine) goto out; he = __hists__add_entry(>hists, , NULL, - NULL, NULL, 1, 1, 0); + NULL, NULL, 1, 1, 0, true); if (he == NULL) goto out; diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index 45a962f40cea..2e9dd5d4ca1d 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -272,7 +272,8 @@ void hists__decay_entries(struct hists *hists, bool zap_user, bool zap_kernel) * histogram, sorted on item, collects periods */ -static struct hist_entry *hist_entry__new(struct hist_entry *template) +static struct hist_entry *hist_entry__new(struct hist_entry *template, + bool sample_self) { size_t callchain_size = 0; struct hist_entry *he; @@ -292,6 +293,8 @@ static struct hist_entry *hist_entry__new(struct hist_entry *template) return NULL; } memcpy(he->stat_acc, >stat, sizeof(he->stat)); + if (!sample_self) + memset(>stat, 0, sizeof(he->stat)); } if (he->ms.map) @@ -354,7 +357,8 @@ static u8 symbol__parent_filter(const struct symbol *parent) static struct hist_entry *add_hist_entry(struct hists *hists, struct hist_entry *entry, -struct addr_location *al) +struct addr_location *al, +bool sample_self) { struct rb_node **p; struct rb_node *parent = NULL; @@ -378,7 +382,8 @@ static struct hist_entry *add_hist_entry(struct hists *hists, cmp =
[PATCH 05/21] perf tools: Update cpumode for each cumulative entry
The cpumode and level in struct addr_localtion was set for a sample and but updated as cumulative callchains were added. This led to have non-matching symbol and cpumode in the output. Update it accordingly based on the fact whether the map is a part of the kernel or not. This is a reverse of what thread__find_addr_map() does. Tested-by: Arun Sharma Cc: Frederic Weisbecker Signed-off-by: Namhyung Kim --- tools/perf/util/callchain.c | 42 ++ tools/perf/util/callchain.h | 2 ++ tools/perf/util/hist.c | 13 ++--- 3 files changed, 46 insertions(+), 11 deletions(-) diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c index 8d9db454f1a9..ac658135079f 100644 --- a/tools/perf/util/callchain.c +++ b/tools/perf/util/callchain.c @@ -551,3 +551,45 @@ int hist_entry__append_callchain(struct hist_entry *he, struct perf_sample *samp return 0; return callchain_append(he->callchain, _cursor, sample->period); } + +int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *node, + bool hide_unresolved) +{ + al->map = node->map; + al->sym = node->sym; + if (node->map) + al->addr = node->map->map_ip(node->map, node->ip); + else + al->addr = node->ip; + + if (al->sym == NULL) { + if (hide_unresolved) + return 0; + if (al->map == NULL) + goto out; + } + + if (al->map->groups == >machine->kmaps) { + if (machine__is_host(al->machine)) { + al->cpumode = PERF_RECORD_MISC_KERNEL; + al->level = 'k'; + } else { + al->cpumode = PERF_RECORD_MISC_GUEST_KERNEL; + al->level = 'g'; + } + } else { + if (machine__is_host(al->machine)) { + al->cpumode = PERF_RECORD_MISC_USER; + al->level = '.'; + } else if (perf_guest) { + al->cpumode = PERF_RECORD_MISC_GUEST_USER; + al->level = 'u'; + } else { + al->cpumode = PERF_RECORD_MISC_HYPERVISOR; + al->level = 'H'; + } + } + +out: + return 1; +} diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h index 8ad97e9b119f..66faae21370d 100644 --- a/tools/perf/util/callchain.h +++ b/tools/perf/util/callchain.h @@ -155,6 +155,8 @@ int sample__resolve_callchain(struct perf_sample *sample, struct symbol **parent struct perf_evsel *evsel, struct addr_location *al, int max_stack); int hist_entry__append_callchain(struct hist_entry *he, struct perf_sample *sample); +int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *node, + bool hide_unresolved); extern const char record_callchain_help[]; #endif /* __PERF_CALLCHAIN_H */ diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index 46402fbf4c0e..beb9f96e4e4f 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -739,18 +739,9 @@ iter_next_cumulative_entry(struct hist_entry_iter *iter, if (node == NULL) return 0; - al->map = node->map; - al->sym = node->sym; - if (node->map) - al->addr = node->map->map_ip(node->map, node->ip); - else - al->addr = node->ip; - - if (iter->hide_unresolved && al->sym == NULL) - return 0; - callchain_cursor_advance(_cursor); - return 1; + + return fill_callchain_info(al, node, iter->hide_unresolved); } static int -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 07/21] perf callchain: Add callchain_cursor_snapshot()
The callchain_cursor_snapshot() is for saving current status of the callchain. It'll be used to accumulate callchain information for each node. Tested-by: Arun Sharma Cc: Frederic Weisbecker Signed-off-by: Namhyung Kim --- tools/perf/util/callchain.h | 9 + 1 file changed, 9 insertions(+) diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h index 66faae21370d..bbd63dfbe112 100644 --- a/tools/perf/util/callchain.h +++ b/tools/perf/util/callchain.h @@ -159,4 +159,13 @@ int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node * bool hide_unresolved); extern const char record_callchain_help[]; + +static inline void callchain_cursor_snapshot(struct callchain_cursor *dest, +struct callchain_cursor *src) +{ + *dest = *src; + + dest->first = src->curr; + dest->nr -= src->pos; +} #endif /* __PERF_CALLCHAIN_H */ -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: manual merge of the arm-soc tree with Linus' tree
Hi all, Today's linux-next merge of the arm-soc tree got a conflict in arch/arm/boot/dts/bcm11351.dtsi between commit 67a57be85e68 ("ARM: bcm11351: Enable pinctrl for Broadcom Capri SoCs") from Linus' tree and commit 0bd898b872ac ("ARM: dts: Declare clocks as fixed on bcm11351") and several following commits from the arm-soc tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc arch/arm/boot/dts/bcm11351.dtsi index dd8e878741c0,375a2f8eb878.. --- a/arch/arm/boot/dts/bcm11351.dtsi +++ b/arch/arm/boot/dts/bcm11351.dtsi @@@ -142,8 -146,159 +146,164 @@@ status = "disabled"; }; + pinctrl@35004800 { + compatible = "brcm,capri-pinctrl"; + reg = <0x35004800 0x430>; + }; ++ + i2c@3e016000 { + compatible = "brcm,bcm11351-i2c", "brcm,kona-i2c"; + reg = <0x3e016000 0x80>; + interrupts = ; + #address-cells = <1>; + #size-cells = <0>; + clocks = <_clk>; + status = "disabled"; + }; + + i2c@3e017000 { + compatible = "brcm,bcm11351-i2c", "brcm,kona-i2c"; + reg = <0x3e017000 0x80>; + interrupts = ; + #address-cells = <1>; + #size-cells = <0>; + clocks = <_clk>; + status = "disabled"; + }; + + i2c@3e018000 { + compatible = "brcm,bcm11351-i2c", "brcm,kona-i2c"; + reg = <0x3e018000 0x80>; + interrupts = ; + #address-cells = <1>; + #size-cells = <0>; + clocks = <_clk>; + status = "disabled"; + }; + + i2c@3500d000 { + compatible = "brcm,bcm11351-i2c", "brcm,kona-i2c"; + reg = <0x3500d000 0x80>; + interrupts = ; + #address-cells = <1>; + #size-cells = <0>; + clocks = <_bsc_clk>; + status = "disabled"; + }; + + clocks { + bsc1_clk: bsc1 { + compatible = "fixed-clock"; + clock-frequency = <1300>; + #clock-cells = <0>; + }; + + bsc2_clk: bsc2 { + compatible = "fixed-clock"; + clock-frequency = <1300>; + #clock-cells = <0>; + }; + + bsc3_clk: bsc3 { + compatible = "fixed-clock"; + clock-frequency = <1300>; + #clock-cells = <0>; + }; + + pmu_bsc_clk: pmu_bsc { + compatible = "fixed-clock"; + clock-frequency = <1300>; + #clock-cells = <0>; + }; + + hub_timer_clk: hub_timer { + compatible = "fixed-clock"; + clock-frequency = <32768>; + #clock-cells = <0>; + }; + + pwm_clk: pwm { + compatible = "fixed-clock"; + clock-frequency = <2600>; + #clock-cells = <0>; + }; + + sdio1_clk: sdio1 { + compatible = "fixed-clock"; + clock-frequency = <4800>; + #clock-cells = <0>; + }; + + sdio2_clk: sdio2 { + compatible = "fixed-clock"; + clock-frequency = <4800>; + #clock-cells = <0>; + }; + + sdio3_clk: sdio3 { + compatible = "fixed-clock"; + clock-frequency = <4800>; + #clock-cells = <0>; + }; + + sdio4_clk: sdio4 { + compatible = "fixed-clock"; + clock-frequency = <4800>; + #clock-cells = <0>; + }; + + tmon_1m_clk: tmon_1m { + compatible = "fixed-clock"; + clock-frequency = <100>; + #clock-cells = <0>; + }; + + uartb_clk: uartb { + compatible = "fixed-clock"; + clock-frequency = <1300>; + #clock-cells = <0>; + }; + + uartb2_clk: uartb2 { + compatible = "fixed-clock"; + clock-frequency = <1300>; + #clock-cells = <0>; + }; + + uartb3_clk: uartb3 { + compatible = "fixed-clock"; + clock-frequency =
Re: randconfig build error with next-20140122, in arch/x86/kernel/devicetree.c
On Wed, Jan 22, 2014 at 12:06 PM, Randy Dunlap wrote: > On 01/22/2014 08:34 AM, Jim Davis wrote: >> Building with the attached random configuration file, >> >> warning: (X86_INTEL_MID) selects INTEL_SCU_IPC which has unmet direct >> dependencies (X86 && X86_PLATFORM_DEVICES && X86_INTEL_MID) >> warning: (USB_OTG_FSM && FSL_USB2_OTG && USB_MV_OTG) selects USB_OTG >> which has unmet direct dependencies (USB_SUPPORT && USB && PM_RUNTIME) >> warning: (X86_INTEL_MID) selects INTEL_SCU_IPC which has unmet direct >> dependencies (X86 && X86_PLATFORM_DEVICES && X86_INTEL_MID) >> warning: (USB_OTG_FSM && FSL_USB2_OTG && USB_MV_OTG) selects USB_OTG >> which has unmet direct dependencies (USB_SUPPORT && USB && PM_RUNTIME) >> >> arch/x86/kernel/devicetree.c:67:1: warning: data definition has no >> type or storage class [enabled by default] >> module_init(add_bus_probe); >> ^ > > For linux-next, devicetree.c needs to #include . > For mainline, it would have needed to #include . > However, it does neither of those. Thanks guys; I've already queued a fix for this. http://git.kernel.org/cgit/linux/kernel/git/paulg/init.git/commit/?id=3d83b6b84210066f0886b0916136fa49ca61704d Paul. -- > > See Documentation/SubmitChecklist #1: > > 1: If you use a facility then #include the file that defines/declares >that facility. Don't depend on other header files pulling in ones >that you > use.http://git.kernel.org/cgit/linux/kernel/git/paulg/init.git/commit/?id=3d83b6b84210066f0886b0916136fa49ca61704d > > >> arch/x86/kernel/devicetree.c:67:1: error: type defaults to ‘int’ in >> declaration of ‘module_init’ [-Werror=implicit-int] >> arch/x86/kernel/devicetree.c:67:1: warning: parameter names (without >> types) in function declaration [enabled by default] >> arch/x86/kernel/devicetree.c:60:19: warning: ‘add_bus_probe’ defined >> but not used [-Wunused-function] >> static int __init add_bus_probe(void) >>^ >> cc1: some warnings being treated as errors >> make[2]: *** [arch/x86/kernel/devicetree.o] Error 1 >> > > > -- > ~Randy > -- > To unsubscribe from this list: send the line "unsubscribe linux-next" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 04/10] base: power: Add generic OF-based power domain look-up
Hi Stephen, On 23.01.2014 01:18, Stephen Boyd wrote: On 01/11, Tomasz Figa wrote: + +/** + * of_genpd_lock() - Lock access to of_genpd_providers list + */ +static void of_genpd_lock(void) +{ + mutex_lock(_genpd_mutex); +} + +/** + * of_genpd_unlock() - Unlock access to of_genpd_providers list + */ +static void of_genpd_unlock(void) +{ + mutex_unlock(_genpd_mutex); +} Why do we need these functions? Can't we just call mutex_lock/unlock directly? That would be fine as well, I guess. Just duplicated the pattern used in CCF, but can remove them in next version if it's found to be better. + +/** + * of_genpd_add_provider() - Register a domain provider for a node + * @np: Device node pointer associated with domain provider + * @genpd_src_get: callback for decoding domain + * @data: context pointer for @genpd_src_get callback. These look a little outdated. Oops, missed this. + */ +int of_genpd_add_provider(struct device_node *np, genpd_xlate_t xlate, + void *data) +{ + struct of_genpd_provider *cp; + + cp = kzalloc(sizeof(struct of_genpd_provider), GFP_KERNEL); Please use sizeof(*cp) instead. Right. + if (!cp) + return -ENOMEM; + + cp->node = of_node_get(np); + cp->data = data; + cp->xlate = xlate; + + of_genpd_lock(); + list_add(>link, _genpd_providers); + of_genpd_unlock(); + pr_debug("Added domain provider from %s\n", np->full_name); + + return 0; +} +EXPORT_SYMBOL_GPL(of_genpd_add_provider); + [...] + +/* See of_genpd_get_from_provider(). */ +static struct generic_pm_domain *__of_genpd_get_from_provider( + struct of_phandle_args *genpdspec) +{ + struct of_genpd_provider *provider; + struct generic_pm_domain *genpd = ERR_PTR(-ENOENT); Can this be -EPROBE_DEFER so that we can defer probe until a later time if the power domain provider hasn't registered yet? Yes, this could be useful. Makes me wonder why clock code (on which I based this code) doesn't have it done this way. + + /* Check if we have such a provider in our array */ + list_for_each_entry(provider, _genpd_providers, link) { + if (provider->node == genpdspec->np) + genpd = provider->xlate(genpdspec, provider->data); + if (!IS_ERR(genpd)) + break; + } + + return genpd; +} + [...] +static int of_genpd_notifier_call(struct notifier_block *nb, + unsigned long event, void *data) +{ + struct device *dev = data; + int ret; + + if (!dev->of_node) + return NOTIFY_DONE; + + switch (event) { + case BUS_NOTIFY_BIND_DRIVER: + ret = of_genpd_add_to_domain(dev); + break; + + case BUS_NOTIFY_UNBOUND_DRIVER: + ret = of_genpd_del_from_domain(dev); + break; + + default: + return NOTIFY_DONE; + } + + return notifier_from_errno(ret); +} + +static struct notifier_block of_genpd_notifier_block = { + .notifier_call = of_genpd_notifier_call, +}; + +static int of_genpd_init(void) +{ + return bus_register_notifier(_bus_type, + _genpd_notifier_block); +} +core_initcall(of_genpd_init); Would it be possible to call the of_genpd_add_to_domain() and of_genpd_del_from_domain() functions directly in the driver core, similar to how the pinctrl framework has a hook in there? That way we're not relying on any initcall ordering for this. Hmm, the initcall here just registers a notifier, which needs to be done just before any driver registers. So, IMHO, current variant is safe, given an early enough initcall level is used. However, doing it the pinctrl way might still have an advantage of not relying on specific bus type, so this is worth consideration indeed. I'd like to hear Rafael's and Kevin's opinions on this (and other comments above too). Best regards, Tomasz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 04/10] base: power: Add generic OF-based power domain look-up
On 01/20, Tomasz Figa wrote: > Hi Kevin, > > On 14.01.2014 16:42, Kevin Hilman wrote: > >Tomasz Figa writes: > > > >>This patch introduces generic code to perform power domain look-up using > >>device tree and automatically bind devices to their power domains. > >>Generic device tree binding is introduced to specify power domains of > >>devices in their device tree nodes. > >> > >>Backwards compatibility with legacy Samsung-specific power domain > >>bindings is provided, but for now the new code is not compiled when > >>CONFIG_ARCH_EXYNOS is selected to avoid collision with legacy code. This > >>will change as soon as Exynos power domain code gets converted to use > >>the generic framework in further patch. > >> > >>Signed-off-by: Tomasz Figa > > > >I haven't read through this in detail yet, but wanted to make sure that > >the DT representation can handle nested power domains. At least > >SH-mobile has a hierarchy of power domains and the genpd code can handle > >that, so wanted to make sure that the DT representation can handle it as > >well. > > The representation of power domains themselves as implied by this > patch is fully platform-specific. The only generic part is the > #power-domain-cells property, which defines the number of cells > needed to identify the power domain of given provider. You are free > to have any platform-specific properties (or even generic ones, > added on top of this patch) to let you specify the hierarchy in DT. > (Semi-related to this thread, but not really the patchset) I'd like to have a way to say that this power domain is a subdomain of another domain provided by a different power domain provider driver. From what I can tell, the only way to reparent domains as of today is by name or reference and you have to make a function call to do it (pm_genpd_add_subdomain_names() or pm_genpd_add_subdomain()). This is annoying in the case where all the power domains are not regsitered within the same driver because we don't know which driver comes first. It would be great if there was a way to specify this relationship explicitly when initializing a power domain so that the reparenting is done automatically without requiring any explicit function call. Perhaps DT could specify this? Or we could add another field to the generic_power_domain struct like parent_name? -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fanotify use after free.
On Wed, Jan 22, 2014 at 04:08:52PM -0800, Linus Torvalds wrote: > On Wed, Jan 22, 2014 at 3:36 PM, Jan Kara wrote: > > > > But refcounting seems like an overkill for this - there is exactly one > > fanotify_response_event structure iff it is a permission event. So > > something like the (completely untested) attached patch should fix the > > problem. But I agree it's a bit ugly so we might want something different. > > I'll try to think about something better tomorrow. > > Ok, In the meantime, Dave, can you verify whether this hacky patch > fixes your problem? It actually seems worse. I see the tail end of what looks like a slab corruption trace, and then a total lockup. And of course none of this makes it over ttyUSB0 because it happens so early. Grr. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH v3] xen/grant-table: Avoid m2p_override during mapping
Zoltan Kiss wrote: >The grant mapping API does m2p_override unnecessarily: only gntdev >needs it, >for blkback and future netback patches it just cause a lock contention, >as >those pages never go to userspace. Therefore this series does the >following: >- the original functions were renamed to __gnttab_[un]map_refs, with a >new > parameter m2p_override >- based on m2p_override either they follow the original behaviour, or >just set > the private flag and call set_phys_to_machine >- gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs >with > m2p_override false >- a new function gnttab_[un]map_refs_userspace provides the old >behaviour You don't say anything about the 'return ret' changed to 'return 0'. Any particular reason for that? Thanks > >v2: >- move the storing of the old mfn in page->index to gnttab_map_refs >- move the function header update to a separate patch > >v3: >- a new approach to retain old behaviour where it needed >- squash the patches into one > >Signed-off-by: Zoltan Kiss >Suggested-by: David Vrabel >--- > drivers/block/xen-blkback/blkback.c | 15 +++ > drivers/xen/gntdev.c| 13 +++--- >drivers/xen/grant-table.c | 81 >+-- > include/xen/grant_table.h |8 +++- > 4 files changed, 87 insertions(+), 30 deletions(-) > >diff --git a/drivers/block/xen-blkback/blkback.c >b/drivers/block/xen-blkback/blkback.c >index 6620b73..875025f 100644 >--- a/drivers/block/xen-blkback/blkback.c >+++ b/drivers/block/xen-blkback/blkback.c >@@ -285,8 +285,7 @@ static void free_persistent_gnts(struct xen_blkif >*blkif, struct rb_root *root, > > if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST || > !rb_next(_gnt->node)) { >- ret = gnttab_unmap_refs(unmap, NULL, pages, >- segs_to_unmap); >+ ret = gnttab_unmap_refs(unmap, pages, segs_to_unmap); > BUG_ON(ret); > put_free_pages(blkif, pages, segs_to_unmap); > segs_to_unmap = 0; >@@ -321,8 +320,7 @@ static void unmap_purged_grants(struct work_struct >*work) > pages[segs_to_unmap] = persistent_gnt->page; > > if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST) { >- ret = gnttab_unmap_refs(unmap, NULL, pages, >- segs_to_unmap); >+ ret = gnttab_unmap_refs(unmap, pages, segs_to_unmap); > BUG_ON(ret); > put_free_pages(blkif, pages, segs_to_unmap); > segs_to_unmap = 0; >@@ -330,7 +328,7 @@ static void unmap_purged_grants(struct work_struct >*work) > kfree(persistent_gnt); > } > if (segs_to_unmap > 0) { >- ret = gnttab_unmap_refs(unmap, NULL, pages, segs_to_unmap); >+ ret = gnttab_unmap_refs(unmap, pages, segs_to_unmap); > BUG_ON(ret); > put_free_pages(blkif, pages, segs_to_unmap); > } >@@ -670,15 +668,14 @@ static void xen_blkbk_unmap(struct xen_blkif >*blkif, > GNTMAP_host_map, pages[i]->handle); > pages[i]->handle = BLKBACK_INVALID_HANDLE; > if (++invcount == BLKIF_MAX_SEGMENTS_PER_REQUEST) { >- ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, >- invcount); >+ ret = gnttab_unmap_refs(unmap, unmap_pages, invcount); > BUG_ON(ret); > put_free_pages(blkif, unmap_pages, invcount); > invcount = 0; > } > } > if (invcount) { >- ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount); >+ ret = gnttab_unmap_refs(unmap, unmap_pages, invcount); > BUG_ON(ret); > put_free_pages(blkif, unmap_pages, invcount); > } >@@ -740,7 +737,7 @@ again: > } > > if (segs_to_map) { >- ret = gnttab_map_refs(map, NULL, pages_to_gnt, segs_to_map); >+ ret = gnttab_map_refs(map, pages_to_gnt, segs_to_map); > BUG_ON(ret); > } > >diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c >index e41c79c..e652c0e 100644 >--- a/drivers/xen/gntdev.c >+++ b/drivers/xen/gntdev.c >@@ -284,8 +284,10 @@ static int map_grant_pages(struct grant_map *map) > } > > pr_debug("map %d+%d\n", map->index, map->count); >- err = gnttab_map_refs(map->map_ops, use_ptemod ? map->kmap_ops : >NULL, >- map->pages, map->count); >+ err = gnttab_map_refs_userspace(map->map_ops, >+ use_ptemod ? map->kmap_ops : NULL, >+ map->pages, >+ map->count); > if (err) > return err; > >@@
Re: MAINTAINERS tree branches [xen tip as an example]
"Luis R. Rodriguez" wrote: >On Mon, Jan 20, 2014 at 2:38 AM, David Vrabel >wrote: >> On 17/01/14 23:02, Luis R. Rodriguez wrote: >>> As per linux-next Next/Trees [0], and a recent January MAINTAINERS >patch [1] >>> from David one of the xen development kernel git trees to track is >>> xen/git.git [2], this tree however gives has undefined references >when doing a >>> fresh clone [shown below], but as expected does work well when only >cloning >>> the linux-next branch [also below]. While I'm sure this is fine for >>> folks who can do the guess work do we really want to live with trees >like >>> these on MAINTAINERS ? The MAINTAINERS file doesn't let us specify >branches >>> required, so perhaps it should -- if we want to live with these ? >Curious, how >>> many other git are there with a similar situation ? >> >> We don't recommend doing development work for the Xen subsystem based >on >> xen/tip.git so I think it's fine to have to checkout the specific >branch >> you are interested in. > >OK thanks. > >>> The xen project web site actually lists [3] Konrad's xen git tree >[4] for >>> development as the primary development tree, that probably should be >>> updated now, and likely with instructions to clone only the >linux-next >>> branch ? >> >> I've updated the wiki to read: >> >> For development the recommended branch is: >> >> The mainline Linus linux.git tree. > >Is the delta of what is queued for the next release typically small? Depends >Otherwise someone doing development based on linux.git alone should >have conflicts with anything on the queue, no? Potentially. Usually the maintainer will spot where there are potential conflicts and give you a branch to base on. > >> To see what's queued for the next release, the next merge window, >> and other work in progress: >> >> The Xen subsystem maintainers' tip.git tree. > >That's the thing, you can't clone the tip.git tree today well, there >are undefined references and git gives up, asking for the linux-next >branch however did work. It should work now. I made master point to 3.13. > > Luis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[git pull] device mapper changes for 3.14
The following changes since commit 319e2e3f63c348a9b66db4667efa73178e18b17d: Linux 3.13-rc4 (2013-12-15 12:31:33 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/dm-3.14-changes for you to fetch changes up to 5066a4df1f427faac8372d20494483bb09a4a1cd: dm log userspace: allow mark requests to piggyback on flush requests (2014-01-21 23:46:27 -0500) A set of device-mapper changes for 3.14. A lot of attention was paid to improving the thin-provisioning target's handling of metadata operation failures and running out of space. A new 'error_if_no_space' feature was added to allow users to error IOs rather than queue them when either the data or metadata space is exhausted. Additional fixes/features include: - a few fixes to properly support thin metadata device resizing - a solution for reliably waiting for a DM device's embedded kobject to be released before destroying the device - old dm-snapshot is updated to use the dm-bufio interface to take advantage of readahead capabilities that improve snapshot activation - new dm-cache target tunables to control how quickly data is promoted to the cache (fast) device - improved write efficiency of cluster mirror target by combining userspace flush and mark requests Chuansheng Liu (1): dm snapshot: call destroy_work_on_stack() to pair with INIT_WORK_ONSTACK() Dongmao Zhang (1): dm log userspace: allow mark requests to piggyback on flush requests Joe Thornber (9): dm thin: fix discard support to a previously shared block dm thin: return error from alloc_data_block if pool is not in write mode dm thin: factor out check_low_water_mark and use bools dm thin: handle metadata failures more consistently dm cache policy mq: introduce three promotion threshold tunables dm space map common: make sure new space is used during extend dm space map metadata: fix extending the space map dm btree: add dm_btree_find_lowest_key dm space map metadata: fix bug in resizing of thin metadata Mike Snitzer (14): dm thin: initialize dm_thin_new_mapping returned by get_next_mapping dm space map metadata: limit errors in sm_metadata_new_block dm persistent data: cleanup dm-thin specific references in text dm thin: use bool rather than unsigned for flags in structures dm thin: add mappings to end of prepared_* lists dm thin: log info when growing the data or metadata device dm thin: cleanup and improve no space handling dm thin: requeue bios to DM core if no_free_space and in read-only mode dm thin: add error_if_no_space feature dm thin: eliminate the no_free_space flag dm thin: fix set_pool_mode exposed pool operation races dm cache: add block sizes and total cache blocks to status output dm thin: fix pool feature parsing dm cache: add policy name to status output Mikulas Patocka (9): dm table: remove unused buggy code that extends the targets array dm delay: use per-bio data instead of a mempool and slab cache dm: remove pointless kobject comparison in dm_get_from_kobject dm: wait until embedded kobject is released before destroying a device dm snapshot: use GFP_KERNEL when initializing exceptions dm snapshot: prepare for switch to using dm-bufio dm snapshot: use dm-bufio dm snapshot: use dm-bufio prefetch dm sysfs: fix a module unload race Wei Yongjun (1): dm cache policy mq: use list_del_init instead of list_del + INIT_LIST_HEAD Documentation/device-mapper/cache-policies.txt | 16 +- Documentation/device-mapper/cache.txt | 51 ++-- Documentation/device-mapper/thin-provisioning.txt | 7 + drivers/md/Kconfig | 11 +- drivers/md/Makefile| 1 + drivers/md/dm-bufio.c | 36 ++- drivers/md/dm-bufio.h | 12 + drivers/md/dm-builtin.c| 48 drivers/md/dm-cache-policy-mq.c| 70 +++-- drivers/md/dm-cache-policy.c | 4 + drivers/md/dm-cache-policy.h | 6 + drivers/md/dm-cache-target.c | 20 +- drivers/md/dm-delay.c | 35 +-- drivers/md/dm-log-userspace-base.c | 206 +++ drivers/md/dm-snap-persistent.c| 87 +-- drivers/md/dm-snap.c | 10 +- drivers/md/dm-sysfs.c | 5 +- drivers/md/dm-table.c | 22 +- drivers/md/dm-thin-metadata.c | 20 ++ drivers/md/dm-thin-metadata.h | 4 +-
Re: [PATCH v2] ACPI: Fix acpi_evaluate_object() return value check
Yijing Wang wrote: >Fix acpi_evaluate_object() return value check, >shoud acpi_status not int. Should be? Your mailer also ate the word 'to' . > >Signed-off-by: Yijing Wang >--- > >v1->v2: Add CC to the related subsystem MAINTAINERS. > >--- > drivers/gpu/drm/i915/intel_acpi.c | 13 +++-- > drivers/gpu/drm/nouveau/core/subdev/mxm/base.c |6 +++--- > drivers/gpu/drm/nouveau/nouveau_acpi.c | 13 +++-- > drivers/pci/pci-label.c|6 +++--- > 4 files changed, 20 insertions(+), 18 deletions(-) > >diff --git a/drivers/gpu/drm/i915/intel_acpi.c >b/drivers/gpu/drm/i915/intel_acpi.c >index dfff090..7ea00e5 100644 >--- a/drivers/gpu/drm/i915/intel_acpi.c >+++ b/drivers/gpu/drm/i915/intel_acpi.c >@@ -35,7 +35,7 @@ static int intel_dsm(acpi_handle handle, int func) > union acpi_object params[4]; > union acpi_object *obj; > u32 result; >- int ret = 0; >+ acpi_status status; > > input.count = 4; > input.pointer = params; >@@ -50,8 +50,8 @@ static int intel_dsm(acpi_handle handle, int func) > params[3].package.count = 0; > params[3].package.elements = NULL; > >- ret = acpi_evaluate_object(handle, "_DSM", , ); >- if (ret) { >+ status = acpi_evaluate_object(handle, "_DSM", , ); >+ if (ACPI_FAILURE(status)) { > DRM_DEBUG_DRIVER("failed to evaluate _DSM: %d\n", ret); > return ret; > } >@@ -141,7 +141,8 @@ static void intel_dsm_platform_mux_info(void) > struct acpi_object_list input; > union acpi_object params[4]; > union acpi_object *pkg; >- int i, ret; >+ acpi_status status; >+ int i; > > input.count = 4; > input.pointer = params; >@@ -156,9 +157,9 @@ static void intel_dsm_platform_mux_info(void) > params[3].package.count = 0; > params[3].package.elements = NULL; > >- ret = acpi_evaluate_object(intel_dsm_priv.dhandle, "_DSM", , >+ acpi_status = acpi_evaluate_object(intel_dsm_priv.dhandle, "_DSM", >, > ); >- if (ret) { >+ if (ACPI_FAILURE(status)) { > DRM_DEBUG_DRIVER("failed to evaluate _DSM: %d\n", ret); > goto out; > } >diff --git a/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c >b/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c >index 1291204..3920943 100644 >--- a/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c >+++ b/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c >@@ -114,14 +114,14 @@ mxm_shadow_dsm(struct nouveau_mxm *mxm, u8 >version) > struct acpi_buffer retn = { ACPI_ALLOCATE_BUFFER, NULL }; > union acpi_object *obj; > acpi_handle handle; >- int ret; >+ acpi_status status; > > handle = ACPI_HANDLE(>pdev->dev); > if (!handle) > return false; > >- ret = acpi_evaluate_object(handle, "_DSM", , ); >- if (ret) { >+ status = acpi_evaluate_object(handle, "_DSM", , ); >+ if (ACPI_FAILURE(status)) { > nv_debug(mxm, "DSM MXMS failed: %d\n", ret); > return false; > } >diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c >b/drivers/gpu/drm/nouveau/nouveau_acpi.c >index ba0183f..6f810f2 100644 >--- a/drivers/gpu/drm/nouveau/nouveau_acpi.c >+++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c >@@ -82,7 +82,8 @@ static int nouveau_optimus_dsm(acpi_handle handle, >int func, int arg, uint32_t * > struct acpi_object_list input; > union acpi_object params[4]; > union acpi_object *obj; >- int i, err; >+ acpi_status status; >+ int i; > char args_buff[4]; > > input.count = 4; >@@ -101,8 +102,8 @@ static int nouveau_optimus_dsm(acpi_handle handle, >int func, int arg, uint32_t * > args_buff[i] = (arg >> i * 8) & 0xFF; > params[3].buffer.pointer = args_buff; > >- err = acpi_evaluate_object(handle, "_DSM", , ); >- if (err) { >+ status = acpi_evaluate_object(handle, "_DSM", , ); >+ if (ACPI_FAILURE(status)) { > printk(KERN_INFO "failed to evaluate _DSM: %d\n", err); > return err; > } >@@ -134,7 +135,7 @@ static int nouveau_dsm(acpi_handle handle, int >func, int arg, uint32_t *result) > struct acpi_object_list input; > union acpi_object params[4]; > union acpi_object *obj; >- int err; >+ acpi_status status; > > input.count = 4; > input.pointer = params; >@@ -148,8 +149,8 @@ static int nouveau_dsm(acpi_handle handle, int >func, int arg, uint32_t *result) > params[3].type = ACPI_TYPE_INTEGER; > params[3].integer.value = arg; > >- err = acpi_evaluate_object(handle, "_DSM", , ); >- if (err) { >+ status = acpi_evaluate_object(handle, "_DSM", , ); >+ if (ACPI_FAILURE(status)) { > printk(KERN_INFO "failed to evaluate _DSM: %d\n", err); > return err; > } >diff --git a/drivers/pci/pci-label.c
Re: [V0 PATCH] xen/pvh: set some cr flags upon vcpu start
Mukesh Rathor wrote: >pvh was designed to start with pv flags, but a commit in xen tree >51e2cac257ec8b4080d89f0855c498cbbd76a5e5 removed some of the flags as >they are not necessary. As a result, these CR flags must be set in the >guest. > >Signed-off-by: Roger Pau Monne >Signed-off-by: Mukesh Rathor >--- >arch/x86/xen/enlighten.c | 43 >+-- > arch/x86/xen/smp.c |2 +- > arch/x86/xen/xen-ops.h |2 +- > 3 files changed, 39 insertions(+), 8 deletions(-) > >diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c >index 628099a..4a2aaa6 100644 >--- a/arch/x86/xen/enlighten.c >+++ b/arch/x86/xen/enlighten.c >@@ -1410,12 +1410,8 @@ static void __init >xen_boot_params_init_edd(void) > * Set up the GDT and segment registers for -fstack-protector. Until > * we do this, we have to be careful not to call any stack-protected > * function, which is most of the kernel. >- * >- * Note, that it is refok - because the only caller of this after init >- * is PVH which is not going to use xen_load_gdt_boot or other >- * __init functions. > */ >-void __ref xen_setup_gdt(int cpu) >+static void xen_setup_gdt(int cpu) > { > if (xen_feature(XENFEAT_auto_translated_physmap)) { > #ifdef CONFIG_X86_64 >@@ -1463,13 +1459,48 @@ void __ref xen_setup_gdt(int cpu) > pv_cpu_ops.load_gdt = xen_load_gdt; > } > >+/* >+ * A pv guest starts with default flags that are not set for pvh, set >them >+ * here asap. >+ */ >+static void xen_pvh_set_cr_flags(int cpu) >+{ >+ write_cr0(read_cr0() | X86_CR0_MP | X86_CR0_WP | X86_CR0_AM); I think it would be good to mention that Xen unconditionally sets PE and ET for HVM guests and that additionally for PVH the PG is set. What about the NE? That looks to be missing from the list above? Should we set it? >+ >+ if (!cpu) >+ return; >+ /* >+ * Unlike PV, for pvh xen does not set: PSE PGE OSFXSR OSXMMEXCPT >+ * For BSP, PSE PGE will be set in probe_page_size_mask(), for AP >+ * set them here. For all, OSFXSR OSXMMEXCPT will be set in fpu_init >+ */ >+ if (cpu_has_pse) >+ set_in_cr4(X86_CR4_PSE); >+ >+ if (cpu_has_pge) >+ set_in_cr4(X86_CR4_PGE); >+} >+ >+/* >+ * Note, that it is refok - because the only caller of this after init >+ * is PVH which is not going to use xen_load_gdt_boot or other >+ * __init functions. >+ */ >+void __ref xen_pvh_secondary_vcpu_init(int cpu) >+{ >+ xen_setup_gdt(cpu); >+ xen_pvh_set_cr_flags(cpu); >+} >+ > static void __init xen_pvh_early_guest_init(void) > { > if (!xen_feature(XENFEAT_auto_translated_physmap)) > return; > >- if (xen_feature(XENFEAT_hvm_callback_vector)) >+ if (xen_feature(XENFEAT_hvm_callback_vector)) { > xen_have_vector_callback = 1; >+ xen_pvh_set_cr_flags(0); >+ } > > #ifdef CONFIG_X86_32 > BUG(); /* PVH: Implement proper support. */ >diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c >index 5e46190..a18eadd 100644 >--- a/arch/x86/xen/smp.c >+++ b/arch/x86/xen/smp.c >@@ -105,7 +105,7 @@ static void cpu_bringup_and_idle(int cpu) > #ifdef CONFIG_X86_64 > if (xen_feature(XENFEAT_auto_translated_physmap) && > xen_feature(XENFEAT_supervisor_mode_kernel)) >- xen_setup_gdt(cpu); >+ xen_pvh_secondary_vcpu_init(cpu); > #endif > cpu_bringup(); > cpu_startup_entry(CPUHP_ONLINE); >diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h >index 9059c24..1cb6f4c 100644 >--- a/arch/x86/xen/xen-ops.h >+++ b/arch/x86/xen/xen-ops.h >@@ -123,5 +123,5 @@ __visible void xen_adjust_exception_frame(void); > > extern int xen_panic_handler_init(void); > >-void xen_setup_gdt(int cpu); >+void xen_pvh_secondary_vcpu_init(int cpu); > #endif /* XEN_OPS_H */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: MAINTAINERS tree branches [xen tip as an example]
"Luis R. Rodriguez" wrote: >As per linux-next Next/Trees [0], and a recent January MAINTAINERS >patch [1] >from David one of the xen development kernel git trees to track is >xen/git.git [2], this tree however gives has undefined references when >doing a >fresh clone [shown below], but as expected does work well when only >cloning >the linux-next branch [also below]. While I'm sure this is fine for >folks who can do the guess work do we really want to live with trees >like >these on MAINTAINERS ? The MAINTAINERS file doesn't let us specify >branches >required, so perhaps it should -- if we want to live with these ? The master branch can be linked to the #linux-next or stable/for-linus. That would solve the problem I think. >Curious, how >many other git are there with a similar situation ? > >The xen project web site actually lists [3] Konrad's xen git tree [4] >for >development as the primary development tree, that probably should be >updated now, and likely with instructions to clone only the linux-next >branch ? Thank you for reporting. Will fix it next week if nobody else beats me to it. > >[0] >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/Next/Trees#n176 >[1] http://lists.xen.org/archives/html/xen-devel/2014-01/msg01504.html >[2] git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git >[3] >http://wiki.xenproject.org/wiki/Xen_Repositories#Primary_Xen_Repository >[4] git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git > >mcgrof@bubbles ~ $ git clone >git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git --reference >linux/.git >Cloning into 'tip'... >remote: Counting objects: 2806, done. >remote: Compressing objects: 100% (334/334), done. >remote: Total 1797 (delta 1511), reused 1646 (delta 1462) >Receiving objects: 100% (1797/1797), 711.01 KiB | 640.00 KiB/s, done. >Resolving deltas: 100% (1511/1511), completed with 306 local objects. >Checking connectivity... done. >warning: remote HEAD refers to nonexistent ref, unable to checkout. > >mcgrof@work ~ $ git clone >git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git -b linux-next >--reference linux/.git >Cloning into 'tip'... >remote: Counting objects: 2806, done. >remote: Compressing objects: 100% (377/377), done. >remote: Total 1797 (delta 1545), reused 1607 (delta 1419) >Receiving objects: 100% (1797/1797), 485.23 KiB | 0 bytes/s, done. >Resolving deltas: 100% (1545/1545), completed with 327 local objects. >Checking connectivity... done. >Checking out files: 100% (44979/44979), done. > > Luis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [V0 PATCH] xen/pvh: set some cr flags upon vcpu start
Mukesh Rathor wrote: >pvh was designed to start with pv flags, but a commit in xen tree >51e2cac257ec8b4080d89f0855c498cbbd76a5e5 removed some of the flags as "Name of the patch in the Xen tree" >they are not necessary. As a result, these CR flags must be set in the >guest. > >Signed-off-by: Roger Pau Monne You missed modifying the patch to reflect the authorship to be Roger's. Please use git commit --amend --author "somebody s name " Also Roger should be credited with Reported-by. I can add that. >Signed-off-by: Mukesh Rathor >--- >arch/x86/xen/enlighten.c | 43 >+-- > arch/x86/xen/smp.c |2 +- > arch/x86/xen/xen-ops.h |2 +- > 3 files changed, 39 insertions(+), 8 deletions(-) > >diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c >index 628099a..4a2aaa6 100644 >--- a/arch/x86/xen/enlighten.c >+++ b/arch/x86/xen/enlighten.c >@@ -1410,12 +1410,8 @@ static void __init >xen_boot_params_init_edd(void) > * Set up the GDT and segment registers for -fstack-protector. Until > * we do this, we have to be careful not to call any stack-protected > * function, which is most of the kernel. >- * >- * Note, that it is refok - because the only caller of this after init >- * is PVH which is not going to use xen_load_gdt_boot or other >- * __init functions. > */ >-void __ref xen_setup_gdt(int cpu) >+static void xen_setup_gdt(int cpu) > if (xen_feature(XENFEAT_auto_translated_physmap)) { > #ifdef CONFIG_X86_64 >@@ -1463,13 +1459,48 @@ void __ref xen_setup_gdt(int cpu) > pv_cpu_ops.load_gdt = xen_load_gdt; > } > >+/* >+ * A pv guest starts with default flags that are not set for pvh, set >them >+ * here asap. >+ */ >+static void xen_pvh_set_cr_flags(int cpu) >+{ Pls add: /* See 'secondary_startup_64' for how bare metal does it. */ >+ write_cr0(read_cr0() | X86_CR0_MP | X86_CR0_WP | X86_CR0_AM); >+ >+ if (!cpu) >+ return; >+ /* >+ * Unlike PV, for pvh xen does not set: PSE PGE OSFXSR OSXMMEXCPT >+ * For BSP, PSE PGE will be set in probe_page_size_mask(), for AP >+ * set them here. For all, OSFXSR Might want to mention that for AP on bare metal they are set in 'secondary_start_64' ... >+ */ Is it OK to set this twice? Meaning remove the 'if (!cpu)..' check so that this code path is run for BSP and AP? >+ if (cpu_has_pse) >+ set_in_cr4(X86_CR4_PSE); >+ >+ if (cpu_has_pge) >+ set_in_cr4(X86_CR4_PGE); >+} >+ >+/* >+ * Note, that it is refok - because the only caller of this after init >+ * is PVH which is not going to use xen_load_gdt_boot or other >+ * __init functions. Hmm. You must be using and older tree. The new one has __ref comment. >+ */ >+void __ref xen_pvh_secondary_vcpu_init(int cpu) >+{ >+ xen_setup_gdt(cpu); >+ xen_pvh_set_cr_flags(cpu); >+} >+ > static void __init xen_pvh_early_guest_init(void) > { > if (!xen_feature(XENFEAT_auto_translated_physmap)) > return; > >- if (xen_feature(XENFEAT_hvm_callback_vector)) >+ if (xen_feature(XENFEAT_hvm_callback_vector)) > xen_have_vector_callback = 1; >+ xen_pvh_set_cr_flags(0); >+ } > > #ifdef CONFIG_X86_32 > BUG(); /* PVH: Implement proper support. */ >diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c >index 5e46190..a18eadd 100644 >--- a/arch/x86/xen/smp.c >+++ b/arch/x86/xen/smp.c >@@ -105,7 +105,7 @@ static void cpu_bringup_and_idle(int cpu) > #ifdef CONFIG_X86_64 > if (xen_feature(XENFEAT_auto_translated_physmap) && > xen_feature(XENFEAT_supervisor_mode_kernel)) >- xen_setup_gdt(cpu); >+ xen_pvh_secondary_vcpu_init(cpu); > #endif > cpu_bringup(); > cpu_startup_entry(CPUHP_ONLINE); >diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h >index 9059c24..1cb6f4c 100644 >--- a/arch/x86/xen/xen-ops.h >+++ b/arch/x86/xen/xen-ops.h >@@ -123,5 +123,5 @@ __visible void xen_adjust_exception_frame(void); > > extern int xen_panic_handler_init(void); > >-void xen_setup_gdt(int cpu); >+void xen_pvh_secondary_vcpu_init(int cpu); > #endif /* XEN_OPS_H */ Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 00/73] tree-wide: clean up some no longer required #include
[Re: [PATCH RFC 00/73] tree-wide: clean up some no longer required #include ] On 22/01/2014 (Wed 18:00) Stephen Rothwell wrote: > Hi Paul, > > On Tue, 21 Jan 2014 16:22:03 -0500 Paul Gortmaker > wrote: > > > > Where: This work exists as a queue of patches that I apply to > > linux-next; since the changes are fixing some things that currently > > can only be found there. The patch series can be found at: > > > >http://git.kernel.org/cgit/linux/kernel/git/paulg/init.git > >git://git.kernel.org/pub/scm/linux/kernel/git/paulg/init.git > > > > I've avoided annoying Stephen with another queue of patches for > > linux-next while the development content was in flux, but now that > > the merge window has opened, and new additions are fewer, perhaps he > > wouldn't mind tacking it on the end... Stephen? > > OK, I have added this to the end of linux-next today - we will see how we > go. It is called "init". Thanks, it was a great help as it uncovered a few issues in fringe arch that I didn't have toolchains for, and I've fixed all of those up. I've noticed that powerpc has been un-buildable for a while now; I have used this hack patch locally so I could run the ppc defconfigs to check that I didn't break anything. Maybe useful for linux-next in the interim? It is a hack patch -- Not-Signed-off-by: Paul Gortmaker. :) Paul. -- diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h index d27960c89a71..d0f070a2b395 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64.h +++ b/arch/powerpc/include/asm/pgtable-ppc64.h @@ -560,9 +560,9 @@ extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); #define pmd_move_must_withdraw pmd_move_must_withdraw -typedef struct spinlock spinlock_t; -static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl, -spinlock_t *old_pmd_ptl) +struct spinlock; +static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl, +struct spinlock *old_pmd_ptl) { /* * Archs like ppc64 use pgtable to store per pmd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [V0 PATCH] xen/pvh: set some cr flags upon vcpu start
Mukesh Rathor wrote: >Konrad, > >The following patch sets the bits in CR0 and CR4. Please note, I'm >working >on patch for the xen side. The CR4 features are not currently exported >to a PVH guest. The patch should really have been split in two - one for CR0 and one for CR4. Especially as the ramifications of enabling PGE are much more complex. For example - there is a need to fix up the __supported_pte_mask to allow one to use PAGE_GLOBAL. There might be other things too that need tweaking. > >Roger, I added your SOB line, please lmk if I need to add anything >else. > >This patch was build on top of a71accb67e7645c68061cec2bee6067205e439fc >in >konrad devel/pvh.v13 branch. Pls use #linux-next at this stage. Thank you! > >thanks >Mukesh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] xen-blkfront: remove type check from blkfront_setup_discard
Boris Ostrovsky wrote: >On 01/13/2014 04:30 AM, Olaf Hering wrote: >> On Fri, Jan 10, Boris Ostrovsky wrote: >> >>> I don't know discard code works but it seems to me that if you pass, >for >>> example, zero as discard_granularity (which may happen if >xenbus_gather() >>> fails) then blkdev_issue_discard() in the backend will set >granularity to 1 >>> and continue with discard. This may not be what the the guest admin >>> requested. And he won't know about this since no error message is >printed >>> anywhere. >> If I understand the code using granularity/alignment correctly, both >are >> optional properties. So if the granularity is just 1 it means byte >> ranges, which is fine if the backend uses FALLOC_FL_PUNCH_HOLE. Also >> both properties are not admin controlled, for phy the blkbk drivers >just >> passes on what it gets from the underlying hardware. >> >>> Similarly, if xenbug_gather("discard-secure") fails, I think the >code will >>> assume that secure discard has not been requested. I don't know what >>> security implications this will have but it sounds bad to me. >> There are no security implications, if the backend does not advertise >it >> then its not present. > >Right. But my questions was what if the backend does advertise it and >wants the frontent to use it but xenbus_gather() in the frontend fails. > >Do we want to silently continue without discard-secure? Is this safe? > Yes > >-boris > >> >> After poking around some more it seems that blkif.h is the spec, it >does >> not say anything that the three properties are optional. Also the >> backend drivers in sles11sp2 and mainline create all three properties >> unconditionally. So I think a better change is to expect all three >> properties in the frontend. I will send another version of the patch. >> >> >> Olaf -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH v2] xen-blkfront: remove type check from blkfront_setup_discard
Jan Beulich wrote: On 13.01.14 at 14:45, David Vrabel wrote: >> On 13/01/14 13:16, Jan Beulich wrote: >> On 13.01.14 at 14:00, Ian Campbell >wrote: On Mon, 2014-01-13 at 12:34 +, Jan Beulich wrote: On 13.01.14 at 13:01, Olaf Hering wrote: >> On Mon, Jan 13, Jan Beulich wrote: >> >>> You can't do this in one go - the first two and the last one may >be >>> set independently (and are independent in their meaning), and >>> hence need to be queried independently (xenbus_gather() fails >>> on the first absent value). >> >> Yes, thats the purpose. Since the properties are required its an >all or >> nothing thing. If they are truly optional then blkif.h should be >updated >> to say that. > > They _are_ optional. But is it true that either they are all present or they are all >absent? >>> >>> No, it's not. discard-secure is independent of the other two (but >>> those other two are tied together). >> >> Can we have a patch to blkif.h that clarifies this? >> >> e.g., >> >> feature-discard >> >>... >> >>discard-granularity and discard-offset must also be present if >>feature-discard is enabled > >It would be "may" here too afaict. But I'll defer to Konrad, who >has done more work in this area... > >Jan > >>discard-secure may also be present if feature-discard is enabled. >> >> David > > > > >___ >Xen-devel mailing list >xen-de...@lists.xen.org >http://lists.xen.org/xen-devel It is all 'may'. If there is just 'feature-discard' without any other options that is OK. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] tick: A few more cleanups
On Thu, Jan 16, 2014 at 04:41:48PM +0100, Frederic Weisbecker wrote: > Ingo, > > Please pull the timers/core branch that can be found at: > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git > timers/core > > HEAD: 8fe8ff09ce3b5750e1f3e45a1f4a81d59c7ff1f1 > > > Nothing very exiting, just a bunch of non-critical cleanups for the next > merge window: > > 1) Make the IRQ tick APIs naming more symetric > > 2) Optimize a bit jiffies_lock code coverage > > 3) Whitespace fixes from Alex Shi > > 4) Fix overflow in scheduler tick max deferment calculation. Given the > current 1 second max limitation, this bug shouldn't happen in mainline. > It's rather to prepare for making this value tunable. Or simply in case > we change the current constant. > > Thanks, > Frederic > --- > > Frederic Weisbecker (2): > tick: Rename tick_check_idle() to tick_irq_enter() > nohz: Get timekeeping max deferment outside jiffies_lock > > Alex Shi (1): > nohz_full: fix code style issue of tick_nohz_full_stop_tick > > Kevin Hilman (1): > sched/nohz: Fix overflow error in scheduler_tick_max_deferment() > Ping. > > include/linux/jiffies.h | 6 ++ > include/linux/tick.h | 6 +++--- > kernel/sched/core.c | 2 +- > kernel/softirq.c | 2 +- > kernel/time/tick-sched.c | 27 ++- > 5 files changed, 25 insertions(+), 18 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] gic: change access of gicc_ctrl register to read modify write.
Just checking to see anyone had time to take a look at this and comment. On Sun, Dec 8, 2013 at 12:22 PM, Feng Kan wrote: > This change is made to preserve the GIC v2 releated bits in the > GIC_CPU_CTRL register (also known as the GICC_CTLR register in spec). > The original code only set the enable/disable group bit in this register. > This code will preserve all other bits configured by the bootload except > the enable/disable bit. The main reason for this change is to allow the > bypass bits specified in the v2 spec to remain untouched by the current > GIC code. In the X-Gene platform, the bypass functionality is not used > and bypass must be disabled at all time. > > Signed-off-by: Vinayak Kale > Acked-by: Anup Patel > Signed-off-by: Feng Kan > --- > drivers/irqchip/irq-gic.c | 19 --- > 1 files changed, 16 insertions(+), 3 deletions(-) > > diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c > index d0e9480..6550ac9 100644 > --- a/drivers/irqchip/irq-gic.c > +++ b/drivers/irqchip/irq-gic.c > @@ -419,6 +419,7 @@ static void gic_cpu_init(struct gic_chip_data *gic) > void __iomem *dist_base = gic_data_dist_base(gic); > void __iomem *base = gic_data_cpu_base(gic); > unsigned int cpu_mask, cpu = smp_processor_id(); > + unsigned int ctrl_mask; > int i; > > /* > @@ -450,13 +451,21 @@ static void gic_cpu_init(struct gic_chip_data *gic) > writel_relaxed(0xa0a0a0a0, dist_base + GIC_DIST_PRI + i * 4 / > 4); > > writel_relaxed(0xf0, base + GIC_CPU_PRIMASK); > - writel_relaxed(1, base + GIC_CPU_CTRL); > + > + ctrl_mask = readl(base + GIC_CPU_CTRL); > + ctrl_mask |= 0x1; > + writel_relaxed(ctrl_mask, base + GIC_CPU_CTRL); > } > > void gic_cpu_if_down(void) > { > + unsigned int ctrl_mask; > + > void __iomem *cpu_base = gic_data_cpu_base(_data[0]); > - writel_relaxed(0, cpu_base + GIC_CPU_CTRL); > + > + ctrl_mask = readl(base + GIC_CPU_CTRL); > + ctrl_mask &= 0xfffe; > + writel_relaxed(ctrl_mask, cpu_base + GIC_CPU_CTRL); > } > > #ifdef CONFIG_CPU_PM > @@ -567,6 +576,7 @@ static void gic_cpu_restore(unsigned int gic_nr) > { > int i; > u32 *ptr; > + unsigned int ctrl_mask; > void __iomem *dist_base; > void __iomem *cpu_base; > > @@ -591,7 +601,10 @@ static void gic_cpu_restore(unsigned int gic_nr) > writel_relaxed(0xa0a0a0a0, dist_base + GIC_DIST_PRI + i * 4); > > writel_relaxed(0xf0, cpu_base + GIC_CPU_PRIMASK); > - writel_relaxed(1, cpu_base + GIC_CPU_CTRL); > + > + ctrl_mask = readl(base + GIC_CPU_CTRL); > + ctrl_mask |= 0x1; > + writel_relaxed(ctrl_mask, cpu_base + GIC_CPU_CTRL); > } > > static int gic_notifier(struct notifier_block *self, unsigned long cmd, > void *v) > -- > 1.7.6.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] gic: change access of gicc_ctrl register to read modify write.
Just checking to see anyone had time to take a look at this and comment. Thanks On Sun, Dec 8, 2013 at 12:22 PM, Feng Kan wrote: > This change is made to preserve the GIC v2 releated bits in the > GIC_CPU_CTRL register (also known as the GICC_CTLR register in spec). > The original code only set the enable/disable group bit in this register. > This code will preserve all other bits configured by the bootload except > the enable/disable bit. The main reason for this change is to allow the > bypass bits specified in the v2 spec to remain untouched by the current > GIC code. In the X-Gene platform, the bypass functionality is not used > and bypass must be disabled at all time. > > Signed-off-by: Vinayak Kale > Acked-by: Anup Patel > Signed-off-by: Feng Kan > --- > drivers/irqchip/irq-gic.c | 19 --- > 1 files changed, 16 insertions(+), 3 deletions(-) > > diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c > index d0e9480..6550ac9 100644 > --- a/drivers/irqchip/irq-gic.c > +++ b/drivers/irqchip/irq-gic.c > @@ -419,6 +419,7 @@ static void gic_cpu_init(struct gic_chip_data *gic) > void __iomem *dist_base = gic_data_dist_base(gic); > void __iomem *base = gic_data_cpu_base(gic); > unsigned int cpu_mask, cpu = smp_processor_id(); > + unsigned int ctrl_mask; > int i; > > /* > @@ -450,13 +451,21 @@ static void gic_cpu_init(struct gic_chip_data *gic) > writel_relaxed(0xa0a0a0a0, dist_base + GIC_DIST_PRI + i * 4 / > 4); > > writel_relaxed(0xf0, base + GIC_CPU_PRIMASK); > - writel_relaxed(1, base + GIC_CPU_CTRL); > + > + ctrl_mask = readl(base + GIC_CPU_CTRL); > + ctrl_mask |= 0x1; > + writel_relaxed(ctrl_mask, base + GIC_CPU_CTRL); > } > > void gic_cpu_if_down(void) > { > + unsigned int ctrl_mask; > + > void __iomem *cpu_base = gic_data_cpu_base(_data[0]); > - writel_relaxed(0, cpu_base + GIC_CPU_CTRL); > + > + ctrl_mask = readl(base + GIC_CPU_CTRL); > + ctrl_mask &= 0xfffe; > + writel_relaxed(ctrl_mask, cpu_base + GIC_CPU_CTRL); > } > > #ifdef CONFIG_CPU_PM > @@ -567,6 +576,7 @@ static void gic_cpu_restore(unsigned int gic_nr) > { > int i; > u32 *ptr; > + unsigned int ctrl_mask; > void __iomem *dist_base; > void __iomem *cpu_base; > > @@ -591,7 +601,10 @@ static void gic_cpu_restore(unsigned int gic_nr) > writel_relaxed(0xa0a0a0a0, dist_base + GIC_DIST_PRI + i * 4); > > writel_relaxed(0xf0, cpu_base + GIC_CPU_PRIMASK); > - writel_relaxed(1, cpu_base + GIC_CPU_CTRL); > + > + ctrl_mask = readl(base + GIC_CPU_CTRL); > + ctrl_mask |= 0x1; > + writel_relaxed(ctrl_mask, cpu_base + GIC_CPU_CTRL); > } > > static int gic_notifier(struct notifier_block *self, unsigned long cmd, > void *v) > -- > 1.7.6.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm/zswap: add writethrough option
On 01/23/2014 08:18 AM, Minchan Kim wrote: > Hello all, > > On Wed, Jan 22, 2014 at 12:33:58PM -0800, Andrew Morton wrote: >> On Wed, 22 Jan 2014 09:19:58 -0500 Dan Streetman wrote: >> >>> Acutally, I really don't know how much benefit we have that in-memory >>> swap overcomming to the real storage but if you want, zRAM with dm-cache >>> is another option rather than invent new wheel by "just having is >>> better". >> >> I'm not sure if this patch is related to the zswap vs. zram discussions. >> This >> only adds the option of using writethrough to zswap. It's a first >> step to possibly >> making zswap work more efficiently using writeback and/or writethrough >> depending on >> the system and conditions. > > The patch size is small. Okay I don't want to be a party-pooper > but at least, I should say my thought for Andrew to help judging. Sure, I'm glad to have your suggestions. >>> >>> To give this a bump - Andrew do you have any concerns about this >>> patch? Or can you pick this up? >> >> I don't pay much attention to new features during the merge window, >> preferring to shove them into a folder to look at later. Often they >> have bitrotted by the time -rc1 comes around. >> >> I'm not sure that this review discussion has played out yet - is >> Minchan happy? > > From the beginning, zswap is for reducing swap I/O but if workingset > overflows, it should write back rather than OOM with expecting a small > number of writeback would make the system happy because the high memory > pressure is temporal so soon most of workload would be hit in zswap > without further writeback. > > If memory pressure continues and writeback steadily, it means zswap's > benefit would be mitigated, even worse by addding comp/decomp overhead. > In that case, it would be better to disable zswap, even. > > Dan said writethrough supporting is first step to make zswap smart > but anybody didn't say further words to step into the smart and > what's the *real* workload want it and what's the *real* number from > that because dm-cache/zram might be a good fit. > (I don't intend to argue zram VS zswap. If the concern is solved by > existing solution, why should we invent new function and > have maintenace cost?) so it's very hard for me to judge that we should > accept and maintain it. > Speak of dm-cache, there are also bcache, flashcache and bcache. > We need blueprint for the future and make an agreement on the > direction before merging this patch. > > But code size is not much and Seth already gave an his Ack so I don't > want to hurt Dan any more(Sorry for Dan) and wasting my time so pass > the decision to others(ex, Seth and Bob). Since zswap is a cache layer and write-back and write-through are two common options for any cache. I'm fine with adding this write-through option. Thanks, -Bob -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
mm: BUG: Bad rss-counter state
Hi all, While fuzzing with trinity running inside a KVM tools guest using latest -next kernel, I've stumbled on a "mm: BUG: Bad rss-counter state" error which was pretty non-obvious in the mix of the kernel spew (why?). I've added a small BUG() after the printk() in check_mm(), and here's the full output: [ 318.334905] BUG: Bad rss-counter state mm:8801e6dec000 idx:0 val:1 [ 318.335955] [ cut here ] [ 318.336507] kernel BUG at kernel/fork.c:562! [ 318.336930] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 318.337826] Dumping ftrace buffer: [ 318.338431](ftrace buffer empty) [ 318.338951] Modules linked in: [ 318.339287] CPU: 45 PID: 10022 Comm: trinity-c190 Tainted: GW 3.13.0-next -20140122-sasha-00011-gcc8342a-dirty #4 [ 318.340120] task: 8801e6a9b000 ti: 8801e6aee000 task.ti: 8801e6aee000 [ 318.340120] RIP: 0010:[] [] __mmdrop+0x9a/0xc0 [ 318.340120] RSP: :8801e6aefe68 EFLAGS: 00010292 [ 318.340120] RAX: 003a RBX: 8801e6dec000 RCX: 0001 [ 318.340120] RDX: RSI: 0001 RDI: 0286 [ 318.340120] RBP: 8801e6aefe78 R08: 0001 R09: [ 318.340120] R10: 0001 R11: 0001 R12: 8801e6dec138 [ 318.340120] R13: 8801e6dec000 R14: 8801e6dec0a8 R15: 00a3 [ 318.340120] FS: 7f6bc5915700() GS:88007b40() knlGS: [ 318.340120] CS: 0010 DS: ES: CR0: 8005003b [ 318.340120] CR2: 7fffd3d62588 CR3: 05e26000 CR4: 06e0 [ 318.340120] Stack: [ 318.340120] 8801e6dec138 8801e6dec000 8801e6aefe98 8113cb3b [ 318.340120] 8801e6a9bbb0 8801e6a9b000 8801e6aefef8 81140ced [ 318.340120] 8801e6c4db00 8801e6c4db00 8801e6aefef8 811f3ea5 [ 318.340120] Call Trace: [ 318.340120] [] mmput+0xcb/0xe0 [ 318.340120] [] exit_mm+0x18d/0x1a0 [ 318.340120] [] ? acct_collect+0x175/0x1b0 [ 318.340120] [] do_exit+0x26f/0x520 [ 318.355754] [] do_group_exit+0xa9/0xe0 [ 318.355754] [] SyS_exit_group+0x17/0x20 [ 318.355754] [] tracesys+0xdd/0xe2 [ 318.355754] Code: 00 00 eb 16 0f 1f 44 00 00 48 8b 8b 68 03 00 00 48 85 c9 74 24 ba 02 00 00 00 48 89 de 48 c7 c7 10 16 68 85 31 c0 e8 a2 d2 2f 03 <0f> 0b 0f 1f 40 00 eb fe 66 0f 1f 44 00 00 48 89 de 48 8b 3d 1e [ 318.355754] RIP [] __mmdrop+0x9a/0xc0 [ 318.355754] RSP [ 318.363991] ---[ end trace 7d85aceb881be62b ]--- Thanks, Sasha -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] clocksource: fix some comments typo in clocksource.c
On 2014/1/23 5:05, Thomas Gleixner wrote: > On Thu, 2 Jan 2014, Yijing Wang wrote: > >> Fix some trivial comments typo in kernel/time/clocksource.c > > That's not a typo. Thats a left over. The function simply cannot fail > anymore. So the subject of that patch should be something like: > > clocksource: Remove outdated comments Hi Thomas, sorry for my poor English, I will update this patch title and changelog. > > And the changelog should explain, that the functions always return 0, > so the comment is just pointless. A nice follow up on that would be to > actually make the function void instead of returning a pointless int, > but that requires to check all call sites. You are right, it's pointless to return 0, I will try to change the function type to void in a separate patch, thanks! > >> Signed-off-by: Yijing Wang >> --- >> kernel/time/clocksource.c |3 --- >> 1 files changed, 0 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c >> index ba3e502..9951575 100644 >> --- a/kernel/time/clocksource.c >> +++ b/kernel/time/clocksource.c >> @@ -779,8 +779,6 @@ EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale); >> * @scale: Scale factor multiplied against freq to get clocksource hz >> * @freq: clocksource frequency (cycles per second) divided by scale >> * >> - * Returns -EBUSY if registration fails, zero otherwise. >> - * >> * This *SHOULD NOT* be called directly! Please use the >> * clocksource_register_hz() or clocksource_register_khz helper functions. >> */ >> @@ -805,7 +803,6 @@ EXPORT_SYMBOL_GPL(__clocksource_register_scale); >> * clocksource_register - Used to install new clocksources >> * @cs: clocksource to be registered >> * >> - * Returns -EBUSY if registration fails, zero otherwise. >> */ >> int clocksource_register(struct clocksource *cs) >> { >> -- >> 1.7.1 >> >> >> > > -- Thanks! Yijing -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the drm-intel tree with the drm tree
On Wed, Jan 22, 2014 at 2:06 AM, Daniel Vetter wrote: > Hi Stephen, > > On Wed, Jan 22, 2014 at 4:04 AM, Stephen Rothwell > wrote: >> Hi all, >> >> Today's linux-next merge of the drm-intel tree got a conflict in >> drivers/gpu/drm/i915/i915_irq.c between commit abca9e454498 ("drm: Pass >> 'flags' from the caller to .get_scanout_position()") from the drm tree >> and commit d59a63ad8234 ("drm/i915: Add intel_get_crtc_scanline()") from >> the drm-intel tree. >> >> I fixed it up (I think - see below) and can carry the fix as necessary >> (no action is required). > > Oops, this patch escaped - it's only for 3.15. I've shuffled my > branches around now for the merge window so this should not pop up in > your -next tree again until 3.15 starts. I just bisected boot failures on x86 chromebooks with -next to this merge commit. I'll take a look tomorrow morning and make sure they're gone. -Olof -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm: BUG: Bad rss-counter state
On Wed, 22 Jan 2014, Sasha Levin wrote: > Hi all, > > While fuzzing with trinity running inside a KVM tools guest using latest -next > kernel, > I've stumbled on a "mm: BUG: Bad rss-counter state" error which was pretty > non-obvious > in the mix of the kernel spew (why?). > It's not a fatal condition and there's only a few possible stack traces that could be emitted during the exit() path. I don't see how we could make it more visible other than its log-level which is already KERN_ALERT. > I've added a small BUG() after the printk() in check_mm(), and here's the full > output: > Worst place to add it :) At line 562 of kernel/fork.c in linux-next you're going to hit BUG() when there may be other counters that are also bad and they don't get printed. > [ 318.334905] BUG: Bad rss-counter state mm:8801e6dec000 idx:0 val:1 So our mm has a non-zero MM_FILEPAGES count, but there's nothing that was cited that would tell us what that is so there's not much to go on, unless someone already recognizes this as another issue. Is this reproducible on 3.13 or only on linux-next? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] mm/zswap: Check all pool pages instead of one pool pages
Hello Dan 2014/1/22 Dan Streetman : > On Wed, Jan 22, 2014 at 7:16 AM, Cai Liu wrote: >> Hello Minchan >> >> >> 2014/1/22 Minchan Kim >>> >>> Hello Cai, >>> >>> On Tue, Jan 21, 2014 at 09:52:25PM +0800, Cai Liu wrote: >>> > Hello Minchan >>> > >>> > 2014/1/21 Minchan Kim : >>> > > Hello, >>> > > >>> > > On Tue, Jan 21, 2014 at 02:35:07PM +0800, Cai Liu wrote: >>> > >> 2014/1/21 Minchan Kim : >>> > >> > Please check your MUA and don't break thread. >>> > >> > >>> > >> > On Tue, Jan 21, 2014 at 11:07:42AM +0800, Cai Liu wrote: >>> > >> >> Thanks for your review. >>> > >> >> >>> > >> >> 2014/1/21 Minchan Kim : >>> > >> >> > Hello Cai, >>> > >> >> > >>> > >> >> > On Mon, Jan 20, 2014 at 03:50:18PM +0800, Cai Liu wrote: >>> > >> >> >> zswap can support multiple swapfiles. So we need to check >>> > >> >> >> all zbud pool pages in zswap. >>> > >> >> >> >>> > >> >> >> Version 2: >>> > >> >> >> * add *total_zbud_pages* in zbud to record all the pages in >>> > >> >> >> pools >>> > >> >> >> * move the updating of pool pages statistics to >>> > >> >> >> alloc_zbud_page/free_zbud_page to hide the details >>> > >> >> >> >>> > >> >> >> Signed-off-by: Cai Liu >>> > >> >> >> --- >>> > >> >> >> include/linux/zbud.h |2 +- >>> > >> >> >> mm/zbud.c| 44 >>> > >> >> >> >>> > >> >> >> mm/zswap.c |4 ++-- >>> > >> >> >> 3 files changed, 35 insertions(+), 15 deletions(-) >>> > >> >> >> >>> > >> >> >> diff --git a/include/linux/zbud.h b/include/linux/zbud.h >>> > >> >> >> index 2571a5c..1dbc13e 100644 >>> > >> >> >> --- a/include/linux/zbud.h >>> > >> >> >> +++ b/include/linux/zbud.h >>> > >> >> >> @@ -17,6 +17,6 @@ void zbud_free(struct zbud_pool *pool, >>> > >> >> >> unsigned long handle); >>> > >> >> >> int zbud_reclaim_page(struct zbud_pool *pool, unsigned int >>> > >> >> >> retries); >>> > >> >> >> void *zbud_map(struct zbud_pool *pool, unsigned long handle); >>> > >> >> >> void zbud_unmap(struct zbud_pool *pool, unsigned long handle); >>> > >> >> >> -u64 zbud_get_pool_size(struct zbud_pool *pool); >>> > >> >> >> +u64 zbud_get_pool_size(void); >>> > >> >> >> >>> > >> >> >> #endif /* _ZBUD_H_ */ >>> > >> >> >> diff --git a/mm/zbud.c b/mm/zbud.c >>> > >> >> >> index 9451361..711aaf4 100644 >>> > >> >> >> --- a/mm/zbud.c >>> > >> >> >> +++ b/mm/zbud.c >>> > >> >> >> @@ -52,6 +52,13 @@ >>> > >> >> >> #include >>> > >> >> >> #include >>> > >> >> >> >>> > >> >> >> +/* >>> > >> >> >> +* statistics >>> > >> >> >> +**/ >>> > >> >> >> + >>> > >> >> >> +/* zbud pages in all pools */ >>> > >> >> >> +static u64 total_zbud_pages; >>> > >> >> >> + >>> > >> >> >> /* >>> > >> >> >> * Structures >>> > >> >> >> */ >>> > >> >> >> @@ -142,10 +149,28 @@ static struct zbud_header >>> > >> >> >> *init_zbud_page(struct page *page) >>> > >> >> >> return zhdr; >>> > >> >> >> } >>> > >> >> >> >>> > >> >> >> +static struct page *alloc_zbud_page(struct zbud_pool *pool, >>> > >> >> >> gfp_t gfp) >>> > >> >> >> +{ >>> > >> >> >> + struct page *page; >>> > >> >> >> + >>> > >> >> >> + page = alloc_page(gfp); >>> > >> >> >> + >>> > >> >> >> + if (page) { >>> > >> >> >> + pool->pages_nr++; >>> > >> >> >> + total_zbud_pages++; >>> > >> >> > >>> > >> >> > Who protect race? >>> > >> >> >>> > >> >> Yes, here the pool->pages_nr and also the total_zbud_pages are not >>> > >> >> protected. >>> > >> >> I will re-do it. >>> > >> >> >>> > >> >> I will change *total_zbud_pages* to atomic type. >>> > >> > >>> > >> > Wait, it doesn't make sense. Now, you assume zbud allocator would be >>> > >> > used >>> > >> > for only zswap. It's true until now but we couldn't make sure it in >>> > >> > future. >>> > >> > If other user start to use zbud allocator, total_zbud_pages would be >>> > >> > pointless. >>> > >> >>> > >> Yes, you are right. ZBUD is a common module. So in this patch >>> > >> calculate the >>> > >> zswap pool size in zbud is not suitable. >>> > >> >>> > >> > >>> > >> > Another concern is that what's your scenario for above two swap? >>> > >> > How often we need to call zbud_get_pool_size? >>> > >> > In previous your patch, you reduced the number of call so IIRC, >>> > >> > we only called it in zswap_is_full and for debugfs. >>> > >> >>> > >> zbud_get_pool_size() is called frequently when adding/freeing zswap >>> > >> entry happen in zswap . This is why in this patch I added a counter in >>> > >> zbud, >>> > >> and then in zswap the iteration of zswap_list to calculate the pool >>> > >> size will >>> > >> not be needed. >>> > > >>> > > We can remove updating zswap_pool_pages in zswap_frontswap_store and >>> > > zswap_free_entry as I said. So zswap_is_full is only hot spot. >>> > > Do you think it's still big overhead? Why? Maybe locking to prevent >>> > > destroying? Then, we can use RCU to minimize the overhead as
Re: [PATCH] SUNRPC: Allow one callback request to be received from two sk_buff
2014/1/23 J. Bruce Fields : > On Tue, Jan 21, 2014 at 08:35:36AM -0700, Trond Myklebust wrote: >> >> On Jan 21, 2014, at 3:08, shaobingqing wrote: >> >> > 2014/1/21 Trond Myklebust : >> >> On Mon, 2014-01-20 at 14:59 +0800, shaobingqing wrote: >> >>> In current code, there only one struct rpc_rqst is prealloced. If one >> >>> callback request is received from two sk_buff, the xprt_alloc_bc_request >> >>> would be execute two times with the same transport->xid. The first time >> >>> xprt_alloc_bc_request will alloc one struct rpc_rqst and the >> >>> TCP_RCV_COPY_DATA >> >>> bit of transport->tcp_flags will not be cleared. The second time >> >>> xprt_alloc_bc_request could not alloc struct rpc_rqst any more and NULL >> >>> pointer will be returned, then xprt_force_disconnect occur. I think one >> >>> callback request can be allowed to be received from two sk_buff. >> >>> >> >>> Signed-off-by: shaobingqing >> >>> --- >> >>> net/sunrpc/xprtsock.c | 11 +-- >> >>> 1 files changed, 9 insertions(+), 2 deletions(-) >> >>> >> >>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c >> >>> index ee03d35..606950d 100644 >> >>> --- a/net/sunrpc/xprtsock.c >> >>> +++ b/net/sunrpc/xprtsock.c >> >>> @@ -1271,8 +1271,13 @@ static inline int xs_tcp_read_callback(struct >> >>> rpc_xprt *xprt, >> >>> struct sock_xprt *transport = >> >>> container_of(xprt, struct sock_xprt, xprt); >> >>> struct rpc_rqst *req; >> >>> + static struct rpc_rqst *req_partial; >> >>> + >> >>> + if (req_partial == NULL) >> >>> + req = xprt_alloc_bc_request(xprt); >> >>> + else if (req_partial->rq_xid == transport->tcp_xid) >> >>> + req = req_partial; >> >> >> >> What happens here if req_partial->rq_xid != transport->tcp_xid? AFAICS, >> >> req will be undefined. Either way, you cannot use a static variable for >> >> storage here: that isn't re-entrant. >> > >> > Because metadata sever only have one slot for backchannel request, >> > req_partial->rq_xid == transport->tcp_xid always happens, if the callback >> > request just being splited in two sk_buffs. But req_partial->rq_xid != >> > transport->tcp_xid may also happens in some special cases, such as >> > retransmission occurs? >> >> If the server retransmits, then it is broken. The NFSv4.1 protocol does not >> allow it to retransmit unless the connection breaks. > > shaobingqing, are you actually seeing retransmission? (If so, are we > setting up the callback client wrong?) No, not actually. Here I just see that one client can receive two callback requests with the same xid. > > --b. > >> >> > If one callback request is splited in two sk_buffs, xs_tcp_read_callback >> > will be execute two times. The req_partial should be a static variable, >> > because the second execution of xs_tcp_read_callback should use >> > the rpc_rqst allocated for the first execution, which saves information >> > copies from the first sk_buff. >> >> No! This is a multi-threaded/process environment which can support multiple >> connection. It is a bug to use a static variable. >> >> -- >> Trond Myklebust >> Linux NFS client maintainer >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 00/36] mtd: st_spi_fsm: Add new driver
Hi Lee, On Wed, Jan 22, 2014 at 12:50:49PM +, Lee Jones wrote: > > Version 4: > > Tended to Brian's previous review comments > > - Checkpatch acceptance > > - MODULE_DEVICE_TABLE() name slip correction > > - Timeout issue(s) resolved > > - Potential infinite loop mitigated > > - Code clarity suggests heeded > > - Duplication with MTD core code removed > > - Upgraded to using ROUND_UP() helper > > - Moved non-shared header code into main driver > > - Relocated dynamic msg sequence stores into main struct > > - Averted adaption of static (table) data > > - Basic whitespace/spelling/data type/dev_err suggestions accepted > > > > Version 3: > > Okay, this thing should be fully functional now. Identify a chip > > based on it's JEDEC ID, Read, Write, Erase (all or by sector). > > Support for various chip quirks added too. > > > > Version 2: > > The first bunch of these patches have been on the MLs before, but > > didn't receive a great deal of attention for the most part. We are > > a little more featureful this time however. We can now successfully > > setup and configure the N25Q256. We still can't read/write/erase > > it though. I'll start work on that next week and will provide it in > > the next instalment. > > > > Version 1: > > First stab at getting this thing Mainlined. It doesn't do a great deal > > yet, but we are able to initialise the device and dynamically set it up > > correctly based on an extracted JEDEC ID. > > > > Documentation/devicetree/bindings/mtd/st-fsm.txt | 26 ++ > > arch/arm/boot/dts/stih416-b2105.dts | 14 + > > arch/arm/boot/dts/stih416-pinctrl.dtsi | 12 + > > drivers/mtd/devices/Kconfig |8 + > > drivers/mtd/devices/Makefile |1 + > > drivers/mtd/devices/serial_flash_cmds.h | 81 > > drivers/mtd/devices/st_spi_fsm.c | 2124 > > + > > 7 files changed, 2266 insertions(+) > > Can you confirm receipt of this set, or would you like me to resend? Well, I personally have the patch set but haven't had a chance to review it. Can you resend with MTD in the CC, since we haven't had any comments anyway? I believe MTD people are much less likely to look at it if you forget the CC :) You can just title it [PATCH RESEND v4 X/Y], possibly with a LKML link back to the original v4, if you want to help avoid confusion. Brian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/4] Intel MPX support
On 01/22/2014 08:30 PM, Ingo Molnar wrote: * Ren, Qiaowei wrote: -Original Message- From: Ingo Molnar [mailto:mingo.kernel@gmail.com] On Behalf Of Ingo Molnar Sent: Wednesday, January 22, 2014 7:53 PM To: Ren, Qiaowei Cc: H. Peter Anvin; Thomas Gleixner; Ingo Molnar; x...@kernel.org; linux-kernel@vger.kernel.org; Peter Zijlstra Subject: Re: [PATCH v2 0/4] Intel MPX support * Qiaowei Ren wrote: Changes since v1: * check to see if #BR occurred in userspace or kernel space. * use generic structure and macro as much as possible when decode mpx instructions. Qiaowei Ren (4): x86, mpx: add documentation on Intel MPX x86, mpx: hook #BR exception handler to allocate bound tables x86, mpx: add prctl commands PR_MPX_INIT, PR_MPX_RELEASE x86, mpx: extend siginfo structure to include bound violation information Documentation/x86/intel_mpx.txt| 76 +++ arch/x86/Kconfig |4 + arch/x86/include/asm/mpx.h | 63 ++ arch/x86/include/asm/processor.h | 16 ++ arch/x86/kernel/Makefile |1 + arch/x86/kernel/mpx.c | 417 arch/x86/kernel/traps.c| 61 +- include/uapi/asm-generic/siginfo.h |9 +- include/uapi/linux/prctl.h |6 + kernel/signal.c|4 + kernel/sys.c | 12 + 11 files changed, 667 insertions(+), 2 deletions(-) create mode 100644 Documentation/x86/intel_mpx.txt create mode 100644 arch/x86/include/asm/mpx.h create mode 100644 arch/x86/kernel/mpx.c Such a patch submission is absolutely inadequate! Please outline: - a short summary of what the feature does - a short description of what hardware supports it today or will support it in the future - a short description of whether the feature needs any configuration from the user or it's entirely auto-enabled on hardware that supports it. - a cost/benefit description to unrelated code: is this slowing down anything else? - how does user-space compiler support stand, what's the expected status there, etc. Only a small fraction of that information can be found in Documentation/x86/intel_mpx.txt. in I'm absolutely sick of these semi-anonymous patch submissions from Intel, so I'm NAK-ing it until it's communicated properly. Ok. I will add related content into this documentation. More importantly, put it into the 0/X mail! That's how people can review such a patch set effectively. Ok. Thanks for your feedback. I will do it. Thanks, Qiaowei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 7/8] ARM: brcmstb: gic: add compatible string for Broadcom Brahma15
Hi Florian, > Do not we also need to update drivers/irqchip/irq-gic.c to look for > this compatible property? Alternatively should the example DTS contain > the following: > > compatible = "brcm,brahma-b15-gic", "arm,cortex-a15-gic"? Patch #8 [1] of this series has the "compatible" string set exactly that way. I was following the pattern seen in the other reference DTS files, where "arm,cortex-a15-gic" is used as the fall-back. Thanks, Marc C [1] https://lkml.org/lkml/2014/1/21/649 On 01/22/2014 02:40 PM, Florian Fainelli wrote: > Hi Marc, > > 2014/1/21 Marc Carino : >> Document the Broadcom Brahma B15 GIC implementation as compatible >> with the ARM GIC standard. >> >> Signed-off-by: Marc Carino >> Acked-by: Florian Fainelli > > Do not we also need to update drivers/irqchip/irq-gic.c to look for > this compatible property? Alternatively should the example DTS contain > the following: > > compatible = "brcm,brahma-b15-gic", "arm,cortex-a15-gic"? > >> --- >> Documentation/devicetree/bindings/arm/gic.txt |1 + >> 1 files changed, 1 insertions(+), 0 deletions(-) >> >> diff --git a/Documentation/devicetree/bindings/arm/gic.txt >> b/Documentation/devicetree/bindings/arm/gic.txt >> index 3dfb0c0..d7409fd 100644 >> --- a/Documentation/devicetree/bindings/arm/gic.txt >> +++ b/Documentation/devicetree/bindings/arm/gic.txt >> @@ -15,6 +15,7 @@ Main node required properties: >> "arm,cortex-a9-gic" >> "arm,cortex-a7-gic" >> "arm,arm11mp-gic" >> + "brcm,brahma-b15-gic" >> - interrupt-controller : Identifies the node as an interrupt controller >> - #interrupt-cells : Specifies the number of cells needed to encode an >>interrupt source. The type shall be a and the value shall be 3. >> -- >> 1.7.1 >> > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 1/9] mtd: nand: retrieve ECC requirements from Hynix READ ID byte 4
+ Huang Hi Boris, On Wed, Jan 08, 2014 at 03:21:56PM +0100, Boris BREZILLON wrote: > The Hynix nand flashes store their ECC requirements in byte 4 of its id > (returned on READ ID command). > > Signed-off-by: Boris BREZILLON I haven't verified yet (perhaps Huang can confirm?), but this may be similar to a patch Huang submitted recently. In his case, we found that this table is actually quite unreliable and is likely hard to maintain. Why do you need this ECC information, for my reference? Brian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net-next v5 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
From: Zoltan Kiss Date: Mon, 20 Jan 2014 21:24:20 + > A long known problem of the upstream netback implementation that on the TX > path (from guest to Dom0) it copies the whole packet from guest memory into > Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a > huge perfomance penalty. The classic kernel version of netback used grant > mapping, and to get notified when the page can be unmapped, it used page > destructors. Unfortunately that destructor is not an upstreamable solution. > Ian Campbell's skb fragment destructor patch series [1] tried to solve this > problem, however it seems to be very invasive on the network stack's code, > and therefore haven't progressed very well. > This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to > know when the skb is freed up. This series does not apply to net-next due to some other recent changes. Please respin, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm: BUG: Bad rss-counter state
On Wed, Jan 22, 2014 at 05:39:25PM -0800, David Rientjes wrote: > > While fuzzing with trinity running inside a KVM tools guest using latest > > -next > > kernel, > > I've stumbled on a "mm: BUG: Bad rss-counter state" error which was pretty > > non-obvious > > in the mix of the kernel spew (why?). > > > > It's not a fatal condition and there's only a few possible stack traces > that could be emitted during the exit() path. I don't see how we could > make it more visible other than its log-level which is already KERN_ALERT. > > > I've added a small BUG() after the printk() in check_mm(), and here's the > > full > > output: > > > > Worst place to add it :) At line 562 of kernel/fork.c in linux-next > you're going to hit BUG() when there may be other counters that are also > bad and they don't get printed. > > > [ 318.334905] BUG: Bad rss-counter state mm:8801e6dec000 idx:0 val:1 > > So our mm has a non-zero MM_FILEPAGES count, but there's nothing that was > cited that would tell us what that is so there's not much to go on, unless > someone already recognizes this as another issue. Is this reproducible on > 3.13 or only on linux-next? Sasha, is this the current git tree version of Trinity ? (I'm wondering if yesterdays munmap changes might be tickling this bug). Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 13/15] sched: Use a static_key for sched_clock_stable
On 01/22/14 at 12:59pm, Peter Zijlstra wrote: > On Wed, Jan 22, 2014 at 11:45:32AM +0100, Peter Zijlstra wrote: > > Ho humm. > > OK, so I had me a ponder; does the below fix things for you and David? > I've only done a boot test on real proper hardware :-) > > --- > kernel/sched/clock.c | 42 +- > 1 file changed, 33 insertions(+), 9 deletions(-) > > diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c > index 6bd6a6731b21..6bbcd97f4532 100644 > --- a/kernel/sched/clock.c > +++ b/kernel/sched/clock.c > @@ -77,35 +77,45 @@ __read_mostly int sched_clock_running; > > #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK > static struct static_key __sched_clock_stable = STATIC_KEY_INIT; > +static int __sched_clock_stable_early; > > int sched_clock_stable(void) > { > - if (static_key_false(&__sched_clock_stable)) > - return false; > - return true; > + return static_key_false(&__sched_clock_stable); > } > > void set_sched_clock_stable(void) > { > + __sched_clock_stable_early = 1; > + > + smp_mb(); /* matches sched_clock_init() */ > + > + if (!sched_clock_running) > + return; > + > if (!sched_clock_stable()) > - static_key_slow_dec(&__sched_clock_stable); > + static_key_slow_inc(&__sched_clock_stable); > } > > static void __clear_sched_clock_stable(struct work_struct *work) > { > /* XXX worry about clock continuity */ > if (sched_clock_stable()) > - static_key_slow_inc(&__sched_clock_stable); > + static_key_slow_dec(&__sched_clock_stable); > } > > static DECLARE_WORK(sched_clock_work, __clear_sched_clock_stable); > > void clear_sched_clock_stable(void) > { > - if (keventd_up()) > - schedule_work(_clock_work); > - else > - __clear_sched_clock_stable(_clock_work); > + __sched_clock_stable_early = 0; > + > + smp_mb(); /* matches sched_clock_init() */ > + > + if (!sched_clock_running) > + return; > + > + schedule_work(_clock_work); > } > > struct sched_clock_data { > @@ -140,6 +150,20 @@ void sched_clock_init(void) > } > > sched_clock_running = 1; > + > + /* > + * Ensure that it is impossible to not do a static_key update. > + * > + * Either {set,clear}_sched_clock_stable() must see sched_clock_running > + * and do the update, or we must see their __sched_clock_stable_early > + * and do the update, or both. > + */ > + smp_mb(); /* matches {set,clear}_sched_clock_stable() */ > + > + if (__sched_clock_stable_early) > + set_sched_clock_stable(); > + else > + clear_sched_clock_stable(); > } > > /* It does not fix the prink time issue, here is the log: [0.00] efi: mem26: type=6, attr=0x800f, range=[0x0dbe-0x0dc0) (0MB) [0.00] DMI not present or invalid. [0.00] Hypervisor detected: KVM [0.00] e820: last_pfn = 0xdbe0 max_arch_pfn = 0x4 [0.00] PAT not supported by CPU. [0.00] init_memory_mapping: [mem 0x-0x000f] [0.00] init_memory_mapping: [mem 0x0aa0-0x0abf] [0.00] init_memory_mapping: [mem 0x0800-0x0a9f] [0.00] init_memory_mapping: [mem 0x0010-0x07ff] [0.00] init_memory_mapping: [mem 0x0ac0-0x0bd93fff] [0.00] init_memory_mapping: [mem 0x0bdc1000-0x0d580fff] [0.00] init_memory_mapping: [mem 0x0d5e5000-0x0dbd] [0.00] RAMDISK: [mem 0x0ac0e000-0x0b583fff] [0.00] ACPI: RSDP 0d5e0014 24 (v02 OVMF ) [0.00] ACPI: XSDT 0d5df0e8 3C (v01 OVMF OVMFEDK2 20130221 0113) [0.00] ACPI: FACP 0d5de000 F4 (v03 OVMF OVMFEDK2 20130221 OVMF 0099) [0.00] ACPI: DSDT 0d5dc000 000D57 (v01 INTEL OVMF 0004 INTL 20120913) [0.00] ACPI: FACS 0d5e4000 40 [0.00] ACPI: APIC 0d5dd000 78 (v01 OVMF OVMFEDK2 20130221 OVMF 0099) [0.00] ACPI: SSDT 0d5db000 57 (v01 REDHAT OVMF 0001 INTL 20120913) [0.00] crashkernel reservation failed - No suitable area found. [0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00 [0.00] kvm-clock: cpu 0, msr 0:d401001, boot clock [65465.267798] Zone ranges: [65465.268914] DMA [mem 0x1000-0x00ff] [65465.271107] DMA32[mem 0x0100-0x] [65465.273348] Normal empty [65465.274683] Movable zone start for each node [65465.276646] Early memory node ranges [65465.278321] node 0: [mem 0x1000-0x0009] [65465.280572] node 0: [mem 0x0010-0x0bd93fff] [65465.282825] node 0: [mem 0x0bdc1000-0x0d580fff] [65465.285084] node 0: [mem 0x0d5e5000-0x0dbd] [65465.289251] ACPI: PM-Timer IO Port: 0xb008 [65465.291105] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) [65465.293766]
Re: [PATCH 13/15] sched: Use a static_key for sched_clock_stable
On 01/22/14 at 10:08pm, Peter Zijlstra wrote: > > > > I think its the right region to look through. My current suspect is the > > linear continuity fit with the initial 'random' multiplier. > > > > That initial 'random' multiplier can get us quite high, and we'll fit > > the function to match that but continue at a sane rate. > > > > I'll try and prod a little more later this evening as time permits. > > Does this cure things? Peter, the odd timstamp still happens with this patch for me. > > --- > arch/x86/kernel/tsc.c | 11 +++ > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c > index a3acbac2ee72..bb04148c5fe0 100644 > --- a/arch/x86/kernel/tsc.c > +++ b/arch/x86/kernel/tsc.c > @@ -237,7 +237,7 @@ static inline unsigned long long cycles_2_ns(unsigned > long long cyc) > /* XXX surely we already have this someplace in the kernel?! */ > #define DIV_ROUND(n, d) (((n) + ((d) / 2)) / (d)) > > -static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) > +static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu, bool origin) > { > unsigned long long tsc_now, ns_now; > struct cyc2ns_data *data; > @@ -252,7 +252,10 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int > cpu) > data = cyc2ns_write_begin(cpu); > > rdtscll(tsc_now); > - ns_now = cycles_2_ns(tsc_now); > + if (origin) > + ns_now = 0; > + else > + ns_now = cycles_2_ns(tsc_now); > > /* >* Compute a new multiplier as per the above comment and ensure our > @@ -926,7 +929,7 @@ static int time_cpufreq_notifier(struct notifier_block > *nb, unsigned long val, > mark_tsc_unstable("cpufreq changes"); > } > > - set_cyc2ns_scale(tsc_khz, freq->cpu); > + set_cyc2ns_scale(tsc_khz, freq->cpu, false); > > return 0; > } > @@ -1199,7 +1202,7 @@ void __init tsc_init(void) >*/ > for_each_possible_cpu(cpu) { > cyc2ns_init(cpu); > - set_cyc2ns_scale(cpu_khz, cpu); > + set_cyc2ns_scale(cpu_khz, cpu, true); > } > > if (tsc_disabled > 0) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] f2fs updates for v3.14
Hi Linus, This is a pull request on f2fs updates for v3.14. In this round, a couple of sysfs entries were introduced to tune the f2fs at runtime. In addition, f2fs starts to support inline_data and improves the read/write performance in some workloads by refactoring bio-related flows. This patch-set also includes a number of clean-ups and several bug fixes. Thank you very much. The following changes since commit 413541dd66d51f791a0b169d9b9014e4f56be13c: Linux 3.13-rc5 (2013-12-22 13:08:32 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git tags/for-f2fs-3.14 for you to fetch changes up to bf39c00a9a7f3cdb5ce7d6695d9f044daf8f0b53: f2fs: drop obsolete node page when it is truncated (2014-01-23 08:04:21 +0900) f2fs updates for v3.14 This patch-set includes the following major enhancement patches. o support inline_data o refactor bio operations such as merge operations and rw type assignment o enhance the direct IO path o enhance bio operations o truncate a node page when it becomes obsolete o add sysfs entries: small_discards, max_victim_search, and in-place-update o add a sysfs entry to control max_victim_search The other bug fixes are as follows. o fix a bug in truncate_partial_nodes o avoid warnings during sparse and build process o fix error handling flows o fix potential bit overflows And, there are a bunch of cleanups. Changman Lee (7): f2fs: introduce __find_rev_next(_zero)_bit f2fs: improve searching speed of __next_free_blkoff f2fs: simplify IS_DATASEG and IS_NODESEG macro f2fs: send REQ_META or REQ_PRIO when reading meta area f2fs: missing kmem_cache_destroy for discard_entry f2fs: add delimiter to seperate name and value in debug phrase f2fs: missing REQ_META and REQ_PRIO when sync_meta_pages(META_FLUSH) Chao Yu (20): f2fs: use f2fs_put_page to release page for uniform style f2fs: add a new function to support for merging contiguous read f2fs: adds a tracepoint for submit_read_page f2fs: adds a tracepoint for f2fs_submit_read_bio f2fs: read contiguous sit entry pages by merging for mount performance f2fs: remove unneeded code in punch_hole f2fs: avoid to calculate incorrect max orphan number f2fs: correct type of wait in struct bio_private f2fs: use true and false for boolean variable f2fs: check return value of f2fs_readpage in find_data_page f2fs: convert recover_orphan_inodes to void f2fs: readahead contiguous pages for restore_node_summary f2fs: use inner macro GFP_F2FS_ZERO for simplification f2fs: avoid unneeded page release for correct _count of page f2fs: add unlikely() macro for compiler optimization f2fs: update several comments f2fs: avoid to set wrong pino of inode when rename dir f2fs: check filename length in recover_dentry f2fs: avoid to left uninitialized data in page when read inline data f2fs: avoid to read inline data except first page Chris Fries (1): f2fs: clean checkpatch warnings Fan Li (1): f2fs: merge pages with the same sync_mode flag Gu Zheng (14): f2fs: convert remove_inode_page to void f2fs: convert dev_valid_block_count to void f2fs: convert inc/dec_valid_node_count to inc/dec one count f2fs: simplify write_orphan_inodes for better readable f2fs: move the list_head initialization into the lock protection region f2fs: fix a potential out of range issue f2fs: move all the bio initialization into __bio_alloc f2fs: remove the rw_flag domain from f2fs_io_info f2fs: convert max_orphans to a field of f2fs_sb_info f2fs: move grabing orphan pages out of protection region f2fs: move alloc new orphan node out of lock protection region f2fs: use spinlock rather than mutex for better speed f2fs: add help function META_MAPPING f2fs: remove the orphan block page array Huajun Li (6): f2fs: add a new function: f2fs_reserve_block() f2fs: add flags and helpers to support inline data f2fs: add a new mount option: inline_data f2fs: key functions to handle inline data f2fs: handle inline data operations f2fs: update f2fs Documentation Jaegeuk Kim (42): f2fs: add a slab cache entry for small discards f2fs: add key functions for small discards f2fs: add a sysfs entry to control max_discards f2fs: introduce f2fs_issue_discard() to clean up f2fs: add a tracepoint for f2fs_issue_discard f2fs: clean up the do_submit_bio flow f2fs: use sbi->write_mutex for write bios f2fs: disable the extent cache ops on high fragmented files f2fs: introduce a bio array for per-page write bios f2fs: merge read IOs at ra_nat_pages() f2fs: avoid lock
Re: [PATCH] clk: export __clk_get_hw for re-use in others
On Thu, Jan 23, 2014 at 3:11 AM, Mike Turquette wrote: > On Wed, Jan 22, 2014 at 9:59 AM, Stephen Boyd wrote: >> On 01/21/14 21:23, SeongJae Park wrote: >>> On Wed, Jan 22, 2014 at 1:59 PM, Greg KH wrote: On Wed, Jan 22, 2014 at 12:05:57PM +0900, SeongJae Park wrote: > Dear Greg, Mike, > > May I ask your answer or other opinion, please? It's the middle of the merge window, it's not time for new development, or much time for free-time for me, sorry. Feel free to fix it the best way you know how. >>> Oops, I've forgot about the merge window. Thank you very much for your >>> kind answer. >>> Sorry if I bothered you while you're in busy time. >>> Because the build problem is not a big deal because it exists only in >>> -next tree, >>> I will wait until merge window be closed and then fix it again if it >>> still exist. >>> >> >> I've already sent a patch that exports this and other clock provider >> functions. Please use this one: >> >> https://patchwork.kernel.org/patch/3507921/ > > I'm going to take Stephen's patch into a fixes branch and send it as > part of a pull request. Maybe -rc1 or -rc2 at the latest. Got it. Thank you for let me know :) > > Thanks all. > > Regards, > Mike > >> >> -- >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, >> hosted by The Linux Foundation >> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression on next-20140116 [Was: [PATCH 3/3 v4] usb: chipidea: hw_phymode_configure moved before ci_usb_phy_init]
On Wed, Jan 22, 2014 at 10:41:33PM +0100, Uwe Kleine-König wrote: > Hello, > > On Wed, Jan 22, 2014 at 10:49:51AM +0100, Uwe Kleine-König wrote: > > On Tue, Dec 03, 2013 at 04:01:50PM +0800, Chris Ruehl wrote: > > > usb: chipidea: hw_phymode_configure moved before ci_usb_phy_init > > > hw_phymode_configure configures the PORTSC registers and allow the > > > following phy_inits to operate on the right parameters. This fix a problem > > > where the UPLI (ISP1504) could not detected, because the Viewport was not > > > available and read the viewport return 0's only. > > This patch (or a later revision of it to be more exact) made it into > > mainline as cd0b42c2a6d2. > > > > On an i.MX27 based machine I'm hitting an oops (see below) on > > next-20140116 + a few patches. (I didn't switch to 3.13+ yet, as I think > > not everything I need has landed there.) The oops goes away (and still > > better, lsusb reports my connected devices instead of "unable to > > initialize libusb: -99") when I do at least one of the following: > > > > - set CONFIG_USB_CHIPIDEA=y instead of =m > > - revert commit > > cd0b42c2a6d2 (usb: chipidea: put hw_phymode_configure before > > ci_usb_phy_init) > I debugged that a bit further and the problem is that > hw_phymode_configure depends on the phy's clk being enabled (i.e. > usb_ipg_gate) and this is only enforced in ci_usb_phy_init (via > usb_phy_init -> usb_gen_phy_init). When CONFIG_USB_CHIPIDEA=y the init > call to disable all unused clocks wasn't run yet and so the clock is > still on as this is the boot default. Hi Uwe, I am a little puzzled at your platform - Which phy you have used? ulpi phy ,internal phy or other external phy? - If you use ulpi phy, why you still need to use nop phy driver? Besides, according to chris patch, the ulpi can only be visited after hw_phymode_configure? - Do you have some hardware related operation at phy's probe? If it exists, why not move it to phy->init? Peter > > Considering that it's already late today and that I don't know the > chipidea driver I'm sure there are people who can come up with a better > patch with less effort than me. Any volunteers? > > Best regards > Uwe > > -- > Pengutronix e.K. | Uwe Kleine-König| > Industrial Linux Solutions | http://www.pengutronix.de/ | > > -- Best Regards, Peter Chen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 13/15] sched: Use a static_key for sched_clock_stable
On 01/23/14 at 09:53am, Dave Young wrote: > On 01/22/14 at 10:08pm, Peter Zijlstra wrote: > > > > > > I think its the right region to look through. My current suspect is the > > > linear continuity fit with the initial 'random' multiplier. > > > > > > That initial 'random' multiplier can get us quite high, and we'll fit > > > the function to match that but continue at a sane rate. > > > > > > I'll try and prod a little more later this evening as time permits. > > > > Does this cure things? > > Peter, the odd timstamp still happens with this patch for me. Hmm, seems the my physical machine is booting fine with this patch. kvm guest problem still exist, but that kvm thing might be other problem. > > > > > --- > > arch/x86/kernel/tsc.c | 11 +++ > > 1 file changed, 7 insertions(+), 4 deletions(-) > > > > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c > > index a3acbac2ee72..bb04148c5fe0 100644 > > --- a/arch/x86/kernel/tsc.c > > +++ b/arch/x86/kernel/tsc.c > > @@ -237,7 +237,7 @@ static inline unsigned long long cycles_2_ns(unsigned > > long long cyc) > > /* XXX surely we already have this someplace in the kernel?! */ > > #define DIV_ROUND(n, d) (((n) + ((d) / 2)) / (d)) > > > > -static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) > > +static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu, bool origin) > > { > > unsigned long long tsc_now, ns_now; > > struct cyc2ns_data *data; > > @@ -252,7 +252,10 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, > > int cpu) > > data = cyc2ns_write_begin(cpu); > > > > rdtscll(tsc_now); > > - ns_now = cycles_2_ns(tsc_now); > > + if (origin) > > + ns_now = 0; > > + else > > + ns_now = cycles_2_ns(tsc_now); > > > > /* > > * Compute a new multiplier as per the above comment and ensure our > > @@ -926,7 +929,7 @@ static int time_cpufreq_notifier(struct notifier_block > > *nb, unsigned long val, > > mark_tsc_unstable("cpufreq changes"); > > } > > > > - set_cyc2ns_scale(tsc_khz, freq->cpu); > > + set_cyc2ns_scale(tsc_khz, freq->cpu, false); > > > > return 0; > > } > > @@ -1199,7 +1202,7 @@ void __init tsc_init(void) > > */ > > for_each_possible_cpu(cpu) { > > cyc2ns_init(cpu); > > - set_cyc2ns_scale(cpu_khz, cpu); > > + set_cyc2ns_scale(cpu_khz, cpu, true); > > } > > > > if (tsc_disabled > 0) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm: BUG: Bad rss-counter state
On 01/22/2014 08:39 PM, David Rientjes wrote: On Wed, 22 Jan 2014, Sasha Levin wrote: Hi all, While fuzzing with trinity running inside a KVM tools guest using latest -next kernel, I've stumbled on a "mm: BUG: Bad rss-counter state" error which was pretty non-obvious in the mix of the kernel spew (why?). It's not a fatal condition and there's only a few possible stack traces that could be emitted during the exit() path. I don't see how we could make it more visible other than its log-level which is already KERN_ALERT. Would it make sense to add a VM_BUG_ON() to make it more obvious when we have CONFIG_VM_DEBUG enabled? Many of the VM_BUG_ON test cases are non-fatal either, and it would make it easier spotting this issue. I've added a small BUG() after the printk() in check_mm(), and here's the full output: Worst place to add it :) At line 562 of kernel/fork.c in linux-next you're going to hit BUG() when there may be other counters that are also bad and they don't get printed. I gave the condition before curly braces :) if (unlikely(x)) { printk(KERN_ALERT "BUG: Bad rss-counter state " "mm:%p idx:%d val:%ld\n", mm, i, x); BUG(); } [ 318.334905] BUG: Bad rss-counter state mm:8801e6dec000 idx:0 val:1 So our mm has a non-zero MM_FILEPAGES count, but there's nothing that was cited that would tell us what that is so there's not much to go on, unless someone already recognizes this as another issue. Is this reproducible on 3.13 or only on linux-next? Yup, I see it in v3.13 too, which is odd. Thanks, Sasha -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm: BUG: Bad rss-counter state
On 01/22/2014 08:52 PM, Dave Jones wrote: Sasha, is this the current git tree version of Trinity ? (I'm wondering if yesterdays munmap changes might be tickling this bug). Ah yes, my tree has the munmap patch from yesterday, which would explain why we started seeing this issue just now. Thanks, Sasha -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Git v1.9-rc0
On Wed, Jan 22, 2014 at 08:30:30PM +, Ken Moffat wrote: > Two questions: Does regenerating (e.g. if the tarball has dropped > out of the cache) change its sums (md5sum or similar) ? In (beyond) > linuxfromscratch we use md5sums to verify that a tarball has not > changed. The tarballs we auto-generate from tags are cached, but they can change if the cached version expires _and_ the archive-generation code changes. We use "git archive" to generate the tarballs themselves, and then gzip the with "gzip -n". So it should be consistent from run to run. However, very occasionally there are bugfixes in "git archive" which can affect the output. E.g., commit 22f0dcd (archive-tar: split long paths more carefully, 2013-01-05) changes the representation of certain long paths, and generating a tarball with and without it will result in different checksums (for some repos). So if you are planning on baking md5sums into a package-build system, it is much better to point at "official" releases which are rolled once by the project maintainer, rather than the automatic tag page. Junio, since you prepare such tarballs[1] anyway for kernel.org, it might be worth uploading them to the "Releases" page of git/git. I imagine there is a programmatic way to do so via GitHub's API, but I don't know offhand. I can look into it if you are interested. -Peff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4] ACPI: Fix acpi_evaluate_object() return value check
On 2014/1/23 5:37, Bjorn Helgaas wrote: > On Mon, Jan 20, 2014 at 7:46 PM, Yijing Wang wrote: >> Since acpi_evaluate_object() returns acpi_status and not plain int, >> ACPI_FAILURE() should be used for checking its return value. >> >> Reviewed-by: Jani Nikula >> Signed-off-by: Yijing Wang >> --- >> v3->v4: Fix spell error, add Jani Nikula reviewed-by. >> v2->v3: Fix compile error pointed out by Hanjun. >> v1->v2: Add CC to related subsystem MAINTAINERS >> --- >> drivers/gpu/drm/i915/intel_acpi.c | 24 >> ++-- >> drivers/gpu/drm/nouveau/core/subdev/mxm/base.c |9 + >> drivers/gpu/drm/nouveau/nouveau_acpi.c | 23 >> +-- >> drivers/pci/pci-label.c|9 ++--- > > For the drivers/pci/pci-label.c part, > > Acked-by: Bjorn Helgaas Thanks. > >> + status = acpi_evaluate_object(handle, "_DSM", , ); >> + if (ACPI_FAILURE(status)) { >> + DRM_DEBUG_DRIVER("failed to evaluate _DSM: %s\n", >> + acpi_format_exception(status)); > > It's too bad there isn't an easy way to produce more informative error > messages, e.g., by including a namespace path or something. A message > like: > > failed to evaluate _DSM: A requested entity is not found > > is only useful if there's enough context to figure out what's going on. Yes, I will add the namespace path into the debug info, thanks! > > Bjorn > > . > -- Thanks! Yijing -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] ARM: kexec: copying code to ioremapped area
On 2014/1/22 21:27, Russell King - ARM Linux wrote: > On Wed, Jan 22, 2014 at 07:25:15PM +0800, Wang Nan wrote: >> ARM's kdump is actually corrupted (at least for omap4460), mainly because of >> cache problem: flush_icache_range can't reliably ensure the copied data >> correctly goes into RAM. > > Quite right too. You're mistake here is thinking that flush_icache_range() > should push it to RAM. That's incorrect. > > flush_icache_range() is there to deal with such things as loadable modules > and self modifying code, where the MMU is not being turned off. Hence, it > only flushes to the point of coherency between the I and D caches, and > any further levels of cache between that point and memory are not touched. > Why should it touch any more levels - it's not the function's purpose. > >> After mmu turned off and jump to the trampoline, kexec always failed due >> to random undef instructions. > > We already have code in the kernel which deals with shutting the MMU off. > An instance of how this can be done is illustrated in the soft_restart() > code path, and kexec already uses this. > > One of the first things soft_restart() does is turn off the outer cache - > which OMAP4 does have, but this can only be done if there is a single CPU > running. If there's multiple CPUs running, then the outer cache can't be > disabled, and that's the most likely cause of the problem you're seeing. > You are right, commit b25f3e1c (OMAP4/highbank: Flush L2 cache before disabling) solves my problem, it flushes outer cache before disabling. I have tested it in UP and SMP situations and it works (actually, omap4 has not ready to support kexec in SMP case, I insert an empty cpu_kill() to make it work), so the first 2 patches are unneeded. What about the 3rd one (ARM: allow kernel to be loaded in middle of phymem)? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm: BUG: Bad rss-counter state
On Wed, Jan 22, 2014 at 09:16:03PM -0500, Sasha Levin wrote: > On 01/22/2014 08:52 PM, Dave Jones wrote: > > Sasha, is this the current git tree version of Trinity ? > > (I'm wondering if yesterdays munmap changes might be tickling this bug). > > Ah yes, my tree has the munmap patch from yesterday, which would explain why > we > started seeing this issue just now. So that change is basically allowing trinity to munmap just part of a prior mmap. So it may do things like.. mmap |--| munmap |XXX---| munmap |--XXX-| ie, it might try unmapping some pages more than once, and may even overlap prior munmaps. until yesterdays change, it would only munmap the entire mmap. There's no easy way to tell exactly what happened without a trinity log of course. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] SUNRPC: Allow one callback request to be received from two sk_buff
2014/1/21 Trond Myklebust : > > On Jan 21, 2014, at 3:08, shaobingqing wrote: > >> 2014/1/21 Trond Myklebust : >>> On Mon, 2014-01-20 at 14:59 +0800, shaobingqing wrote: In current code, there only one struct rpc_rqst is prealloced. If one callback request is received from two sk_buff, the xprt_alloc_bc_request would be execute two times with the same transport->xid. The first time xprt_alloc_bc_request will alloc one struct rpc_rqst and the TCP_RCV_COPY_DATA bit of transport->tcp_flags will not be cleared. The second time xprt_alloc_bc_request could not alloc struct rpc_rqst any more and NULL pointer will be returned, then xprt_force_disconnect occur. I think one callback request can be allowed to be received from two sk_buff. Signed-off-by: shaobingqing --- net/sunrpc/xprtsock.c | 11 +-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index ee03d35..606950d 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1271,8 +1271,13 @@ static inline int xs_tcp_read_callback(struct rpc_xprt *xprt, struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); struct rpc_rqst *req; + static struct rpc_rqst *req_partial; + + if (req_partial == NULL) + req = xprt_alloc_bc_request(xprt); + else if (req_partial->rq_xid == transport->tcp_xid) + req = req_partial; >>> >>> What happens here if req_partial->rq_xid != transport->tcp_xid? AFAICS, >>> req will be undefined. Either way, you cannot use a static variable for >>> storage here: that isn't re-entrant. >> >> Because metadata sever only have one slot for backchannel request, >> req_partial->rq_xid == transport->tcp_xid always happens, if the callback >> request just being splited in two sk_buffs. But req_partial->rq_xid != >> transport->tcp_xid may also happens in some special cases, such as >> retransmission occurs? > > If the server retransmits, then it is broken. The NFSv4.1 protocol does not > allow it to retransmit unless the connection breaks. What I am saying above is bogus. As far as I can see, If one callback request is splitted into two sk_buffs, the function xs_tcp_read_callback will be called two times with the same rpc_xprt and the same xid. If between the two calls there is another call with the same rpc_xprt, but different xid, we consider it is another callback request from the same server, in the condition that there is no retransmission in our enviorenment. But this might not happen because there is only one callback slot in each server. > >> If one callback request is splited in two sk_buffs, xs_tcp_read_callback >> will be execute two times. The req_partial should be a static variable, >> because the second execution of xs_tcp_read_callback should use >> the rpc_rqst allocated for the first execution, which saves information >> copies from the first sk_buff. > > No! This is a multi-threaded/process environment which can support multiple > connection. It is a bug to use a static variable. I think I have misunderstood the question. Here a static variable can not be used. Perhaps, we should define a variable for each rpc_client (or rpc xprt). > > -- > Trond Myklebust > Linux NFS client maintainer > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Internal error: Oops: 17 [#1] ARM
On Wed, Jan 22, 2014 at 9:49 PM, John Tobias wrote: > Hello all, > > Just to confirm that the error I posted previously exist in 3.13 > released. Just be noted that some patches related to eMMC/sdhci has > been applied in order to boot the 3.13 on my board. > Addition to that, I was getting additional errors (please see below): > - It happened during the reboot. > > Cc'ng Dong Aisheng. What are the steps to reproduce this? Which SoC are you using? Regards, Fabio Estevam -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -tip 4/8] perf-probe: Use the actual address instead of the symbol name
Since several local symbols can have same name (e.g. t_show), we need to use the actual address instead of symbol name for those points. Note that this works only with debuginfo. E.g. without this change; # ./perf probe -a t_show \$vars Added new events: probe:t_show (on t_show with $vars) probe:t_show_1 (on t_show with $vars) probe:t_show_2 (on t_show with $vars) probe:t_show_3 (on t_show with $vars) You can now use it in all perf tools, such as: perf record -e probe:t_show_3 -aR sleep 1 OK, we have 4 different t_show()s. All functions have different arguments as below; # cat /sys/kernel/debug/tracing/kprobe_events p:probe/t_show t_show m=%di:u64 v=%si:u64 p:probe/t_show_1 t_show m=%di:u64 v=%si:u64 t=%si:u64 p:probe/t_show_2 t_show m=%di:u64 v=%si:u64 fmt=%si:u64 p:probe/t_show_3 t_show m=%di:u64 v=%si:u64 file=%si:u64 However, all of them have been put on the *same* address. # cat /sys/kernel/debug/kprobes/list 810d9720 k t_show+0x0[DISABLED] 810d9720 k t_show+0x0[DISABLED] 810d9720 k t_show+0x0[DISABLED] 810d9720 k t_show+0x0[DISABLED] With this change; # ./perf probe -a t_show \$vars Added new events: probe:t_show (on t_show with $vars) probe:t_show_1 (on t_show with $vars) probe:t_show_2 (on t_show with $vars) probe:t_show_3 (on t_show with $vars) You can now use it in all perf tools, such as: perf record -e probe:t_show_3 -aR sleep 1 # cat /sys/kernel/debug/tracing/kprobe_events p:probe/t_show 0x810d9720 m=%di:u64 v=%si:u64 p:probe/t_show_1 0x810e2e40 m=%di:u64 v=%si:u64 t=%si:u64 p:probe/t_show_2 0x810ece30 m=%di:u64 v=%si:u64 fmt=%si:u64 p:probe/t_show_3 0x810f4ad0 m=%di:u64 v=%si:u64 file=%si:u64 # cat /sys/kernel/debug/kprobes/list 810e2e40 k t_show+0x0[DISABLED] 810ece30 k t_show+0x0[DISABLED] 810f4ad0 k t_show+0x0[DISABLED] 810d9720 k t_show+0x0[DISABLED] This time, each event is put in different address correctly. Note that currently this doesn't support address-based probe on modules (thus the probes on modules are symbol based), since it requires relative address probe syntax for kprobe-tracer, and it doesn't implemented yet. One more note, this allows us to put events on correct address, but --list option should be updated to show correct corresponding source code. Signed-off-by: Masami Hiramatsu --- tools/perf/util/probe-event.c | 23 +++ 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 2fb4486..92ab688 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -1529,20 +1529,27 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev) if (buf == NULL) return NULL; - if (tev->uprobes) - len = e_snprintf(buf, MAX_CMDLEN, "%c:%s/%s %s:%s", -tp->retprobe ? 'r' : 'p', -tev->group, tev->event, + len = e_snprintf(buf, MAX_CMDLEN, "%c:%s/%s ", tp->retprobe ? 'r' : 'p', +tev->group, tev->event); + if (len <= 0) + goto error; + + /* Use the real address, except for kernel modules */ + if (tp->address && !(tp->module && !tev->uprobes)) + ret = e_snprintf(buf + len, MAX_CMDLEN, "%s%s0x%lx", +tp->module ?: "", tp->module ? ":" : "", +tp->address); + else if (tev->uprobes) + ret = e_snprintf(buf + len, MAX_CMDLEN, "%s:%s", tp->module, tp->symbol); else - len = e_snprintf(buf, MAX_CMDLEN, "%c:%s/%s %s%s%s+%lu", -tp->retprobe ? 'r' : 'p', -tev->group, tev->event, + ret = e_snprintf(buf + len, MAX_CMDLEN, "%s%s%s+%lu", tp->module ?: "", tp->module ? ":" : "", tp->symbol, tp->offset); - if (len <= 0) + if (ret <= 0) goto error; + len += ret; for (i = 0; i < tev->nargs; i++) { ret = synthesize_probe_trace_arg(>args[i], buf + len, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -tip 6/8] perf-probe: Show symbol+offset for address only kprobes
Show the symbol+offset information for address only kprobe events when --list operation without debuginfo. Currently those events are shown by the address itself. With this change perf probe finds symbols on those addresses and shows it. E.g. without this change (when debuginfo is not available); # ./perf probe -l probe:t_show (on 0x810d9720 with m v) probe:t_show_1 (on 0x810e2e40 with m v t) probe:t_show_2 (on 0x810ece30 with m v fmt) probe:t_show_3 (on 0x810f4ad0 with m v file) With this change; # ./perf probe -l probe:t_show (on t_show with m v) probe:t_show_1 (on t_show with m v t) probe:t_show_2 (on t_show with m v fmt) probe:t_show_3 (on t_show with m v file) Signed-off-by: Masami Hiramatsu --- tools/perf/util/probe-event.c | 35 +++ 1 file changed, 27 insertions(+), 8 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 3470934..bf1d73b 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -118,6 +118,7 @@ static void exit_symbol_maps(void) symbol__exit(); } +/* Caller must call init_symbol_maps before use this */ static struct symbol *__find_kernel_function_by_name(const char *name, struct map **mapp) { @@ -125,6 +126,12 @@ static struct symbol *__find_kernel_function_by_name(const char *name, NULL); } +/* Caller must call init_symbol_maps before use this */ +static struct symbol *__find_kernel_function(u64 addr, struct map **mapp) +{ + return machine__find_kernel_function(host_machine, addr, mapp, NULL); +} + static struct map *kernel_get_module_map(const char *module) { struct rb_node *nd; @@ -222,17 +229,29 @@ static int convert_to_perf_probe_point(struct probe_trace_point *tp, { char buf[128]; int ret; - - if (tp->symbol) { + struct symbol *sym; + struct map *map; + u64 addr; + + if (!tp->symbol) { + sym = __find_kernel_function(tp->address, ); + if (sym) { + pp->function = strdup(sym->name); + addr = map->unmap_ip(map, sym->start); + pp->offset = tp->address - addr; + } else { + ret = e_snprintf(buf, 128, "0x%" PRIx64, +(u64)tp->address); + if (ret < 0) + return ret; + pp->function = strdup(buf); + pp->offset = 0; + } + } else { pp->function = strdup(tp->symbol); pp->offset = tp->offset; - } else { - ret = e_snprintf(buf, 128, "0x%" PRIx64, (u64)tp->address); - if (ret < 0) - return ret; - pp->function = strdup(buf); - pp->offset = 0; } + if (pp->function == NULL) return -ENOMEM; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -tip 1/8] [BUGFIX] perf-probe: Fix to do exit call for symbol maps
Some perf-probe commands do symbol_init() but doesn't do exit call. This fixes that to call symbol_exit() and relase machine if needed. This also merges init_vmlinux() and init_user_exec() because both of them are doing similar things. (init_user_exec() just skips init vmlinux related symbol maps) Signed-off-by: Masami Hiramatsu --- tools/perf/util/probe-event.c | 110 +++-- 1 file changed, 61 insertions(+), 49 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index a8a9b6c..14c649df 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -73,31 +73,35 @@ static char *synthesize_perf_probe_point(struct perf_probe_point *pp); static int convert_name_to_addr(struct perf_probe_event *pev, const char *exec); static void clear_probe_trace_event(struct probe_trace_event *tev); -static struct machine machine; +static struct machine *host_machine; /* Initialize symbol maps and path of vmlinux/modules */ -static int init_vmlinux(void) +static int init_symbol_maps(bool user_only) { int ret; symbol_conf.sort_by_name = true; - if (symbol_conf.vmlinux_name == NULL) - symbol_conf.try_vmlinux_path = true; - else - pr_debug("Use vmlinux: %s\n", symbol_conf.vmlinux_name); + if (user_only) + symbol_conf.try_vmlinux_path = false; + else { + if (symbol_conf.vmlinux_name == NULL) + symbol_conf.try_vmlinux_path = true; + else + pr_debug("Use vmlinux: %s\n", symbol_conf.vmlinux_name); + } ret = symbol__init(); if (ret < 0) { pr_debug("Failed to init symbol map.\n"); goto out; } - ret = machine__init(, "", HOST_KERNEL_ID); - if (ret < 0) - goto out; - - if (machine__create_kernel_maps() < 0) { - pr_debug("machine__create_kernel_maps() failed.\n"); - goto out; + if (host_machine || user_only) /* already initialized */ + return 0; + host_machine = machine__new_host(); + if (!host_machine) { + pr_debug("machine__new_host() failed.\n"); + symbol__exit(); + ret = -1; } out: if (ret < 0) @@ -105,21 +109,30 @@ out: return ret; } +static void exit_symbol_maps(void) +{ + if (host_machine) { + machine__delete(host_machine); + host_machine = NULL; + } + symbol__exit(); +} + static struct symbol *__find_kernel_function_by_name(const char *name, struct map **mapp) { - return machine__find_kernel_function_by_name(, name, mapp, + return machine__find_kernel_function_by_name(host_machine, name, mapp, NULL); } static struct map *kernel_get_module_map(const char *module) { struct rb_node *nd; - struct map_groups *grp = + struct map_groups *grp = _machine->kmaps; /* A file path -- this is an offline module */ if (module && strchr(module, '/')) - return machine__new_module(, 0, module); + return machine__new_module(host_machine, 0, module); if (!module) module = "kernel"; @@ -141,7 +154,7 @@ static struct dso *kernel_get_module_dso(const char *module) const char *vmlinux_name; if (module) { - list_for_each_entry(dso, _dsos, node) { + list_for_each_entry(dso, _machine->kernel_dsos, node) { if (strncmp(dso->short_name + 1, module, dso->short_name_len - 2) == 0) goto found; @@ -150,7 +163,7 @@ static struct dso *kernel_get_module_dso(const char *module) return NULL; } - map = machine.vmlinux_maps[MAP__FUNCTION]; + map = host_machine->vmlinux_maps[MAP__FUNCTION]; dso = map->dso; vmlinux_name = symbol_conf.vmlinux_name; @@ -173,20 +186,6 @@ const char *kernel_get_module_path(const char *module) return (dso) ? dso->long_name : NULL; } -static int init_user_exec(void) -{ - int ret = 0; - - symbol_conf.try_vmlinux_path = false; - symbol_conf.sort_by_name = true; - ret = symbol__init(); - - if (ret < 0) - pr_debug("Failed to init symbol map.\n"); - - return ret; -} - static int convert_exec_to_group(const char *exec, char **result) { char *ptr1, *ptr2, *exec_copy; @@ -563,7 +562,7 @@ static int _show_one_line(FILE *fp, int l, bool skip, bool show_num) * Show line-range always requires debuginfo to find source file and * line number. */ -int show_line_range(struct line_range *lr, const char *module) +static int
[PATCH -tip 7/8] perf-probe: Show source-level or symbol-level info for uprobes
Show source-level or symbol-level information for uprobe events. Without this change; # ./perf probe -l probe_perf:dso__load_vmlinux (on 0x0006d110 in /kbuild/ksrc/linux-3/tools/perf/perf) With this change; # ./perf probe -l probe_perf:dso__load_vmlinux (on dso__load_vmlinux@util/symbol.c in /kbuild/ksrc/linux-3/tools/perf/perf) Signed-off-by: Masami Hiramatsu --- tools/perf/util/probe-event.c | 149 - 1 file changed, 88 insertions(+), 61 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index bf1d73b..84c1807 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -224,42 +224,6 @@ out: return ret; } -static int convert_to_perf_probe_point(struct probe_trace_point *tp, - struct perf_probe_point *pp) -{ - char buf[128]; - int ret; - struct symbol *sym; - struct map *map; - u64 addr; - - if (!tp->symbol) { - sym = __find_kernel_function(tp->address, ); - if (sym) { - pp->function = strdup(sym->name); - addr = map->unmap_ip(map, sym->start); - pp->offset = tp->address - addr; - } else { - ret = e_snprintf(buf, 128, "0x%" PRIx64, -(u64)tp->address); - if (ret < 0) - return ret; - pp->function = strdup(buf); - pp->offset = 0; - } - } else { - pp->function = strdup(tp->symbol); - pp->offset = tp->offset; - } - - if (pp->function == NULL) - return -ENOMEM; - - pp->retprobe = tp->retprobe; - - return 0; -} - #ifdef HAVE_DWARF_SUPPORT /* Open new debuginfo of given module */ static struct debuginfo *open_debuginfo(const char *module) @@ -285,8 +249,9 @@ static struct debuginfo *open_debuginfo(const char *module) * Convert trace point to probe point with debuginfo * Currently only handles kprobes. */ -static int kprobe_convert_to_perf_probe(struct probe_trace_point *tp, - struct perf_probe_point *pp) +static int find_perf_probe_point_from_dwarf(struct probe_trace_point *tp, + struct perf_probe_point *pp, + bool is_kprobe) { struct symbol *sym; struct map *map; @@ -306,7 +271,11 @@ static int kprobe_convert_to_perf_probe(struct probe_trace_point *tp, pr_debug("try to find information at %" PRIx64 " in %s\n", addr, tp->module ? : "kernel"); - dinfo = debuginfo__new_online_kernel(addr); + if (is_kprobe) + dinfo = debuginfo__new_online_kernel(addr); + else + dinfo = open_debuginfo(tp->module); + if (dinfo) { ret = debuginfo__find_probe_point(dinfo, (unsigned long)addr, pp); @@ -319,9 +288,8 @@ static int kprobe_convert_to_perf_probe(struct probe_trace_point *tp, if (ret <= 0) { error: - pr_debug("Failed to find corresponding probes from " -"debuginfo. Use kprobe event information.\n"); - return convert_to_perf_probe_point(tp, pp); + pr_debug("Failed to find corresponding probes from debuginfo.\n"); + return ret ? : -ENOENT; } pp->retprobe = tp->retprobe; @@ -776,21 +744,12 @@ out: #else /* !HAVE_DWARF_SUPPORT */ -static int kprobe_convert_to_perf_probe(struct probe_trace_point *tp, - struct perf_probe_point *pp) +static int +find_perf_probe_point_from_dwarf(struct probe_trace_point *tp __maybe_unused, +struct perf_probe_point *pp __maybe_unused, +bool is_kprobe __maybe_unused) { - struct symbol *sym; - - if (tp->symbol) { - sym = __find_kernel_function_by_name(tp->symbol, NULL); - if (!sym) { - pr_err("Failed to find symbol %s in kernel.\n", - tp->symbol); - return -ENOENT; - } - } - - return convert_to_perf_probe_point(tp, pp); + return -ENOSYS; } static int try_to_find_probe_trace_events(struct perf_probe_event *pev, @@ -1609,6 +1568,78 @@ error: return NULL; } +static int find_perf_probe_point_from_map(struct probe_trace_point *tp, + struct perf_probe_point *pp, + bool is_kprobe) +{ + struct symbol *sym = NULL; + struct map *map = NULL; + u64 addr; + int ret = 0; + + if (is_kprobe) +
[PATCH -tip 8/8] perf-probe: Allow to add events on the local functions
Allow to add events on the local functions without debuginfo. (With the debuginfo, we can add events even on inlined functions) Currently, probing on local functions requires debuginfo to locate actual address. It is also possible without debuginfo since we have symbol maps. Without this change; # ./perf probe -a t_show Added new event: probe:t_show (on t_show) You can now use it in all perf tools, such as: perf record -e probe:t_show -aR sleep 1 # ./perf probe -x perf -a identity__map_ip no symbols found in /kbuild/ksrc/linux-3/tools/perf/perf, maybe install a debug package? Failed to load map. Error: Failed to add events. (-22) As the above results, perf probe just put one event on the first found symbol for kprobe event. Moreover, for uprobe event, perf probe failed to find local functions. With this change; # ./perf probe -a t_show Added new events: probe:t_show (on t_show) probe:t_show_1 (on t_show) probe:t_show_2 (on t_show) probe:t_show_3 (on t_show) You can now use it in all perf tools, such as: perf record -e probe:t_show_3 -aR sleep 1 # ./perf probe -x perf -a identity__map_ip Added new events: probe_perf:identity__map_ip (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf) probe_perf:identity__map_ip_1 (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf) probe_perf:identity__map_ip_2 (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf) probe_perf:identity__map_ip_3 (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf) You can now use it in all perf tools, such as: perf record -e probe_perf:identity__map_ip_3 -aR sleep 1 Now we succeed to put events on every given local functions for both kprobes and uprobes. :) Note that this also introduces some symbol rbtree iteration macros; symbols__for_each, dso__for_each_symbol, and map__for_each_symbol. These are for walking through the symbol list in a map. Signed-off-by: Masami Hiramatsu --- tools/perf/util/dso.h | 10 + tools/perf/util/map.h | 10 + tools/perf/util/probe-event.c | 351 ++--- tools/perf/util/symbol.h | 11 + 4 files changed, 183 insertions(+), 199 deletions(-) diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h index cd7d6f0..ab06f1c 100644 --- a/tools/perf/util/dso.h +++ b/tools/perf/util/dso.h @@ -102,6 +102,16 @@ struct dso { char name[0]; }; +/* dso__for_each_symbol - iterate over the symbols of given type + * + * @dso: the 'struct dso *' in which symbols itereated + * @pos: the 'struct symbol *' to use as a loop cursor + * @n: the 'struct rb_node *' to use as a temporary storage + * @type: the 'enum map_type' type of symbols + */ +#define dso__for_each_symbol(dso, pos, n, type)\ + symbols__for_each_entry(&(dso)->symbols[(type)], pos, n) + static inline void dso__set_loaded(struct dso *dso, enum map_type type) { dso->loaded |= (1 << type); diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h index 18068c6..ef18a48 100644 --- a/tools/perf/util/map.h +++ b/tools/perf/util/map.h @@ -89,6 +89,16 @@ u64 map__objdump_2mem(struct map *map, u64 ip); struct symbol; +/* map__for_each_symbol - iterate over the symbols in the given map + * + * @map: the 'struct map *' in which symbols itereated + * @pos: the 'struct symbol *' to use as a loop cursor + * @n: the 'struct rb_node *' to use as a temporary storage + * Note: caller must ensure map->dso is not NULL (map is loaded). + */ +#define map__for_each_symbol(map, pos, n) \ + dso__for_each_symbol(map->dso, pos, n, map->type) + typedef int (*symbol_filter_t)(struct map *map, struct symbol *sym); void map__init(struct map *map, enum map_type type, diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 84c1807..93087d7 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -70,8 +70,6 @@ static int e_snprintf(char *str, size_t size, const char *format, ...) } static char *synthesize_perf_probe_point(struct perf_probe_point *pp); -static int convert_name_to_addr(struct perf_probe_event *pev, - const char *exec); static void clear_probe_trace_event(struct probe_trace_event *tev); static struct machine *host_machine; @@ -119,14 +117,6 @@ static void exit_symbol_maps(void) } /* Caller must call init_symbol_maps before use this */ -static struct symbol *__find_kernel_function_by_name(const char *name, -struct map **mapp) -{ - return machine__find_kernel_function_by_name(host_machine, name, mapp, -NULL); -} - -/* Caller must call init_symbol_maps before use this */ static struct symbol *__find_kernel_function(u64 addr, struct map **mapp) { return machine__find_kernel_function(host_machine, addr,
[PATCH -tip 2/8] [BUGFIX] perf-tools: Load map before using map->map_ip
In map_groups__find_symbol() map->map_ip is used without ensuring the map is loaded. Then the address passed to map->map_ip isn't mapped at the first time. E.g. below code always fails to get a symbol at the first call; addr = /* Somewhere in the kernel text */ symbol_conf.try_vmlinux_path = true; symbol__init(); host_machine = machine__new_host(); sym = machine__find_kernel_function(host_machine, addr, NULL, NULL); /* Note that machine__find_kernel_function calls map_groups__find_symbol */ This ensures it by calling map__load before using it in map_groups__find_symbol(). Signed-off-by: Masami Hiramatsu --- tools/perf/util/map.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c index 9b9bd71..6a805e7 100644 --- a/tools/perf/util/map.c +++ b/tools/perf/util/map.c @@ -386,7 +386,8 @@ struct symbol *map_groups__find_symbol(struct map_groups *mg, { struct map *map = map_groups__find(mg, type, addr); - if (map != NULL) { + /* Ensure map is loaded before using map->map_ip */ + if (map != NULL && map__load(map, filter) >= 0) { if (mapp != NULL) *mapp = map; return map__find_symbol(map, map->map_ip(map, addr), filter); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -tip 3/8] perf-probe: Show in what binaries/modules probes are set
Show the name of binary file or modules in which the probes are set with --list option. Without this change; # ./perf probe -m drm drm_av_sync_delay # ./perf probe -x perf dso__load_vmlinux # ./perf probe -l probe:drm_av_sync_delay (on drm_av_sync_delay) probe_perf:dso__load_vmlinux (on 0x0006d110) With this change; # ./perf probe -l probe:drm_av_sync_delay (on drm_av_sync_delay in drm) probe_perf:dso__load_vmlinux (on 0x0006d110 in /kbuild/ksrc/linux-3/tools/perf/perf) Signed-off-by: Masami Hiramatsu --- tools/perf/util/probe-event.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 14c649df..2fb4486 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -1742,7 +1742,8 @@ static struct strlist *get_probe_trace_command_rawlist(int fd) } /* Show an event */ -static int show_perf_probe_event(struct perf_probe_event *pev) +static int show_perf_probe_event(struct perf_probe_event *pev, +const char *module) { int i, ret; char buf[128]; @@ -1758,6 +1759,8 @@ static int show_perf_probe_event(struct perf_probe_event *pev) return ret; printf(" %-20s (on %s", buf, place); + if (module) + printf(" in %s", module); if (pev->nargs > 0) { printf(" with"); @@ -1795,7 +1798,8 @@ static int __show_perf_probe_events(int fd, bool is_kprobe) ret = convert_to_perf_probe_event(, , is_kprobe); if (ret >= 0) - ret = show_perf_probe_event(); + ret = show_perf_probe_event(, + tev.point.module); } clear_perf_probe_event(); clear_probe_trace_event(); @@ -1994,7 +1998,7 @@ static int __add_probe_trace_events(struct perf_probe_event *pev, group = pev->group; pev->event = tev->event; pev->group = tev->group; - show_perf_probe_event(pev); + show_perf_probe_event(pev, tev->point.module); /* Trick here - restore current event/group */ pev->event = (char *)event; pev->group = (char *)group; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -tip 5/8] perf-probe: Show source level information for address only kprobes
Show the source code level information for address only kprobe events. Currently the perf probe shows such information only for symbol-based probes. With this change, perf-probe correctly parses the address-based events and tries to find the actual lines of code from the debuginfo. E.g. without this patch; # ./perf probe -l probe:t_show (on 0x810d9720 with m v) probe:t_show_1 (on 0x810e2e40 with m v t) probe:t_show_2 (on 0x810ece30 with m v fmt) probe:t_show_3 (on 0x810f4ad0 with m v file) With this patch; # ./perf probe -l probe:t_show (on t_show@linux-3/kernel/trace/ftrace.c with m v) probe:t_show_1 (on t_show@linux-3/kernel/trace/trace.c with m v t) probe:t_show_2 (on t_show@kernel/trace/trace_printk.c with m v fmt) probe:t_show_3 (on t_show@kernel/trace/trace_events.c with m v file) Signed-off-by: Masami Hiramatsu --- tools/perf/util/probe-event.c | 87 ++--- 1 file changed, 56 insertions(+), 31 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 92ab688..3470934 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -153,7 +153,7 @@ static struct dso *kernel_get_module_dso(const char *module) struct map *map; const char *vmlinux_name; - if (module) { + if (module && strcmp(module, "kernel") != 0) { list_for_each_entry(dso, _machine->kernel_dsos, node) { if (strncmp(dso->short_name + 1, module, dso->short_name_len - 2) == 0) @@ -220,12 +220,22 @@ out: static int convert_to_perf_probe_point(struct probe_trace_point *tp, struct perf_probe_point *pp) { - pp->function = strdup(tp->symbol); + char buf[128]; + int ret; + if (tp->symbol) { + pp->function = strdup(tp->symbol); + pp->offset = tp->offset; + } else { + ret = e_snprintf(buf, 128, "0x%" PRIx64, (u64)tp->address); + if (ret < 0) + return ret; + pp->function = strdup(buf); + pp->offset = 0; + } if (pp->function == NULL) return -ENOMEM; - pp->offset = tp->offset; pp->retprobe = tp->retprobe; return 0; @@ -261,28 +271,35 @@ static int kprobe_convert_to_perf_probe(struct probe_trace_point *tp, { struct symbol *sym; struct map *map; - u64 addr; - int ret = -ENOENT; + u64 addr = tp->address; + int ret; struct debuginfo *dinfo; - sym = __find_kernel_function_by_name(tp->symbol, ); - if (sym) { + if (!addr) { + sym = __find_kernel_function_by_name(tp->symbol, ); + if (!sym) { + ret = -ENOENT; + goto error; + } addr = map->unmap_ip(map, sym->start + tp->offset); - pr_debug("try to find %s+%ld@%" PRIx64 "\n", tp->symbol, -tp->offset, addr); + } + + pr_debug("try to find information at %" PRIx64 " in %s\n", addr, +tp->module ? : "kernel"); - dinfo = debuginfo__new_online_kernel(addr); - if (dinfo) { - ret = debuginfo__find_probe_point(dinfo, + dinfo = debuginfo__new_online_kernel(addr); + if (dinfo) { + ret = debuginfo__find_probe_point(dinfo, (unsigned long)addr, pp); - debuginfo__delete(dinfo); - } else { - pr_debug("Failed to open debuginfo at 0x%" PRIx64 "\n", -addr); - ret = -ENOENT; - } + debuginfo__delete(dinfo); + } else { + pr_debug("Failed to open debuginfo at 0x%" PRIx64 "\n", +addr); + ret = -ENOENT; } + if (ret <= 0) { +error: pr_debug("Failed to find corresponding probes from " "debuginfo. Use kprobe event information.\n"); return convert_to_perf_probe_point(tp, pp); @@ -745,10 +762,13 @@ static int kprobe_convert_to_perf_probe(struct probe_trace_point *tp, { struct symbol *sym; - sym = __find_kernel_function_by_name(tp->symbol, NULL); - if (!sym) { - pr_err("Failed to find symbol %s in kernel.\n", tp->symbol); - return -ENOENT; + if (tp->symbol) { + sym = __find_kernel_function_by_name(tp->symbol, NULL); + if (!sym) { + pr_err("Failed to find symbol %s in kernel.\n", + tp->symbol); + return -ENOENT; + }
[PATCH -tip 0/8] perf-probe: Updates for handling local functions correctly
Hi, Here is a series of patches for handling local functions correctly in perf-probe. Issue 1) Current perf-probe can't handle probe-points for kprobes, since it uses symbol-based probe definition. The symbol based definition is easy to read and robust for differnt kernel and modules. However, when user gives a local function name which has several different instances, it may put probes on wrong (or unexpected) address. On the other hand, since uprobe events are based on the actual address, it can avoid this issue. E.g. In the case to probe t_show local functions (which has 4 different instances. # grep " t_show\$" /proc/kallsyms 810d9720 t t_show 810e2e40 t t_show 810ece30 t t_show 810f4ad0 t t_show # ./perf probe -f t_show \$vars Added new events: probe:t_show (on t_show with $vars) probe:t_show_1 (on t_show with $vars) probe:t_show_2 (on t_show with $vars) probe:t_show_3 (on t_show with $vars) You can now use it in all perf tools, such as: perf record -e probe:t_show_3 -aR sleep 1 OK, we have 4 different t_show()s. All functions have different arguments as below; # cat /sys/kernel/debug/tracing/kprobe_events p:probe/t_show t_show m=%di:u64 v=%si:u64 p:probe/t_show_1 t_show m=%di:u64 v=%si:u64 t=%si:u64 p:probe/t_show_2 t_show m=%di:u64 v=%si:u64 fmt=%si:u64 p:probe/t_show_3 t_show m=%di:u64 v=%si:u64 file=%si:u64 However, all of them have been put on the *same* address. # cat /sys/kernel/debug/kprobes/list 810d9720 k t_show+0x0[DISABLED] 810d9720 k t_show+0x0[DISABLED] 810d9720 k t_show+0x0[DISABLED] 810d9720 k t_show+0x0[DISABLED] oops... Issue 2) With the debuginfo, issue 1 can be solved by using address-based probe definition instead of symbol-based. However, without debuginfo, perf-probe can only use symbol-map in the binary (or kallsyms). The map provides symbol find methods, but it returns only the first matched symbol. To put probes on all functions which have given symbol, we need a symbol-list iterator for the map. E.g. (built perf with NO_DWARF=1) In the case to probe t_show and identity__map_ip in perf. # ./perf probe -a t_show Added new event: probe:t_show (on t_show) You can now use it in all perf tools, such as: perf record -e probe:t_show -aR sleep 1 # ./perf probe -x perf -a identity__map_ip no symbols found in /kbuild/ksrc/linux-3/tools/perf/perf, maybe install a debug package? Failed to load map. Error: Failed to add events. (-22) oops. Solutions) To solve the issue 1, this series changes perf probe to use address-based probe definition. This means that we also need to fix the --list options to analyze probe addresses instead of symbols (and that has been done in this series). E.g. with this series; # ./perf probe -f t_show \$vars Added new events: probe:t_show (on t_show with $vars) probe:t_show_1 (on t_show with $vars) probe:t_show_2 (on t_show with $vars) probe:t_show_3 (on t_show with $vars) You can now use it in all perf tools, such as: perf record -e probe:t_show_3 -aR sleep 1 # cat /sys/kernel/debug/tracing/kprobe_events p:probe/t_show 0x810d9720 m=%di:u64 v=%si:u64 p:probe/t_show_1 0x810e2e40 m=%di:u64 v=%si:u64 t=%si:u64 p:probe/t_show_2 0x810ece30 m=%di:u64 v=%si:u64 fmt=%si:u64 p:probe/t_show_3 0x810f4ad0 m=%di:u64 v=%si:u64 file=%si:u64 # cat /sys/kernel/debug/kprobes/list 810e2e40 k t_show+0x0[DISABLED] 810ece30 k t_show+0x0[DISABLED] 810f4ad0 k t_show+0x0[DISABLED] 810d9720 k t_show+0x0[DISABLED] This time we can see the events are set in different addresses. And for the issue 2, the last patch introduces symbol iterators for map, dso and symbols (since the symbol list is the symbols and it is included dso, and perf probe accesses dso via map). E.g. with this series (built perf with NO_DWARF=1); # ./perf probe -a t_show Added new events: probe:t_show (on t_show) probe:t_show_1 (on t_show) probe:t_show_2 (on t_show) probe:t_show_3 (on t_show) You can now use it in all perf tools, such as: perf record -e probe:t_show_3 -aR sleep 1 # ./perf probe -x perf -a identity__map_ip Added new events: probe_perf:identity__map_ip (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf) probe_perf:identity__map_ip_1 (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf) probe_perf:identity__map_ip_2 (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf) probe_perf:identity__map_ip_3 (on identity__map_ip in /kbuild/ksrc/linux-3/tools/perf/perf) You can now use it in all perf tools, such as: perf record -e probe_perf:identity__map_ip_3 -aR sleep 1 Now, even without the debuginfo, both the kprobe and uprobe are set 4 different
Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors - going beyond 4096 bytes
On Wed, 22 Jan 2014, Chris Mason wrote: On Wed, 2014-01-22 at 11:50 -0800, Andrew Morton wrote: On Wed, 22 Jan 2014 11:30:19 -0800 James Bottomley wrote: But this, I think, is the fundamental point for debate. If we can pull alignment and other tricks to solve 99% of the problem is there a need for radical VM surgery? Is there anything coming down the pipe in the future that may move the devices ahead of the tricks? I expect it would be relatively simple to get large blocksizes working on powerpc with 64k PAGE_SIZE. So before diving in and doing huge amounts of work, perhaps someone can do a proof-of-concept on powerpc (or ia64) with 64k blocksize. Maybe 5 drives in raid5 on MD, with 4K coming from each drive. Well aligned 16K IO will work, everything else will about the same as a rmw from a single drive. I think this is the key point to think about here. How will these new hard drive large block sizes differ from RAID stripes and SSD eraseblocks? In all of these cases there are very clear advantages to doing the writes in properly sized and aligned chunks that correspond with the underlying structure to avoid the RMW overhead. It's extremely unlikely that drive manufacturers will produce drives that won't work with any existing OS, so they are going to support smaller writes in firmware. If they don't, they won't be able to sell their drives to anyone running existing software. Given the Enterprise software upgrade cycle compared to the expanding storage needs, whatever they ship will have to work on OS and firmware releases that happened several years ago. I think what is needed is some way to be able to get a report on how man RMW cycles have to happen. Then people can work on ways to reduce this number and measure the results. I don't know if md and dm are currently smart enough to realize that the entire stripe is being overwritten and avoid the RMW cycle. If they can't, I would expect that once we start measuring it, they will gain such support. David Lang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] sched: tip/master show soft lockup while running multiple VM
On 01/22/2014 08:36 PM, Peter Zijlstra wrote: > On Wed, Jan 22, 2014 at 04:27:45PM +0800, Michael wang wrote: >> # CONFIG_PREEMPT_NONE is not set >> CONFIG_PREEMPT_VOLUNTARY=y >> # CONFIG_PREEMPT is not set > > Could you try the patch here: > > lkml.kernel.org/r/20140122102435.gh31...@twins.programming.kicks-ass.net > > I suspect its the same issue. Yup, it works. Regards, Michael Wang > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Internal error: Oops: 17 [#1] ARM
On Wed, Jan 22, 2014 at 08:23:36AM -0800, John Tobias wrote: >Hello all, > >I am using 3.13-rc1 kernel on iMX6SL processor. My filesystem is in >eMMC running SDR50. >Is anyone here encountered these problem and if there's any existing >patch that I can get?. hi, Do you use gcc 4.8.1? If so, maybe you should look at following link to see whether it's a similar issue. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854 Liming Wang > >Regards, > >john > >[ 1552.394899] Unable to handle kernel NULL pointer dereference at >virtual address 0037 >[ 1552.403034] pgd = beef4000 >[ 1552.405855] [0037] *pgd=bef60831, *pte=, *ppte= >[ 1552.412245] Internal error: Oops: 17 [#1] ARM >[ 1552.416627] Modules linked in: bt8xxx(O) sd8xxx(O) mlan(O) >[ 1552.422249] CPU: 0 PID: 232 Comm: commsd Tainted: G O 3.13.0-rc1 >#7 >[ 1552.429409] task: bfbc7500 ti: bec96000 task.ti: bec96000 >[ 1552.434844] PC is at lookup_fast+0x5c/0x318 >[ 1552.439067] LR is at mark_held_locks+0x78/0x13c >[ 1552.443622] pc : [<80101184>]lr : [<80056e48>]psr: a00f0013 >[ 1552.443622] sp : bec97d88 ip : 00666e6f fp : bec97ddc >[ 1552.455124] r10: r9 : bec97e08 r8 : 80102d94 >[ 1552.460370] r7 : bec97e60 r6 : bf133ac8 r5 : bec97e60 r4 : bec97e00 >[ 1552.466918] r3 : bee4f01d r2 : r1 : r0 : >[ 1552.473471] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment >user >[ 1552.480629] Control: 10c53c7d Table: beef4059 DAC: 0015 >[ 1552.486397] Process commsd (pid: 232, stack limit = 0xbec96238) >[ 1552.492341] Stack: (0xbec97d88 to 0xbec98000) >[ 1552.496728] 7d80: 80102b94 80057108 bfb95310 >bf133ac8 bf15f4e8 bfb95310 >[ 1552.504936] 7da0: c08bb14d bee4f015 0008 bfbc7500 >bec97e08 0041 >[ 1552.513142] 7dc0: bec97e60 bec96020 bec96000 bec97e3c >bec97de0 80102d94 80101134 >[ 1552.521347] 7de0: bec97df8 800d982c bec96018 0010 >bec97e00 bec97e08 >[ 1552.529553] 7e00: 8026e25c 800d97e8 bee4f000 0ff0 80d4e3a4 >0001 bee4f000 bec97e60 >[ 1552.537758] 7e20: ff9c ff9c bec96000 bec97e5c >bec97e40 801033bc 80102c70 >[ 1552.545964] 7e40: bee4f000 0001 bec97e60 bec97f00 bec97ee4 >bec97e60 80105dc0 80103398 >[ 1552.554170] 7e60: bfb95310 bf133ac8 c08bb14d 000b bee4f015 >8005992c bfb95310 bf133398 >[ 1552.562375] 7e80: bf15f4e8 0041 0002 008a > 600f0013 bec96000 >[ 1552.570581] 7ea0: ffea bf8c1840 807b4430 80115444 801156e4 >8011563c 0008 >[ 1552.578786] 7ec0: bec97f04 733fe4e0 0001 ff9c 757e3810 >bec97f40 bec97efc bec97ee8 >[ 1552.586991] 7ee0: 80105e0c 80105d68 bc950fe0 bec97f2c >bec97f00 800faf64 80105df4 >[ 1552.595196] 7f00: 801156e4 80115420 bec97f54 733fe4e0 733ff8f0 >733fe61c 00c3 8000f504 >[ 1552.603402] 7f20: bec97f3c bec97f30 800fafe0 800faf1c bec97fa4 >bec97f40 800fb71c 800fafc4 >[ 1552.611608] 7f40: 8000f310 bfbc7500 733fe550 8000f458 733fe61c >00c3 bec97f84 bec97f68 >[ 1552.619813] 7f60: 80056f28 800a0270 733fe550 733ff8f0 733fe61c >00c3 bec97f94 bec97f88 >[ 1552.628019] 7f80: 80057110 80056f18 bec97f98 8000f458 >733fe550 bec97fa8 >[ 1552.636226] 7fa0: 8000f280 800fb704 733fe550 733ff8f0 757e3810 >733fe4e0 733fe550 0003 >[ 1552.644431] 7fc0: 733fe550 733ff8f0 733fe61c 00c3 >0002 733fe92c 0200 >[ 1552.652636] 7fe0: 00c3 733fe4d8 7579b7e5 7572e276 200f0030 >757e3810 bfffd821 bfffdc21 >[ 1552.660828] Backtrace: >[ 1552.663343] [<80101128>] (lookup_fast+0x0/0x318) from [<80102d94>] >(path_lookupat+0x130/0x728) >[ 1552.671994] [<80102c64>] (path_lookupat+0x0/0x728) from >[<801033bc>] (filename_lookup.isra.40+0x30/0x70) >[ 1552.681515] [<8010338c>] (filename_lookup.isra.40+0x0/0x70) from >[<80105dc0>] (user_path_at_empty+0x64/0x8c) >[ 1552.691361] r7:bec97f00 r6:bec97e60 r5:0001 r4:bee4f000 >[ 1552.697163] [<80105d5c>] (user_path_at_empty+0x0/0x8c) from >[<80105e0c>] (user_path_at+0x24/0x2c) >[ 1552.706053] r8:bec97f40 r7:757e3810 r6:ff9c r5:0001 r4:733fe4e0 >[ 1552.712927] [<80105de8>] (user_path_at+0x0/0x2c) from [<800faf64>] >(vfs_fstatat+0x54/0xa8) >[ 1552.721232] [<800faf10>] (vfs_fstatat+0x0/0xa8) from [<800fafe0>] >(vfs_stat+0x28/0x2c) >[ 1552.729167] r8:8000f504 r7:00c3 r6:733fe61c r5:733ff8f0 r4:733fe4e0 >[ 1552.736031] [<800fafb8>] (vfs_stat+0x0/0x2c) from [<800fb71c>] >(SyS_stat64+0x24/0x40) >[ 1552.743902] [<800fb6f8>] (SyS_stat64+0x0/0x40) from [<8000f280>] >(ret_fast_syscall+0x0/0x48) >[ 1552.752359] r4:733fe550 >[ 1552.754946] Code: eb00352d e350 e50b0038 0a80 (e5903038) >[ 1552.761270] ---[ end trace 02679086a39365e8 ]--- >[ 1552.765968] Kernel panic - not syncing: Fatal exception > >___ >linux-arm-kernel mailing list >linux-arm-ker...@lists.infradead.org >http://lists.infradead.org/mailman/listinfo/linux-arm-kernel -- To unsubscribe from this list:
Re: [PATCH v2] mm/zswap: Check all pool pages instead of one pool pages
Hello Cai, On Thu, Jan 23, 2014 at 09:38:41AM +0800, Cai Liu wrote: > Hello Dan > > 2014/1/22 Dan Streetman : > > On Wed, Jan 22, 2014 at 7:16 AM, Cai Liu wrote: > >> Hello Minchan > >> > >> > >> 2014/1/22 Minchan Kim > >>> > >>> Hello Cai, > >>> > >>> On Tue, Jan 21, 2014 at 09:52:25PM +0800, Cai Liu wrote: > >>> > Hello Minchan > >>> > > >>> > 2014/1/21 Minchan Kim : > >>> > > Hello, > >>> > > > >>> > > On Tue, Jan 21, 2014 at 02:35:07PM +0800, Cai Liu wrote: > >>> > >> 2014/1/21 Minchan Kim : > >>> > >> > Please check your MUA and don't break thread. > >>> > >> > > >>> > >> > On Tue, Jan 21, 2014 at 11:07:42AM +0800, Cai Liu wrote: > >>> > >> >> Thanks for your review. > >>> > >> >> > >>> > >> >> 2014/1/21 Minchan Kim : > >>> > >> >> > Hello Cai, > >>> > >> >> > > >>> > >> >> > On Mon, Jan 20, 2014 at 03:50:18PM +0800, Cai Liu wrote: > >>> > >> >> >> zswap can support multiple swapfiles. So we need to check > >>> > >> >> >> all zbud pool pages in zswap. > >>> > >> >> >> > >>> > >> >> >> Version 2: > >>> > >> >> >> * add *total_zbud_pages* in zbud to record all the pages in > >>> > >> >> >> pools > >>> > >> >> >> * move the updating of pool pages statistics to > >>> > >> >> >> alloc_zbud_page/free_zbud_page to hide the details > >>> > >> >> >> > >>> > >> >> >> Signed-off-by: Cai Liu > >>> > >> >> >> --- > >>> > >> >> >> include/linux/zbud.h |2 +- > >>> > >> >> >> mm/zbud.c| 44 > >>> > >> >> >> > >>> > >> >> >> mm/zswap.c |4 ++-- > >>> > >> >> >> 3 files changed, 35 insertions(+), 15 deletions(-) > >>> > >> >> >> > >>> > >> >> >> diff --git a/include/linux/zbud.h b/include/linux/zbud.h > >>> > >> >> >> index 2571a5c..1dbc13e 100644 > >>> > >> >> >> --- a/include/linux/zbud.h > >>> > >> >> >> +++ b/include/linux/zbud.h > >>> > >> >> >> @@ -17,6 +17,6 @@ void zbud_free(struct zbud_pool *pool, > >>> > >> >> >> unsigned long handle); > >>> > >> >> >> int zbud_reclaim_page(struct zbud_pool *pool, unsigned int > >>> > >> >> >> retries); > >>> > >> >> >> void *zbud_map(struct zbud_pool *pool, unsigned long handle); > >>> > >> >> >> void zbud_unmap(struct zbud_pool *pool, unsigned long handle); > >>> > >> >> >> -u64 zbud_get_pool_size(struct zbud_pool *pool); > >>> > >> >> >> +u64 zbud_get_pool_size(void); > >>> > >> >> >> > >>> > >> >> >> #endif /* _ZBUD_H_ */ > >>> > >> >> >> diff --git a/mm/zbud.c b/mm/zbud.c > >>> > >> >> >> index 9451361..711aaf4 100644 > >>> > >> >> >> --- a/mm/zbud.c > >>> > >> >> >> +++ b/mm/zbud.c > >>> > >> >> >> @@ -52,6 +52,13 @@ > >>> > >> >> >> #include > >>> > >> >> >> #include > >>> > >> >> >> > >>> > >> >> >> +/* > >>> > >> >> >> +* statistics > >>> > >> >> >> +**/ > >>> > >> >> >> + > >>> > >> >> >> +/* zbud pages in all pools */ > >>> > >> >> >> +static u64 total_zbud_pages; > >>> > >> >> >> + > >>> > >> >> >> /* > >>> > >> >> >> * Structures > >>> > >> >> >> */ > >>> > >> >> >> @@ -142,10 +149,28 @@ static struct zbud_header > >>> > >> >> >> *init_zbud_page(struct page *page) > >>> > >> >> >> return zhdr; > >>> > >> >> >> } > >>> > >> >> >> > >>> > >> >> >> +static struct page *alloc_zbud_page(struct zbud_pool *pool, > >>> > >> >> >> gfp_t gfp) > >>> > >> >> >> +{ > >>> > >> >> >> + struct page *page; > >>> > >> >> >> + > >>> > >> >> >> + page = alloc_page(gfp); > >>> > >> >> >> + > >>> > >> >> >> + if (page) { > >>> > >> >> >> + pool->pages_nr++; > >>> > >> >> >> + total_zbud_pages++; > >>> > >> >> > > >>> > >> >> > Who protect race? > >>> > >> >> > >>> > >> >> Yes, here the pool->pages_nr and also the total_zbud_pages are > >>> > >> >> not protected. > >>> > >> >> I will re-do it. > >>> > >> >> > >>> > >> >> I will change *total_zbud_pages* to atomic type. > >>> > >> > > >>> > >> > Wait, it doesn't make sense. Now, you assume zbud allocator would > >>> > >> > be used > >>> > >> > for only zswap. It's true until now but we couldn't make sure it > >>> > >> > in future. > >>> > >> > If other user start to use zbud allocator, total_zbud_pages would > >>> > >> > be pointless. > >>> > >> > >>> > >> Yes, you are right. ZBUD is a common module. So in this patch > >>> > >> calculate the > >>> > >> zswap pool size in zbud is not suitable. > >>> > >> > >>> > >> > > >>> > >> > Another concern is that what's your scenario for above two swap? > >>> > >> > How often we need to call zbud_get_pool_size? > >>> > >> > In previous your patch, you reduced the number of call so IIRC, > >>> > >> > we only called it in zswap_is_full and for debugfs. > >>> > >> > >>> > >> zbud_get_pool_size() is called frequently when adding/freeing zswap > >>> > >> entry happen in zswap . This is why in this patch I added a counter > >>> > >> in zbud, > >>> > >> and then in zswap the iteration of zswap_list to calculate the pool > >>> > >> size will > >>> > >>
Re: kvm virtio ethernet ring on guest side over high throughput (packet per second)
On 01/22/2014 11:22 PM, Stefan Hajnoczi wrote: > On Tue, Jan 21, 2014 at 04:06:05PM -0200, Alejandro Comisario wrote: > > CCed Michael Tsirkin and Jason Wang who work on KVM networking. > >> Hi guys, we had in the past when using physical servers, several >> throughput issues regarding the throughput of our APIS, in our case we >> measure this with packets per seconds, since we dont have that much >> bandwidth (Mb/s) since our apis respond lots of packets very small >> ones (maximum response of 3.5k and avg response of 1.5k), when we >> where using this physical servers, when we reach throughput capacity >> (due to clients tiemouts) we touched the ethernet ring configuration >> and we made the problem dissapear. >> >> Today with kvm and over 10k virtual instances, when we want to >> increase the throughput of KVM instances, we bumped with the fact that >> when using virtio on guests, we have a max configuration of the ring >> of 256 TX/RX, and from the host side the atached vnet has a txqueuelen >> of 500. >> >> What i want to know is, how can i tune the guest to support more >> packets per seccond if i know that's my bottleneck? > I suggest investigating performance in a systematic way. Set up a > benchmark that saturates the network. Post the details of the benchmark > and the results that you are seeing. > > Then, we can discuss how to investigate the root cause of the bottleneck. > >> * does virtio exposes more packets to configure in the virtual ethernet's >> ring ? > No, ring size is hardcoded in QEMU (on the host). Do it make sense to let user can configure it through something at least like qemu command line? > >> * does the use of vhost_net helps me with increasing packets per >> second and not only bandwidth? > vhost_net is generally the most performant network option. > >> does anyone has to struggle with this before and knows where i can look into >> ? >> there's LOOOTS of information about networking performance >> tuning of kvm, but nothing related to increase throughput in pps >> capacity. >> >> This is a couple of configurations that we are having right now on the >> compute nodes: >> >> * 2x1Gb bonded interfaces (want to know the more than 20 models we are >> using, just ask for it) >> * Multi queue interfaces, pined via irq to different cores Maybe you can have a try with multiqueue virtio-net with vhost. It can let guest to use more than one tx/rx virtqueue pairs to do the network processing. >> * Linux bridges, no VLAN, no open-vswitch >> * ubuntu 12.04 kernel 3.2.0-[40-48] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm: BUG: Bad rss-counter state
On 01/22/2014 09:21 PM, Dave Jones wrote: On Wed, Jan 22, 2014 at 09:16:03PM -0500, Sasha Levin wrote: > On 01/22/2014 08:52 PM, Dave Jones wrote: > > Sasha, is this the current git tree version of Trinity ? > > (I'm wondering if yesterdays munmap changes might be tickling this bug). > > Ah yes, my tree has the munmap patch from yesterday, which would explain why we > started seeing this issue just now. So that change is basically allowing trinity to munmap just part of a prior mmap. So it may do things like.. mmap |--| munmap |XXX---| munmap |--XXX-| ie, it might try unmapping some pages more than once, and may even overlap prior munmaps. until yesterdays change, it would only munmap the entire mmap. There's no easy way to tell exactly what happened without a trinity log of course. I've attached the trinity log of the child that triggered the bug. Odd thing is that I don't see any munmaps in it. Thanks, Sasha [child234:9994] [0] [32BIT] munlock(addr=0x7f724f784000, len=0x40) = -1 (Cannot allocate memory) [child234:9994] [1] remap_file_pages(start=0x7f724e984000, size=0x406f79, prot=0, pgoff=6, flags=0x1) = 0 [child234:9994] [2] vmsplice(fd=682, iov=0x318d710, nr_segs=404, flags=2) = 0x5000 [child234:9994] [3] mbind(start=0x7f724f384000, len=0x40, mode=1, nmask=0, maxnode=0x8000, flags=0) = 0 [child234:9994] [4] mmap(addr=0, len=0x20, prot=7[PROT_READ|PROT_WRITE|PROT_EXEC], flags=0x43842, fd=682, off=0) = -1 (Invalid argument) [child234:9994] [5] mprotect(start=0x7f7250384000, len=0x20, prot=0) = 0 [child234:9994] [6] mprotect(start=0x7f7250886000, len=8192, prot=0x205) = -1 (Invalid argument) [child234:9994] [7] munlock(addr=0x7f7250584000, len=0x10) = 0 [child234:9994] [8] [32BIT] mlock(addr=0x7f7250684000, len=0x10) = -1 (Cannot allocate memory) [child234:9994] [9] move_pages(pid=0, nr_pages=236, pages=0x3015ed0, nodes=0x3111010, status=0x31909d0, flags=4) = 0 [child234:9994] [10] mlock(addr=0x7f7250384000, len=0x20) = -1 (Cannot allocate memory) [child234:9994] [11] remap_file_pages(start=0x7f724f784000, size=0x3bbfbd, prot=0, pgoff=19, flags=0) = 0 [child234:9994] [12] msync(start=0x7f724d584000, len=0xa0, flags=3) = 0 [child234:9994] [13] mlock(addr=0x7f7250684000, len=0x10) = 0 [child234:9994] [14] madvise(start=0x7f7250384000, len_in=0x20, advice=0) = 0 [child234:9994] [15] mlock(addr=0x7f7250888000, len=8192) = 0 [child234:9994] [16] mbind(start=0x7f7250584000, len=0x10, mode=0, nmask=0, maxnode=0x8000, flags=0x4000) = -1 (Invalid argument) [child234:9994] [17] move_pages(pid=9896, nr_pages=124, pages=0x3015ed0, nodes=0x3190d90, status=0x3109500, flags=4) = -1 (Invalid argument) [child234:9994] [18] mprotect(start=0x7f724df84000, len=0xa0, prot=8) = 0 [child234:9994] [19] move_pages(pid=0, nr_pages=221, pages=0x3015ed0, nodes=0x3109700, status=0x3109a80, flags=6) = 0 [child234:9994] [20] [32BIT] madvise(start=0x7f7250184000, len_in=0x20, advice=14) = -1 (Cannot allocate memory) [child234:9994] [21] move_pages(pid=0, nr_pages=337, pages=0x3015ed0, nodes=0x318f790, status=0x318fce0, flags=4) = 0 [child234:9994] [22] move_pages(pid=9981, nr_pages=115, pages=0x3015ed0, nodes=0x3109e00, status=0x9db1a0, flags=4) = 0 [child234:9994] [23] migrate_pages(pid=0, maxnode=0x680016c3, old_nodes=0x6ba000[page_0xff], new_nodes=0x8100) = -1 (Invalid argument) [child234:9994] [24] msync(start=0x7f7250384000, len=0x20, flags=1) = 0 [child234:9994] [25] msync(start=0x7f724fb84000, len=0x40, flags=6) = 0 [child234:9994] [26] mincore(start=0, len=0, vec=0x8100) = -1 (Bad address) [child234:9994] [27] remap_file_pages(start=0x7f7250184000, size=0x1597ab, prot=0, pgoff=336, flags=0) = 0 [child234:9994] [28] move_pages(pid=0, nr_pages=99, pages=0x3015ed0, nodes=0x3190230, status=0x9db380, flags=0) = 0 [child234:9994] [29] mincore(start=0x7f724df84000, len=0x31978b, vec=0x7f724df84001) = -1 (Bad address) [child234:9994] [30] move_pages(pid=9962, nr_pages=83, pages=0x3015ed0, nodes=0x31113d0, status=0x9db520, flags=6) = -1 (Invalid argument) [child234:9994] [31] [32BIT] madvise(start=0x7f7250384000, len_in=0x20, advice=9) = -1 (Cannot allocate memory) [child234:9994] [32] msync(start=0x7f7250886000, len=8192, flags=6) = 0 [child234:9994] [33] migrate_pages(pid=0, maxnode=0x929292929292, old_nodes=1, new_nodes=0x6c[page_allocs]) = -1 (Invalid argument) [child234:9994] [34] mlock(addr=0x7f7250888000, len=8192) = 0 [child234:9994] [35] mbind(start=0x7f724fb84000, len=0x40, mode=3, nmask=0x6c[page_allocs], maxnode=0x8000, flags=0) = -1 (Invalid argument) [child234:9994] [36] vmsplice(fd=681, iov=0x31903d0, nr_segs=68, flags=1) = 4096 [child234:9994] [37] mbind(start=0x7f7250584000, len=0x10, mode=3, nmask=0x3016ee0, maxnode=0x8000, flags=0x8000) = -1 (Invalid argument) [child234:9994] [38] mmap(addr=0, len=0x4000,
Re: kvm virtio ethernet ring on guest side over high throughput (packet per second)
On 01/23/2014 05:32 AM, Alejandro Comisario wrote: > Thank you so much Stefan for the help and cc'ing Michael & Jason. > Like you advised yesterday on IRC, today we are making some tests with > the application setting TCP_NODELAY in the socket options. > > So we will try that and get back to you with further information. > In the mean time, maybe showing what options the vms are using while running ! > > # > -- > /usr/bin/kvm -S -M pc-1.0 -cpu > core2duo,+lahf_lm,+rdtscp,+pdpe1gb,+aes,+popcnt,+x2apic,+sse4.2,+sse4.1,+dca,+xtpr,+cx16,+tm2,+est,+vmx,+ds_cpl,+pbe,+tm,+ht,+ss,+acpi,+ds > -enable-kvm -m 32768 -smp 8,sockets=1,cores=6,threads=2 -name > instance-0254 -uuid d25b1b20-409e-4d7f-bd92-2ef4073c7c2b > -nodefconfig -nodefaults -chardev > socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-0254.monitor,server,nowait > -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc > -no-shutdown -kernel /var/lib/nova/instances/instance-0254/kernel > -initrd /var/lib/nova/instances/instance-0254/ramdisk -append > root=/dev/vda console=ttyS0 -drive > file=/var/lib/nova/instances/instance-0254/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=writethrough > -device > virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 > -netdev tap,fd=19,id=hostnet0 -device Better enable vhost as Stefan suggested. It may help a lot here. > virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:27:d4:6d,bus=pci.0,addr=0x3 > -chardev > file,id=charserial0,path=/var/lib/nova/instances/instance-0254/console.log > -device isa-serial,chardev=charserial0,id=serial0 -chardev > pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 > -usb -device usb-tablet,id=input0 -vnc 0.0.0.0:4 -k en-us -vga cirrus > -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 > # > -- > > best regards > > > Alejandro Comisario > #melicloud CloudBuilders > Arias 3751, Piso 7 (C1430CRG) > Ciudad de Buenos Aires - Argentina > Cel: +549(11) 15-3770-1857 > Tel : +54(11) 4640-8443 > > > On Wed, Jan 22, 2014 at 12:22 PM, Stefan Hajnoczi wrote: >> On Tue, Jan 21, 2014 at 04:06:05PM -0200, Alejandro Comisario wrote: >> >> CCed Michael Tsirkin and Jason Wang who work on KVM networking. >> >>> Hi guys, we had in the past when using physical servers, several >>> throughput issues regarding the throughput of our APIS, in our case we >>> measure this with packets per seconds, since we dont have that much >>> bandwidth (Mb/s) since our apis respond lots of packets very small >>> ones (maximum response of 3.5k and avg response of 1.5k), when we >>> where using this physical servers, when we reach throughput capacity >>> (due to clients tiemouts) we touched the ethernet ring configuration >>> and we made the problem dissapear. >>> >>> Today with kvm and over 10k virtual instances, when we want to >>> increase the throughput of KVM instances, we bumped with the fact that >>> when using virtio on guests, we have a max configuration of the ring >>> of 256 TX/RX, and from the host side the atached vnet has a txqueuelen >>> of 500. >>> >>> What i want to know is, how can i tune the guest to support more >>> packets per seccond if i know that's my bottleneck? >> I suggest investigating performance in a systematic way. Set up a >> benchmark that saturates the network. Post the details of the benchmark >> and the results that you are seeing. >> >> Then, we can discuss how to investigate the root cause of the bottleneck. >> >>> * does virtio exposes more packets to configure in the virtual ethernet's >>> ring ? >> No, ring size is hardcoded in QEMU (on the host). >> >>> * does the use of vhost_net helps me with increasing packets per >>> second and not only bandwidth? >> vhost_net is generally the most performant network option. >> >>> does anyone has to struggle with this before and knows where i can look >>> into ? >>> there's LOOOTS of information about networking performance >>> tuning of kvm, but nothing related to increase throughput in pps >>> capacity. >>> >>> This is a couple of configurations that we are having right now on the >>> compute nodes: >>> >>> * 2x1Gb bonded interfaces (want to know the more than 20 models we are >>> using, just ask for it) >>> * Multi queue interfaces, pined via irq to different cores >>> * Linux bridges, no VLAN, no open-vswitch >>> * ubuntu 12.04 kernel 3.2.0-[40-48] > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To
Re: [V0 PATCH] xen/pvh: set some cr flags upon vcpu start
On Mon, 20 Jan 2014 10:09:30 -0500 Konrad Rzeszutek Wilk wrote: > On Fri, Jan 17, 2014 at 06:24:55PM -0800, Mukesh Rathor wrote: > > pvh was designed to start with pv flags, but a commit in xen tree > > Thank you for posting this! > > > 51e2cac257ec8b4080d89f0855c498cbbd76a5e5 removed some of the flags > > as > > You need to always include the title of said commit. > > > they are not necessary. As a result, these CR flags must be set in > > the guest. > > I sent out replies to this over the weekend but somehow they are not > showing up. > Well, they finally showed up today... US mail must be slow :)... > > > + > > + if (!cpu) > > + return; > > And what happens if don't have this check? Will be bad if do multiple > cr4 writes? no, but just confuses the reader/debugger of the code IMO :)... > Fyi, this (cr4) should have been a seperate patch. I fixed it up that > way. > > + /* > > +* Unlike PV, for pvh xen does not set: PSE PGE OSFXSR > > OSXMMEXCPT > > +* For BSP, PSE PGE will be set in probe_page_size_mask(), > > for AP > > +* set them here. For all, OSFXSR OSXMMEXCPT will be set > > in fpu_init > > +*/ > > + if (cpu_has_pse) > > + set_in_cr4(X86_CR4_PSE); > > + > > + if (cpu_has_pge) > > + set_in_cr4(X86_CR4_PGE); > > +} > > Seperate patch and since the PGE part is more complicated that just > setting the CR4 - you also have to tweak this: > > 1512 /* Prevent unwanted bits from being set in PTEs. > */ 1513 __supported_pte_mask &= > ~_PAGE_GLOBAL; > > I think it should be done once we have actually confirmed that you can > do 2MB pages within the guest. (might need some more tweaking?) Umm... well, the above is just setting the PSE and PGE in the APs, the BSP is already doing that in probe_page_size_mask, and setting __supported_pte_mask which needs to be set just once. So, because it's being set in the BSP, it's already broken/untested if we add expose of PGE from xen to a linux PVH guest... IOW, leaving above is no more harm, or we should 'if (pvh)' the code in probe_page_size_mask() for PSE, and wait till we can test it... thanks Mukesh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Internal error: Oops: 17 [#1] ARM
Hi Fabio, Attached are the two patch files that I applied in the 3.13 released so that the kernel will detect my eMMC in DDR50. (let me correct my previous email, I was mentioning SDR50 but it should be DDR50). eMMC info: clock: 5200 Hz actual clock: 4950 Hz vdd:21 (3.3 ~ 3.4 V) bus mode: 2 (push-pull) chip select:0 (don't care) power mode: 2 (on) bus width: 3 (8 bits) timing spec:1 (mmc high-speed) signal voltage: 0 (3.30 V) I reboot my device often and it shows during the reboot. Regards, john On Wed, Jan 22, 2014 at 6:28 PM, Fabio Estevam wrote: > On Wed, Jan 22, 2014 at 9:49 PM, John Tobias wrote: >> Hello all, >> >> Just to confirm that the error I posted previously exist in 3.13 >> released. Just be noted that some patches related to eMMC/sdhci has >> been applied in order to boot the 3.13 on my board. >> Addition to that, I was getting additional errors (please see below): >> - It happened during the reboot. >> >> Cc'ng Dong Aisheng. > > What are the steps to reproduce this? Which SoC are you using? > > Regards, > > Fabio Estevam sdhci-esdhc-imx.patch Description: Binary data sdhci.patch Description: Binary data
Re: Internal error: Oops: 17 [#1] ARM
Hi Liming, Yes, I am using 4.8.1. I switched back to 4.7.3 and will test it again if I can re-produce it. Regards, john On Wed, Jan 22, 2014 at 7:01 PM, walimis wrote: > On Wed, Jan 22, 2014 at 08:23:36AM -0800, John Tobias wrote: >>Hello all, >> >>I am using 3.13-rc1 kernel on iMX6SL processor. My filesystem is in >>eMMC running SDR50. >>Is anyone here encountered these problem and if there's any existing >>patch that I can get?. > hi, > > Do you use gcc 4.8.1? If so, maybe you should look at following link > to see whether it's a similar issue. > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854 > > Liming Wang > >> >>Regards, >> >>john >> >>[ 1552.394899] Unable to handle kernel NULL pointer dereference at >>virtual address 0037 >>[ 1552.403034] pgd = beef4000 >>[ 1552.405855] [0037] *pgd=bef60831, *pte=, *ppte= >>[ 1552.412245] Internal error: Oops: 17 [#1] ARM >>[ 1552.416627] Modules linked in: bt8xxx(O) sd8xxx(O) mlan(O) >>[ 1552.422249] CPU: 0 PID: 232 Comm: commsd Tainted: G O 3.13.0-rc1 >>#7 >>[ 1552.429409] task: bfbc7500 ti: bec96000 task.ti: bec96000 >>[ 1552.434844] PC is at lookup_fast+0x5c/0x318 >>[ 1552.439067] LR is at mark_held_locks+0x78/0x13c >>[ 1552.443622] pc : [<80101184>]lr : [<80056e48>]psr: a00f0013 >>[ 1552.443622] sp : bec97d88 ip : 00666e6f fp : bec97ddc >>[ 1552.455124] r10: r9 : bec97e08 r8 : 80102d94 >>[ 1552.460370] r7 : bec97e60 r6 : bf133ac8 r5 : bec97e60 r4 : bec97e00 >>[ 1552.466918] r3 : bee4f01d r2 : r1 : r0 : >>[ 1552.473471] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment >>user >>[ 1552.480629] Control: 10c53c7d Table: beef4059 DAC: 0015 >>[ 1552.486397] Process commsd (pid: 232, stack limit = 0xbec96238) >>[ 1552.492341] Stack: (0xbec97d88 to 0xbec98000) >>[ 1552.496728] 7d80: 80102b94 80057108 bfb95310 >>bf133ac8 bf15f4e8 bfb95310 >>[ 1552.504936] 7da0: c08bb14d bee4f015 0008 bfbc7500 >>bec97e08 0041 >>[ 1552.513142] 7dc0: bec97e60 bec96020 bec96000 bec97e3c >>bec97de0 80102d94 80101134 >>[ 1552.521347] 7de0: bec97df8 800d982c bec96018 0010 >>bec97e00 bec97e08 >>[ 1552.529553] 7e00: 8026e25c 800d97e8 bee4f000 0ff0 80d4e3a4 >>0001 bee4f000 bec97e60 >>[ 1552.537758] 7e20: ff9c ff9c bec96000 bec97e5c >>bec97e40 801033bc 80102c70 >>[ 1552.545964] 7e40: bee4f000 0001 bec97e60 bec97f00 bec97ee4 >>bec97e60 80105dc0 80103398 >>[ 1552.554170] 7e60: bfb95310 bf133ac8 c08bb14d 000b bee4f015 >>8005992c bfb95310 bf133398 >>[ 1552.562375] 7e80: bf15f4e8 0041 0002 008a >> 600f0013 bec96000 >>[ 1552.570581] 7ea0: ffea bf8c1840 807b4430 80115444 801156e4 >>8011563c 0008 >>[ 1552.578786] 7ec0: bec97f04 733fe4e0 0001 ff9c 757e3810 >>bec97f40 bec97efc bec97ee8 >>[ 1552.586991] 7ee0: 80105e0c 80105d68 bc950fe0 bec97f2c >>bec97f00 800faf64 80105df4 >>[ 1552.595196] 7f00: 801156e4 80115420 bec97f54 733fe4e0 733ff8f0 >>733fe61c 00c3 8000f504 >>[ 1552.603402] 7f20: bec97f3c bec97f30 800fafe0 800faf1c bec97fa4 >>bec97f40 800fb71c 800fafc4 >>[ 1552.611608] 7f40: 8000f310 bfbc7500 733fe550 8000f458 733fe61c >>00c3 bec97f84 bec97f68 >>[ 1552.619813] 7f60: 80056f28 800a0270 733fe550 733ff8f0 733fe61c >>00c3 bec97f94 bec97f88 >>[ 1552.628019] 7f80: 80057110 80056f18 bec97f98 8000f458 >>733fe550 bec97fa8 >>[ 1552.636226] 7fa0: 8000f280 800fb704 733fe550 733ff8f0 757e3810 >>733fe4e0 733fe550 0003 >>[ 1552.644431] 7fc0: 733fe550 733ff8f0 733fe61c 00c3 >>0002 733fe92c 0200 >>[ 1552.652636] 7fe0: 00c3 733fe4d8 7579b7e5 7572e276 200f0030 >>757e3810 bfffd821 bfffdc21 >>[ 1552.660828] Backtrace: >>[ 1552.663343] [<80101128>] (lookup_fast+0x0/0x318) from [<80102d94>] >>(path_lookupat+0x130/0x728) >>[ 1552.671994] [<80102c64>] (path_lookupat+0x0/0x728) from >>[<801033bc>] (filename_lookup.isra.40+0x30/0x70) >>[ 1552.681515] [<8010338c>] (filename_lookup.isra.40+0x0/0x70) from >>[<80105dc0>] (user_path_at_empty+0x64/0x8c) >>[ 1552.691361] r7:bec97f00 r6:bec97e60 r5:0001 r4:bee4f000 >>[ 1552.697163] [<80105d5c>] (user_path_at_empty+0x0/0x8c) from >>[<80105e0c>] (user_path_at+0x24/0x2c) >>[ 1552.706053] r8:bec97f40 r7:757e3810 r6:ff9c r5:0001 r4:733fe4e0 >>[ 1552.712927] [<80105de8>] (user_path_at+0x0/0x2c) from [<800faf64>] >>(vfs_fstatat+0x54/0xa8) >>[ 1552.721232] [<800faf10>] (vfs_fstatat+0x0/0xa8) from [<800fafe0>] >>(vfs_stat+0x28/0x2c) >>[ 1552.729167] r8:8000f504 r7:00c3 r6:733fe61c r5:733ff8f0 r4:733fe4e0 >>[ 1552.736031] [<800fafb8>] (vfs_stat+0x0/0x2c) from [<800fb71c>] >>(SyS_stat64+0x24/0x40) >>[ 1552.743902] [<800fb6f8>] (SyS_stat64+0x0/0x40) from [<8000f280>] >>(ret_fast_syscall+0x0/0x48) >>[ 1552.752359] r4:733fe550 >>[ 1552.754946] Code: eb00352d e350 e50b0038 0a80 (e5903038) >>[ 1552.761270] ---[ end trace
Re: [PATCH V5 6/8] time/cpuidle: Support in tick broadcast framework in the absence of external clock device
Hi Thomas, Thank you very much for the review. On 01/22/2014 06:57 PM, Thomas Gleixner wrote: > On Wed, 15 Jan 2014, Preeti U Murthy wrote: >> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c >> index 086ad60..d61404e 100644 >> --- a/kernel/time/clockevents.c >> +++ b/kernel/time/clockevents.c >> @@ -524,12 +524,13 @@ void clockevents_resume(void) >> #ifdef CONFIG_GENERIC_CLOCKEVENTS >> /** >> * clockevents_notify - notification about relevant events >> + * Returns non zero on error. >> */ >> -void clockevents_notify(unsigned long reason, void *arg) >> +int clockevents_notify(unsigned long reason, void *arg) >> { > > The interface change of clockevents_notify wants to be a separate > patch. > >> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c >> index 9532690..1c23912 100644 >> --- a/kernel/time/tick-broadcast.c >> +++ b/kernel/time/tick-broadcast.c >> @@ -20,6 +20,7 @@ >> #include >> #include >> #include >> +#include >> >> #include "tick-internal.h" >> >> @@ -35,6 +36,15 @@ static cpumask_var_t tmpmask; >> static DEFINE_RAW_SPINLOCK(tick_broadcast_lock); >> static int tick_broadcast_force; >> >> +/* >> + * Helper variables for handling broadcast in the absence of a >> + * tick_broadcast_device. >> + * */ >> +static struct hrtimer *bc_hrtimer; >> +static int bc_cpu = -1; >> +static ktime_t bc_next_wakeup; > > Why do you need another variable to store the expiry time? The > broadcast code already knows it and the hrtimer expiry value gives you > the same information for free. The reason was functions like tick_handle_oneshot_broadcast() and tick_broadcast_switch_to_oneshot() were using the tick_broadcast_device.evtdev->next_event to set/get the next wakeups. But since this patchset introduced an explicit hrtimer for archs which did not have such a device, I wanted these functions to use a generic parameter to set/get the next wakeups without having to know about the existence of this hrtimer, if at all. And program the hrtimer/tick broadcast device whichever was present only when the next event was to be set. But with your below concept patch, we will not be required to do this. > >> +static int hrtimer_initialized = 0; > > What's the point of this hrtimer_initialized dance? Why not simply > making the hrtimer static and avoid that all together. Also adding the > initialization into tick_broadcast_oneshot_available() is > braindamaged. Why not adding this to tick_broadcast_init() which is > the proper place to do? Right I agree, this hrtimer initialization should have been in tick_broadcast_init() and a simple static declaration would have done the job. > > Aside of that you are making this hrtimer mode unconditional, which > might break existing systems which are not aware of the hrtimer > implications. > > What you really want is a pseudo clock event device which has the > proper functions for handling the timer and you can register it from > your architecture code. The broadcast core code needs a few tweaks to > avoid the shutdown of the cpu local clock event device, but aside of > that the whole thing just falls into place. So architectures can use > this if they want and are sure that their low level idle code knows > about the deep idle preventing return value of > clockevents_notify(). Once that works you can register the hrtimer > based broadcast device and a real hardware broadcast device with a > higher rating. It just works. I now completely see your point. This will surely break on archs which are not using the return value of the BROADCAST_ENTER notification. I am not even giving them a choice about using the hrtimer mode of broadcast framework and am expecting them to take action for the failed return of BROADCAST_ENTER. I missed that critical point. I went through the below patch and am able to see how you are solving this problem. > > Find an incomplete and nonfunctional concept patch below. It should be > simple to make it work for real. Thank you very much for the valuable review. The below patch makes your points very clear. Let me try this out. Regards Preeti U Murthy > > Thanks, > > tglx > > Index: linux-2.6/include/linux/clockchips.h > === > --- linux-2.6.orig/include/linux/clockchips.h > +++ linux-2.6/include/linux/clockchips.h > @@ -62,6 +62,11 @@ enum clock_event_mode { > #define CLOCK_EVT_FEAT_DYNIRQ0x20 > #define CLOCK_EVT_FEAT_PERCPU0x40 > > +/* > + * Clockevent device is based on a hrtimer for broadcast > + */ > +#define CLOCK_EVT_FEAT_HRTIMER 0x80 > + > /** > * struct clock_event_device - clock event device descriptor > * @event_handler: Assigned by the framework to be called by the low > @@ -83,6 +88,7 @@ enum clock_event_mode { > * @name:ptr to clock event name > * @rating: variable to rate clock event devices > * @irq:
[PATCH v5] ACPI: Fix acpi_evaluate_object() return value check
Since acpi_evaluate_object() returns acpi_status and not plain int, ACPI_FAILURE() should be used for checking its return value. Also add some detailed debug info when acpi_evaluate_object() failed. Reviewed-by: Jani Nikula Acked-by: Bjorn Helgaas Signed-off-by: Yijing Wang --- v4->v5: Add some detailed debug info for acpi_evaluate_object() failure suggested by Bjorn. v3->v4: Fix spell error, add Jani Nikula reviewed-by. v2->v3: Fix compile error pointed out by Hanjun. v1->v2: Add CC to related subsystem MAINTAINERS --- drivers/gpu/drm/i915/intel_acpi.c | 33 --- drivers/gpu/drm/nouveau/core/subdev/mxm/base.c | 13 ++--- drivers/gpu/drm/nouveau/nouveau_acpi.c | 25 +++--- drivers/pci/pci-label.c| 10 +-- 4 files changed, 54 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_acpi.c b/drivers/gpu/drm/i915/intel_acpi.c index dfff090..e7b526b 100644 --- a/drivers/gpu/drm/i915/intel_acpi.c +++ b/drivers/gpu/drm/i915/intel_acpi.c @@ -31,11 +31,13 @@ static const u8 intel_dsm_guid[] = { static int intel_dsm(acpi_handle handle, int func) { struct acpi_buffer output = { ACPI_ALLOCATE_BUFFER, NULL }; + struct acpi_buffer string = { ACPI_ALLOCATE_BUFFER, NULL }; struct acpi_object_list input; union acpi_object params[4]; union acpi_object *obj; u32 result; - int ret = 0; + acpi_status status; + int ret; input.count = 4; input.pointer = params; @@ -50,10 +52,14 @@ static int intel_dsm(acpi_handle handle, int func) params[3].package.count = 0; params[3].package.elements = NULL; - ret = acpi_evaluate_object(handle, "_DSM", , ); - if (ret) { - DRM_DEBUG_DRIVER("failed to evaluate _DSM: %d\n", ret); - return ret; + status = acpi_evaluate_object(handle, "_DSM", , ); + if (ACPI_FAILURE(status)) { + acpi_get_name(handle, ACPI_FULL_PATHNAME, ); + DRM_DEBUG_DRIVER( + "failed to evaluate _DSM for %s, exit status %u\n", + (char *)string.pointer, (unsigned int)status); + kfree(string.pointer); + return -EINVAL; } obj = (union acpi_object *)output.pointer; @@ -138,10 +144,12 @@ static char *intel_dsm_mux_type(u8 type) static void intel_dsm_platform_mux_info(void) { struct acpi_buffer output = { ACPI_ALLOCATE_BUFFER, NULL }; + struct acpi_buffer string = { ACPI_ALLOCATE_BUFFER, NULL }; struct acpi_object_list input; union acpi_object params[4]; union acpi_object *pkg; - int i, ret; + acpi_status status; + int i; input.count = 4; input.pointer = params; @@ -156,10 +164,15 @@ static void intel_dsm_platform_mux_info(void) params[3].package.count = 0; params[3].package.elements = NULL; - ret = acpi_evaluate_object(intel_dsm_priv.dhandle, "_DSM", , - ); - if (ret) { - DRM_DEBUG_DRIVER("failed to evaluate _DSM: %d\n", ret); + acpi_status = acpi_evaluate_object(intel_dsm_priv.dhandle, + "_DSM", , ); + if (ACPI_FAILURE(status)) { + acpi_get_name(intel_dsm_priv.dhandle, + ACPI_FULL_PATHNAME, ); + DRM_DEBUG_DRIVER( + "failed to evaluate _DSM for %s, exit status %u\n", + (char *)string.pointer, (unsigned int)status); + kfree(string.pointer); goto out; } diff --git a/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c b/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c index 1291204..c30ee88 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c +++ b/drivers/gpu/drm/nouveau/core/subdev/mxm/base.c @@ -112,17 +112,22 @@ mxm_shadow_dsm(struct nouveau_mxm *mxm, u8 version) }; struct acpi_object_list list = { ARRAY_SIZE(args), args }; struct acpi_buffer retn = { ACPI_ALLOCATE_BUFFER, NULL }; + struct acpi_buffer string = { ACPI_ALLOCATE_BUFFER, NULL }; union acpi_object *obj; acpi_handle handle; - int ret; + acpi_status status; handle = ACPI_HANDLE(>pdev->dev); if (!handle) return false; - ret = acpi_evaluate_object(handle, "_DSM", , ); - if (ret) { - nv_debug(mxm, "DSM MXMS failed: %d\n", ret); + status = acpi_evaluate_object(handle, "_DSM", , ); + if (ACPI_FAILURE(status)) { + acpi_get_name(handle, ACPI_FULL_PATHNAME, ); + nv_debug(mxm, "DSM MXMS failed for %s: exit status %u\n", + (char *)string.pointer, + (unsigned int)status); + kfree(string.pointer); return false; } diff
Re: Internal error: Oops: 17 [#1] ARM
On Wed, Jan 22, 2014 at 07:28:55PM -0800, John Tobias wrote: >Hi Liming, > >Yes, I am using 4.8.1. I switched back to 4.7.3 and will test it again >if I can re-produce it. Hi, Or you can use the latest linaro 4.8.x toolchain, which has been applied that patch: http://releases.linaro.org/13.12/components/toolchain/binaries/ Please select this one to try: gcc-linaro-arm-linux-gnueabihf-4.8-2013.12_linux.tar.bz2 Liming Wang > >Regards, > >john > >On Wed, Jan 22, 2014 at 7:01 PM, walimis wrote: >> On Wed, Jan 22, 2014 at 08:23:36AM -0800, John Tobias wrote: >>>Hello all, >>> >>>I am using 3.13-rc1 kernel on iMX6SL processor. My filesystem is in >>>eMMC running SDR50. >>>Is anyone here encountered these problem and if there's any existing >>>patch that I can get?. >> hi, >> >> Do you use gcc 4.8.1? If so, maybe you should look at following link >> to see whether it's a similar issue. >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854 >> >> Liming Wang >> >>> >>>Regards, >>> >>>john >>> >>>[ 1552.394899] Unable to handle kernel NULL pointer dereference at >>>virtual address 0037 >>>[ 1552.403034] pgd = beef4000 >>>[ 1552.405855] [0037] *pgd=bef60831, *pte=, *ppte= >>>[ 1552.412245] Internal error: Oops: 17 [#1] ARM >>>[ 1552.416627] Modules linked in: bt8xxx(O) sd8xxx(O) mlan(O) >>>[ 1552.422249] CPU: 0 PID: 232 Comm: commsd Tainted: G O >>>3.13.0-rc1 #7 >>>[ 1552.429409] task: bfbc7500 ti: bec96000 task.ti: bec96000 >>>[ 1552.434844] PC is at lookup_fast+0x5c/0x318 >>>[ 1552.439067] LR is at mark_held_locks+0x78/0x13c >>>[ 1552.443622] pc : [<80101184>]lr : [<80056e48>]psr: a00f0013 >>>[ 1552.443622] sp : bec97d88 ip : 00666e6f fp : bec97ddc >>>[ 1552.455124] r10: r9 : bec97e08 r8 : 80102d94 >>>[ 1552.460370] r7 : bec97e60 r6 : bf133ac8 r5 : bec97e60 r4 : bec97e00 >>>[ 1552.466918] r3 : bee4f01d r2 : r1 : r0 : >>>[ 1552.473471] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment >>>user >>>[ 1552.480629] Control: 10c53c7d Table: beef4059 DAC: 0015 >>>[ 1552.486397] Process commsd (pid: 232, stack limit = 0xbec96238) >>>[ 1552.492341] Stack: (0xbec97d88 to 0xbec98000) >>>[ 1552.496728] 7d80: 80102b94 80057108 bfb95310 >>>bf133ac8 bf15f4e8 bfb95310 >>>[ 1552.504936] 7da0: c08bb14d bee4f015 0008 bfbc7500 >>>bec97e08 0041 >>>[ 1552.513142] 7dc0: bec97e60 bec96020 bec96000 bec97e3c >>>bec97de0 80102d94 80101134 >>>[ 1552.521347] 7de0: bec97df8 800d982c bec96018 0010 >>>bec97e00 bec97e08 >>>[ 1552.529553] 7e00: 8026e25c 800d97e8 bee4f000 0ff0 80d4e3a4 >>>0001 bee4f000 bec97e60 >>>[ 1552.537758] 7e20: ff9c ff9c bec96000 bec97e5c >>>bec97e40 801033bc 80102c70 >>>[ 1552.545964] 7e40: bee4f000 0001 bec97e60 bec97f00 bec97ee4 >>>bec97e60 80105dc0 80103398 >>>[ 1552.554170] 7e60: bfb95310 bf133ac8 c08bb14d 000b bee4f015 >>>8005992c bfb95310 bf133398 >>>[ 1552.562375] 7e80: bf15f4e8 0041 0002 008a >>> 600f0013 bec96000 >>>[ 1552.570581] 7ea0: ffea bf8c1840 807b4430 80115444 801156e4 >>>8011563c 0008 >>>[ 1552.578786] 7ec0: bec97f04 733fe4e0 0001 ff9c 757e3810 >>>bec97f40 bec97efc bec97ee8 >>>[ 1552.586991] 7ee0: 80105e0c 80105d68 bc950fe0 bec97f2c >>>bec97f00 800faf64 80105df4 >>>[ 1552.595196] 7f00: 801156e4 80115420 bec97f54 733fe4e0 733ff8f0 >>>733fe61c 00c3 8000f504 >>>[ 1552.603402] 7f20: bec97f3c bec97f30 800fafe0 800faf1c bec97fa4 >>>bec97f40 800fb71c 800fafc4 >>>[ 1552.611608] 7f40: 8000f310 bfbc7500 733fe550 8000f458 733fe61c >>>00c3 bec97f84 bec97f68 >>>[ 1552.619813] 7f60: 80056f28 800a0270 733fe550 733ff8f0 733fe61c >>>00c3 bec97f94 bec97f88 >>>[ 1552.628019] 7f80: 80057110 80056f18 bec97f98 8000f458 >>>733fe550 bec97fa8 >>>[ 1552.636226] 7fa0: 8000f280 800fb704 733fe550 733ff8f0 757e3810 >>>733fe4e0 733fe550 0003 >>>[ 1552.644431] 7fc0: 733fe550 733ff8f0 733fe61c 00c3 >>>0002 733fe92c 0200 >>>[ 1552.652636] 7fe0: 00c3 733fe4d8 7579b7e5 7572e276 200f0030 >>>757e3810 bfffd821 bfffdc21 >>>[ 1552.660828] Backtrace: >>>[ 1552.663343] [<80101128>] (lookup_fast+0x0/0x318) from [<80102d94>] >>>(path_lookupat+0x130/0x728) >>>[ 1552.671994] [<80102c64>] (path_lookupat+0x0/0x728) from >>>[<801033bc>] (filename_lookup.isra.40+0x30/0x70) >>>[ 1552.681515] [<8010338c>] (filename_lookup.isra.40+0x0/0x70) from >>>[<80105dc0>] (user_path_at_empty+0x64/0x8c) >>>[ 1552.691361] r7:bec97f00 r6:bec97e60 r5:0001 r4:bee4f000 >>>[ 1552.697163] [<80105d5c>] (user_path_at_empty+0x0/0x8c) from >>>[<80105e0c>] (user_path_at+0x24/0x2c) >>>[ 1552.706053] r8:bec97f40 r7:757e3810 r6:ff9c r5:0001 r4:733fe4e0 >>>[ 1552.712927] [<80105de8>] (user_path_at+0x0/0x2c) from [<800faf64>] >>>(vfs_fstatat+0x54/0xa8) >>>[ 1552.721232] [<800faf10>] (vfs_fstatat+0x0/0xa8) from [<800fafe0>]
Re: [PATCH] tracing: Use task_nice() in function __update_max_tr() to get the nice value of task.
On Wed, 22 Jan 2014 17:41:45 -0500 Dongsheng Yang wrote: > There is already a function named task_nice in sched.h to get the nice value > of task_struct. We can use it in __update_max_tr() rather than calculate it > manually. > > Signed-off-by: Dongsheng Yang > --- > kernel/trace/trace.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index 9d20cd9..ec149b4 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -970,7 +970,7 @@ __update_max_tr(struct trace_array *tr, struct > task_struct *tsk, int cpu) > else > max_data->uid = task_uid(tsk); > > - max_data->nice = tsk->static_prio - 20 - MAX_RT_PRIO; > + max_data->nice = task_nice(tsk); Except that's a function call in a critical path. Switch it to TASK_NICE(), and I'll take the patch. Thanks, -- Steve > max_data->policy = tsk->policy; > max_data->rt_priority = tsk->rt_priority; > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tracing: Use task_nice() in function __update_max_tr() to get the nice value of task.
On Wed, 22 Jan 2014 22:56:32 -0500 Steven Rostedt wrote: > On Wed, 22 Jan 2014 17:41:45 -0500 > Dongsheng Yang wrote: > > > There is already a function named task_nice in sched.h to get the nice value > > of task_struct. We can use it in __update_max_tr() rather than calculate it > > manually. > > > > Signed-off-by: Dongsheng Yang > > --- > > kernel/trace/trace.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > > index 9d20cd9..ec149b4 100644 > > --- a/kernel/trace/trace.c > > +++ b/kernel/trace/trace.c > > @@ -970,7 +970,7 @@ __update_max_tr(struct trace_array *tr, struct > > task_struct *tsk, int cpu) > > else > > max_data->uid = task_uid(tsk); > > > > - max_data->nice = tsk->static_prio - 20 - MAX_RT_PRIO; > > + max_data->nice = task_nice(tsk); > > Except that's a function call in a critical path. Switch it to > TASK_NICE(), and I'll take the patch. Bah, I just noticed that TASK_NICE is in kernel/sched/sched.h not include/linux/sched.h Peter, is there a reason that task_nice() is not a static inline in sched.h and have these macros there too? They only reference fields in task_struct that are already defined there. I don't see why they need to be private to kernel/sched. -- Steve > > Thanks, > > -- Steve > > > max_data->policy = tsk->policy; > > max_data->rt_priority = tsk->rt_priority; > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tracing: Use task_nice() in function __update_max_tr() to get the nice value of task.
On 01/22/2014 11:00 PM, Steven Rostedt wrote: Bah, I just noticed that TASK_NICE is in kernel/sched/sched.h not include/linux/sched.h Peter, is there a reason that task_nice() is not a static inline in sched.h and have these macros there too? They only reference fields in task_struct that are already defined there. I don't see why they need to be private to kernel/sched. Agree. These macros are useful to other modules out of kernel/sched. But they are private to kernel/sched currently. If we move them to include/linux/sched.h, I will use TASK_NICE in this patch. -- Steve Thanks, -- Steve max_data->policy = tsk->policy; max_data->rt_priority = tsk->rt_priority; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Internal error: Oops: 17 [#1] ARM
Thanks! I will try it tomorrow. Regards, John Sent from my iPhone > On Jan 22, 2014, at 7:46 PM, walimis wrote: > >> On Wed, Jan 22, 2014 at 07:28:55PM -0800, John Tobias wrote: >> Hi Liming, >> >> Yes, I am using 4.8.1. I switched back to 4.7.3 and will test it again >> if I can re-produce it. > > Hi, > > Or you can use the latest linaro 4.8.x toolchain, which has been applied that > patch: > > http://releases.linaro.org/13.12/components/toolchain/binaries/ > > Please select this one to try: > > gcc-linaro-arm-linux-gnueabihf-4.8-2013.12_linux.tar.bz2 > > Liming Wang >> >> Regards, >> >> john >> >>> On Wed, Jan 22, 2014 at 7:01 PM, walimis wrote: On Wed, Jan 22, 2014 at 08:23:36AM -0800, John Tobias wrote: Hello all, I am using 3.13-rc1 kernel on iMX6SL processor. My filesystem is in eMMC running SDR50. Is anyone here encountered these problem and if there's any existing patch that I can get?. >>> hi, >>> >>> Do you use gcc 4.8.1? If so, maybe you should look at following link >>> to see whether it's a similar issue. >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854 >>> >>> Liming Wang >>> Regards, john [ 1552.394899] Unable to handle kernel NULL pointer dereference at virtual address 0037 [ 1552.403034] pgd = beef4000 [ 1552.405855] [0037] *pgd=bef60831, *pte=, *ppte= [ 1552.412245] Internal error: Oops: 17 [#1] ARM [ 1552.416627] Modules linked in: bt8xxx(O) sd8xxx(O) mlan(O) [ 1552.422249] CPU: 0 PID: 232 Comm: commsd Tainted: G O 3.13.0-rc1 #7 [ 1552.429409] task: bfbc7500 ti: bec96000 task.ti: bec96000 [ 1552.434844] PC is at lookup_fast+0x5c/0x318 [ 1552.439067] LR is at mark_held_locks+0x78/0x13c [ 1552.443622] pc : [<80101184>]lr : [<80056e48>]psr: a00f0013 [ 1552.443622] sp : bec97d88 ip : 00666e6f fp : bec97ddc [ 1552.455124] r10: r9 : bec97e08 r8 : 80102d94 [ 1552.460370] r7 : bec97e60 r6 : bf133ac8 r5 : bec97e60 r4 : bec97e00 [ 1552.466918] r3 : bee4f01d r2 : r1 : r0 : [ 1552.473471] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 1552.480629] Control: 10c53c7d Table: beef4059 DAC: 0015 [ 1552.486397] Process commsd (pid: 232, stack limit = 0xbec96238) [ 1552.492341] Stack: (0xbec97d88 to 0xbec98000) [ 1552.496728] 7d80: 80102b94 80057108 bfb95310 bf133ac8 bf15f4e8 bfb95310 [ 1552.504936] 7da0: c08bb14d bee4f015 0008 bfbc7500 bec97e08 0041 [ 1552.513142] 7dc0: bec97e60 bec96020 bec96000 bec97e3c bec97de0 80102d94 80101134 [ 1552.521347] 7de0: bec97df8 800d982c bec96018 0010 bec97e00 bec97e08 [ 1552.529553] 7e00: 8026e25c 800d97e8 bee4f000 0ff0 80d4e3a4 0001 bee4f000 bec97e60 [ 1552.537758] 7e20: ff9c ff9c bec96000 bec97e5c bec97e40 801033bc 80102c70 [ 1552.545964] 7e40: bee4f000 0001 bec97e60 bec97f00 bec97ee4 bec97e60 80105dc0 80103398 [ 1552.554170] 7e60: bfb95310 bf133ac8 c08bb14d 000b bee4f015 8005992c bfb95310 bf133398 [ 1552.562375] 7e80: bf15f4e8 0041 0002 008a 600f0013 bec96000 [ 1552.570581] 7ea0: ffea bf8c1840 807b4430 80115444 801156e4 8011563c 0008 [ 1552.578786] 7ec0: bec97f04 733fe4e0 0001 ff9c 757e3810 bec97f40 bec97efc bec97ee8 [ 1552.586991] 7ee0: 80105e0c 80105d68 bc950fe0 bec97f2c bec97f00 800faf64 80105df4 [ 1552.595196] 7f00: 801156e4 80115420 bec97f54 733fe4e0 733ff8f0 733fe61c 00c3 8000f504 [ 1552.603402] 7f20: bec97f3c bec97f30 800fafe0 800faf1c bec97fa4 bec97f40 800fb71c 800fafc4 [ 1552.611608] 7f40: 8000f310 bfbc7500 733fe550 8000f458 733fe61c 00c3 bec97f84 bec97f68 [ 1552.619813] 7f60: 80056f28 800a0270 733fe550 733ff8f0 733fe61c 00c3 bec97f94 bec97f88 [ 1552.628019] 7f80: 80057110 80056f18 bec97f98 8000f458 733fe550 bec97fa8 [ 1552.636226] 7fa0: 8000f280 800fb704 733fe550 733ff8f0 757e3810 733fe4e0 733fe550 0003 [ 1552.644431] 7fc0: 733fe550 733ff8f0 733fe61c 00c3 0002 733fe92c 0200 [ 1552.652636] 7fe0: 00c3 733fe4d8 7579b7e5 7572e276 200f0030 757e3810 bfffd821 bfffdc21 [ 1552.660828] Backtrace: [ 1552.663343] [<80101128>] (lookup_fast+0x0/0x318) from [<80102d94>] (path_lookupat+0x130/0x728) [ 1552.671994] [<80102c64>] (path_lookupat+0x0/0x728) from [<801033bc>] (filename_lookup.isra.40+0x30/0x70) [ 1552.681515] [<8010338c>] (filename_lookup.isra.40+0x0/0x70) from [<80105dc0>] (user_path_at_empty+0x64/0x8c) [ 1552.691361] r7:bec97f00 r6:bec97e60 r5:0001 r4:bee4f000 [ 1552.697163]
linux-next: manual merge of the userns tree with the mips tree
Hi Eric, Today's linux-next merge of the userns tree got conflicts in arch/mips/include/asm/vpe.h and arch/mips/kernel/vpe.c between commits 1a2a6d7e8816 ("MIPS: APRP: Split VPE loader into separate files") and 5792bf643865 ("MIPS: APRP: Code formatting clean-ups") from the mips tree and commit f58437f1f916 ("MIPS: VPE: Remove vpe_getuid and vpe_getgid") from the userns tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc arch/mips/include/asm/vpe.h index e0684f5f0054,0880fe8809b1.. --- a/arch/mips/include/asm/vpe.h +++ b/arch/mips/include/asm/vpe.h @@@ -9,88 -18,7 +9,87 @@@ #ifndef _ASM_VPE_H #define _ASM_VPE_H +#include +#include +#include +#include + +#define VPE_MODULE_NAME "vpe" +#define VPE_MODULE_MINOR 1 + +/* grab the likely amount of memory we will need. */ +#ifdef CONFIG_MIPS_VPE_LOADER_TOM +#define P_SIZE (2 * 1024 * 1024) +#else +/* add an overhead to the max kmalloc size for non-striped symbols/etc */ +#define P_SIZE (256 * 1024) +#endif + +#define MAX_VPES 16 +#define VPE_PATH_MAX 256 + +static inline int aprp_cpu_index(void) +{ +#ifdef CONFIG_MIPS_CMP + return setup_max_cpus; +#else + extern int tclimit; + return tclimit; +#endif +} + +enum vpe_state { + VPE_STATE_UNUSED = 0, + VPE_STATE_INUSE, + VPE_STATE_RUNNING +}; + +enum tc_state { + TC_STATE_UNUSED = 0, + TC_STATE_INUSE, + TC_STATE_RUNNING, + TC_STATE_DYNAMIC +}; + +struct vpe { + enum vpe_state state; + + /* (device) minor associated with this vpe */ + int minor; + + /* elfloader stuff */ + void *load_addr; + unsigned long len; + char *pbuffer; + unsigned long plen; - unsigned int uid, gid; + char cwd[VPE_PATH_MAX]; + + unsigned long __start; + + /* tc's associated with this vpe */ + struct list_head tc; + + /* The list of vpe's */ + struct list_head list; + + /* shared symbol address */ + void *shared_ptr; + + /* the list of who wants to know when something major happens */ + struct list_head notify; + + unsigned int ntcs; +}; + +struct tc { + enum tc_state state; + int index; + + struct vpe *pvpe; /* parent VPE */ + struct list_head tc;/* The list of TC's with this VPE */ + struct list_head list; /* The global list of tc's */ +}; + struct vpe_notifications { void (*start)(int vpe); void (*stop)(int vpe); @@@ -98,36 -26,10 +97,34 @@@ struct list_head list; }; +struct vpe_control { + spinlock_t vpe_list_lock; + struct list_head vpe_list; /* Virtual processing elements */ + spinlock_t tc_list_lock; + struct list_head tc_list; /* Thread contexts */ +}; + +extern unsigned long physical_memsize; +extern struct vpe_control vpecontrol; +extern const struct file_operations vpe_fops; + +int vpe_notify(int index, struct vpe_notifications *notify); + +void *vpe_get_shared(int index); - int vpe_getuid(int index); - int vpe_getgid(int index); +char *vpe_getcwd(int index); + +struct vpe *get_vpe(int minor); +struct tc *get_tc(int index); +struct vpe *alloc_vpe(int minor); +struct tc *alloc_tc(int index); +void release_vpe(struct vpe *v); -extern int vpe_notify(int index, struct vpe_notifications *notify); +void *alloc_progmem(unsigned long len); +void release_progmem(void *ptr); -extern void *vpe_get_shared(int index); -extern char *vpe_getcwd(int index); +int __weak vpe_run(struct vpe *v); +void cleanup_tc(struct tc *tc); +int __init vpe_module_init(void); +void __exit vpe_module_exit(void); #endif /* _ASM_VPE_H */ diff --cc arch/mips/kernel/vpe.c index 42d3ca08bd28,2d5c142bad67.. --- a/arch/mips/kernel/vpe.c +++ b/arch/mips/kernel/vpe.c @@@ -899,35 -1262,14 +896,13 @@@ void *vpe_get_shared(int index return v->shared_ptr; } - EXPORT_SYMBOL(vpe_get_shared); - int vpe_getuid(int index) - { - struct vpe *v = get_vpe(index); - - if (v == NULL) - return -1; - - return v->uid; - } - EXPORT_SYMBOL(vpe_getuid); - - int vpe_getgid(int index) - { - struct vpe *v = get_vpe(index); - - if (v == NULL) - return -1; - - return v->gid; - } - EXPORT_SYMBOL(vpe_getgid); - int vpe_notify(int index, struct vpe_notifications *notify) { - struct vpe *v; + struct vpe *v = get_vpe(index); - if ((v = get_vpe(index)) == NULL) + if (v == NULL) return -1; list_add(>list, >notify); pgpclBr1_jBTi.pgp Description: PGP signature
Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock
"Srivatsa S. Bhat" writes: > On 01/22/2014 02:00 PM, Srivatsa S. Bhat wrote: >> Hi Paul, I find an old patch for register_allcpu_notifier(), but the "bool replay_history" should be eliminated (always true): it's too weird. Then we should get rid of register_cpu_notifier, or at least hide it. Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression on next-20140116 [Was: [PATCH 3/3 v4] usb: chipidea: hw_phymode_configure moved before ci_usb_phy_init]
On Thursday, January 23, 2014 09:22 AM, Peter Chen wrote: On Wed, Jan 22, 2014 at 10:41:33PM +0100, Uwe Kleine-König wrote: Hello, On Wed, Jan 22, 2014 at 10:49:51AM +0100, Uwe Kleine-König wrote: On Tue, Dec 03, 2013 at 04:01:50PM +0800, Chris Ruehl wrote: usb: chipidea: hw_phymode_configure moved before ci_usb_phy_init hw_phymode_configure configures the PORTSC registers and allow the following phy_inits to operate on the right parameters. This fix a problem where the UPLI (ISP1504) could not detected, because the Viewport was not available and read the viewport return 0's only. This patch (or a later revision of it to be more exact) made it into mainline as cd0b42c2a6d2. On an i.MX27 based machine I'm hitting an oops (see below) on next-20140116 + a few patches. (I didn't switch to 3.13+ yet, as I think not everything I need has landed there.) The oops goes away (and still better, lsusb reports my connected devices instead of "unable to initialize libusb: -99") when I do at least one of the following: - set CONFIG_USB_CHIPIDEA=y instead of =m - revert commit cd0b42c2a6d2 (usb: chipidea: put hw_phymode_configure before ci_usb_phy_init) I debugged that a bit further and the problem is that hw_phymode_configure depends on the phy's clk being enabled (i.e. usb_ipg_gate) and this is only enforced in ci_usb_phy_init (via usb_phy_init -> usb_gen_phy_init). When CONFIG_USB_CHIPIDEA=y the init call to disable all unused clocks wasn't run yet and so the clock is still on as this is the boot default. Hi Uwe, I am a little puzzled at your platform - Which phy you have used? ulpi phy ,internal phy or other external phy? - If you use ulpi phy, why you still need to use nop phy driver? Besides, according to chris patch, the ulpi can only be visited after hw_phymode_configure? - Do you have some hardware related operation at phy's probe? If it exists, why not move it to phy->init? Peter Peter, I think thats my fault, I send Uwe my patches which call the phy-ulpi from the nop driver in order to get the ISP1504 running with my board. Its obversely wrong to call an other driver from the nop see: [PATCH 3/3] usb: phy-generic: Add ULPI VBUS support and the concerns from Heikki (mail-list linux-usb) Uwe we may work together on this. Chris Considering that it's already late today and that I don't know the chipidea driver I'm sure there are people who can come up with a better patch with less effort than me. Any volunteers? Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König| Industrial Linux Solutions | http://www.pengutronix.de/ | -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 5/9] mm + fs: prepare for non-page entries in page cache radix trees
Hi Hannes, On Wed, Jan 22, 2014 at 12:47:44PM -0500, Johannes Weiner wrote: > On Mon, Jan 13, 2014 at 11:01:32AM +0900, Minchan Kim wrote: > > On Fri, Jan 10, 2014 at 01:10:39PM -0500, Johannes Weiner wrote: > > > shmem mappings already contain exceptional entries where swap slot > > > information is remembered. > > > > > > To be able to store eviction information for regular page cache, > > > prepare every site dealing with the radix trees directly to handle > > > entries other than pages. > > > > > > The common lookup functions will filter out non-page entries and > > > return NULL for page cache holes, just as before. But provide a raw > > > version of the API which returns non-page entries as well, and switch > > > shmem over to use it. > > > > > > Signed-off-by: Johannes Weiner > > Reviewed-by: Minchan Kim > > Thanks, Minchan! > > > > @@ -890,6 +973,73 @@ repeat: > > > EXPORT_SYMBOL(find_or_create_page); > > > > > > /** > > > + * __find_get_pages - gang pagecache lookup > > > + * @mapping: The address_space to search > > > + * @start: The starting page index > > > + * @nr_pages:The maximum number of pages > > > + * @pages: Where the resulting pages are placed > > > > where is @indices? > > Fixed :) > > > > @@ -894,6 +894,53 @@ EXPORT_SYMBOL(__pagevec_lru_add); > > > > > > /** > > > * pagevec_lookup - gang pagecache lookup > > > > __pagevec_lookup? > > > > > + * @pvec:Where the resulting entries are placed > > > + * @mapping: The address_space to search > > > + * @start: The starting entry index > > > + * @nr_pages:The maximum number of entries > > > > missing @indices? > > > > > + * > > > + * pagevec_lookup() will search for and return a group of up to > > > + * @nr_pages pages and shadow entries in the mapping. All entries are > > > + * placed in @pvec. pagevec_lookup() takes a reference against actual > > > + * pages in @pvec. > > > + * > > > + * The search returns a group of mapping-contiguous entries with > > > + * ascending indexes. There may be holes in the indices due to > > > + * not-present entries. > > > + * > > > + * pagevec_lookup() returns the number of entries which were found. > > > > __pagevec_lookup > > Yikes, all three fixed. > > > > @@ -22,6 +22,22 @@ > > > #include > > > #include "internal.h" > > > > > > +static void clear_exceptional_entry(struct address_space *mapping, > > > + pgoff_t index, void *entry) > > > +{ > > > + /* Handled by shmem itself */ > > > + if (shmem_mapping(mapping)) > > > + return; > > > + > > > + spin_lock_irq(>tree_lock); > > > + /* > > > + * Regular page slots are stabilized by the page lock even > > > + * without the tree itself locked. These unlocked entries > > > + * need verification under the tree lock. > > > + */ > > > > Could you explain why repeated spin_lock with irq disabled isn't problem > > in truncation path? > > To modify the cache tree, we have to take the IRQ-safe tree_lock, this > is no different than removing a page (see truncate_complete_page). I meant we can do batch irq_[lock|unlock] part with periodic irq release because clear_exceptional_entry is always called with gang pagecache lookup. Just a comment about optimiztation so it shouldn't be critical for merging and we could do in future if it's really problem for scalability. -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 9/9] mm: keep page cache radix tree nodes in check
On Wed, Jan 22, 2014 at 01:42:17PM -0500, Johannes Weiner wrote: > On Mon, Jan 13, 2014 at 04:39:47PM +0900, Minchan Kim wrote: > > On Fri, Jan 10, 2014 at 01:10:43PM -0500, Johannes Weiner wrote: > > > Previously, page cache radix tree nodes were freed after reclaim > > > emptied out their page pointers. But now reclaim stores shadow > > > entries in their place, which are only reclaimed when the inodes > > > themselves are reclaimed. This is problematic for bigger files that > > > are still in use after they have a significant amount of their cache > > > reclaimed, without any of those pages actually refaulting. The shadow > > > entries will just sit there and waste memory. In the worst case, the > > > shadow entries will accumulate until the machine runs out of memory. > > > > > > To get this under control, the VM will track radix tree nodes > > > exclusively containing shadow entries on a per-NUMA node list. > > > Per-NUMA rather than global because we expect the radix tree nodes > > > themselves to be allocated node-locally and we want to reduce > > > cross-node references of otherwise independent cache workloads. A > > > simple shrinker will then reclaim these nodes on memory pressure. > > > > > > A few things need to be stored in the radix tree node to implement the > > > shadow node LRU and allow tree deletions coming from the list: > > > > > > 1. There is no index available that would describe the reverse path > > >from the node up to the tree root, which is needed to perform a > > >deletion. To solve this, encode in each node its offset inside the > > >parent. This can be stored in the unused upper bits of the same > > >member that stores the node's height at no extra space cost. > > > > > > 2. The number of shadow entries needs to be counted in addition to the > > >regular entries, to quickly detect when the node is ready to go to > > >the shadow node LRU list. The current entry count is an unsigned > > >int but the maximum number of entries is 64, so a shadow counter > > >can easily be stored in the unused upper bits. > > > > > > 3. Tree modification needs tree lock and tree root, which are located > > >in the address space, so store an address_space backpointer in the > > >node. The parent pointer of the node is in a union with the 2-word > > >rcu_head, so the backpointer comes at no extra cost as well. > > > > > > 4. The node needs to be linked to an LRU list, which requires a list > > >head inside the node. This does increase the size of the node, but > > >it does not change the number of objects that fit into a slab page. > > > > > > Signed-off-by: Johannes Weiner > > > --- > > > include/linux/list_lru.h | 2 + > > > include/linux/mmzone.h | 1 + > > > include/linux/radix-tree.h | 32 +--- > > > include/linux/swap.h | 1 + > > > lib/radix-tree.c | 36 -- > > > mm/filemap.c | 77 +++-- > > > mm/list_lru.c | 8 +++ > > > mm/truncate.c | 20 +++- > > > mm/vmstat.c| 1 + > > > mm/workingset.c| 121 > > > + > > > 10 files changed, 259 insertions(+), 40 deletions(-) > > > > > > diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h > > > index 3ce541753c88..b02fc233eadd 100644 > > > --- a/include/linux/list_lru.h > > > +++ b/include/linux/list_lru.h > > > @@ -13,6 +13,8 @@ > > > /* list_lru_walk_cb has to always return one of those */ > > > enum lru_status { > > > LRU_REMOVED,/* item removed from list */ > > > + LRU_REMOVED_RETRY, /* item removed, but lock has been > > > +dropped and reacquired */ > > > LRU_ROTATE, /* item referenced, give another pass */ > > > LRU_SKIP, /* item cannot be locked, skip */ > > > LRU_RETRY, /* item not freeable. May drop the lock > > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > > > index 118ba9f51e86..8cac5a7ef7a7 100644 > > > --- a/include/linux/mmzone.h > > > +++ b/include/linux/mmzone.h > > > @@ -144,6 +144,7 @@ enum zone_stat_item { > > > #endif > > > WORKINGSET_REFAULT, > > > WORKINGSET_ACTIVATE, > > > + WORKINGSET_NODERECLAIM, > > > NR_ANON_TRANSPARENT_HUGEPAGES, > > > NR_FREE_CMA_PAGES, > > > NR_VM_ZONE_STAT_ITEMS }; > > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h > > > index 13636c40bc42..33170dbd9db4 100644 > > > --- a/include/linux/radix-tree.h > > > +++ b/include/linux/radix-tree.h > > > @@ -72,21 +72,37 @@ static inline int radix_tree_is_indirect_ptr(void > > > *ptr) > > > #define RADIX_TREE_TAG_LONGS \ > > > ((RADIX_TREE_MAP_SIZE + BITS_PER_LONG - 1) / BITS_PER_LONG) > > > > > > +#define RADIX_TREE_INDEX_BITS (8 /* CHAR_BIT */ * sizeof(unsigned long)) > > > +#define RADIX_TREE_MAX_PATH
Re: [PATCH] [media] s5p-mfc: Add Horizontal and Vertical search range for Video Macro Blocks
Hi All, Is there any review Comments for the patch "[PATCH] [media] s5p-mfc: Add Horizontal and Vertical search range for Video Macro Blocks" posted on 30-Dec-2013 ? Regards, Swaminathan -- From: "Amit Grover" Sent: Monday, December 30, 2013 4:13 PM To: ; ; ; ; ; ; ; ; ; Cc: ; ; ; ; ; ; ; "Swami Nathan" Subject: [PATCH] [media] s5p-mfc: Add Horizontal and Vertical search range for Video Macro Blocks This patch adds Controls to set Horizontal and Vertical search range for Motion Estimation block for Samsung MFC video Encoders. Signed-off-by: Swami Nathan Signed-off-by: Amit Grover --- Documentation/DocBook/media/v4l/controls.xml| 14 + drivers/media/platform/s5p-mfc/s5p_mfc_common.h |2 ++ drivers/media/platform/s5p-mfc/s5p_mfc_enc.c| 24 +++ drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c |8 ++-- drivers/media/v4l2-core/v4l2-ctrls.c| 14 + include/uapi/linux/v4l2-controls.h |2 ++ 6 files changed, 58 insertions(+), 6 deletions(-) diff --git a/Documentation/DocBook/media/v4l/controls.xml b/Documentation/DocBook/media/v4l/controls.xml index 7a3b49b..70a0f6f 100644 --- a/Documentation/DocBook/media/v4l/controls.xml +++ b/Documentation/DocBook/media/v4l/controls.xml @@ -2258,6 +2258,20 @@ Applicable to the MPEG1, MPEG2, MPEG4 encoders. VBV buffer control. + + + spanname="id">V4L2_CID_MPEG_VIDEO_HORZ_SEARCH_RANGE + integer + Sets the Horizontal search range for Video Macro blocks. + + + + + spanname="id">V4L2_CID_MPEG_VIDEO_VERT_SEARCH_RANGE + integer + Sets the Vertical search range for Video Macro blocks. + + spanname="id">V4L2_CID_MPEG_VIDEO_H264_CPB_SIZE diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc_common.h b/drivers/media/platform/s5p-mfc/s5p_mfc_common.h index 6920b54..f2c13c3 100644 --- a/drivers/media/platform/s5p-mfc/s5p_mfc_common.h +++ b/drivers/media/platform/s5p-mfc/s5p_mfc_common.h @@ -430,6 +430,8 @@ struct s5p_mfc_vp8_enc_params { struct s5p_mfc_enc_params { u16 width; u16 height; + u32 horz_range; + u32 vert_range; u16 gop_size; enum v4l2_mpeg_video_multi_slice_mode slice_mode; diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc_enc.c b/drivers/media/platform/s5p-mfc/s5p_mfc_enc.c index 4ff3b6c..a02e7b8 100644 --- a/drivers/media/platform/s5p-mfc/s5p_mfc_enc.c +++ b/drivers/media/platform/s5p-mfc/s5p_mfc_enc.c @@ -208,6 +208,24 @@ static struct mfc_control controls[] = { .default_value = 0, }, { + .id = V4L2_CID_MPEG_VIDEO_HORZ_SEARCH_RANGE, + .type = V4L2_CTRL_TYPE_INTEGER, + .name = "horizontal search range of video macro block", + .minimum = 16, + .maximum = 128, + .step = 16, + .default_value = 32, + }, + { + .id = V4L2_CID_MPEG_VIDEO_VERT_SEARCH_RANGE, + .type = V4L2_CTRL_TYPE_INTEGER, + .name = "vertical search range of video macro block", + .minimum = 16, + .maximum = 128, + .step = 16, + .default_value = 32, + }, + { .id = V4L2_CID_MPEG_VIDEO_H264_CPB_SIZE, .type = V4L2_CTRL_TYPE_INTEGER, .minimum = 0, @@ -1377,6 +1395,12 @@ static int s5p_mfc_enc_s_ctrl(struct v4l2_ctrl *ctrl) case V4L2_CID_MPEG_VIDEO_VBV_SIZE: p->vbv_size = ctrl->val; break; + case V4L2_CID_MPEG_VIDEO_HORZ_SEARCH_RANGE: + p->horz_range = ctrl->val; + break; + case V4L2_CID_MPEG_VIDEO_VERT_SEARCH_RANGE: + p->vert_range = ctrl->val; + break; case V4L2_CID_MPEG_VIDEO_H264_CPB_SIZE: p->codec.h264.cpb_size = ctrl->val; break; diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c b/drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c index 461358c..47e1807 100644 --- a/drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c +++ b/drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c @@ -727,14 +727,10 @@ static int s5p_mfc_set_enc_params(struct s5p_mfc_ctx *ctx) WRITEL(reg, S5P_FIMV_E_RC_CONFIG_V6); /* setting for MV range [16, 256] */ - reg = 0; - reg &= ~(0x3FFF); - reg = 256; + reg = (p->horz_range & 0x3fff); /* conditional check in app */ WRITEL(reg, S5P_FIMV_E_MV_HOR_RANGE_V6); - reg = 0; - reg &= ~(0x3FFF); - reg = 256; + reg = (p->vert_range & 0x3fff); /* conditional check in app */ WRITEL(reg, S5P_FIMV_E_MV_VER_RANGE_V6); WRITEL(0x0, S5P_FIMV_E_FRAME_INSERTION_V6); diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c b/drivers/media/v4l2-core/v4l2-ctrls.c index fb46790..7cf23d5 100644 --- a/drivers/media/v4l2-core/v4l2-ctrls.c +++ b/drivers/media/v4l2-core/v4l2-ctrls.c @@ -735,6 +735,8 @@ const char *v4l2_ctrl_get_name(u32 id) case V4L2_CID_MPEG_VIDEO_DEC_PTS: return "Video Decoder PTS"; case V4L2_CID_MPEG_VIDEO_DEC_FRAME: return "Video Decoder Frame Count"; case V4L2_CID_MPEG_VIDEO_VBV_DELAY: return "Initial Delay for VBV Control"; + case V4L2_CID_MPEG_VIDEO_HORZ_SEARCH_RANGE: return "hor search range of video MB"; + case V4L2_CID_MPEG_VIDEO_VERT_SEARCH_RANGE: return "vert search range of video MB"; case
Re: Kconfig errors
On Wed, Jan 22, 2014 at 5:56 PM, Russell King - ARM Linux wrote: > On Wed, Jan 22, 2014 at 05:54:29PM +0530, Prabhakar Lad wrote: >> Hi Russell, >> >> On Fri, Jan 17, 2014 at 1:07 PM, Prabhakar Lad >> wrote: >> > Hi, >> > >> > On Linux-next branch I see following errors for davinci_all_defconfig >> > & da8xx_omapl_defconfig configs, >> > >> > arch/arm/Kconfig:1966:error: recursive dependency detected! >> > arch/arm/Kconfig:1966:symbol ZBOOT_ROM depends on AUTO_ZRELADDR >> > arch/arm/Kconfig:2154:symbol AUTO_ZRELADDR is selected by ZBOOT_ROM >> > # >> > # configuration written to .config >> > # >> > >> I am seeing this errors on linux-next, with your recent patch, >> "[PATCH] Fix select-induced Kconfig warning for ZBOOT_ROM" >> and strangely I see that AUTO_ZRELADDR doesnt select ZBOOT_ROM >> but still an error. >> >> Note: For the davinci configs CONFIG_AUTO_ZRELADDR is not set and >> CONFIG_ZBOOT_ROM_TEXT=0x0, CONFIG_ZBOOT_ROM_BSS=0x0 > > I've killed off the "select AUTO_ZRELADDR if !ZBOOT_ROM" in the IMX > Kconfig now, so when linux-next picks up my tree, that should be gone. > Thanks that helps. Regards, --Prabhakar Lad -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors - going beyond 4096 bytes
On Wed, Jan 22, 2014 at 06:46:11PM -0800, David Lang wrote: > It's extremely unlikely that drive manufacturers will produce drives > that won't work with any existing OS, so they are going to support > smaller writes in firmware. If they don't, they won't be able to > sell their drives to anyone running existing software. Given the > Enterprise software upgrade cycle compared to the expanding storage > needs, whatever they ship will have to work on OS and firmware > releases that happened several years ago. I've been talking to a number of HDD vendors, and while most of the discussions has been about SMR, the topic of 64k sectors did come up recently. In the opinion of at least one drive vendor, the pressure or 64k sectors will start increasing (roughly paraphrasing that vendor's engineer, "it's a matter of physics"), and it might not be surprising that in 2 or 3 years, we might start seing drives with 64k sectors. Like with 4k sector drives, it's likely that at least initial said drives will have an emulation mode where sub-64k writes will require a read-modify-write cycle. What I told that vendor was that if this were the case, he should seriously consider submitting a topic proposal to the LSF/MM, since if he wants those drives to be well supported, we need to start thinking about what changes might be necessary at the VM and FS layers now. So hopefully we'll see a topic proposal from that HDD vendor in the next couple of days. The bottom line is that I'm pretty well convinced that like SMR drives, 64k sector drives will be coming, and it's not something we can duck. It might not come as quickly as the HDD vendor community might like --- I remember attending an IDEMA conference in 2008 where they confidently predicted that 4k sector drives would be the default in 2 years, and it took a wee bit longer than that. But nevertheless, looking at the most likely roadmap and trajectory of hard drive technology, these are two things that will very likely be coming down the pike, and it would be best if we start thinking about how to engage with these changes constructively sooner rather than putting it off and then getting caught behind the eight-ball later. Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock
On 01/23/2014 07:59 AM, Rusty Russell wrote: > "Srivatsa S. Bhat" writes: >> On 01/22/2014 02:00 PM, Srivatsa S. Bhat wrote: >>> Hi Paul, > > I find an old patch for register_allcpu_notifier(), but the "bool > replay_history" should be eliminated (always true): it's too weird. > Sorry, I didn't get this part. Why do you say that replay_history will always be true? replay_history will be set to true whenever the caller wants to get notified of CPU_UP_PREPARE and CPU_ONLINE notifications for the already online CPUs, or wants to run a custom setup-routine of its own. And it will be false whenever the caller simply wants to just register the callback. Note that passing NULL for the setup-routine, by itself isn't enough to make a decision. NULL + replay_history == True will invoke the normal CPU_UP_PREPARE/CPU_ONLINE notifiers for the already online CPUs before registering the callback. NULL + replay_history == False will just register the callback and do nothing else. > Then we should get rid of register_cpu_notifier, or at least hide it. > Why? Isn't it easier to use (since you don't have to pass 2 additional parameters)? I see register_allcpu_notifier (or whatever better name we can give it), as an API for special cases where there is something more to be done than just registering the callback. And register_cpu_notifier will continue to be the API for the regular case when the caller wants to just register the callback. This latter case is the majority in the kernel. So I don't think eliminating the regular API would be a good idea. By the way, I'm still tempted to try out the simpler-looking alternative idea of exporting cpu_maps_update_begin() and cpu_maps_update_done() and then mandating that the callers do: cpu_maps_update_begin(); for_each_online_cpu(cpu) { ... } __register_cpu_notifier(); // this doesn't take the add_remove_lock cpu_maps_update_done(); I'm working on a patchset that does this and performs a tree-wide conversion. Please let me know if you have any objections to exporting cpu_maps_update_begin/done() in this manner. I thought I'd give this solution a try first, before going to the much fancier register_allcpu_notifier() method. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Is it ok for deferrable timer wakeup the idle cpu?
On Wed, Jan 22, 2014 at 10:07 PM, Thomas Gleixner wrote: > On Wed, 22 Jan 2014, Lei Wen wrote: >> Recently I want to do the experiment for cpu isolation over 3.10 kernel. >> But I find the isolated one is periodically waken up by IPI interrupt. >> >> By checking the trace, I find those IPI is generated by add_timer_on, >> which would calls wake_up_nohz_cpu, and wake up the already idle cpu. >> >> With further checking, I find this timer is added by on_demand governor of >> cpufreq. It would periodically check each cores' state. >> The problem I see here is cpufreq_governor using INIT_DEFERRABLE_WORK >> as the tool, while timer is made as deferrable anyway. >> And what is more that cpufreq checking is very frequent. In my case, the >> isolated cpu is wakenup by IPI every 5ms. >> >> So why kernel need to wake the remote processor when mount the deferrable >> timer? As per my understanding, we'd better keep cpu as idle when use >> the deferrable timer. > > Indeed, we can avoid the wakeup of the remote cpu when the timer is > deferrable. Glad to hear that we could fix this unwanted wakeup. Do you have related patches already? > > Though you really want to figure out why the cpufreq governor is > arming timers on other cores every 5ms. That smells like an utterly > stupid approach. Not sure why cpufreq choose such frequent profiling over each cpu. As my understanding, since kernel is smp, launching profiler over one cpu would be enough... Thanks, Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [media] s5p-mfc: Add Horizontal and Vertical search range for Video Macro Blocks
Hi Swaminathan, On Thu, Jan 23, 2014 at 10:49 AM, swaminathan wrote: > Hi All, > Is there any review Comments for the patch "[PATCH] [media] s5p-mfc: Add > Horizontal and Vertical search range for Video Macro Blocks" > posted on 30-Dec-2013 ? > > Just a side note, please don’t top post and always reply as plain text. [Snip] > Subject: [PATCH] [media] s5p-mfc: Add Horizontal and Vertical search range > for Video Macro Blocks > > >> This patch adds Controls to set Horizontal and Vertical search range >> for Motion Estimation block for Samsung MFC video Encoders. >> >> Signed-off-by: Swami Nathan >> Signed-off-by: Amit Grover >> --- >> Documentation/DocBook/media/v4l/controls.xml| 14 + >> drivers/media/platform/s5p-mfc/s5p_mfc_common.h |2 ++ >> drivers/media/platform/s5p-mfc/s5p_mfc_enc.c| 24 >> +++ >> drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c |8 ++-- >> drivers/media/v4l2-core/v4l2-ctrls.c| 14 + >> include/uapi/linux/v4l2-controls.h |2 ++ >> 6 files changed, 58 insertions(+), 6 deletions(-) >> This patch from the outset looks OK, but you need to split up into two, first adding a v4l control and second one using it up in the driver. Regards, --Prabhakar Lad -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] backlight: turn backlight on/off when necessary
On Wednesday, January 22, 2014 6:36 PM, Jani Nikula wrote: > On Mon, 20 Jan 2014, Liu Ying wrote: > > We don't have to turn backlight on/off everytime a blanking > > or unblanking event comes because the backlight status may > > have already been what we want. Another thought is that one > > backlight device may be shared by multiple framebuffers. We > > don't hope blanking one of the framebuffers may turn the > > backlight off for all the other framebuffers, since they are > > likely being active to display something. This patch adds > > some logics to record each framebuffer's backlight usage to > > determine the backlight device use count and whether the > > backlight should be turned on or off. To be more specific, > > only one unblank operation on a certain blanked framebuffer > > may increase the backlight device's use count by one, while > > one blank operation on a certain unblanked framebuffer may > > decrease the use count by one, because the userspace is > > likely to unblank a unblanked framebuffer or blank a blanked > > framebuffer. > > > > Signed-off-by: Liu Ying > > --- > > v1 can be found at https://lkml.org/lkml/2013/5/30/139 > > > > v1->v2: > > * Make the commit message be more specific about the condition > > in which backlight device use count can be increased/decreased. > > * Correct the setting for bd->props.fb_blank. > > > > drivers/video/backlight/backlight.c | 28 +--- > > include/linux/backlight.h |6 ++ > > 2 files changed, 27 insertions(+), 7 deletions(-) > > [.] > > Anything backlight worries me a little, and there are actually three > changes bundled into one patch here: > > 1. Changing bd->props.state and bd->props.fb_blank only when use_count >changes from 0->1 or 1->0. > > 2. Calling backlight_update_status() only with the above change, and not >on all notifier callbacks. > > 3. Setting bd->props.fb_blank always to either FB_BLANK_UNBLANK or >FB_BLANK_POWERDOWN instead of *(int *)evdata->data. > > The rationale in the commit message seems plausible, and AFAICT the code > does what it says on the box, so for that (and for that alone) you can > have my > > Reviewed-by: Jani Nikula > > *BUT* it would be laborous to figure out whether this change in > behaviour might regress some drivers. I'm just punting on that. And that > brings us back to the three changes above - in a bisect POV it might be > helpful to split the patch up. Up to the maintainers. I agree with Jani Nikula's opinion. Please split this patch into three patches as above mentioned. Best regards, Jingoo Han -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
Dave found that the kernel will hang during boot. This is because the nodemask_t type stack variable numa_kernel_nodes is large enough to overflow the stack. This doesn't always happen. According to Dave, this happened once in about five boots. The backtrace is like the following: dump_stack panic ? numa_clear_kernel_node_hotplug __stack_chk_fail numa_clear_kernel_node_hotplug ? memblock_search_pfn_nid ? __early_pfn_to_nid numa_init x86_numa_init initmem_init setup_arch start_kernel This patch fix this problem by defining numa_kernel_nodes as a static global variable in __initdata area. Reported-by: Dave Jones Signed-off-by: Tang Chen Tested-by: Gu Zheng --- arch/x86/mm/numa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 81b2750..ebefeb7 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -562,10 +562,10 @@ static void __init numa_init_array(void) } } +static nodemask_t numa_kernel_nodes __initdata; static void __init numa_clear_kernel_node_hotplug(void) { int i, nid; - nodemask_t numa_kernel_nodes; unsigned long start, end; struct memblock_type *type = -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mtd: mtd_oobtest: fix verify errors due to incorrect use of prandom_bytes_state()
Hi, Akinobu Mita wrote: > 2014/1/23 Lothar Waßmann : > > Hi, > > > > Akinobu Mita wrote: > >> 2014/1/22 Lothar Waßmann : > >> > Hi, > >> > > >> > Is anyone taking care of this? > >> > > >> > Lothar Waßmann wrote: > >> >> When using prandom_bytes_state() it is critical to use the same block > >> >> size in all invocations that are to produce the same random sequence. > >> >> Otherwise the state of the PRNG will be out of sync if the blocksize > >> >> is not divisible by 4. > >> >> This leads to bogus verification errors in several tests which use > >> >> different block sizes to initialize the buffer for writing and > >> >> comparison. > >> >> > >> >> Signed-off-by: Lothar Waßmann > >> >> --- > >> >> drivers/mtd/tests/oobtest.c | 14 -- > >> >> 1 files changed, 12 insertions(+), 2 deletions(-) > >> >> > >> >> diff --git a/drivers/mtd/tests/oobtest.c b/drivers/mtd/tests/oobtest.c > >> >> index 2e9e2d1..72c7359 100644 > >> >> --- a/drivers/mtd/tests/oobtest.c > >> >> +++ b/drivers/mtd/tests/oobtest.c > >> >> @@ -213,8 +213,15 @@ static int verify_eraseblock_in_one_go(int ebnum) > >> >> int err = 0; > >> >> loff_t addr = ebnum * mtd->erasesize; > >> >> size_t len = mtd->ecclayout->oobavail * pgcnt; > >> >> + int i; > >> >> + > >> >> + for (i = 0; i < pgcnt; i++) > >> >> + prandom_bytes_state(_state, [i * use_len], > >> >> + use_len); > >> >> + if (len % use_len) > >> >> + prandom_bytes_state(_state, [i * use_len], > >> >> + len % use_len); > >> >> > >> >> - prandom_bytes_state(_state, writebuf, len); > >> >> ops.mode = MTD_OPS_AUTO_OOB; > >> >> ops.len = 0; > >> >> ops.retlen= 0; > >> > >> I would rather fix the use of prandom_bytes_state() in write_eraseblock() > >> than fix in verify_eraseblock_in_one_go(). > >> > > Why and how? > > I thought that it could reduce calls of prandom_bytes_state() and > it makes code simpler than increasing calls. > > > write_whole_device() (which calls write_eraseblock()) is used multiple > > times with different verification methods (all blocks in one go or each > > block individually). > > If prandom_state_bytes() in write_eraseblock() would be changed, that > > function would have to know, how the block are going to be checked > > lateron to know how to set up the writebuffer. > > Instead of calling prandom_bytes_state() in the for loop in > write_eraseblock(), call prandom_bytes_state() at once before going > into the loop and use correct offset in writebuf in the loop. > Although, we also need to fix verify_eraseblock() in the same way. > > Doesn't that fix this problem? > Of course one could fix it that way, but that would be a much more invasive change that also needs more testing. Lothar Waßmann -- ___ Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10 Geschäftsführer: Matthias Kaussen Handelsregistereintrag: Amtsgericht Aachen, HRB 4996 www.karo-electronics.de | i...@karo-electronics.de ___ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Is it ok for deferrable timer wakeup the idle cpu?
On 23 January 2014 11:11, Lei Wen wrote: > On Wed, Jan 22, 2014 at 10:07 PM, Thomas Gleixner wrote: >> On Wed, 22 Jan 2014, Lei Wen wrote: >>> Recently I want to do the experiment for cpu isolation over 3.10 kernel. >>> But I find the isolated one is periodically waken up by IPI interrupt. >>> >>> By checking the trace, I find those IPI is generated by add_timer_on, >>> which would calls wake_up_nohz_cpu, and wake up the already idle cpu. >>> >>> With further checking, I find this timer is added by on_demand governor of >>> cpufreq. It would periodically check each cores' state. >>> The problem I see here is cpufreq_governor using INIT_DEFERRABLE_WORK >>> as the tool, while timer is made as deferrable anyway. >>> And what is more that cpufreq checking is very frequent. In my case, the >>> isolated cpu is wakenup by IPI every 5ms. >>> >>> So why kernel need to wake the remote processor when mount the deferrable >>> timer? As per my understanding, we'd better keep cpu as idle when use >>> the deferrable timer. >> >> Indeed, we can avoid the wakeup of the remote cpu when the timer is >> deferrable. > > Glad to hear that we could fix this unwanted wakeup. > Do you have related patches already? > >> >> Though you really want to figure out why the cpufreq governor is >> arming timers on other cores every 5ms. That smells like an utterly >> stupid approach. > > Not sure why cpufreq choose such frequent profiling over each cpu. > As my understanding, since kernel is smp, launching profiler over one cpu > would be enough... Hi Guys, So the first question is why cpufreq needs it and is it really stupid? Yes, it is stupid but that's how its implemented since a long time. It does so to get data about the load on CPUs, so that freq can be scaled up/down. Though there is a solution in discussion currently, which will take inputs from scheduler and so these background timers would go away. But we need to wait until that time. Now, why do we need that for every cpu, while that for a single cpu might be enough? The answer is cpuidle here: What if the cpu responsible for running timer goes to sleep? Who will evaluate the load then? And if we make this timer run on one cpu in non-deferrable mode then that cpu would be waken up again and again from idle. So, it was decided to have a per-cpu deferrable timer. Though to improve efficiency, once it is fired on any cpu, timer for all other CPUs are rescheduled, so that they don't fire before 5ms (sampling time).. I think below diff might get this fixed for you, though I am not sure if it breaks something else. Probably Thomas/Frederic can answer here. If this looks fine I will send it formally again: diff --git a/kernel/timer.c b/kernel/timer.c index accfd24..3a2c7fa 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -940,7 +940,8 @@ void add_timer_on(struct timer_list *timer, int cpu) * makes sure that a CPU on the way to stop its tick can not * evaluate the timer wheel. */ - wake_up_nohz_cpu(cpu); + if (!tbase_get_deferrable(timer->base)) + wake_up_nohz_cpu(cpu); spin_unlock_irqrestore(>lock, flags); } EXPORT_SYMBOL_GPL(add_timer_on); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 9/9] mm: keep page cache radix tree nodes in check
On Mon, Jan 20, 2014 at 06:17:37PM -0500, Johannes Weiner wrote: > On Fri, Jan 17, 2014 at 11:05:17AM +1100, Dave Chinner wrote: > > On Fri, Jan 10, 2014 at 01:10:43PM -0500, Johannes Weiner wrote: > > > Previously, page cache radix tree nodes were freed after reclaim > > > emptied out their page pointers. But now reclaim stores shadow > > > entries in their place, which are only reclaimed when the inodes > > > themselves are reclaimed. This is problematic for bigger files that > > > are still in use after they have a significant amount of their cache > > > reclaimed, without any of those pages actually refaulting. The shadow > > > entries will just sit there and waste memory. In the worst case, the > > > shadow entries will accumulate until the machine runs out of memory. > > > > > > To get this under control, the VM will track radix tree nodes > > > exclusively containing shadow entries on a per-NUMA node list. > > > Per-NUMA rather than global because we expect the radix tree nodes > > > themselves to be allocated node-locally and we want to reduce > > > cross-node references of otherwise independent cache workloads. A > > > simple shrinker will then reclaim these nodes on memory pressure. > > > > > > A few things need to be stored in the radix tree node to implement the > > > shadow node LRU and allow tree deletions coming from the list: > > > > Just a couple of things with the list_lru interfaces. > > > > > > > @@ -123,9 +129,39 @@ static void page_cache_tree_delete(struct > > > address_space *mapping, > > >* same time and miss a shadow entry. > > >*/ > > > smp_wmb(); > > > - } else > > > - radix_tree_delete(>page_tree, page->index); > > > + } > > > mapping->nrpages--; > > > + > > > + if (!node) { > > > + /* Clear direct pointer tags in root node */ > > > + mapping->page_tree.gfp_mask &= __GFP_BITS_MASK; > > > + radix_tree_replace_slot(slot, shadow); > > > + return; > > > + } > > > + > > > + /* Clear tree tags for the removed page */ > > > + index = page->index; > > > + offset = index & RADIX_TREE_MAP_MASK; > > > + for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) { > > > + if (test_bit(offset, node->tags[tag])) > > > + radix_tree_tag_clear(>page_tree, index, tag); > > > + } > > > + > > > + /* Delete page, swap shadow entry */ > > > + radix_tree_replace_slot(slot, shadow); > > > + node->count--; > > > + if (shadow) > > > + node->count += 1U << RADIX_TREE_COUNT_SHIFT; > > > + else > > > + if (__radix_tree_delete_node(>page_tree, node)) > > > + return; > > > + > > > + /* Only shadow entries in there, keep track of this node */ > > > + if (!(node->count & RADIX_TREE_COUNT_MASK) && > > > + list_empty(>private_list)) { > > > + node->private_data = mapping; > > > + list_lru_add(_shadow_nodes, >private_list); > > > + } > > > > You can't do this list_empty(>private_list) check safely > > externally to the list_lru code - only time that entry can be > > checked safely is under the LRU list locks. This is the reason that > > list_lru_add/list_lru_del return a boolean to indicate is the object > > was added/removed from the list - they do this list_empty() check > > internally. i.e. the correct, safe way to do conditionally update > > state iff the object was added to the LRU is: > > > > if (!(node->count & RADIX_TREE_COUNT_MASK)) { > > if (list_lru_add(_shadow_nodes, >private_list)) > > node->private_data = mapping; > > } > > > > > + radix_tree_replace_slot(slot, page); > > > + mapping->nrpages++; > > > + if (node) { > > > + node->count++; > > > + /* Installed page, can't be shadow-only anymore */ > > > + if (!list_empty(>private_list)) > > > + list_lru_del(_shadow_nodes, > > > + >private_list); > > > + } > > > > Same issue here: > > > > if (node) { > > node->count++; > > list_lru_del(_shadow_nodes, >private_list); > > } > > All modifications to node->private_list happen under > mapping->tree_lock, and modifications of a neighboring link should not > affect the outcome of the list_empty(), so I don't think the lru lock > is necessary. > > It would be cleaner to take it of course, but that would mean adding > an unconditional NUMAnode-wide lock to every page cache population. > > > > static int __add_to_page_cache_locked(struct page *page, > > > diff --git a/mm/list_lru.c b/mm/list_lru.c > > > index 72f9decb0104..47a9faf4070b 100644 > > > --- a/mm/list_lru.c > > > +++ b/mm/list_lru.c > > > @@ -88,10 +88,18 @@ restart: > > > ret = isolate(item, >lock, cb_arg); > > > switch (ret) { > > > case LRU_REMOVED: > > > + case LRU_REMOVED_RETRY: > > > if (--nlru->nr_items == 0) > > > node_clear(nid, lru->active_nodes); > > >
[patch] mm, compaction: ignore pageblock skip when manually invoking compaction
The cached pageblock hint should be ignored when triggering compaction through /proc/sys/vm/compact_memory so all eligible memory is isolated. Manually invoking compaction is known to be expensive, there's no need to skip pageblocks based on heuristics (mainly for debugging). Signed-off-by: David Rientjes --- mm/compaction.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/compaction.c b/mm/compaction.c --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1177,6 +1177,7 @@ static void compact_node(int nid) struct compact_control cc = { .order = -1, .sync = true, + .ignore_skip_hint = true, }; __compact_pgdat(NODE_DATA(nid), ); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 67651] Bisected: Lots of fragmented mmaps cause gimp to fail in 3.12 after exceeding vm_max_map_count
On Wed, Jan 22, 2014 at 02:45:53PM -0800, Andy Lutomirski wrote: > > > > Thus when user space application track memory changes now it can detect > > if > > vma area is renewed. > > Presumably some path is failing to set VM_SOFTDIRTY, thus preventing mms > from being merged. > > That being said, this could cause vma blowups for programs that are > actually using this thing. Hi Andy, indeed, this could happen. The easiest way is to ignore softdirty bit when we're trying to merge vmas and set it one new merged. I think this should be correct. Once I finish I'll send the patch. Cyrill -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] net/neighbour: queue work on power efficient wq
From: Viresh Kumar Date: Wed, 22 Jan 2014 12:23:33 +0530 > Workqueue used in neighbour layer have no real dependency of scheduling these > on > the cpu which scheduled them. > > On a idle system, it is observed that an idle cpu wakes up many times just to > service this work. It would be better if we can schedule it on a cpu which the > scheduler believes to be the most appropriate one. > > This patch replaces normal workqueues with power efficient versions. This > doesn't change existing behavior of code unless CONFIG_WQ_POWER_EFFICIENT is > enabled. > > Signed-off-by: Viresh Kumar Applied. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] net/ipv4: queue work on power efficient wq
From: Viresh Kumar Date: Wed, 22 Jan 2014 12:23:32 +0530 > Workqueue used in ipv4 layer have no real dependency of scheduling these on > the > cpu which scheduled them. > > On a idle system, it is observed that an idle cpu wakes up many times just to > service this work. It would be better if we can schedule it on a cpu which the > scheduler believes to be the most appropriate one. > > This patch replaces normal workqueues with power efficient versions. This > doesn't change existing behavior of code unless CONFIG_WQ_POWER_EFFICIENT is > enabled. > > Signed-off-by: Viresh Kumar Applied. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 6lowpan: add a license to 6lowpan_iphc module
From: Yann Droneaud Date: Wed, 22 Jan 2014 20:25:24 +0100 > Since commit 8df8c56a5abc, 6lowpan_iphc is a module of its own. > > Unfortunately, it lacks some infrastructure to behave like a > good kernel citizen: > > kernel: 6lowpan_iphc: module license 'unspecified' taints kernel. > kernel: Disabling lock debugging due to kernel taint > > This patch adds the basic MODULE_LICENSE(); with GPL license: > the code was copied from net/ieee802154/6lowpan.c which is GPL > and the module exports symbol with EXPORT_SYMBOL_GPL();. > > Cc: Jukka Rissanen > Cc: Alexander Aring > Cc: Marcel Holtmann > Signed-off-by: Yann Droneaud Applied. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting kernel nodes to unhotpluggable.
On Thu, Jan 23, 2014 at 01:49:28PM +0800, Tang Chen wrote: > This doesn't always happen. According to Dave, this happened once > in about five boots. The backtrace is like the following: > > dump_stack > panic > ? numa_clear_kernel_node_hotplug > __stack_chk_fail > numa_clear_kernel_node_hotplug > ? memblock_search_pfn_nid > ? __early_pfn_to_nid > numa_init > x86_numa_init > initmem_init > setup_arch > start_kernel > > This patch fix this problem by defining numa_kernel_nodes as a > static global variable in __initdata area. > > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c > index 81b2750..ebefeb7 100644 > --- a/arch/x86/mm/numa.c > +++ b/arch/x86/mm/numa.c > @@ -562,10 +562,10 @@ static void __init numa_init_array(void) > } > } > > +static nodemask_t numa_kernel_nodes __initdata; > static void __init numa_clear_kernel_node_hotplug(void) > { > int i, nid; > -nodemask_t numa_kernel_nodes; > unsigned long start, end; > struct memblock_type *type = I'm surprised that this worked for anyone. By my math, nodemask_t is 1024 longs, which should fill the whole stack. Any idea why it only broke sometimes ? There are other on-stack nodemask_t's in the tree too, why are they safe ? Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH Resend 4/8] ASoC: simple-card: Add snd_card's name parsing from DT node support
If the DT is used and the CPU DAI device has only one DAI, the card name will be like : ALSA device list: 0: 40031000.sai-sgtl5000 And this name maybe a little ugly to some customers, so here the card name parsing from DT node is supported. Signed-off-by: Xiubo Li --- sound/soc/generic/simple-card.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/sound/soc/generic/simple-card.c b/sound/soc/generic/simple-card.c index f38e56e..546b93d 100644 --- a/sound/soc/generic/simple-card.c +++ b/sound/soc/generic/simple-card.c @@ -140,6 +140,9 @@ static int asoc_simple_card_parse_of(struct device_node *node, char *name; int ret; + /* parsing the card name from DT */ + snd_soc_of_parse_card_name(>snd_card, "simple-audio-card,name"); + /* get CPU/CODEC common format via simple-audio-card,format */ priv->daifmt = snd_soc_of_parse_daifmt(node, "simple-audio-card,") & (SND_SOC_DAIFMT_FORMAT_MASK | SND_SOC_DAIFMT_INV_MASK); @@ -184,7 +187,8 @@ static int asoc_simple_card_parse_of(struct device_node *node, GFP_KERNEL); sprintf(name, "%s-%s", dai_link->cpu_dai_name, dai_link->codec_dai_name); - priv->snd_card.name = name; + if (!priv->snd_card.name) + priv->snd_card.name = name; dai_link->name = dai_link->stream_name = name; /* simple-card assumes platform == cpu */ -- 1.8.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/