RE: Dangerous devm_request_irq() conversions
On Friday, February 22, 2013 4:27 PM, Dmitry Torokhov wrote: > On Fri, Feb 22, 2013 at 04:12:36PM +0900, Jingoo Han wrote: > > On Friday, February 22, 2013 3:54 PM, Dmitry Torokhov wrote: > > > > > > Hi, > > > > > > It looks like a whole slew of devm_request_irq() conversions just got > > > applied to mainline and many of them are quite broken. > > > > > > Consider fd5231ce336e038037b4f0190a6838bdd6e17c6d or > > > c1879fe80c61f3be6f2ddb82509c2e7f92a484fe: the drivers udsed first to > > > free IRQ and then unregister the corresponding device ensuring that IRQ > > > handler, while it runs, has the device available. The mechanic > > > conversion to devm_request_irq() reverses the order of these operations > > > opening the race window where IRQ can reference device (or other > > > resource) that is already gone. > > > > > > It would be nice if these could be reverted and revioewed again for > > > correctness. > > > > Um, other RTC drivers already have been using devm_request_threaded_irq() or > > devm_request_irq() like this, before I added these patches. > > > > For example, > > ./drivers/rtc/rtc-tegra.c > > ./drivers/rtc/rtc-spear.c > > ./drivers/rtc/rtc-s3c.c > > ./drivers/rtc/rtc-mxc.c > > ./drivers/rtc/rtc-ds1553.c > > ./drivers/rtc/rtc-ds1511.c > > ./drivers/rtc/rtc-snvs.c > > ./drivers/rtc/rtc-imxdi.c > > ./drivers/rtc/rtc-tx4939.c > > ./drivers/rtc/rtc-mv.c > > ./drivers/rtc/rtc-coh901331.c > > ./drivers/rtc/rtc-stk17ta8.c > > ./drivers/rtc/rtc-lpc32xx.c > > ./drivers/rtc/rtc-tps65910.c > > ./drivers/rtc/rtc-rc5t583.c > > > > > > Also, even more, some RTC drivers calls rtc_device_unregister() first, > > then calls free_irq() later. > > > > For example, > > ./drivers/rtc/rtc-vr41xx.c > > ./drivers/rtc/rtc-da9052.c > > ./drivers/rtc/rtc-isl1208.c > > ./drivers/rtc/rtc-88pm860x.c > > ./drivers/rtc/rtc-tps6586x.c > > ./drivers/rtc/rtc-mpc5121.c > > ./drivers/rtc/rtc-m48t59.c > > > > > > Please, don't argue revert without concrete reasons. > > What more concrete reason do you need? I explained to you the exact > reason on the patches I noticed before and also on the 2 commits > referenced above: blind conversion to devm_* changes order of operation > which may be deadly with IRQs (but others, like clocks and regulators, > are important too). > > The fact that crap slipped in the kernel before is not the valid reason > for adding more of the same crap. > > Please *understand* APIs you are using before making changes. > > > > > If these devm_request_threaded_irq() or devm_request_irq() make the problem, > > devm_free_irq() will be added later. > > And the point? If you use devm_request_irq() and then call > devm_free_irq() manually in all paths what you achieved is waste of > memory required for devm_* tracking. CC'ed Al Viro, Tejun Heo So, is there any report that the devm_request_threaded_irq() makes the deadly problem related IRQ in such cases? According to your comment, it seems that there is no reason to use devm_request_irq() or devm_request_threaded_irq(). Please, argue that it would be better to deprecate devm_request_irq() or devm_request_threaded_irq(). > > -- > Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/35] mfd: ab8500-gpadc: Implemented suspend/resume
On Thu, 21 Feb 2013, Ulf Hansson wrote: > On 20 February 2013 14:19, Mark Brown > wrote: > > On Fri, Feb 15, 2013 at 12:56:32PM +, Lee Jones wrote: > > > >> +static int ab8500_gpadc_suspend(struct device *dev) > >> +{ > >> + struct ab8500_gpadc *gpadc = dev_get_drvdata(dev); > >> + > >> + mutex_lock(&gpadc->ab8500_gpadc_lock); > >> + > >> + pm_runtime_get_sync(dev); > >> + > >> + regulator_disable(gpadc->regu); > >> + return 0; > >> +} > > > > This doesn't look especially sane... You're doing a runtime get, taking > > the lock without releasing it and disabling the regulator. This is > > *very* odd, both the changelog and the code need to explain what's going > > on and why it's safe in a lot more detail here. > > You need to do pm_runtime_get_sync to be able to make sure resources > (which seems to be only the regulator) are safe to switch off. To my > understanding this is a generic way to use for being able to switch > off resources at a device suspend when runtime pm is used in > conjunction. > > Regarding the mutex, I can't tell the reason behind it. It seems > strange but not sure. Daniel, any thoughts? I'm happy to fixup, once I have the full story. -- Lee Jones Linaro ST-Ericsson Landing Team Lead Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] PM: align buffers for LZO compression
Hi, for performance reasons I'd strongly suggest that you explicitly align all buffers passed to the LZO compress and decompress functions. Below is a small (and completely untested!) patch, but I think you get the idea. BTW, it might be even more beneficial (esp. for NUMA systems) to align *all* individual unc/cmp/wrk pointers to a multiple of the PAGE_SIZE, but this would require some code restructuring. Cheers, Markus completely untested patch: diff --git a/kernel/power/swap.c b/kernel/power/swap.c index 7c33ed2..7af4293 100644 --- a/kernel/power/swap.c +++ b/kernel/power/swap.c @@ -532,9 +532,9 @@ struct cmp_data { wait_queue_head_t done; /* compression done */ size_t unc_len; /* uncompressed length */ size_t cmp_len; /* compressed length */ - unsigned char unc[LZO_UNC_SIZE]; /* uncompressed buffer */ - unsigned char cmp[LZO_CMP_SIZE]; /* compressed buffer */ - unsigned char wrk[LZO1X_1_MEM_COMPRESS]; /* compression workspace */ + unsigned char unc[LZO_UNC_SIZE] cacheline_aligned; /* uncompressed buffer */ + unsigned char cmp[LZO_CMP_SIZE] cacheline_aligned; /* compressed buffer */ + unsigned char wrk[LZO1X_1_MEM_COMPRESS] cacheline_aligned; /* compression workspace */ }; /** @@ -1021,8 +1021,8 @@ struct dec_data { wait_queue_head_t done; /* decompression done */ size_t unc_len; /* uncompressed length */ size_t cmp_len; /* compressed length */ - unsigned char unc[LZO_UNC_SIZE]; /* uncompressed buffer */ - unsigned char cmp[LZO_CMP_SIZE]; /* compressed buffer */ + unsigned char unc[LZO_UNC_SIZE] cacheline_aligned; /* uncompressed buffer */ + unsigned char cmp[LZO_CMP_SIZE] cacheline_aligned; /* compressed buffer */ }; /** Signed-off-by: Markus F.X.J. Oberhumer -- Markus Oberhumer, , http://www.oberhumer.com/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] cpustat: use accessor functions for get/set/add
On 22 February 2013 12:47, Amit Kucheria wrote: > On Fri, Feb 22, 2013 at 11:51 AM, Viresh Kumar > wrote: >> BTW, i don't see kcpustat_cpu() used in >> >> kernel/sched/core.c| 12 +--- >> kernel/sched/cputime.c | 29 +-- >> >> I searched tip/master as well as lnext/master. > > Added by Frederic's Adaptive NOHZ patchset? I don't even see them on our unused-nohz-adaptive-tickless-v2 branch :) Maybe some other latest work. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] acpi: sleep: Avoid interleaved message on errors
Got this dmesg log on an Acer Aspire 725. [0.256351] ACPI: (supports S0ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S1_] (20130117/hwxface-568) [0.256373] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20130117/hwxface-568) [0.256391] S3 S4 S5) Avoid this interleaving error messages. Signed-off-by: Joe Perches --- drivers/acpi/sleep.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c index 6d3a06a..2421303 100644 --- a/drivers/acpi/sleep.c +++ b/drivers/acpi/sleep.c @@ -599,7 +599,6 @@ static void acpi_sleep_suspend_setup(void) status = acpi_get_sleep_type_data(i, &type_a, &type_b); if (ACPI_SUCCESS(status)) { sleep_states[i] = 1; - pr_cont(" S%d", i); } } @@ -742,7 +741,6 @@ static void acpi_sleep_hibernate_setup(void) hibernation_set_ops(old_suspend_ordering ? &acpi_hibernation_ops_old : &acpi_hibernation_ops); sleep_states[ACPI_STATE_S4] = 1; - pr_cont(KERN_CONT " S4"); if (nosigcheck) return; @@ -788,6 +786,9 @@ int __init acpi_sleep_init(void) { acpi_status status; u8 type_a, type_b; + char supported[ACPI_S_STATE_COUNT * 3 + 1]; + char *pos = supported; + int i; if (acpi_disabled) return 0; @@ -795,7 +796,6 @@ int __init acpi_sleep_init(void) acpi_sleep_dmi_check(); sleep_states[ACPI_STATE_S0] = 1; - pr_info(PREFIX "(supports S0"); acpi_sleep_suspend_setup(); acpi_sleep_hibernate_setup(); @@ -803,11 +803,17 @@ int __init acpi_sleep_init(void) status = acpi_get_sleep_type_data(ACPI_STATE_S5, &type_a, &type_b); if (ACPI_SUCCESS(status)) { sleep_states[ACPI_STATE_S5] = 1; - pr_cont(" S5"); pm_power_off_prepare = acpi_power_off_prepare; pm_power_off = acpi_power_off; } - pr_cont(")\n"); + + supported[0] = 0; + for (i = 0; i < ACPI_S_STATE_COUNT; i++) { + if (sleep_states[i]) + pos += sprintf(pos, " S%d", i); + } + pr_info(PREFIX "(supports%s)\n", supported); + /* * Register the tts_notifier to reboot notifier list so that the _TTS * object can also be evaluated when the system enters S5. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/8] ARM: PRIMA2: Divorce timer-marco from local timer API
Separate the marco local timers from the local timer API. This will allow us to remove ARM local timer support in the near future and gets us closer to moving this driver to drivers/clocksource. Cc: Barry Song Signed-off-by: Stephen Boyd --- arch/arm/mach-prima2/timer-marco.c | 98 -- 1 file changed, 52 insertions(+), 46 deletions(-) diff --git a/arch/arm/mach-prima2/timer-marco.c b/arch/arm/mach-prima2/timer-marco.c index f4eea2e..d54aac2 100644 --- a/arch/arm/mach-prima2/timer-marco.c +++ b/arch/arm/mach-prima2/timer-marco.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -18,7 +19,6 @@ #include #include #include -#include #include #include "common.h" @@ -154,13 +154,7 @@ static void sirfsoc_clocksource_resume(struct clocksource *cs) BIT(1) | BIT(0), sirfsoc_timer_base + SIRFSOC_TIMER_64COUNTER_CTRL); } -static struct clock_event_device sirfsoc_clockevent = { - .name = "sirfsoc_clockevent", - .rating = 200, - .features = CLOCK_EVT_FEAT_ONESHOT, - .set_mode = sirfsoc_timer_set_mode, - .set_next_event = sirfsoc_timer_set_next_event, -}; +static struct clock_event_device __percpu *sirfsoc_clockevent; static struct clocksource sirfsoc_clocksource = { .name = "sirfsoc_clocksource", @@ -176,11 +170,8 @@ static struct irqaction sirfsoc_timer_irq = { .name = "sirfsoc_timer0", .flags = IRQF_TIMER | IRQF_NOBALANCING, .handler = sirfsoc_timer_interrupt, - .dev_id = &sirfsoc_clockevent, }; -#ifdef CONFIG_LOCAL_TIMERS - static struct irqaction sirfsoc_timer1_irq = { .name = "sirfsoc_timer1", .flags = IRQF_TIMER | IRQF_NOBALANCING, @@ -189,56 +180,75 @@ static struct irqaction sirfsoc_timer1_irq = { static int __cpuinit sirfsoc_local_timer_setup(struct clock_event_device *ce) { - /* Use existing clock_event for cpu 0 */ - if (!smp_processor_id()) - return 0; + int cpu = smp_processor_id(); + struct irqaction *action; + + if (cpu == 0) + action = &sirfsoc_timer_irq; + else + action = &sirfsoc_timer1_irq; - ce->irq = sirfsoc_timer1_irq.irq; + ce->irq = action->irq; ce->name = "local_timer"; - ce->features = sirfsoc_clockevent.features; - ce->rating = sirfsoc_clockevent.rating; + ce->features = CLOCK_EVT_FEAT_ONESHOT; + ce->rating = 200; ce->set_mode = sirfsoc_timer_set_mode; ce->set_next_event = sirfsoc_timer_set_next_event; - ce->shift = sirfsoc_clockevent.shift; - ce->mult = sirfsoc_clockevent.mult; - ce->max_delta_ns = sirfsoc_clockevent.max_delta_ns; - ce->min_delta_ns = sirfsoc_clockevent.min_delta_ns; + clockevents_calc_mult_shift(ce, CLOCK_TICK_RATE, 60); + ce->max_delta_ns = clockevent_delta2ns(-2, ce); + ce->min_delta_ns = clockevent_delta2ns(2, ce); + ce->cpumask = cpumask_of(cpu); - sirfsoc_timer1_irq.dev_id = ce; - BUG_ON(setup_irq(ce->irq, &sirfsoc_timer1_irq)); - irq_set_affinity(sirfsoc_timer1_irq.irq, cpumask_of(1)); + action->dev_id = ce; + BUG_ON(setup_irq(ce->irq, action)); + irq_set_affinity(action->irq, cpumask_of(cpu)); clockevents_register_device(ce); return 0; } -static void sirfsoc_local_timer_stop(struct clock_event_device *ce) +static void __cpuinit sirfsoc_local_timer_stop(struct clock_event_device *ce) { + int cpu = smp_processor_id(); + sirfsoc_timer_count_disable(1); - remove_irq(sirfsoc_timer1_irq.irq, &sirfsoc_timer1_irq); + if (cpu == 0) + remove_irq(sirfsoc_timer_irq.irq, &sirfsoc_timer_irq); + else + remove_irq(sirfsoc_timer1_irq.irq, &sirfsoc_timer1_irq); +} + +static int __cpuinit sirfsoc_cpu_notify(struct notifier_block *self, + unsigned long action, void *hcpu) +{ + struct clock_event_device *evt = this_cpu_ptr(sirfsoc_clockevent); + + switch (action & ~CPU_TASKS_FROZEN) { + case CPU_STARTING: + sirfsoc_local_timer_setup(evt); + break; + case CPU_DYING: + sirfsoc_local_timer_stop(evt); + break; + } + + return NOTIFY_OK; } -static struct local_timer_ops sirfsoc_local_timer_ops __cpuinitdata = { - .setup = sirfsoc_local_timer_setup, - .stop = sirfsoc_local_timer_stop, +static struct notifier_block sirfsoc_cpu_nb __cpuinitdata = { + .notifier_call = sirfsoc_cpu_notify, }; -#endif /* CONFIG_LOCAL_TIMERS */ static void __init sirfsoc_clockevent_init(void) { - clockevents_calc_mult_shift(&sirfsoc_clockevent, CLOCK_TICK_RATE, 60); - - sirfsoc_clockevent.max_delta_ns = - clockevent_delta2ns(-2, &sirfsoc_clockevent); - sirfsoc_clockevent.min_delta_ns = - clockevent_
Re: [PATCH v4] mfd: syscon: Add non-DT support
On Fri, Feb 22, 2013 at 03:13:12PM +0800, Dong Aisheng wrote: > On Fri, Feb 22, 2013 at 11:01:18AM +0400, Alexander Shiyan wrote: > > > On Thu, Feb 21, 2013 at 07:29:02PM +0400, Alexander Shiyan wrote: > > > > This patch allow using syscon driver from the platform data, i.e. > > > > possibility using driver on systems without oftree support. > > > > For search syscon device from the client drivers, > > > > "syscon_regmap_lookup_by_pdevname" function was added. > > > > > > > > Signed-off-by: Alexander Shiyan > > > > > > [...] > > > > > > > + syscon->base = devm_ioremap_resource(dev, res); > > > > + if (!syscon->base) > > > > > > Is this correct? > > > > Hmm, of course IS_ERR should be used here... > > v5? > > > > Yes. > >From here: > https://lkml.org/lkml/2013/1/21/140 > It seems it is. > > > > > > > > + return -EBUSY; > > Both this line could also be changed. > > > > > > > > > > > Otherwise, i'm also ok with this patch. > > > Acked-by: Dong Aisheng > > > > > > BTW, i did not see Samuel's tree having this new API. > > > So, who will pick this patch? > > > > I have same question. > > I CCed Thierry and Greg who may know it. Yes, devm_ioremap_resource() never returns NULL. You always need to check the returned pointer with IS_ERR(). The value that you return should be extracted from the pointer with PTR_ERR(). Thierry pgpzWfciBrHRC.pgp Description: PGP signature
[PATCH 5/8] ARM: MSM: Divorce msm_timer from local timer API
Separate the msm_timer from the local timer API. This will allow us to remove ARM local timer support in the near future and gets us closer to moving this driver to drivers/clocksource. Cc: David Brown Cc: Daniel Walker Cc: Bryan Huntsman Signed-off-by: Stephen Boyd --- arch/arm/mach-msm/timer.c | 125 +- 1 file changed, 67 insertions(+), 58 deletions(-) diff --git a/arch/arm/mach-msm/timer.c b/arch/arm/mach-msm/timer.c index 2969027..4675c5e 100644 --- a/arch/arm/mach-msm/timer.c +++ b/arch/arm/mach-msm/timer.c @@ -16,6 +16,7 @@ #include #include +#include #include #include #include @@ -25,7 +26,6 @@ #include #include -#include #include #include "common.h" @@ -46,7 +46,7 @@ static void __iomem *event_base; static irqreturn_t msm_timer_interrupt(int irq, void *dev_id) { - struct clock_event_device *evt = *(struct clock_event_device **)dev_id; + struct clock_event_device *evt = dev_id; /* Stop the timer tick */ if (evt->mode == CLOCK_EVT_MODE_ONESHOT) { u32 ctrl = readl_relaxed(event_base + TIMER_ENABLE); @@ -90,18 +90,7 @@ static void msm_timer_set_mode(enum clock_event_mode mode, writel_relaxed(ctrl, event_base + TIMER_ENABLE); } -static struct clock_event_device msm_clockevent = { - .name = "gp_timer", - .features = CLOCK_EVT_FEAT_ONESHOT, - .rating = 200, - .set_next_event = msm_timer_set_next_event, - .set_mode = msm_timer_set_mode, -}; - -static union { - struct clock_event_device *evt; - struct clock_event_device * __percpu *percpu_evt; -} msm_evt; +static struct clock_event_device __percpu *msm_evt; static void __iomem *source_base; @@ -127,40 +116,66 @@ static struct clocksource msm_clocksource = { .flags = CLOCK_SOURCE_IS_CONTINUOUS, }; -#ifdef CONFIG_LOCAL_TIMERS +static int msm_timer_irq; +static int msm_timer_has_ppi; + static int __cpuinit msm_local_timer_setup(struct clock_event_device *evt) { - /* Use existing clock_event for cpu 0 */ - if (!smp_processor_id()) - return 0; + int cpu = smp_processor_id(); + int err; writel_relaxed(0, event_base + TIMER_ENABLE); writel_relaxed(0, event_base + TIMER_CLEAR); writel_relaxed(~0, event_base + TIMER_MATCH_VAL); - evt->irq = msm_clockevent.irq; + evt->irq = msm_timer_irq; evt->name = "local_timer"; - evt->features = msm_clockevent.features; - evt->rating = msm_clockevent.rating; + evt->features = CLOCK_EVT_FEAT_ONESHOT; + evt->rating = 200; evt->set_mode = msm_timer_set_mode; evt->set_next_event = msm_timer_set_next_event; + evt->cpumask = cpumask_of(cpu); + + clockevents_config_and_register(evt, GPT_HZ, 4, 0x); + + if (msm_timer_has_ppi) { + enable_percpu_irq(evt->irq, IRQ_TYPE_EDGE_RISING); + } else { + err = request_irq(evt->irq, msm_timer_interrupt, + IRQF_TIMER | IRQF_NOBALANCING | + IRQF_TRIGGER_RISING, "gp_timer", evt); + if (err) + pr_err("request_irq failed\n"); + } - *__this_cpu_ptr(msm_evt.percpu_evt) = evt; - clockevents_config_and_register(evt, GPT_HZ, 4, 0xf000); - enable_percpu_irq(evt->irq, IRQ_TYPE_EDGE_RISING); return 0; } -static void msm_local_timer_stop(struct clock_event_device *evt) +static void __cpuinit msm_local_timer_stop(struct clock_event_device *evt) { evt->set_mode(CLOCK_EVT_MODE_UNUSED, evt); disable_percpu_irq(evt->irq); } -static struct local_timer_ops msm_local_timer_ops __cpuinitdata = { - .setup = msm_local_timer_setup, - .stop = msm_local_timer_stop, +static int __cpuinit msm_timer_cpu_notify(struct notifier_block *self, + unsigned long action, void *hcpu) +{ + struct clock_event_device *evt = this_cpu_ptr(msm_evt); + + switch (action & ~CPU_TASKS_FROZEN) { + case CPU_STARTING: + msm_local_timer_setup(evt); + break; + case CPU_DYING: + msm_local_timer_stop(evt); + break; + } + + return NOTIFY_OK; +} + +static struct notifier_block msm_timer_cpu_nb __cpuinitdata = { + .notifier_call = msm_timer_cpu_notify, }; -#endif /* CONFIG_LOCAL_TIMERS */ static notrace u32 msm_sched_clock_read(void) { @@ -170,41 +185,35 @@ static notrace u32 msm_sched_clock_read(void) static void __init msm_timer_init(u32 dgt_hz, int sched_bits, int irq, bool percpu) { - struct clock_event_device *ce = &msm_clockevent; struct clocksource *cs = &msm_clocksource; - int res; + int res = 0; - writel_relaxed(0, event_base + TIMER_ENABLE); - writel_relaxed(0, ev
[PATCH 3/8] ARM: EXYNOS4: Divorce mct from local timer API
Separate the mct local timers from the local timer API. This will allow us to remove ARM local timer support in the near future and gets us closer to moving this driver to drivers/clocksource. Cc: Kukjin Kim Signed-off-by: Stephen Boyd --- arch/arm/mach-exynos/mct.c | 53 -- 1 file changed, 37 insertions(+), 16 deletions(-) diff --git a/arch/arm/mach-exynos/mct.c b/arch/arm/mach-exynos/mct.c index c9d6650..5a9a73f 100644 --- a/arch/arm/mach-exynos/mct.c +++ b/arch/arm/mach-exynos/mct.c @@ -16,13 +16,13 @@ #include #include #include +#include #include #include #include #include #include -#include #include @@ -42,7 +42,7 @@ static unsigned long clk_rate; static unsigned int mct_int_type; struct mct_clock_event_device { - struct clock_event_device *evt; + struct clock_event_device evt; void __iomem *base; char name[10]; }; @@ -264,8 +264,6 @@ static void exynos4_clockevent_init(void) setup_irq(EXYNOS4_IRQ_MCT_G0, &mct_comp_event_irq); } -#ifdef CONFIG_LOCAL_TIMERS - static DEFINE_PER_CPU(struct mct_clock_event_device, percpu_mct_tick); /* Clock event handling */ @@ -338,7 +336,7 @@ static inline void exynos4_tick_set_mode(enum clock_event_mode mode, static int exynos4_mct_tick_clear(struct mct_clock_event_device *mevt) { - struct clock_event_device *evt = mevt->evt; + struct clock_event_device *evt = &mevt->evt; /* * This is for supporting oneshot mode. @@ -360,7 +358,7 @@ static int exynos4_mct_tick_clear(struct mct_clock_event_device *mevt) static irqreturn_t exynos4_mct_tick_isr(int irq, void *dev_id) { struct mct_clock_event_device *mevt = dev_id; - struct clock_event_device *evt = mevt->evt; + struct clock_event_device *evt = &mevt->evt; exynos4_mct_tick_clear(mevt); @@ -388,7 +386,6 @@ static int __cpuinit exynos4_local_timer_setup(struct clock_event_device *evt) int mct_lx_irq; mevt = this_cpu_ptr(&percpu_mct_tick); - mevt->evt = evt; mevt->base = EXYNOS4_MCT_L_BASE(cpu); sprintf(mevt->name, "mct_tick%d", cpu); @@ -426,7 +423,7 @@ static int __cpuinit exynos4_local_timer_setup(struct clock_event_device *evt) return 0; } -static void exynos4_local_timer_stop(struct clock_event_device *evt) +static void __cpuinit exynos4_local_timer_stop(struct clock_event_device *evt) { unsigned int cpu = smp_processor_id(); evt->set_mode(CLOCK_EVT_MODE_UNUSED, evt); @@ -439,22 +436,38 @@ static void exynos4_local_timer_stop(struct clock_event_device *evt) disable_percpu_irq(EXYNOS_IRQ_MCT_LOCALTIMER); } -static struct local_timer_ops exynos4_mct_tick_ops __cpuinitdata = { - .setup = exynos4_local_timer_setup, - .stop = exynos4_local_timer_stop, +static int __cpuinit exynos4_mct_cpu_notify(struct notifier_block *self, + unsigned long action, void *hcpu) +{ + struct mct_clock_event_device *mevt = this_cpu_ptr(&percpu_mct_tick); + struct clock_event_device *evt = &mevt->evt; + + switch (action & ~CPU_TASKS_FROZEN) { + case CPU_STARTING: + exynos4_local_timer_setup(evt); + break; + case CPU_DYING: + exynos4_local_timer_stop(evt); + break; + } + + return NOTIFY_OK; +} + +static struct notifier_block exynos4_mct_cpu_nb __cpuinitdata = { + .notifier_call = exynos4_mct_cpu_notify, }; -#endif /* CONFIG_LOCAL_TIMERS */ static void __init exynos4_timer_resources(void) { + int err; + struct mct_clock_event_device *mevt = this_cpu_ptr(&percpu_mct_tick); struct clk *mct_clk; mct_clk = clk_get(NULL, "xtal"); clk_rate = clk_get_rate(mct_clk); -#ifdef CONFIG_LOCAL_TIMERS if (mct_int_type == MCT_INT_PPI) { - int err; err = request_percpu_irq(EXYNOS_IRQ_MCT_LOCALTIMER, exynos4_mct_tick_isr, "MCT", @@ -463,8 +476,16 @@ static void __init exynos4_timer_resources(void) EXYNOS_IRQ_MCT_LOCALTIMER, err); } - local_timer_register(&exynos4_mct_tick_ops); -#endif /* CONFIG_LOCAL_TIMERS */ + err = register_cpu_notifier(&exynos4_mct_cpu_nb); + if (err) + goto out_irq; + + /* Immediately configure the timer on the boot CPU */ + exynos4_local_timer_setup(&mevt->evt); + return; + +out_irq: + free_percpu_irq(EXYNOS_IRQ_MCT_LOCALTIMER, &percpu_mct_tick); } void __init exynos4_timer_init(void) -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html P
[PATCH 2/8] ARM: smp_twd: Divorce smp_twd from local timer API
Separate the smp_twd timers from the local timer API. This will allow us to remove ARM local timer support in the near future and gets us closer to moving this driver to drivers/clocksource. Cc: Russell King Signed-off-by: Stephen Boyd --- arch/arm/kernel/smp_twd.c | 48 +++ 1 file changed, 32 insertions(+), 16 deletions(-) diff --git a/arch/arm/kernel/smp_twd.c b/arch/arm/kernel/smp_twd.c index c092115..2439843 100644 --- a/arch/arm/kernel/smp_twd.c +++ b/arch/arm/kernel/smp_twd.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -23,7 +24,6 @@ #include #include -#include /* set up by the platform code */ static void __iomem *twd_base; @@ -32,7 +32,7 @@ static struct clk *twd_clk; static unsigned long twd_timer_rate; static DEFINE_PER_CPU(bool, percpu_setup_called); -static struct clock_event_device __percpu **twd_evt; +static struct clock_event_device __percpu *twd_evt; static int twd_ppi; static void twd_set_mode(enum clock_event_mode mode, @@ -105,7 +105,7 @@ static void twd_update_frequency(void *new_rate) { twd_timer_rate = *((unsigned long *) new_rate); - clockevents_update_freq(*__this_cpu_ptr(twd_evt), twd_timer_rate); + clockevents_update_freq(__this_cpu_ptr(twd_evt), twd_timer_rate); } static int twd_rate_change(struct notifier_block *nb, @@ -131,7 +131,7 @@ static struct notifier_block twd_clk_nb = { static int twd_clk_init(void) { - if (twd_evt && *__this_cpu_ptr(twd_evt) && !IS_ERR(twd_clk)) + if (twd_evt && __this_cpu_ptr(twd_evt) && !IS_ERR(twd_clk)) return clk_notifier_register(twd_clk, &twd_clk_nb); return 0; @@ -150,7 +150,7 @@ static void twd_update_frequency(void *data) { twd_timer_rate = clk_get_rate(twd_clk); - clockevents_update_freq(*__this_cpu_ptr(twd_evt), twd_timer_rate); + clockevents_update_freq(__this_cpu_ptr(twd_evt), twd_timer_rate); } static int twd_cpufreq_transition(struct notifier_block *nb, @@ -176,7 +176,7 @@ static struct notifier_block twd_cpufreq_nb = { static int twd_cpufreq_init(void) { - if (twd_evt && *__this_cpu_ptr(twd_evt) && !IS_ERR(twd_clk)) + if (twd_evt && __this_cpu_ptr(twd_evt) && !IS_ERR(twd_clk)) return cpufreq_register_notifier(&twd_cpufreq_nb, CPUFREQ_TRANSITION_NOTIFIER); @@ -266,7 +266,6 @@ static void twd_get_clock(struct device_node *np) */ static int __cpuinit twd_timer_setup(struct clock_event_device *clk) { - struct clock_event_device **this_cpu_clk; int cpu = smp_processor_id(); /* @@ -275,7 +274,7 @@ static int __cpuinit twd_timer_setup(struct clock_event_device *clk) */ if (per_cpu(percpu_setup_called, cpu)) { __raw_writel(0, twd_base + TWD_TIMER_CONTROL); - clockevents_register_device(*__this_cpu_ptr(twd_evt)); + clockevents_register_device(__this_cpu_ptr(twd_evt)); enable_percpu_irq(clk->irq, 0); return 0; } @@ -296,9 +295,7 @@ static int __cpuinit twd_timer_setup(struct clock_event_device *clk) clk->set_mode = twd_set_mode; clk->set_next_event = twd_set_next_event; clk->irq = twd_ppi; - - this_cpu_clk = __this_cpu_ptr(twd_evt); - *this_cpu_clk = clk; + clk->cpumask = cpumask_of(cpu); clockevents_config_and_register(clk, twd_timer_rate, 0xf, 0x); @@ -307,16 +304,32 @@ static int __cpuinit twd_timer_setup(struct clock_event_device *clk) return 0; } -static struct local_timer_ops twd_lt_ops __cpuinitdata = { - .setup = twd_timer_setup, - .stop = twd_timer_stop, +static int __cpuinit twd_timer_cpu_notify(struct notifier_block *self, + unsigned long action, void *hcpu) +{ + struct clock_event_device *evt = this_cpu_ptr(twd_evt); + + switch (action & ~CPU_TASKS_FROZEN) { + case CPU_STARTING: + twd_timer_setup(evt); + break; + case CPU_DYING: + twd_timer_stop(evt); + break; + } + + return NOTIFY_OK; +} + +static struct notifier_block twd_timer_cpu_nb __cpuinitdata = { + .notifier_call = twd_timer_cpu_notify, }; static int __init twd_local_timer_common_register(struct device_node *np) { int err; - twd_evt = alloc_percpu(struct clock_event_device *); + twd_evt = alloc_percpu(struct clock_event_device); if (!twd_evt) { err = -ENOMEM; goto out_free; @@ -328,10 +341,13 @@ static int __init twd_local_timer_common_register(struct device_node *np) goto out_free; } - err = local_timer_register(&twd_lt_ops); + err = register_cpu_notifier(&twd_timer_cpu_nb); if (err) goto
[PATCH 8/8] ARM: smp: Remove local timer API
There are no more users of this API, remove it. Cc: Russell King Signed-off-by: Stephen Boyd --- arch/arm/Kconfig | 12 +-- arch/arm/include/asm/localtimer.h | 34 arch/arm/kernel/smp.c | 67 ++- arch/arm/mach-omap2/Kconfig | 1 - arch/arm/mach-omap2/timer.c | 7 5 files changed, 11 insertions(+), 110 deletions(-) delete mode 100644 arch/arm/include/asm/localtimer.h diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index dedf02b..7d4338d 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1527,6 +1527,7 @@ config SMP depends on HAVE_SMP depends on MMU select HAVE_ARM_SCU if !ARCH_MSM_SCORPIONMP + select HAVE_ARM_TWD if (!ARCH_MSM_SCORPIONMP && !EXYNOS4_MCT) select USE_GENERIC_SMP_HELPERS help This enables support for systems with more than one CPU. If you have @@ -1646,17 +1647,6 @@ config ARM_PSCI 0022A ("Power State Coordination Interface System Software on ARM processors"). -config LOCAL_TIMERS - bool "Use local timer interrupts" - depends on SMP - default y - select HAVE_ARM_TWD if (!ARCH_MSM_SCORPIONMP && !EXYNOS4_MCT) - help - Enable support for local timers on SMP platforms, rather then the - legacy IPI broadcast method. Local timers allows the system - accounting to be spread across the timer interval, preventing a - "thundering herd" at every timer tick. - config ARCH_NR_GPIO int default 1024 if ARCH_SHMOBILE || ARCH_TEGRA diff --git a/arch/arm/include/asm/localtimer.h b/arch/arm/include/asm/localtimer.h deleted file mode 100644 index f77ffc1..000 --- a/arch/arm/include/asm/localtimer.h +++ /dev/null @@ -1,34 +0,0 @@ -/* - * arch/arm/include/asm/localtimer.h - * - * Copyright (C) 2004-2005 ARM Ltd. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - */ -#ifndef __ASM_ARM_LOCALTIMER_H -#define __ASM_ARM_LOCALTIMER_H - -#include - -struct clock_event_device; - -struct local_timer_ops { - int (*setup)(struct clock_event_device *); - void (*stop)(struct clock_event_device *); -}; - -#ifdef CONFIG_LOCAL_TIMERS -/* - * Register a local timer driver - */ -int local_timer_register(struct local_timer_ops *); -#else -static inline int local_timer_register(struct local_timer_ops *ops) -{ - return -ENXIO; -} -#endif - -#endif diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index 2d5197d..f628c79 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -41,7 +41,6 @@ #include #include #include -#include #include #include #include @@ -133,8 +132,6 @@ int __cpuinit boot_secondary(unsigned int cpu, struct task_struct *idle) } #ifdef CONFIG_HOTPLUG_CPU -static void percpu_timer_stop(void); - static int platform_cpu_kill(unsigned int cpu) { if (smp_ops.cpu_kill) @@ -178,11 +175,6 @@ int __cpuinit __cpu_disable(void) migrate_irqs(); /* -* Stop the local timer for this CPU. -*/ - percpu_timer_stop(); - - /* * Flush user cache and TLB mappings, and then remove this CPU * from the vm mask set of all processes. * @@ -269,7 +261,7 @@ static void __cpuinit smp_store_cpu_info(unsigned int cpuid) store_cpu_topology(cpuid); } -static void percpu_timer_setup(void); +static void broadcast_timer_setup(void); /* * This is the secondary CPU boot entry. We're using this CPUs @@ -325,9 +317,9 @@ asmlinkage void __cpuinit secondary_start_kernel(void) complete(&cpu_running); /* -* Setup the percpu timer for this CPU. +* Setup the dummy broadcast timer for this CPU. */ - percpu_timer_setup(); + broadcast_timer_setup(); local_irq_enable(); local_fiq_enable(); @@ -375,10 +367,10 @@ void __init smp_prepare_cpus(unsigned int max_cpus) max_cpus = ncores; if (ncores > 1 && max_cpus) { /* -* Enable the local timer or broadcast device for the +* Enable the dummy broadcast device for the * boot CPU, but only if we have more than one CPU. */ - percpu_timer_setup(); + broadcast_timer_setup(); /* * Initialise the present map, which describes the set of CPUs @@ -473,8 +465,12 @@ static void broadcast_timer_set_mode(enum clock_event_mode mode, { } -static void __cpuinit broadcast_timer_setup(struct clock_event_device *evt) +static void __cpuinit broadcast_timer_setup(void) { + unsigned int cpu = smp_processor_id(); + struct clock_event_device *evt = &per_cpu(percpu_clockevent, cpu); + + evt->cpumask= cpumask_
[PATCH 6/8] clocksource: time-armada-370-xp: Fix sparse warning
drivers/clocksource/time-armada-370-xp.c:217:13: warning: symbol 'armada_370_xp_timer_init' was not declared. Should it be static? Also remove the __init marking in the prototype as it's unnecessary and drop the init.h file. Cc: Gregory CLEMENT Signed-off-by: Stephen Boyd --- drivers/clocksource/time-armada-370-xp.c | 3 ++- include/linux/time-armada-370-xp.h | 4 +--- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/clocksource/time-armada-370-xp.c b/drivers/clocksource/time-armada-370-xp.c index 47a6730..efe4aef 100644 --- a/drivers/clocksource/time-armada-370-xp.c +++ b/drivers/clocksource/time-armada-370-xp.c @@ -27,10 +27,11 @@ #include #include #include +#include +#include #include #include -#include /* * Timer block registers. */ diff --git a/include/linux/time-armada-370-xp.h b/include/linux/time-armada-370-xp.h index dfdfdc0..6fb0856 100644 --- a/include/linux/time-armada-370-xp.h +++ b/include/linux/time-armada-370-xp.h @@ -11,8 +11,6 @@ #ifndef __TIME_ARMADA_370_XPPRCMU_H #define __TIME_ARMADA_370_XPPRCMU_H -#include - -void __init armada_370_xp_timer_init(void); +void armada_370_xp_timer_init(void); #endif -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/8] clocksource: time-armada-370-xp: Divorce from local timer API
Separate the armada 370xp local timers from the local timer API. This will allow us to remove ARM local timer support in the near future and makes this driver multi-architecture friendly. Cc: Gregory CLEMENT Signed-off-by: Stephen Boyd --- drivers/clocksource/time-armada-370-xp.c | 85 ++-- 1 file changed, 38 insertions(+), 47 deletions(-) diff --git a/drivers/clocksource/time-armada-370-xp.c b/drivers/clocksource/time-armada-370-xp.c index efe4aef..ee2e50c5 100644 --- a/drivers/clocksource/time-armada-370-xp.c +++ b/drivers/clocksource/time-armada-370-xp.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -31,7 +32,6 @@ #include #include -#include /* * Timer block registers. */ @@ -70,7 +70,7 @@ static bool timer25Mhz = true; */ static u32 ticks_per_jiffy; -static struct clock_event_device __percpu **percpu_armada_370_xp_evt; +static struct clock_event_device __percpu *armada_370_xp_evt; static u32 notrace armada_370_xp_read_sched_clock(void) { @@ -143,14 +143,7 @@ armada_370_xp_clkevt_mode(enum clock_event_mode mode, } } -static struct clock_event_device armada_370_xp_clkevt = { - .name = "armada_370_xp_per_cpu_tick", - .features = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_PERIODIC, - .shift = 32, - .rating = 300, - .set_next_event = armada_370_xp_clkevt_next_event, - .set_mode = armada_370_xp_clkevt_mode, -}; +static int armada_370_xp_clkevt_irq; static irqreturn_t armada_370_xp_timer_interrupt(int irq, void *dev_id) { @@ -173,42 +166,53 @@ static int __cpuinit armada_370_xp_timer_setup(struct clock_event_device *evt) u32 u; int cpu = smp_processor_id(); - /* Use existing clock_event for cpu 0 */ - if (!smp_processor_id()) - return 0; - u = readl(local_base + TIMER_CTRL_OFF); if (timer25Mhz) writel(u | TIMER0_25MHZ, local_base + TIMER_CTRL_OFF); else writel(u & ~TIMER0_25MHZ, local_base + TIMER_CTRL_OFF); - evt->name = armada_370_xp_clkevt.name; - evt->irq= armada_370_xp_clkevt.irq; - evt->features = armada_370_xp_clkevt.features; - evt->shift = armada_370_xp_clkevt.shift; - evt->rating = armada_370_xp_clkevt.rating, + evt->name = "armada_370_xp_per_cpu_tick", + evt->features = CLOCK_EVT_FEAT_ONESHOT | + CLOCK_EVT_FEAT_PERIODIC; + evt->shift = 32, + evt->rating = 300, evt->set_next_event = armada_370_xp_clkevt_next_event, evt->set_mode = armada_370_xp_clkevt_mode, + evt->irq= armada_370_xp_clkevt_irq; evt->cpumask= cpumask_of(cpu); - *__this_cpu_ptr(percpu_armada_370_xp_evt) = evt; - clockevents_config_and_register(evt, timer_clk, 1, 0xfffe); enable_percpu_irq(evt->irq, 0); return 0; } -static void armada_370_xp_timer_stop(struct clock_event_device *evt) +static void __cpuinit armada_370_xp_timer_stop(struct clock_event_device *evt) { evt->set_mode(CLOCK_EVT_MODE_UNUSED, evt); disable_percpu_irq(evt->irq); } -static struct local_timer_ops armada_370_xp_local_timer_ops __cpuinitdata = { - .setup = armada_370_xp_timer_setup, - .stop = armada_370_xp_timer_stop, +static int __cpuinit armada_370_xp_timer_cpu_notify(struct notifier_block *self, + unsigned long action, void *hcpu) +{ + struct clock_event_device *evt = this_cpu_ptr(armada_370_xp_evt); + + switch (action & ~CPU_TASKS_FROZEN) { + case CPU_STARTING: + armada_370_xp_timer_setup(evt); + break; + case CPU_DYING: + armada_370_xp_timer_stop(evt); + break; + } + + return NOTIFY_OK; +} + +static struct notifier_block armada_370_xp_timer_cpu_nb __cpuinitdata = { + .notifier_call = armada_370_xp_timer_cpu_notify, }; void __init armada_370_xp_timer_init(void) @@ -224,9 +228,6 @@ void __init armada_370_xp_timer_init(void) if (of_find_property(np, "marvell,timer-25Mhz", NULL)) { /* The fixed 25MHz timer is available so let's use it */ - u = readl(local_base + TIMER_CTRL_OFF); - writel(u | TIMER0_25MHZ, - local_base + TIMER_CTRL_OFF); u = readl(timer_base + TIMER_CTRL_OFF); writel(u | TIMER0_25MHZ, timer_base + TIMER_CTRL_OFF); @@ -236,9 +237,6 @@ void __init armada_370_xp_timer_init(void) struct clk *clk = of_clk_get(np, 0); WARN_ON(IS_ERR(clk)); rate = clk_get_rate(clk); - u = readl(local_base +
[PATCH 1/8] ARM: smp: Lower rating of dummy broadcast device
In the near future the dummy broadcast device will always be registered with the clockevent core. If the rating of the dummy is higher than the rating of the real clockevent the clockevents core will try to replace the real clockevent with the dummy broadcast. We don't want this to happen, so lower the rating to something no good clockevent should choose. Cc: Russell King Signed-off-by: Stephen Boyd --- arch/arm/kernel/smp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index fa86d1c..2d5197d 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -479,7 +479,7 @@ static void __cpuinit broadcast_timer_setup(struct clock_event_device *evt) evt->features = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_DUMMY; - evt->rating = 400; + evt->rating = 100; evt->mult = 1; evt->set_mode = broadcast_timer_set_mode; -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/8] Remove ARM local timer API
In light of Mark Rutland's recent work on divorcing the ARM architected timers from the ARM local timer API and introducing a generic arch hook for broadcast it seems that we should remove the local timer API entirely. Doing so will reduce the architecture dependencies of our timer drivers, reduce code in ARM core, and simplify timer drivers because they no longer go through an architecture layer that is essentially a hotplug notifier. Previous attempts have been made[1] unsuccessfully. I'm hoping this can be accepted now so that we can clean up the timer drivers that are used in both UP and SMP situations. Right now these drivers have to ignore the timer setup callback on the boot CPU to avoid registering clockevents twice. This is not very symmetric and causes convuluted code that does the same thing in two places. Patches based on linux-next-20130221. Mostly compile tested as I don't have access to the hardware. [1] http://article.gmane.org/gmane.linux.ports.arm.kernel/145705 Note: A hotplug notifier is used by both x86 for the apb_timer (see apbt_cpuhp_notify) and by metag (see arch_timer_cpu_notify in metag_generic.c) so this is not new. Stephen Boyd (8): ARM: smp: Lower rating of dummy broadcast device ARM: smp_twd: Divorce smp_twd from local timer API ARM: EXYNOS4: Divorce mct from local timer API ARM: PRIMA2: Divorce timer-marco from local timer API ARM: MSM: Divorce msm_timer from local timer API clocksource: time-armada-370-xp: Fix sparse warning clocksource: time-armada-370-xp: Divorce from local timer API ARM: smp: Remove local timer API arch/arm/Kconfig | 12 +-- arch/arm/include/asm/localtimer.h| 34 - arch/arm/kernel/smp.c| 69 +++-- arch/arm/kernel/smp_twd.c| 48 arch/arm/mach-exynos/mct.c | 53 + arch/arm/mach-msm/timer.c| 125 +-- arch/arm/mach-omap2/Kconfig | 1 - arch/arm/mach-omap2/timer.c | 7 -- arch/arm/mach-prima2/timer-marco.c | 98 drivers/clocksource/time-armada-370-xp.c | 88 ++ include/linux/time-armada-370-xp.h | 4 +- 11 files changed, 241 insertions(+), 298 deletions(-) delete mode 100644 arch/arm/include/asm/localtimer.h -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] memcg: Add memory.pressure_level events
On Thu, Feb 21, 2013 at 10:55:52PM -0800, Anton Vorontsov wrote: > On Fri, Feb 22, 2013 at 08:56:08AM +0900, Minchan Kim wrote: > > [...] The my point is that you have a plan to support? Why I have a > > question is that you said your goal is to replace lowmemory killer > > In short: yes, of course, if the non-memcg interface will be in demand. > > > but android don't have enabled CONFIG_MEMCG as you know well > > so they should enable it for using just notifier? or they need another hack > > to > > connect notifier to global thing? > > A hack is not an option for me. :-) My final goal is to switch Android to > use the notifier without need for hacks/external patches or > drivers/staging. > > But my current goal is to make the most generic case work, and do this in > the most correct way. That is, vmpressure + MEMCG. Once I accomplish this, > I can then think of any niche needs (such as Android). > > There will be two possibilities for Android: > > 1. Obviously, turn on CONFIG_MEMCG. We need to measure its effect on real >devices, and see if it makes sense. (Plus, maybe there are other uses >for MEMCG on Android?) I'd like to see this one. > > or > > 2. Implement /sys/fs/cgroups/memory/memory.pressure_level interface >without MEMCG. Doing this will be really easy as we'll already have >vmpressure() core, and Android has CROUPS=y. But I do expect some >discussion like 'why don't you fix memcg instead?'. We'll have to >answer this question by looking back at '1.' Of course. > > Also note that cgroups vmpressure notifiers were tried by QEMU folks, and > it seemed to be useful: > >http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg02821.html I saw that. > > So, nowadays it is not only about Android. Some time ago I also got an > email from Orna Agmon Ben-Yehuda, who suggested to use vmpressure stuff > with 'memcached' (but I didn't find time to actually try it, so far. :( > Thanks for the email, btw!). I also got received email from another people in embedded side about memory notifier which I worked long time ago and I used to introduce your work to them instead of my old solution. It seems they don't use Android and has very small RAM so they want to handle memory very efficiently. For such purpose, I hope vmpressure become tiny and can support even NOMMU system. > > So it is useful with or without MEMCG, and if we will really need to > support vmpressure without MEMCG, I will have to implement the support in > addition to MEMCG case, yes. Thanks for your clarification. > > Thanks, > > Anton > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org";> em...@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dangerous devm_request_irq() conversions
On Fri, Feb 22, 2013 at 04:12:36PM +0900, Jingoo Han wrote: > On Friday, February 22, 2013 3:54 PM, Dmitry Torokhov wrote: > > > > Hi, > > > > It looks like a whole slew of devm_request_irq() conversions just got > > applied to mainline and many of them are quite broken. > > > > Consider fd5231ce336e038037b4f0190a6838bdd6e17c6d or > > c1879fe80c61f3be6f2ddb82509c2e7f92a484fe: the drivers udsed first to > > free IRQ and then unregister the corresponding device ensuring that IRQ > > handler, while it runs, has the device available. The mechanic > > conversion to devm_request_irq() reverses the order of these operations > > opening the race window where IRQ can reference device (or other > > resource) that is already gone. > > > > It would be nice if these could be reverted and revioewed again for > > correctness. > > Um, other RTC drivers already have been using devm_request_threaded_irq() or > devm_request_irq() like this, before I added these patches. > > For example, > ./drivers/rtc/rtc-tegra.c > ./drivers/rtc/rtc-spear.c > ./drivers/rtc/rtc-s3c.c > ./drivers/rtc/rtc-mxc.c > ./drivers/rtc/rtc-ds1553.c > ./drivers/rtc/rtc-ds1511.c > ./drivers/rtc/rtc-snvs.c > ./drivers/rtc/rtc-imxdi.c > ./drivers/rtc/rtc-tx4939.c > ./drivers/rtc/rtc-mv.c > ./drivers/rtc/rtc-coh901331.c > ./drivers/rtc/rtc-stk17ta8.c > ./drivers/rtc/rtc-lpc32xx.c > ./drivers/rtc/rtc-tps65910.c > ./drivers/rtc/rtc-rc5t583.c > > > Also, even more, some RTC drivers calls rtc_device_unregister() first, > then calls free_irq() later. > > For example, > ./drivers/rtc/rtc-vr41xx.c > ./drivers/rtc/rtc-da9052.c > ./drivers/rtc/rtc-isl1208.c > ./drivers/rtc/rtc-88pm860x.c > ./drivers/rtc/rtc-tps6586x.c > ./drivers/rtc/rtc-mpc5121.c > ./drivers/rtc/rtc-m48t59.c > > > Please, don't argue revert without concrete reasons. What more concrete reason do you need? I explained to you the exact reason on the patches I noticed before and also on the 2 commits referenced above: blind conversion to devm_* changes order of operation which may be deadly with IRQs (but others, like clocks and regulators, are important too). The fact that crap slipped in the kernel before is not the valid reason for adding more of the same crap. Please *understand* APIs you are using before making changes. > > If these devm_request_threaded_irq() or devm_request_irq() make the problem, > devm_free_irq() will be added later. And the point? If you use devm_request_irq() and then call devm_free_irq() manually in all paths what you achieved is waste of memory required for devm_* tracking. -- Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] cpustat: use accessor functions for get/set/add
On Fri, Feb 22, 2013 at 11:51 AM, Viresh Kumar wrote: > On Fri, Feb 22, 2013 at 11:26 AM, Kevin Hilman wrote: >> Add some accessor functions in order to facilitate the conversion to >> atomic reads/writes of cpustat values. >> >> Signed-off-by: Kevin Hilman >> --- >> drivers/cpufreq/cpufreq_governor.c | 18 - >> drivers/cpufreq/cpufreq_ondemand.c | 2 +- > >> diff --git a/drivers/cpufreq/cpufreq_governor.c >> b/drivers/cpufreq/cpufreq_governor.c >> index 6c5f1d3..ec6c315 100644 >> --- a/drivers/cpufreq/cpufreq_governor.c >> +++ b/drivers/cpufreq/cpufreq_governor.c >> @@ -36,12 +36,12 @@ static inline u64 get_cpu_idle_time_jiffy(unsigned int >> cpu, u64 *wall) >> >> cur_wall_time = jiffies64_to_cputime64(get_jiffies_64()); >> >> - busy_time = kcpustat_cpu(cpu).cpustat[CPUTIME_USER]; >> - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SYSTEM]; >> - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_IRQ]; >> - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SOFTIRQ]; >> - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_STEAL]; >> - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_NICE]; >> + busy_time = kcpustat_cpu_get(cpu, CPUTIME_USER); >> + busy_time += kcpustat_cpu_get(cpu, CPUTIME_SYSTEM); >> + busy_time += kcpustat_cpu_get(cpu, CPUTIME_IRQ); >> + busy_time += kcpustat_cpu_get(cpu, CPUTIME_SOFTIRQ); >> + busy_time += kcpustat_cpu_get(cpu, CPUTIME_STEAL); >> + busy_time += kcpustat_cpu_get(cpu, CPUTIME_NICE); >> >> idle_time = cur_wall_time - busy_time; >> if (wall) >> @@ -103,7 +103,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) >> u64 cur_nice; >> unsigned long cur_nice_jiffies; >> >> - cur_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE] - >> + cur_nice = kcpustat_cpu_get(j, CPUTIME_NICE) - >> cdbs->prev_cpu_nice; >> /* >> * Assumption: nice time between sampling periods >> will >> @@ -113,7 +113,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) >> cputime64_to_jiffies64(cur_nice); >> >> cdbs->prev_cpu_nice = >> - kcpustat_cpu(j).cpustat[CPUTIME_NICE]; >> + kcpustat_cpu_get(j, CPUTIME_NICE); >> idle_time += jiffies_to_usecs(cur_nice_jiffies); >> } >> >> @@ -216,7 +216,7 @@ int cpufreq_governor_dbs(struct dbs_data *dbs_data, >> &j_cdbs->prev_cpu_wall); >> if (ignore_nice) >> j_cdbs->prev_cpu_nice = >> - >> kcpustat_cpu(j).cpustat[CPUTIME_NICE]; >> + kcpustat_cpu_get(j, CPUTIME_NICE); >> } >> >> /* >> diff --git a/drivers/cpufreq/cpufreq_ondemand.c >> b/drivers/cpufreq/cpufreq_ondemand.c >> index 7731f7c..ac5d49f 100644 >> --- a/drivers/cpufreq/cpufreq_ondemand.c >> +++ b/drivers/cpufreq/cpufreq_ondemand.c >> @@ -403,7 +403,7 @@ static ssize_t store_ignore_nice_load(struct kobject *a, >> struct attribute *b, >> >> &dbs_info->cdbs.prev_cpu_wall); >> if (od_tuners.ignore_nice) >> dbs_info->cdbs.prev_cpu_nice = >> - kcpustat_cpu(j).cpustat[CPUTIME_NICE]; >> + kcpustat_cpu_get(j, CPUTIME_NICE); >> >> } >> return count; > > For cpufreq: > > Acked-by: Viresh Kumar > > Though i believe you also need this: > > diff --git a/drivers/cpufreq/cpufreq_conservative.c > b/drivers/cpufreq/cpufreq_conservative.c > index 64ef737..38e3ad7 100644 > --- a/drivers/cpufreq/cpufreq_conservative.c > +++ b/drivers/cpufreq/cpufreq_conservative.c > @@ -242,7 +242,7 @@ static ssize_t store_ignore_nice_load(struct > kobject *a, struct attribute *b, > > &dbs_info->cdbs.prev_cpu_wall); > if (cs_tuners.ignore_nice) > dbs_info->cdbs.prev_cpu_nice = > - kcpustat_cpu(j).cpustat[CPUTIME_NICE]; > + kcpustat_cpu_get(j, CPUTIME_NICE); > } > return count; > } > > BTW, i don't see kcpustat_cpu() used in > > kernel/sched/core.c| 12 +--- > kernel/sched/cputime.c | 29 +-- > > I searched tip/master as well as lnext/master. Added by Frederic's Adaptive NOHZ patchset? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://ww
Re: [PATCH v4] mfd: syscon: Add non-DT support
On Fri, Feb 22, 2013 at 11:01:18AM +0400, Alexander Shiyan wrote: > > On Thu, Feb 21, 2013 at 07:29:02PM +0400, Alexander Shiyan wrote: > > > This patch allow using syscon driver from the platform data, i.e. > > > possibility using driver on systems without oftree support. > > > For search syscon device from the client drivers, > > > "syscon_regmap_lookup_by_pdevname" function was added. > > > > > > Signed-off-by: Alexander Shiyan > > > > [...] > > > > > + syscon->base = devm_ioremap_resource(dev, res); > > > + if (!syscon->base) > > > > Is this correct? > > Hmm, of course IS_ERR should be used here... > v5? > Yes. >From here: https://lkml.org/lkml/2013/1/21/140 It seems it is. > > > > > + return -EBUSY; Both this line could also be changed. > > > > > > > Otherwise, i'm also ok with this patch. > > Acked-by: Dong Aisheng > > > > BTW, i did not see Samuel's tree having this new API. > > So, who will pick this patch? > > I have same question. I CCed Thierry and Greg who may know it. Regards Dong Aisheng -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/7] ksm: treat unstable nid like in stable tree
On 02/21/2013 04:20 PM, Hugh Dickins wrote: An inconsistency emerged in reviewing the NUMA node changes to KSM: when meeting a page from the wrong NUMA node in a stable tree, we say that it's okay for comparisons, but not as a leaf for merging; whereas when meeting a page from the wrong NUMA node in an unstable tree, we bail out immediately. IIUC - ksm page from the wrong NUMA node will be add to current node's stable tree - normal page from the wrong NUMA node will be merged to current node's stable tree <- where I miss here? I didn't see any special handling in function stable_tree_search for this case. - normal page from the wrong NUMA node will compare but not as a leaf for merging after the patch Now, it might be that a wrong NUMA node in an unstable tree is more likely to correlate with instablility (different content, with rbnode now misplaced) than page migration; but even so, we are accustomed to instablility in the unstable tree. Without strong evidence for which strategy is generally better, I'd rather be consistent with what's done in the stable tree: accept a page from the wrong NUMA node for comparison, but not as a leaf for merging. Signed-off-by: Hugh Dickins --- mm/ksm.c | 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) --- mmotm.orig/mm/ksm.c 2013-02-20 22:28:23.584001392 -0800 +++ mmotm/mm/ksm.c 2013-02-20 22:28:27.288001480 -0800 @@ -1340,16 +1340,6 @@ struct rmap_item *unstable_tree_search_i return NULL; } - /* -* If tree_page has been migrated to another NUMA node, it -* will be flushed out and put into the right unstable tree -* next time: only merge with it if merge_across_nodes. -*/ - if (!ksm_merge_across_nodes && page_to_nid(tree_page) != nid) { - put_page(tree_page); - return NULL; - } - ret = memcmp_pages(page, tree_page); parent = *new; @@ -1359,6 +1349,15 @@ struct rmap_item *unstable_tree_search_i } else if (ret > 0) { put_page(tree_page); new = &parent->rb_right; + } else if (!ksm_merge_across_nodes && + page_to_nid(tree_page) != nid) { + /* +* If tree_page has been migrated to another NUMA node, +* it will be flushed out and put in the right unstable +* tree next time: only merge with it when across_nodes. +*/ + put_page(tree_page); + return NULL; } else { *tree_pagep = tree_page; return tree_rmap_item; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: mailto:"d...@kvack.org";> em...@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dangerous devm_request_irq() conversions
On Friday, February 22, 2013 3:54 PM, Dmitry Torokhov wrote: > > Hi, > > It looks like a whole slew of devm_request_irq() conversions just got > applied to mainline and many of them are quite broken. > > Consider fd5231ce336e038037b4f0190a6838bdd6e17c6d or > c1879fe80c61f3be6f2ddb82509c2e7f92a484fe: the drivers udsed first to > free IRQ and then unregister the corresponding device ensuring that IRQ > handler, while it runs, has the device available. The mechanic > conversion to devm_request_irq() reverses the order of these operations > opening the race window where IRQ can reference device (or other > resource) that is already gone. > > It would be nice if these could be reverted and revioewed again for > correctness. Um, other RTC drivers already have been using devm_request_threaded_irq() or devm_request_irq() like this, before I added these patches. For example, ./drivers/rtc/rtc-tegra.c ./drivers/rtc/rtc-spear.c ./drivers/rtc/rtc-s3c.c ./drivers/rtc/rtc-mxc.c ./drivers/rtc/rtc-ds1553.c ./drivers/rtc/rtc-ds1511.c ./drivers/rtc/rtc-snvs.c ./drivers/rtc/rtc-imxdi.c ./drivers/rtc/rtc-tx4939.c ./drivers/rtc/rtc-mv.c ./drivers/rtc/rtc-coh901331.c ./drivers/rtc/rtc-stk17ta8.c ./drivers/rtc/rtc-lpc32xx.c ./drivers/rtc/rtc-tps65910.c ./drivers/rtc/rtc-rc5t583.c Also, even more, some RTC drivers calls rtc_device_unregister() first, then calls free_irq() later. For example, ./drivers/rtc/rtc-vr41xx.c ./drivers/rtc/rtc-da9052.c ./drivers/rtc/rtc-isl1208.c ./drivers/rtc/rtc-88pm860x.c ./drivers/rtc/rtc-tps6586x.c ./drivers/rtc/rtc-mpc5121.c ./drivers/rtc/rtc-m48t59.c Please, don't argue revert without concrete reasons. If these devm_request_threaded_irq() or devm_request_irq() make the problem, devm_free_irq() will be added later. > > In general any conversion to devm_request_irq() needs double and triple > checking. > > Thanks. > > -- > Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 4/5] UBIFS: Add security.* XATTR support for the UBIFS
OK, the lockdep warnings clearly tell the reason: CPU0CPU1 lock(&ui->ui_mutex); lock(&sb->s_type->i_mutex_key#10); lock(&ui->ui_mutex); lock(&sb->s_type->i_mutex_key#10); And then there are 2 tracebacks which are useful and show that you unnecessarily initialize the inode security contenxt whil holding the parent inode lock. I think you do not need to hold that lock. Move the initialization out of the protected section. See below my suggestions. On Wed, 2013-02-13 at 11:23 +0100, Marc Kleine-Budde wrote: > @@ -280,6 +280,10 @@ static int ubifs_create(struct inode *dir, struct dentry > *dentry, umode_t mode, > err = ubifs_jnl_update(c, dir, &dentry->d_name, inode, 0, 0); > if (err) > goto out_cancel; > + > + err = ubifs_init_security(dir, inode, &dentry->d_name); > + if (err) > + goto out_cancel; > mutex_unlock(&dir_ui->ui_mutex); Can you move ubifs_init_security() up to before 'mutex_lock(&dir_ui->ui_mutex)' > @@ -742,6 +746,10 @@ static int ubifs_mkdir(struct inode *dir, struct dentry > *dentry, umode_t mode) ... > + err = ubifs_init_security(dir, inode, &dentry->d_name); > + if (err) > + goto out_cancel; > mutex_unlock(&dir_ui->ui_mutex); Ditto. > @@ -818,6 +826,10 @@ static int ubifs_mknod(struct inode *dir, struct dentry > *dentry, ... > + err = ubifs_init_security(dir, inode, &dentry->d_name); > + if (err) > + goto out_cancel; > mutex_unlock(&dir_ui->ui_mutex); Ditto. > @@ -894,6 +906,10 @@ static int ubifs_symlink(struct inode *dir, struct > dentry *dentry, ... > + err = ubifs_init_security(dir, inode, &dentry->d_name); > + if (err) > + goto out_cancel; > mutex_unlock(&dir_ui->ui_mutex); Ditto. > +int ubifs_init_security(struct inode *dentry, struct inode *inode, > + const struct qstr *qstr) > +{ > + int err; > + > + mutex_lock(&inode->i_mutex); > + err = security_inode_init_security(inode, dentry, qstr, > +&ubifs_initxattrs, 0); > + mutex_unlock(&inode->i_mutex); I did not verify, but I doubt that you need i_mutex here, because you only call this function when you create an inode, before it becomes visible to VFS. Please, double-check this. Thanks! -- Best Regards, Artem Bityutskiy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] fusb300_udc: modify stall clear and idma reset procedure
From: Yuan-Hsin Chen Due to fusb300 controller modification, stall clear procedure should be modified consistantly. This patch also fixes software bugs: only enter IDMA_RESET when the condition matched and disable corresponding PRD interrupt in IDMA_RESET. Signed-off-by: Yuan-Hsin Chen --- drivers/usb/gadget/fusb300_udc.c |9 ++--- drivers/usb/gadget/fusb300_udc.h |2 +- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/usb/gadget/fusb300_udc.c b/drivers/usb/gadget/fusb300_udc.c index 72cd5e6..109cab1 100644 --- a/drivers/usb/gadget/fusb300_udc.c +++ b/drivers/usb/gadget/fusb300_udc.c @@ -394,7 +394,7 @@ static void fusb300_clear_epnstall(struct fusb300 *fusb300, u8 ep) if (reg & FUSB300_EPSET0_STL) { printk(KERN_DEBUG "EP%d stall... Clear!!\n", ep); - reg &= ~FUSB300_EPSET0_STL; + reg |= FUSB300_EPSET0_STL_CLR; iowrite32(reg, fusb300->reg + FUSB300_OFFSET_EPSET0(ep)); } } @@ -930,9 +930,12 @@ static void fusb300_wait_idma_finished(struct fusb300_ep *ep) fusb300_clear_int(ep->fusb300, FUSB300_OFFSET_IGR0, FUSB300_IGR0_EPn_PRD_INT(ep->epnum)); + return; + IDMA_RESET: - fusb300_clear_int(ep->fusb300, FUSB300_OFFSET_IGER0, - FUSB300_IGER0_EEPn_PRD_INT(ep->epnum)); + reg = ioread32(ep->fusb300->reg + FUSB300_OFFSET_IGER0); + reg &= ~FUSB300_IGER0_EEPn_PRD_INT(ep->epnum); + iowrite32(reg, ep->fusb300->reg + FUSB300_OFFSET_IGER0); } static void fusb300_set_idma(struct fusb300_ep *ep, diff --git a/drivers/usb/gadget/fusb300_udc.h b/drivers/usb/gadget/fusb300_udc.h index 542cd83..ccae1b5 100644 --- a/drivers/usb/gadget/fusb300_udc.h +++ b/drivers/usb/gadget/fusb300_udc.h @@ -111,8 +111,8 @@ /* * * EPn Setting 0 (EPn_SET0, offset = 020H+(n-1)*30H, n=1~15 ) * */ +#define FUSB300_EPSET0_STL_CLR (1 << 3) #define FUSB300_EPSET0_CLRSEQNUM (1 << 2) -#define FUSB300_EPSET0_EPn_TX0BYTE (1 << 1) #define FUSB300_EPSET0_STL (1 << 0) /* -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re[2]: [PATCH v4] mfd: syscon: Add non-DT support
> On Thu, Feb 21, 2013 at 07:29:02PM +0400, Alexander Shiyan wrote: > > This patch allow using syscon driver from the platform data, i.e. > > possibility using driver on systems without oftree support. > > For search syscon device from the client drivers, > > "syscon_regmap_lookup_by_pdevname" function was added. > > > > Signed-off-by: Alexander Shiyan > > [...] > > > + syscon->base = devm_ioremap_resource(dev, res); > > + if (!syscon->base) > > Is this correct? Hmm, of course IS_ERR should be used here... v5? > > > + return -EBUSY; > > > > Otherwise, i'm also ok with this patch. > Acked-by: Dong Aisheng > > BTW, i did not see Samuel's tree having this new API. > So, who will pick this patch? I have same question. ---
Re: [PATCH v2] memcg: Add memory.pressure_level events
On Fri, Feb 22, 2013 at 08:56:08AM +0900, Minchan Kim wrote: > [...] The my point is that you have a plan to support? Why I have a > question is that you said your goal is to replace lowmemory killer In short: yes, of course, if the non-memcg interface will be in demand. > but android don't have enabled CONFIG_MEMCG as you know well > so they should enable it for using just notifier? or they need another hack to > connect notifier to global thing? A hack is not an option for me. :-) My final goal is to switch Android to use the notifier without need for hacks/external patches or drivers/staging. But my current goal is to make the most generic case work, and do this in the most correct way. That is, vmpressure + MEMCG. Once I accomplish this, I can then think of any niche needs (such as Android). There will be two possibilities for Android: 1. Obviously, turn on CONFIG_MEMCG. We need to measure its effect on real devices, and see if it makes sense. (Plus, maybe there are other uses for MEMCG on Android?) or 2. Implement /sys/fs/cgroups/memory/memory.pressure_level interface without MEMCG. Doing this will be really easy as we'll already have vmpressure() core, and Android has CROUPS=y. But I do expect some discussion like 'why don't you fix memcg instead?'. We'll have to answer this question by looking back at '1.' Also note that cgroups vmpressure notifiers were tried by QEMU folks, and it seemed to be useful: http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg02821.html So, nowadays it is not only about Android. Some time ago I also got an email from Orna Agmon Ben-Yehuda, who suggested to use vmpressure stuff with 'memcached' (but I didn't find time to actually try it, so far. :( Thanks for the email, btw!). So it is useful with or without MEMCG, and if we will really need to support vmpressure without MEMCG, I will have to implement the support in addition to MEMCG case, yes. Thanks, Anton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ARM: EXYNOS: Keep USB related LDOs always active on Origen
LDO3 and LDO8 are used for powering both device and host phy controllers. These regulators are not handled in USB host driver. Hence we get unexpected behaviour when the regulators are disabled elsewhere. It would be best to keep these regulators always on. Signed-off-by: Tushar Behera --- Based on v3.8. arch/arm/mach-exynos/mach-origen.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/arm/mach-exynos/mach-origen.c b/arch/arm/mach-exynos/mach-origen.c index 5e34b9c..7351063 100644 --- a/arch/arm/mach-exynos/mach-origen.c +++ b/arch/arm/mach-exynos/mach-origen.c @@ -169,6 +169,7 @@ static struct regulator_init_data __initdata max8997_ldo3_data = { .min_uV = 110, .max_uV = 110, .apply_uV = 1, + .always_on = 1, .valid_ops_mask = REGULATOR_CHANGE_STATUS, .state_mem = { .disabled = 1, @@ -227,6 +228,7 @@ static struct regulator_init_data __initdata max8997_ldo8_data = { .min_uV = 330, .max_uV = 330, .apply_uV = 1, + .always_on = 1, .valid_ops_mask = REGULATOR_CHANGE_STATUS, .state_mem = { .disabled = 1, -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] arch/arc for v3.9-rc1
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Linus, I would like to introduce the Linux port to ARC Processors (from Synopsys) for 3.9-rc1. The patch-set has been discussed on the public lists since Nov and has received a fair bit of review, specially from Arnd, tglx, Al and other subsystem maintainers for DeviceTree, kgdb . The arch bits are in arch/arc, some asm-generic changes (acked by Arnd), a minor change to PARISC (acked by Helge). The series is a touch bigger for a new port for 2 main reasons: 1. It enables a basic kernel in first sub-series and adds ptrace/kgdb/.. later 2. Some of the fallout of review (DeviceTree support, multi-platform-image support) were added on top of orig series, primarily to record the revision history. Please consider pulling. Thanks, Vineet The following changes since commit 949db153b6466c6f7cad5a427ecea94985927311: Linux 3.8-rc5 (2013-01-25 11:57:28 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc.git tags/arc-v3.9-rc1 for you to fetch changes up to fc32781bfdb56dad883469b65e468e749ef35fe5: ARC: [plat-arcfpga] DT arc-uart bindings change: "baud" => "current-speed" (2013-02-15 23:16:22 +0530) - Introducing Linux port to Synopsys ARC Processors (for 3.9-rc1) This patchset contains architecture specific bits (arch/arc) to enable Linux on ARC700 Processor and some minor adjustments to generic code (reviewed/acked). - Gilad Ben-Yossef (1): ARC: Add support for ioremap_prot API Mischa Jonker (1): ARC: kgdb support Vineet Gupta (75): ARC: Generic Headers ARC: Build system: Makefiles, Kconfig, Linker script ARC: irqflags - Interrupt enabling/disabling at in-core intc ARC: Atomic/bitops/cmpxchg/barriers asm-generic headers: uaccess.h to conditionally define segment_eq() ARC: uaccess friends asm-generic: uaccess: Allow arches to over-ride __{get,put}_user_fn() ARC: [optim] uaccess __{get,put}_user() optimised asm-generic headers: Allow yet more arch overrides in checksum.h ARC: Checksum/byteorder/swab routines ARC: Fundamental ARCH data-types/defines ARC: Spinlock/rwlock/mutex primitives ARC: String library ARC: Low level IRQ/Trap/Exception Handling ARC: Interrupt Handling ARC: Non-MMU Exception Handling ARC: Syscall support (no-legacy-syscall ABI) ARC: Process-creation/scheduling/idle-loop ARC: Timers/counters/delay management ARC: Signal handling ARC: [Review] Preparing to fix incorrect syscall restarts due to signals ARC: [Review] Prevent incorrect syscall restarts ARC: Cache Flush Management ARC: Page Table Management ARC: MMU Context Management ARC: MMU Exception Handling ARC: TLB flush Handling ARC: Page Fault handling ARC: I/O and DMA Mappings ARC: Boot #1: low-level, setup_arch(), /proc/cpuinfo, mem init ARC: [plat-arcfpga] Static platform device for CONFIG_SERIAL_ARC ARC: [DeviceTree] Basic support ARC: [DeviceTree] Convert some Kconfig items to runtime values ARC: [plat-arcfpga]: Enabling DeviceTree for Angel4 board ARC: Last bits (stubs) to get to a running kernel with UART ARC: [plat-arcfpga] defconfig ARC: [optim] Cache "current" in Register r25 ARC: ptrace support ARC: Futex support ARC: OProfile support ARC: Support for high priority interrupts in the in-core intc ARC: Module support ARC: Diagnostics: show_regs() etc ARC: SMP support ARC: DWARF2 .debug_frame based stack unwinder ARC: stacktracing APIs based on dw2 unwinder ARC: disassembly (needed by kprobes/kgdb/unaligned-access-emul) ARC: kprobes support sysctl: Enable PARISC "unaligned-trap" to be used cross-arch ARC: Unaligned access emulation ARC: Boot #2: Verbose Boot reporting / feature verification ARC: [plat-arfpga] BVCI Latency Unit setup perf, ARC: Enable building perf tools for ARC ARC: perf support (software counters only) ARC: Support for single cycle Close Coupled Mem (CCM) ARC: Hostlink Pseudo-Driver for Metaware Debugger ARC: UAPI Disintegrate arch/arc/include/asm ARC: [Review] Multi-platform image #1: Kconfig enablement ARC: Fold boards sub-menu into platform/SoC menu ARC: [Review] Multi-platform image #2: Board callback Infrastructure ARC: [Review] Multi-platform image #3: switch to board callback ARC: [Review] Multi-platform image #4: Isolate platform headers ARC: [Review] Multi-platform image #5: NR_IRQS defined by ARC core ARC: [Review] Multi-platform image #6: cpu-to-dma-addr optional ARC: [Review] Multi-platform image #7: SMP common code to use callbacks ARC: [Review] Multi-plat
Re: [PATCH v4] mfd: syscon: Add non-DT support
On Thu, Feb 21, 2013 at 07:29:02PM +0400, Alexander Shiyan wrote: > This patch allow using syscon driver from the platform data, i.e. > possibility using driver on systems without oftree support. > For search syscon device from the client drivers, > "syscon_regmap_lookup_by_pdevname" function was added. > > Signed-off-by: Alexander Shiyan [...] > + syscon->base = devm_ioremap_resource(dev, res); > + if (!syscon->base) Is this correct? > + return -EBUSY; > Otherwise, i'm also ok with this patch. Acked-by: Dong Aisheng BTW, i did not see Samuel's tree having this new API. So, who will pick this patch? Regards Dong Aisheng -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Origen board hang with functionfs
On 02/22/2013 12:03 AM, John Stultz wrote: > On 02/20/2013 06:01 PM, John Stultz wrote: >> Hey Kukjin, Andrzej, >> I recently started playing around with functionfs, and have >> noticed some strange behavior with my origen board. >> >> If I enable the FunctionFS gadget driver, I see the board hang at boot >> here: >> >> [2.36] USB Mass Storage support registered. >> [2.365000] s3c-hsotg s3c-hsotg: regs f004, irq 103 >> [2.375000] s3c-hsotg s3c-hsotg: EPs:15 >> [2.38] s3c-hsotg s3c-hsotg: dedicated fifos >> [2.385000] g_ffs: file system registered I think the issue is because of the USB phy regulators. LDO3 and LDO8 power the phy regulators for OTG and HOST. These regulators are disabled in OTG probe whereas they are not handled at all in HOST driver. Keeping these LDOs always active should solve the problem for the time being. I will follow-up with a patch shortly. But I am not sure if this patch will be considered for mainline as board patches are not getting accepted these days. -- Tushar Behera -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Dangerous devm_request_irq() conversions
Hi, It looks like a whole slew of devm_request_irq() conversions just got applied to mainline and many of them are quite broken. Consider fd5231ce336e038037b4f0190a6838bdd6e17c6d or c1879fe80c61f3be6f2ddb82509c2e7f92a484fe: the drivers udsed first to free IRQ and then unregister the corresponding device ensuring that IRQ handler, while it runs, has the device available. The mechanic conversion to devm_request_irq() reverses the order of these operations opening the race window where IRQ can reference device (or other resource) that is already gone. It would be nice if these could be reverted and revioewed again for correctness. In general any conversion to devm_request_irq() needs double and triple checking. Thanks. -- Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PATCH] USB patches for 3.9-rc1
On Thu, Feb 21, 2013 at 01:58:39PM -0800, Greg KH wrote: > On Thu, Feb 21, 2013 at 12:25:24PM -0800, Linus Torvalds wrote: > > On Thu, Feb 21, 2013 at 10:40 AM, Greg KH > > wrote: > > > > > > USB patches for 3.9-rc1 > > > > > > Here's the big USB merge for 3.9-rc1 > > > > > > Nothing major, lots of gadget fixes, and of course, xhci stuff. > > > > Ok, so there were a couple of conflicts with Thierry Reding's series > > to convert devm_request_and_ioremap() users into > > devm_ioremap_resource(), where some of the old users had been > > converted to use other helper functions (eg omap_get_control_dev()). > > That's fine. > > > I left the omap_get_control_dev() users alone, but I do want to note > > that omap_control_usb_probe() itself now uses that > > devm_request_and_ioremap() function. And I did *not* extend the merge > > to do that kind of conversion in the helper function, so I'm assuming > > Thierry might want to extend his work. Assuming people care enough.. > > Yes, his plan was to do another sweep of the calls and hopefully remove > the old api in 3.10 or so once that is all cleaned up. Given that even devm_request_and_ioremap() is rather new and people have been busy sending patches to use it I had expected that the initial series wouldn't catch all uses once it had been merged. grepping is easy and I even have a semantic patch to help with the conversion so I'll keep an eye out for any new occurrences. Thierry pgpcgAaqLKisO.pgp Description: PGP signature
Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()
On 02/22/2013 01:02 PM, Mike Galbraith wrote: > On Fri, 2013-02-22 at 10:36 +0800, Michael Wang wrote: >> On 02/21/2013 05:43 PM, Mike Galbraith wrote: >>> On Thu, 2013-02-21 at 17:08 +0800, Michael Wang wrote: >>> But is this patch set really cause regression on your Q6600? It may sacrificed some thing, but I still think it will benefit far more, especially on huge systems. >>> >>> We spread on FORK/EXEC, and will no longer will pull communicating tasks >>> back to a shared cache with the new logic preferring to leave wakee >>> remote, so while no, I haven't tested (will try to find round tuit) it >>> seems it _must_ hurt. Dragging data from one llc to the other on Q6600 >>> hurts a LOT. Every time a client and server are cross llc, it's a huge >>> hit. The previous logic pulled communicating tasks together right when >>> it matters the most, intermittent load... or interactive use. >> >> I agree that this is a problem need to be solved, but don't agree that >> wake_affine() is the solution. > > It's not perfect, but it's better than no countering force at all. It's > a relic of the dark ages, when affine meant L2, ie this cpu. Now days, > affine has a whole new meaning, L3, so it could be done differently, but > _some_ kind of opposing force is required. > >> According to my understanding, in the old world, wake_affine() will only >> be used if curr_cpu and prev_cpu share cache, which means they are in >> one package, whatever search in llc sd of curr_cpu or prev_cpu, we won't >> have the chance to spread the task out of that package. > > ? affine_sd is the first domain spanning both cpus, that may be NODE. > True we won't ever spread in the wakeup path unless SD_WAKE_BALANCE is > set that is. Would be nice to be able to do that without shredding > performance. That's right, we need two conditions in each select instance: 1. prev_cpu and curr_cpu are not affine 2. SD_WAKE_BALANCE > > Off the top of my pointy head, I can think of a way to _maybe_ improve > the "affine" wakeup criteria: Add a small (package size? and very fast) > FIFO queue to task struct, record waker/wakee relationship. If > relationship exists in that queue (rbtree), try to wake local, if not, > wake remote. The thought is to identify situations ala 1:N pgbench > where you really need to keep the load spread. That need arises when > the sum wakees + waker won't fit in one cache. True buddies would > always hit (hm, hit rate), always try to become affine where they > thrive. 1:N stuff starts missing when client count exceeds package > size, starts expanding it's horizons. 'Course you would still need to > NAK if imbalanced too badly, and let NUMA stuff NAK touching lard-balls > and whatnot. With a little more smarts, we could have happy 1:N, and > buddies don't have to chat through 2m thick walls to make 1:N scale as > well as it can before it dies of stupidity. So this is trying to take care the condition when curr_cpu(local) and prev_cpu(remote) are on different nodes, which in the old world, wake_affine() won't be invoked, correct? Hmm...I think this maybe a good additional checking before enter balance path, but I could not estimate the cost to record the relationship at this moment of time... Whatever, after applied the affine logical into new world, it will gain the ability to spread tasks cross nodes just like the old world, your idea may be an optimization, but the logical is out of the changing in this patch set, which means if it benefits, the beneficiary will be not only new but also old. Regards, Michael Wang > > -Mike > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sched: Fix signedness bug in yield_to()
On Fri, Feb 22, 2013 at 4:56 AM, Marcelo Tosatti wrote: > On Thu, Feb 21, 2013 at 09:56:54AM +0100, Ingo Molnar wrote: >> >> * Shuah Khan wrote: >> >> > On Tue, Feb 19, 2013 at 7:27 PM, Linux Kernel Mailing List >> > wrote: >> > > Gitweb: >> > > http://git.kernel.org/linus/;a=commit;h=c3c186403c6abd32e719f005f0af950155a9e54d >> > > Commit: c3c186403c6abd32e719f005f0af950155a9e54d >> > > Parent: e0a79f529d5ba2507486d498b25da40911d95cf6 >> > > Author: Dan Carpenter >> > > AuthorDate: Tue Feb 5 14:37:51 2013 +0300 >> > > Committer: Ingo Molnar >> > > CommitDate: Tue Feb 5 12:59:29 2013 +0100 >> > > >> > > sched: Fix signedness bug in yield_to() >> > > >> > > In 7b270f6099 "sched: Bail out of yield_to when source and >> > > target runqueue has one task" we changed this to store -ESRCH so >> > > it needs to be signed. >> > >> > Dan, Ingo, >> > >> > I can't find the 7b270f6099 "sched: Bail out of yield_to when >> > source and target runqueue has one task" in the latest Linus's >> > git. Am I missing something. >> > >> > The current kenel/sched/core.c doesn't have the code from the >> > associated patch https://patchwork.kernel.org/patch/2016651/ >> >> As per the lkml discussion that one was supposed to go upstream >> via the KVM tree. Marcelo? > > commit c3c186403c6abd32e719f005f0af950155a9e54d > Author: Dan Carpenter > Date: Tue Feb 5 14:37:51 2013 +0300 > > sched: Fix signedness bug in yield_to() > > In 7b270f6099 "sched: Bail out of yield_to when source and > target runqueue has one task" we changed this to store -ESRCH so > it needs to be signed. > > Signed-off-by: Dan Carpenter > Cc: Peter Zijlstra > Cc: kbu...@01.org > Cc: Steven Rostedt > Cc: Mike Galbraith > Link: http://lkml.kernel.org/r/20130205113751.GA20521@elgon.mountain > Signed-off-by: Ingo Molnar > IIUC, we are only changing variable in yield_to from bool to int. I am curious whether we need changes in struct sched_class (sched.h) bool (*yield_to_task) (struct rq *rq, struct task_struct *p, bool preempt); ==> int (*yield_to_task) (struct rq *rq, struct task_struct *p, bool preempt); otherwise we would assign bool value to int here yielded = curr->sched_class->yield_to_task(rq, p, preempt); this return values also cascaded to kvm_main.c. If we need to patchup entire thing, I can cook a correction patch. Thanks and Regards Raghu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Resend: [patch] hsi : Avoid race condition between HSI controller and HSI client when system restart and power down
Avoid race condition between HSI controller and HSI client when system restart and power down hsi_isr_tasklet disabled in HSI_controller exit, but before HSI controller exit, HSI client will cleanup, this cleanup will destroy the spinlock used by the hsi_isr_tasklet,so if after HSI client cleanup, there still such tasklet running, issue will happend. here is the issue stack as below. hsi-ctrl: WAKEf514b000: f9a800e5 0010 6006 013f5bf9 582bf9b8 3d565244 hsi-dlp: TTY device close request (mmgr, 133) hsi-dlp: port shutdown request mdm_ctrl: Unexpected RESET_OUT 0x0 BUG: spinlock bad magic on CPU#3, zygote/137 lock: f53a0fbc, .magic: , .owner: /-1, .owner_cpu: 0 Pid: 137, comm: zygote Tainted: G C 3.0.34-141888-g9e0a6fb #1 Call Trace: [] ? printk+0x1d/0x1f [] spin_bug+0xa4/0xac [] do_raw_spin_lock+0x7d/0x170 [] ? _raw_spin_unlock_irqrestore+0x26/0x50 [] _raw_spin_lock_irqsave+0x2c/0x40 [] complete+0x20/0x60 [] ? _raw_spin_unlock_irqrestore+0x26/0x50 [] dlp_ctrl_complete_tx+0x29/0x40 [] hsi_isr_tasklet+0x394/0x11a0 [] ? sched_clock_cpu+0xe5/0x150 [] tasklet_hi_action+0x59/0x120 [] ? it_real_fn+0x18/0xb0 [] __do_softirq+0x9b/0x220 [] ? remote_softirq_receive+0x110/0x110 Change-Id: I6a0ca0c14409bfc4cd7a2767a4f203c171ece007 Signed-off-by: xiaobing tu Signed-off-by: chao bi --- drivers/hsi/clients/dlp_ctrl.c |4 drivers/hsi/clients/dlp_flash.c |5 - drivers/hsi/clients/dlp_net.c |4 drivers/hsi/clients/dlp_trace.c |1 - drivers/hsi/clients/dlp_tty.c |5 - 5 files changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/hsi/clients/dlp_ctrl.c b/drivers/hsi/clients/dlp_ctrl.c index b09f9e6..e980f0c 100644 --- a/drivers/hsi/clients/dlp_ctrl.c +++ b/drivers/hsi/clients/dlp_ctrl.c @@ -394,6 +394,8 @@ static void dlp_ctrl_complete_tx(struct hsi_msg *msg) struct dlp_command *dlp_cmd = msg->context; struct dlp_channel *ch_ctx = dlp_cmd->channel; +if (dlp_drv.channels[DLP_CHANNEL_CTRL] == NULL) +return; dlp_cmd->status = (msg->status == HSI_STATUS_COMPLETED) ? 0 : -EIO; /* Command done, notify the sender */ @@ -433,6 +435,8 @@ static void dlp_ctrl_complete_rx(struct hsi_msg *msg) unsigned long flags; int hsi_channel, elp_channel, ret, response, msg_complete, state; +if (dlp_drv.channels[DLP_CHANNEL_CTRL] == NULL) +return; /* Copy the reponse */ memcpy(¶ms, sg_virt(msg->sgt.sgl), sizeof(struct dlp_command_params)); diff --git a/drivers/hsi/clients/dlp_flash.c b/drivers/hsi/clients/dlp_flash.c index 885b73a..b333d74 100644 --- a/drivers/hsi/clients/dlp_flash.c +++ b/drivers/hsi/clients/dlp_flash.c @@ -42,7 +42,6 @@ */ #define DLP_FLASH_NB_RX_MSG 10 - /* * struct flashing_driver - HSI Modem flashing driver protocol * @@ -259,6 +258,8 @@ static struct hsi_msg *dlp_boot_rx_dequeue(struct dlp_channel *ch_ctx) */ static void dlp_flash_complete_tx(struct hsi_msg *msg) { +if (dlp_drv.channels[DLP_CHANNEL_FLASH] == NULL) +return; /* Delete the received msg */ dlp_pdu_free(msg, -1); } @@ -274,6 +275,8 @@ static void dlp_flash_complete_rx(struct hsi_msg *msg) struct dlp_flash_ctx *flash_ctx = ch_ctx->ch_data; int ret; +if (dlp_drv.channels[DLP_CHANNEL_FLASH] == NULL) +return; if (msg->status != HSI_STATUS_COMPLETED) { pr_err(DRVNAME ": Invalid msg status: %d (ignored)\n", msg->status); diff --git a/drivers/hsi/clients/dlp_net.c b/drivers/hsi/clients/dlp_net.c index f3ca817..0c3e672 100644 --- a/drivers/hsi/clients/dlp_net.c +++ b/drivers/hsi/clients/dlp_net.c @@ -158,6 +158,8 @@ static void dlp_net_complete_tx(struct hsi_msg *pdu) struct dlp_net_context *net_ctx = ch_ctx->ch_data; struct dlp_xfer_ctx *xfer_ctx = &ch_ctx->tx; +if (dlp_drv.channels[ch_ctx->hsi_channel] == NULL) +return; /* TX done, free the skb */ dev_kfree_skb(msg_param->skb); @@ -197,6 +199,8 @@ static void dlp_net_complete_rx(struct hsi_msg *pdu) unsigned int *ptr; unsigned long flags; +if (dlp_drv.channels[ch_ctx->hsi_channel] == NULL) +return; /* Pop the CTRL queue */ write_lock_irqsave(&xfer_ctx->lock, flags); dlp_hsi_controller_pop(xfer_ctx); diff --git a/drivers/hsi/clients/dlp_trace.c b/drivers/hsi/clients/dlp_trace.c index fa91985..0067798 100644 --- a/drivers/hsi/clients/dlp_trace.c +++ b/drivers/hsi/clients/dlp_trace.c @@ -84,7 +84,6 @@ static unsigned int log_dropped_data; module_param_named(log_dropped_data, log_dropped_data, int, S_IRUGO | S_IWUSR); #endif - /* * */ diff --git a/drivers/hsi/clients/dlp_tty.c b/drivers/hsi/clients/dlp_tty.c index 7774484..47f6697 100644 --- a/drivers/hsi/clients/dlp_tty.c +++ b/drivers/hsi/clients/dlp_tty.c @@ -68,7 +68,6 @@ struct dlp_tty_context { struct work_structdo_tty_forward; }; - /** * Push as many RX PDUs as possible to the contro
Re: Re: [PATCH] PM / devfreq: fix missing unlock on error in exynos4_busfreq_pm_notifier_event()
> On 12:33-20130222, Wei Yongjun wrote: > > From: Wei Yongjun > > > > Add the missing unlock before return from function > > exynos4_busfreq_pm_notifier_event() in the error > > handling case. > > > > This issue introduced by commit 8fa938 > > (PM / devfreq: exynos4_bus: honor RCU lock usage) > > > > Signed-off-by: Wei Yongjun > Arrgh.. Thanks for catching this :( My bad. > > Fix looks good to me. upto MyungJoo. Applied to devfreq repository. I'll send pull request to Rafael soon along with other patches. > > MyungJoo, Rafael, > btw, adding linux...@vger.kernel.org to MAINTAINERS for devfreq might > be a nice idea to have right audience. It appears that replacing the current mailing list address with linux-pm is appropriate. If no one objects, I'll post the suggestion later. Cheers, MyungJoo N떑꿩�r툤y鉉싕b쾊Ф푤v�^�)頻{.n�+돴쪐{콗喩zX㎍썳變}찠꼿쟺�&j:+v돣�쳭喩zZ+�+zf"톒쉱�~넮녬i鎬z�췿ⅱ�?솳鈺�&�)刪f뷌^j푹y쬶끷@A첺뛴 0띠h��뭝
[patch] hsi : Avoid race condition between HSI controller and HSI client when system restart and power down
Avoid race condition between HSI controller and HSI client when system restart and power down hsi_isr_tasklet disabled in HSI_controller exit, but before HSI controller exit, HSI client will cleanup, this cleanup will destroy the spinlock used by the hsi_isr_tasklet,so if after HSI client cleanup, there still such tasklet running, issue will happend. here is the issue stack as below. hsi-ctrl: WAKEf514b000: f9a800e5 0010 6006 013f5bf9 582bf9b8 3d565244 hsi-dlp: TTY device close request (mmgr, 133) hsi-dlp: port shutdown request mdm_ctrl: Unexpected RESET_OUT 0x0 BUG: spinlock bad magic on CPU#3, zygote/137 lock: f53a0fbc, .magic: , .owner: /-1, .owner_cpu: 0 Pid: 137, comm: zygote Tainted: G C 3.0.34-141888-g9e0a6fb #1 Call Trace: [] ? printk+0x1d/0x1f [] spin_bug+0xa4/0xac [] do_raw_spin_lock+0x7d/0x170 [] ? _raw_spin_unlock_irqrestore+0x26/0x50 [] _raw_spin_lock_irqsave+0x2c/0x40 [] complete+0x20/0x60 [] ? _raw_spin_unlock_irqrestore+0x26/0x50 [] dlp_ctrl_complete_tx+0x29/0x40 [] hsi_isr_tasklet+0x394/0x11a0 [] ? sched_clock_cpu+0xe5/0x150 [] tasklet_hi_action+0x59/0x120 [] ? it_real_fn+0x18/0xb0 [] __do_softirq+0x9b/0x220 [] ? remote_softirq_receive+0x110/0x110 Change-Id: I6a0ca0c14409bfc4cd7a2767a4f203c171ece007 Signed-off-by: xiaobing tu Signed-off-by: chao bi --- drivers/hsi/clients/dlp_ctrl.c |4 drivers/hsi/clients/dlp_flash.c |5 - drivers/hsi/clients/dlp_net.c |4 drivers/hsi/clients/dlp_trace.c |1 - drivers/hsi/clients/dlp_tty.c |5 - 5 files changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/hsi/clients/dlp_ctrl.c b/drivers/hsi/clients/dlp_ctrl.c index b09f9e6..e980f0c 100644 --- a/drivers/hsi/clients/dlp_ctrl.c +++ b/drivers/hsi/clients/dlp_ctrl.c @@ -394,6 +394,8 @@ static void dlp_ctrl_complete_tx(struct hsi_msg *msg) struct dlp_command *dlp_cmd = msg->context; struct dlp_channel *ch_ctx = dlp_cmd->channel; +if (dlp_drv.channels[DLP_CHANNEL_CTRL] == NULL) +return; dlp_cmd->status = (msg->status == HSI_STATUS_COMPLETED) ? 0 : -EIO; /* Command done, notify the sender */ @@ -433,6 +435,8 @@ static void dlp_ctrl_complete_rx(struct hsi_msg *msg) unsigned long flags; int hsi_channel, elp_channel, ret, response, msg_complete, state; +if (dlp_drv.channels[DLP_CHANNEL_CTRL] == NULL) +return; /* Copy the reponse */ memcpy(¶ms, sg_virt(msg->sgt.sgl), sizeof(struct dlp_command_params)); diff --git a/drivers/hsi/clients/dlp_flash.c b/drivers/hsi/clients/dlp_flash.c index 885b73a..b333d74 100644 --- a/drivers/hsi/clients/dlp_flash.c +++ b/drivers/hsi/clients/dlp_flash.c @@ -42,7 +42,6 @@ */ #define DLP_FLASH_NB_RX_MSG 10 - /* * struct flashing_driver - HSI Modem flashing driver protocol * @@ -259,6 +258,8 @@ static struct hsi_msg *dlp_boot_rx_dequeue(struct dlp_channel *ch_ctx) */ static void dlp_flash_complete_tx(struct hsi_msg *msg) { +if (dlp_drv.channels[DLP_CHANNEL_FLASH] == NULL) +return; /* Delete the received msg */ dlp_pdu_free(msg, -1); } @@ -274,6 +275,8 @@ static void dlp_flash_complete_rx(struct hsi_msg *msg) struct dlp_flash_ctx *flash_ctx = ch_ctx->ch_data; int ret; +if (dlp_drv.channels[DLP_CHANNEL_FLASH] == NULL) +return; if (msg->status != HSI_STATUS_COMPLETED) { pr_err(DRVNAME ": Invalid msg status: %d (ignored)\n", msg->status); diff --git a/drivers/hsi/clients/dlp_net.c b/drivers/hsi/clients/dlp_net.c index f3ca817..0c3e672 100644 --- a/drivers/hsi/clients/dlp_net.c +++ b/drivers/hsi/clients/dlp_net.c @@ -158,6 +158,8 @@ static void dlp_net_complete_tx(struct hsi_msg *pdu) struct dlp_net_context *net_ctx = ch_ctx->ch_data; struct dlp_xfer_ctx *xfer_ctx = &ch_ctx->tx; +if (dlp_drv.channels[ch_ctx->hsi_channel] == NULL) +return; /* TX done, free the skb */ dev_kfree_skb(msg_param->skb); @@ -197,6 +199,8 @@ static void dlp_net_complete_rx(struct hsi_msg *pdu) unsigned int *ptr; unsigned long flags; +if (dlp_drv.channels[ch_ctx->hsi_channel] == NULL) +return; /* Pop the CTRL queue */ write_lock_irqsave(&xfer_ctx->lock, flags); dlp_hsi_controller_pop(xfer_ctx); diff --git a/drivers/hsi/clients/dlp_trace.c b/drivers/hsi/clients/dlp_trace.c index fa91985..0067798 100644 --- a/drivers/hsi/clients/dlp_trace.c +++ b/drivers/hsi/clients/dlp_trace.c @@ -84,7 +84,6 @@ static unsigned int log_dropped_data; module_param_named(log_dropped_data, log_dropped_data, int, S_IRUGO | S_IWUSR); #endif - /* * */ diff --git a/drivers/hsi/clients/dlp_tty.c b/drivers/hsi/clients/dlp_tty.c index 7774484..47f6697 100644 --- a/drivers/hsi/clients/dlp_tty.c +++ b/drivers/hsi/clients/dlp_tty.c @@ -68,7 +68,6 @@ struct dlp_tty_context { struct work_structdo_tty_forward; }; - /** * Push as many RX PDUs as possible to the controller F
Re: [PATCH v2] staging: comedi: drivers: usbduxsigma.c: fix DMA buffers on stack
Looks good. Reviewed-by: Dan Carpenter regards, dan carpenter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] cpustat: use accessor functions for get/set/add
On Fri, Feb 22, 2013 at 11:26 AM, Kevin Hilman wrote: > Add some accessor functions in order to facilitate the conversion to > atomic reads/writes of cpustat values. > > Signed-off-by: Kevin Hilman > --- > drivers/cpufreq/cpufreq_governor.c | 18 - > drivers/cpufreq/cpufreq_ondemand.c | 2 +- > diff --git a/drivers/cpufreq/cpufreq_governor.c > b/drivers/cpufreq/cpufreq_governor.c > index 6c5f1d3..ec6c315 100644 > --- a/drivers/cpufreq/cpufreq_governor.c > +++ b/drivers/cpufreq/cpufreq_governor.c > @@ -36,12 +36,12 @@ static inline u64 get_cpu_idle_time_jiffy(unsigned int > cpu, u64 *wall) > > cur_wall_time = jiffies64_to_cputime64(get_jiffies_64()); > > - busy_time = kcpustat_cpu(cpu).cpustat[CPUTIME_USER]; > - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SYSTEM]; > - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_IRQ]; > - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SOFTIRQ]; > - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_STEAL]; > - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_NICE]; > + busy_time = kcpustat_cpu_get(cpu, CPUTIME_USER); > + busy_time += kcpustat_cpu_get(cpu, CPUTIME_SYSTEM); > + busy_time += kcpustat_cpu_get(cpu, CPUTIME_IRQ); > + busy_time += kcpustat_cpu_get(cpu, CPUTIME_SOFTIRQ); > + busy_time += kcpustat_cpu_get(cpu, CPUTIME_STEAL); > + busy_time += kcpustat_cpu_get(cpu, CPUTIME_NICE); > > idle_time = cur_wall_time - busy_time; > if (wall) > @@ -103,7 +103,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) > u64 cur_nice; > unsigned long cur_nice_jiffies; > > - cur_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE] - > + cur_nice = kcpustat_cpu_get(j, CPUTIME_NICE) - > cdbs->prev_cpu_nice; > /* > * Assumption: nice time between sampling periods will > @@ -113,7 +113,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) > cputime64_to_jiffies64(cur_nice); > > cdbs->prev_cpu_nice = > - kcpustat_cpu(j).cpustat[CPUTIME_NICE]; > + kcpustat_cpu_get(j, CPUTIME_NICE); > idle_time += jiffies_to_usecs(cur_nice_jiffies); > } > > @@ -216,7 +216,7 @@ int cpufreq_governor_dbs(struct dbs_data *dbs_data, > &j_cdbs->prev_cpu_wall); > if (ignore_nice) > j_cdbs->prev_cpu_nice = > - kcpustat_cpu(j).cpustat[CPUTIME_NICE]; > + kcpustat_cpu_get(j, CPUTIME_NICE); > } > > /* > diff --git a/drivers/cpufreq/cpufreq_ondemand.c > b/drivers/cpufreq/cpufreq_ondemand.c > index 7731f7c..ac5d49f 100644 > --- a/drivers/cpufreq/cpufreq_ondemand.c > +++ b/drivers/cpufreq/cpufreq_ondemand.c > @@ -403,7 +403,7 @@ static ssize_t store_ignore_nice_load(struct kobject *a, > struct attribute *b, > > &dbs_info->cdbs.prev_cpu_wall); > if (od_tuners.ignore_nice) > dbs_info->cdbs.prev_cpu_nice = > - kcpustat_cpu(j).cpustat[CPUTIME_NICE]; > + kcpustat_cpu_get(j, CPUTIME_NICE); > > } > return count; For cpufreq: Acked-by: Viresh Kumar Though i believe you also need this: diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c index 64ef737..38e3ad7 100644 --- a/drivers/cpufreq/cpufreq_conservative.c +++ b/drivers/cpufreq/cpufreq_conservative.c @@ -242,7 +242,7 @@ static ssize_t store_ignore_nice_load(struct kobject *a, struct attribute *b, &dbs_info->cdbs.prev_cpu_wall); if (cs_tuners.ignore_nice) dbs_info->cdbs.prev_cpu_nice = - kcpustat_cpu(j).cpustat[CPUTIME_NICE]; + kcpustat_cpu_get(j, CPUTIME_NICE); } return count; } BTW, i don't see kcpustat_cpu() used in kernel/sched/core.c| 12 +--- kernel/sched/cputime.c | 29 +-- I searched tip/master as well as lnext/master. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()
On Fri, 2013-02-22 at 14:06 +0800, Michael Wang wrote: > On 02/22/2013 01:08 PM, Mike Galbraith wrote: > > On Fri, 2013-02-22 at 10:37 +0800, Michael Wang wrote: > > > >> According to the testing result, I could not agree this purpose of > >> wake_affine() benefit us, but I'm sure that wake_affine() is a terrible > >> performance killer when system is busy. > > > > (hm, result is singular.. pgbench in 1:N mode only?) > > I'm not sure about how pgbench implemented, all I know is it will create > several instance and access the database, I suppose no different from > several threads access database (1 server and N clients?). It's user switchable. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()
On Fri, 2013-02-22 at 13:26 +0800, Michael Wang wrote: > Just confirm that I'm not on the wrong way, did the 1:N mode here means > 1 task forked N threads, and child always talk with father? Yes, one server, many clients. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] staging: comedi: drivers: usbduxsigma.c: fix DMA buffers on stack
This patch fixes an instance of DMA buffer on stack(being passed to usb_control_msg)for the USB-DUXsigma Board driver. Found using smatch. Signed-off-by: Kumar Amit Mehta --- drivers/staging/comedi/drivers/usbduxsigma.c | 27 -- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/drivers/staging/comedi/drivers/usbduxsigma.c b/drivers/staging/comedi/drivers/usbduxsigma.c index dc6b017..9e99a4b 100644 --- a/drivers/staging/comedi/drivers/usbduxsigma.c +++ b/drivers/staging/comedi/drivers/usbduxsigma.c @@ -681,7 +681,11 @@ static void usbduxsub_ao_IsocIrq(struct urb *urb) static int usbduxsub_start(struct usbduxsub *usbduxsub) { int errcode = 0; - uint8_t local_transfer_buffer[16]; + uint8_t *local_transfer_buffer; + + local_transfer_buffer = kmalloc(16, GFP_KERNEL); + if (!local_transfer_buffer) + return -ENOMEM; /* 7f92 to zero */ local_transfer_buffer[0] = 0; @@ -702,19 +706,22 @@ static int usbduxsub_start(struct usbduxsub *usbduxsub) 1, /* Timeout */ BULK_TIMEOUT); - if (errcode < 0) { + if (errcode < 0) dev_err(&usbduxsub->interface->dev, "comedi_: control msg failed (start)\n"); - return errcode; - } - return 0; + + kfree(local_transfer_buffer); + return errcode; } static int usbduxsub_stop(struct usbduxsub *usbduxsub) { int errcode = 0; + uint8_t *local_transfer_buffer; - uint8_t local_transfer_buffer[16]; + local_transfer_buffer = kmalloc(16, GFP_KERNEL); + if (!local_transfer_buffer) + return -ENOMEM; /* 7f92 to one */ local_transfer_buffer[0] = 1; @@ -732,12 +739,12 @@ static int usbduxsub_stop(struct usbduxsub *usbduxsub) 1, /* Timeout */ BULK_TIMEOUT); - if (errcode < 0) { + if (errcode < 0) dev_err(&usbduxsub->interface->dev, "comedi_: control msg failed (stop)\n"); - return errcode; - } - return 0; + + kfree(local_transfer_buffer); + return errcode; } static int usbduxsub_upload(struct usbduxsub *usbduxsub, -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] remoteproc: off by one in rproc_virtio_new_vringh()
It should be >= ARRAY_SIZE() instead of > ARRAY_SIZE() because it is an index. Signed-off-by: Dan Carpenter diff --git a/drivers/remoteproc/remoteproc_virtio.c b/drivers/remoteproc/remoteproc_virtio.c index dba33ff..b5e3af5 100644 --- a/drivers/remoteproc/remoteproc_virtio.c +++ b/drivers/remoteproc/remoteproc_virtio.c @@ -208,7 +208,7 @@ rproc_virtio_new_vringh(struct virtio_device *vdev, unsigned index, struct vringh *vrh; int err; - if (index > ARRAY_SIZE(rvdev->vring)) { + if (index >= ARRAY_SIZE(rvdev->vring)) { dev_err(&rvdev->vdev.dev, "bad vring index: %d\n", index); return NULL; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()
On 02/22/2013 01:08 PM, Mike Galbraith wrote: > On Fri, 2013-02-22 at 10:37 +0800, Michael Wang wrote: > >> According to the testing result, I could not agree this purpose of >> wake_affine() benefit us, but I'm sure that wake_affine() is a terrible >> performance killer when system is busy. > > (hm, result is singular.. pgbench in 1:N mode only?) I'm not sure about how pgbench implemented, all I know is it will create several instance and access the database, I suppose no different from several threads access database (1 server and N clients?). There are improvement since when system busy, wake_affine() will be skipped. And in old world, when system is busy, wake_affine() will only be skipped if prev_cpu and curr_cpu belong to different nodes. Regards, Michael Wang > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/PATCH 2/5] kernel_cpustat: convert to atomic 64-bit accessors
Frederic Weisbecker writes: > 2013/2/21 Frederic Weisbecker : >> 2013/2/21 Kevin Hilman : >>> Subject: [PATCH 2/5] kernel_cpustat: convert to atomic 64-bit accessors >>> >>> Use the atomic64_* accessors for all the kernel_cpustat fields to >>> ensure atomic access on non-64 bit platforms. >>> >>> Thanks to Mats Liljegren for CGROUP_CPUACCT related fixes. >>> >>> Cc: Mats Liljegren >>> Signed-off-by: Kevin Hilman >> >> Funny stuff, I thought struct kernel_cpustat was made of cputime_t >> field. Actually it's u64. So the issue is independant from the new >> full dynticks cputime accounting. It was already broken before. >> >> But yeah that's not the point, we still want to fix this anyway. But >> let's just treat this patch as independant. OK, I just sent an updated series based on your proposal. Thanks for the review, Kevin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] cpustat: use accessor functions for get/set/add
Add some accessor functions in order to facilitate the conversion to atomic reads/writes of cpustat values. Signed-off-by: Kevin Hilman --- arch/s390/appldata/appldata_os.c | 16 +++ drivers/cpufreq/cpufreq_governor.c | 18 - drivers/cpufreq/cpufreq_ondemand.c | 2 +- drivers/macintosh/rack-meter.c | 6 +++--- fs/proc/stat.c | 40 +++--- fs/proc/uptime.c | 2 +- include/linux/kernel_stat.h| 7 ++- kernel/sched/core.c| 12 +--- kernel/sched/cputime.c | 29 +-- 9 files changed, 66 insertions(+), 66 deletions(-) diff --git a/arch/s390/appldata/appldata_os.c b/arch/s390/appldata/appldata_os.c index 87521ba..eff76f8 100644 --- a/arch/s390/appldata/appldata_os.c +++ b/arch/s390/appldata/appldata_os.c @@ -113,21 +113,21 @@ static void appldata_get_os_data(void *data) j = 0; for_each_online_cpu(i) { os_data->os_cpu[j].per_cpu_user = - cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_USER]); + cputime_to_jiffies(kcpustat_cpu_get(i, CPUTIME_USER)); os_data->os_cpu[j].per_cpu_nice = - cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_NICE]); + cputime_to_jiffies(kcpustat_cpu_get(i, CPUTIME_NICE)); os_data->os_cpu[j].per_cpu_system = - cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM]); + cputime_to_jiffies(kcpustat_cpu_get(i, CPUTIME_SYSTEM)); os_data->os_cpu[j].per_cpu_idle = - cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IDLE]); + cputime_to_jiffies(kcpustat_cpu_get(i, CPUTIME_IDLE)); os_data->os_cpu[j].per_cpu_irq = - cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IRQ]); + cputime_to_jiffies(kcpustat_cpu_get(i, CPUTIME_IRQ)); os_data->os_cpu[j].per_cpu_softirq = - cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ]); + cputime_to_jiffies(kcpustat_cpu_get(i, CPUTIME_SOFTIRQ)); os_data->os_cpu[j].per_cpu_iowait = - cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IOWAIT]); + cputime_to_jiffies(kcpustat_cpu_get(i, CPUTIME_IOWAIT)); os_data->os_cpu[j].per_cpu_steal = - cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_STEAL]); + cputime_to_jiffies(kcpustat_cpu_get(i, CPUTIME_STEAL)); os_data->os_cpu[j].cpu_id = i; j++; } diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index 6c5f1d3..ec6c315 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -36,12 +36,12 @@ static inline u64 get_cpu_idle_time_jiffy(unsigned int cpu, u64 *wall) cur_wall_time = jiffies64_to_cputime64(get_jiffies_64()); - busy_time = kcpustat_cpu(cpu).cpustat[CPUTIME_USER]; - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SYSTEM]; - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_IRQ]; - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SOFTIRQ]; - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_STEAL]; - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_NICE]; + busy_time = kcpustat_cpu_get(cpu, CPUTIME_USER); + busy_time += kcpustat_cpu_get(cpu, CPUTIME_SYSTEM); + busy_time += kcpustat_cpu_get(cpu, CPUTIME_IRQ); + busy_time += kcpustat_cpu_get(cpu, CPUTIME_SOFTIRQ); + busy_time += kcpustat_cpu_get(cpu, CPUTIME_STEAL); + busy_time += kcpustat_cpu_get(cpu, CPUTIME_NICE); idle_time = cur_wall_time - busy_time; if (wall) @@ -103,7 +103,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) u64 cur_nice; unsigned long cur_nice_jiffies; - cur_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE] - + cur_nice = kcpustat_cpu_get(j, CPUTIME_NICE) - cdbs->prev_cpu_nice; /* * Assumption: nice time between sampling periods will @@ -113,7 +113,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) cputime64_to_jiffies64(cur_nice); cdbs->prev_cpu_nice = - kcpustat_cpu(j).cpustat[CPUTIME_NICE]; + kcpustat_cpu_get(j, CPUTIME_NICE); idle_time += jiffies_to_usecs(cur_nice_jiffies); } @@ -216,7 +216,7 @@ int cpufreq_governor_dbs(struct dbs_data *dbs_data, &j_cdbs->pre
Re: [PATCH v3 linux-next] cpufreq: ondemand: Calculate gradient of CPU load to early increase frequency
On Fri, Feb 22, 2013 at 7:26 AM, Viresh Kumar wrote: > On 21 February 2013 23:09, Stratos Karafotis wrote: >> Instead of checking only the absolute value of CPU load_freq to increase >> frequency, we detect forthcoming CPU load rise and increase frequency >> earlier. >> >> Every sampling rate, we calculate the gradient of load_freq. If it is >> too steep we assume that the load most probably will go over >> up_threshold in next iteration(s) and we increase frequency immediately. >> >> New tuners are introduced: >> - early_demand: to enable this functionality (disabled by default). >> - grad_up_threshold: over this gradient of load we will increase >> frequency immediately. >> >> Signed-off-by: Stratos Karafotis > > Acked-by: Viresh Kumar Rafael, I applied it here with my Ack over my patches, for getting a run by "kbuild test robot". http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/cpufreq-for-3.10 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] cpustat: convert to atomic operations
For non 64-bit platforms, convert cpustat fields to atomic64 type so reads and udpates of cpustats are atomic on those platforms as well. For 64-bit platforms, the cpustat field is left as u64 because on 64-bit, using atomic64_add will have the additional overhead of a lock. We could also have used atomic64_set(atomic64_read() + delta), but on 32-bit platforms using the generic 64-bit ops (lib/atomic64.c), that results in taking a lock twice. Signed-off-by: Kevin Hilman --- include/linux/kernel_stat.h | 16 1 file changed, 16 insertions(+) diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h index df8ad75..a433f87 100644 --- a/include/linux/kernel_stat.h +++ b/include/linux/kernel_stat.h @@ -32,7 +32,11 @@ enum cpu_usage_stat { }; struct kernel_cpustat { +#ifdef CONFIG_64BIT u64 _cpustat[NR_STATS]; +#else + atomic64_t _cpustat[NR_STATS]; +#endif }; struct kernel_stat { @@ -51,11 +55,23 @@ DECLARE_PER_CPU(struct kernel_cpustat, kernel_cpustat); #define kcpustat_this_cpu (&__get_cpu_var(kernel_cpustat)) #define kstat_cpu(cpu) per_cpu(kstat, cpu) #define kcpustat_cpu(cpu) per_cpu(kernel_cpustat, cpu) +#ifdef CONFIG_64BIT #define kcpustat_cpu_get(cpu, i) (kcpustat_cpu(cpu)._cpustat[i]) #define kcpustat_cpu_set(cpu, i, val) (kcpustat_cpu(cpu)._cpustat[i] = (val)) #define kcpustat_cpu_add(cpu, i, val) (kcpustat_cpu(cpu)._cpustat[i] += (val)) #define kcpustat_this_cpu_set(i, val) (kcpustat_this_cpu->_cpustat[i] = (val)) #define kcpustat_this_cpu_add(i, val) (kcpustat_this_cpu->_cpustat[i] += (val)) +#else +#define kcpustat_cpu_get(cpu, i) atomic64_read(&kcpustat_cpu(cpu)._cpustat[i]) +#define kcpustat_cpu_set(cpu, i, val) \ + atomic64_set(val, &kcpustat_cpu(cpu)._cpustat[i]) +#define kcpustat_cpu_add(cpu, i, val) \ + atomic64_add(val, &kcpustat_cpu(cpu)._cpustat[i]) +#define kcpustat_this_cpu_set(i, val) \ + atomic64_set(val, &kcpustat_this_cpu->_cpustat[i]) +#define kcpustat_this_cpu_add(i, val) \ + atomic64_add(val, &kcpustat_this_cpu->_cpustat[i]) +#endif extern unsigned long long nr_context_switches(void); -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] cpustat: use atomic operations to read/update stats
On 64-bit platforms, reads/writes of the various cpustat fields are atomic due to native 64-bit loads/stores. However, on non 64-bit platforms, reads/writes of the cpustat fields are not atomic and could lead to inconsistent statistics. This problem was originally reported by Frederic Weisbecker as a 64-bit limitation with the nsec granularity cputime accounting for full dynticks, but then we realized that it's a problem that's been around for awhile and not specific to the new cputime accounting. This series fixes this by first converting all access to the cputime fields to use accessor functions, and then converting the accessor functions to use the atomic64 functions. Implemented based on idea proposed by Frederic Weisbecker. Kevin Hilman (2): cpustat: use accessor functions for get/set/add cpustat: convert to atomic operations arch/s390/appldata/appldata_os.c | 16 +++ drivers/cpufreq/cpufreq_governor.c | 18 - drivers/cpufreq/cpufreq_ondemand.c | 2 +- drivers/macintosh/rack-meter.c | 6 +++--- fs/proc/stat.c | 40 +++--- fs/proc/uptime.c | 2 +- include/linux/kernel_stat.h| 11 ++- kernel/sched/core.c| 12 +--- kernel/sched/cputime.c | 29 +-- 9 files changed, 70 insertions(+), 66 deletions(-) -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[rebased][PATCH 0/4] acpi: do some changes for numa info
just do some trivial changes to make acpi's numa info operation more cleaner. ChangeLog v3->v4 1.fix srat_disabled function spotted by Yasuaki Ishimatsu v2->v3 1. rebase on linux-next 2. bring back lost Makefile changes spotted by David Rientjes spotted by Yasuaki Ishimatsu v1->v2 1. fix-up several coding issues 2. finish srat.c change spotted by David Rientjes Li Guang (4) acpi: move x86/mm/srat.c to x86/kernel/acpi/srat.c numa: avoid export acpi_numa variable acpi: add clock_domain field to acpi_srat_cpu_affinity remove include asm/acpi.h in process_driver.c arch/x86/include/asm/acpi.h | 2 +- arch/x86/kernel/acpi/Makefile | 1 + arch/x86/kernel/acpi/srat.c | 299 + arch/x86/mm/Makefile| 1 - arch/x86/mm/numa.c | 2 +- arch/x86/mm/srat.c | 278 - arch/x86/xen/enlighten.c| 2 +- drivers/acpi/processor_driver.c | 1 - include/acpi/actbl1.h | 2 +- 9 files changed, 296 insertions(+), 292 deletions(-) create mode 100644 arch/x86/kernel/acpi/srat.c delete mode 100644 arch/x86/mm/srat.c -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[rebased][PATCH 2/4] numa: avoid export acpi_numa variable
acpi_numa is used to prevent srat table being parsed, seems a little miss-named, if 'noacpi' was specified by cmdline and CONFIG_ACPI_NUMA was enabled, acpi_numa will be operated directly from everywhere it needed to disable/enable numa in acpi mode which was a bad thing, so, try to export a fuction to get srat table enable/disable info. Signed-off-by: liguang --- arch/x86/include/asm/acpi.h |2 +- arch/x86/kernel/acpi/srat.c | 21 + arch/x86/mm/numa.c |2 +- arch/x86/xen/enlighten.c|2 +- 4 files changed, 16 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h index b31bf97..449e12a 100644 --- a/arch/x86/include/asm/acpi.h +++ b/arch/x86/include/asm/acpi.h @@ -177,7 +177,7 @@ static inline void disable_acpi(void) { } #define ARCH_HAS_POWER_INIT1 #ifdef CONFIG_ACPI_NUMA -extern int acpi_numa; +extern void disable_acpi_numa(void); extern int x86_acpi_numa_init(void); #endif /* CONFIG_ACPI_NUMA */ diff --git a/arch/x86/kernel/acpi/srat.c b/arch/x86/kernel/acpi/srat.c index b20b5b7..469a0af 100644 --- a/arch/x86/kernel/acpi/srat.c +++ b/arch/x86/kernel/acpi/srat.c @@ -24,22 +24,27 @@ #include #include -int acpi_numa __initdata; +static bool acpi_numa __initdata; static __init int setup_node(int pxm) { return acpi_map_pxm_to_node(pxm); } -static __init void bad_srat(void) +void __init disable_acpi_numa(void) { - printk(KERN_ERR "SRAT: SRAT not used.\n"); - acpi_numa = -1; + acpi_numa = false; } -static __init inline int srat_disabled(void) +static void __init bad_srat(void) { - return acpi_numa < 0; + disable_acpi_numa(); + printk(KERN_ERR "SRAT: SRAT will not be used.\n"); +} + +static bool __init srat_disabled(void) +{ + return acpi_numa == false; } /* Callback for SLIT parsing */ @@ -88,7 +93,7 @@ acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa) } set_apicid_to_node(apic_id, node); node_set(node, numa_nodes_parsed); - acpi_numa = 1; + acpi_numa = true; printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u\n", pxm, apic_id, node); } @@ -130,7 +135,7 @@ acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa) set_apicid_to_node(apic_id, node); node_set(node, numa_nodes_parsed); - acpi_numa = 1; + acpi_numa = true; printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u\n", pxm, apic_id, node); } diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 3545585..62e3b2a 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -47,7 +47,7 @@ static __init int numa_setup(char *opt) #endif #ifdef CONFIG_ACPI_NUMA if (!strncmp(opt, "noacpi", 6)) - acpi_numa = -1; + disable_acpi_numa(); #endif return 0; } diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index bd4c134..724ac84 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1447,7 +1447,7 @@ asmlinkage void __init xen_start_kernel(void) * any NUMA information the kernel tries to get from ACPI will * be meaningless. Prevent it from trying. */ - acpi_numa = -1; + disable_acpi_numa(); #endif /* Don't do the full vcpu_info placement stuff until we have a -- 1.7.2.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[rebased][PATCH 4/4] remove include asm/acpi.h in process_driver.c
process_driver.c include linux/acpi.h which already include asm/acpi.h, so remove it. Reviewed-by: Yasuaki Ishimatsu Acked-by: David Rientjes Signed-off-by: liguang --- drivers/acpi/processor_driver.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c index df34bd0..341258a 100644 --- a/drivers/acpi/processor_driver.c +++ b/drivers/acpi/processor_driver.c @@ -53,7 +53,6 @@ #include #include #include -#include #include #include -- 1.7.2.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[rebased][PATCH 1/4] acpi: move x86/mm/srat.c to x86/kernel/acpi/srat.c
srat table should present only on acpi domain, seems mm/ is not the right place for it. Reviewed-by: Yasuaki Ishimatsu Signed-off-by: liguang --- arch/x86/kernel/acpi/Makefile |1 + arch/x86/kernel/acpi/srat.c | 278 + arch/x86/mm/Makefile |1 - arch/x86/mm/srat.c| 278 - 4 files changed, 279 insertions(+), 279 deletions(-) create mode 100644 arch/x86/kernel/acpi/srat.c delete mode 100644 arch/x86/mm/srat.c diff --git a/arch/x86/kernel/acpi/Makefile b/arch/x86/kernel/acpi/Makefile index 163b225..98cea92 100644 --- a/arch/x86/kernel/acpi/Makefile +++ b/arch/x86/kernel/acpi/Makefile @@ -1,5 +1,6 @@ obj-$(CONFIG_ACPI) += boot.o obj-$(CONFIG_ACPI_SLEEP) += sleep.o wakeup_$(BITS).o +obj-$(CONFIG_ACPI_NUMA)+= srat.o ifneq ($(CONFIG_ACPI_PROCESSOR),) obj-y += cstate.o diff --git a/arch/x86/kernel/acpi/srat.c b/arch/x86/kernel/acpi/srat.c new file mode 100644 index 000..b20b5b7 --- /dev/null +++ b/arch/x86/kernel/acpi/srat.c @@ -0,0 +1,278 @@ +/* + * ACPI 3.0 based NUMA setup + * Copyright 2004 Andi Kleen, SuSE Labs. + * + * Reads the ACPI SRAT table to figure out what memory belongs to which CPUs. + * + * Called from acpi_numa_init while reading the SRAT and SLIT tables. + * Assumes all memory regions belonging to a single proximity domain + * are in one chunk. Holes between them will be included in the node. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +int acpi_numa __initdata; + +static __init int setup_node(int pxm) +{ + return acpi_map_pxm_to_node(pxm); +} + +static __init void bad_srat(void) +{ + printk(KERN_ERR "SRAT: SRAT not used.\n"); + acpi_numa = -1; +} + +static __init inline int srat_disabled(void) +{ + return acpi_numa < 0; +} + +/* Callback for SLIT parsing */ +void __init acpi_numa_slit_init(struct acpi_table_slit *slit) +{ + int i, j; + + for (i = 0; i < slit->locality_count; i++) + for (j = 0; j < slit->locality_count; j++) + numa_set_distance(pxm_to_node(i), pxm_to_node(j), + slit->entry[slit->locality_count * i + j]); +} + +/* Callback for Proximity Domain -> x2APIC mapping */ +void __init +acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa) +{ + int pxm, node; + int apic_id; + + if (srat_disabled()) + return; + if (pa->header.length < sizeof(struct acpi_srat_x2apic_cpu_affinity)) { + bad_srat(); + return; + } + if ((pa->flags & ACPI_SRAT_CPU_ENABLED) == 0) + return; + pxm = pa->proximity_domain; + apic_id = pa->apic_id; + if (!apic->apic_id_valid(apic_id)) { + printk(KERN_INFO "SRAT: PXM %u -> X2APIC 0x%04x ignored\n", +pxm, apic_id); + return; + } + node = setup_node(pxm); + if (node < 0) { + printk(KERN_ERR "SRAT: Too many proximity domains %x\n", pxm); + bad_srat(); + return; + } + + if (apic_id >= MAX_LOCAL_APIC) { + printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node); + return; + } + set_apicid_to_node(apic_id, node); + node_set(node, numa_nodes_parsed); + acpi_numa = 1; + printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u\n", + pxm, apic_id, node); +} + +/* Callback for Proximity Domain -> LAPIC mapping */ +void __init +acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa) +{ + int pxm, node; + int apic_id; + + if (srat_disabled()) + return; + if (pa->header.length != sizeof(struct acpi_srat_cpu_affinity)) { + bad_srat(); + return; + } + if ((pa->flags & ACPI_SRAT_CPU_ENABLED) == 0) + return; + pxm = pa->proximity_domain_lo; + if (acpi_srat_revision >= 2) + pxm |= *((unsigned int*)pa->proximity_domain_hi) << 8; + node = setup_node(pxm); + if (node < 0) { + printk(KERN_ERR "SRAT: Too many proximity domains %x\n", pxm); + bad_srat(); + return; + } + + if (get_uv_system_type() >= UV_X2APIC) + apic_id = (pa->apic_id << 8) | pa->local_sapic_eid; + else + apic_id = pa->apic_id; + + if (apic_id >= MAX_LOCAL_APIC) { + printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u skipped apicid that is too big\n", pxm, apic_id, node); + return; + } + + set_apicid_to_node(apic_id, node); + node_set(node, numa_nodes_parsed); + acpi_numa
[rebased][PATCH 3/4] acpi: add clock_domain field to acpi_srat_cpu_affinity
according to ACPI SPEC v5.0, page 152, 5.2.16.1 Processor Local APIC/SAPIC Affinity Structure, the last member of it is clock_domain. Reviewed-by: Yasuaki Ishimatsu Acked-by: David Rientjes Signed-off-by: liguang --- include/acpi/actbl1.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h index 0bd750e..e21d22b 100644 --- a/include/acpi/actbl1.h +++ b/include/acpi/actbl1.h @@ -922,7 +922,7 @@ struct acpi_srat_cpu_affinity { u32 flags; u8 local_sapic_eid; u8 proximity_domain_hi[3]; - u32 reserved; /* Reserved, must be zero */ + u32 clock_domain; }; /* Flags */ -- 1.7.2.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] lib: devres: Fix misplaced #endif
A misplaced #endif causes link errors related to pcim_*() functions. Signed-off-by: Jingoo Han --- lib/devres.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/lib/devres.c b/lib/devres.c index 88ad759..8235331 100644 --- a/lib/devres.c +++ b/lib/devres.c @@ -227,6 +227,7 @@ void devm_ioport_unmap(struct device *dev, void __iomem *addr) devm_ioport_map_match, (void *)addr)); } EXPORT_SYMBOL(devm_ioport_unmap); +#endif /* CONFIG_HAS_IOPORT */ #ifdef CONFIG_PCI /* @@ -432,4 +433,3 @@ void pcim_iounmap_regions(struct pci_dev *pdev, int mask) } EXPORT_SYMBOL(pcim_iounmap_regions); #endif /* CONFIG_PCI */ -#endif /* CONFIG_HAS_IOPORT */ -- 1.7.2.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: thermal governor: does it actually work??
Adding Boris, sorry, I can't do anything currently, I'm down with influenza. kind regards, --peter; Zhang Rui writes: On Thu, 2013-02-14 at 16:32 +0100, Andreas Mohr wrote: For me after having loaded acerhdf the fan never stops (with kernelmode active), despite staying safely below trip point (acerhdf_set_cur_state() actually never gets called). BTW, could you please check if this one fixes the problem for you? http://git.kernel.org/?p=linux/kernel/git/rzhang/linux.git;a=commit;h=b8bb6cb999858043489c1ddef08eed2127559169 thanks, rui And AFAIR in a 3.2.0 kernel acerhdf fan operation seemed to just work (i.e., no fan for low temps, from the beginning). Needless to say 3.2.0 didn't even feature all the modern thermal governor crapyard yet ;) (ok, well, it's more complex but it's also a very nice environment capability) 3.8-rc7: CONFIG_ACPI_THERMAL=m CONFIG_THERMAL=m CONFIG_THERMAL_HWMON=y CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE=y # CONFIG_THERMAL_DEFAULT_GOV_FAIR_SHARE is not set # CONFIG_THERMAL_DEFAULT_GOV_USER_SPACE is not set CONFIG_FAIR_SHARE=y CONFIG_STEP_WISE=y # CONFIG_USER_SPACE is not set # CONFIG_CPU_THERMAL is not set Terminology in this area seems to be quite a bit off, too, at several docs places, at least according to my understanding: e.g. drivers/thermal/step_wise.c has the following comment: /** * step_wise_throttle - throttles devices associated with the given zone * @tz - thermal_zone_device * @trip - the trip point * @trip_type - type of the trip point * * Throttling Logic: This uses the trend of the thermal zone to * throttle. * If the thermal zone is 'heating up' this throttles all the cooling * devices associated with the zone and its particular trip point, by * one * step. If the zone is 'cooling down' it brings back the performance of * the devices by one step. if ... heating up ... throttles ... Sorry, but at least for P4 clockmod stuff (or some such), throttle states (P1...P8 IIRC) meant that the CPU operation was *reduced*, i.e. with pause intervals. And the translation of throttle clearly says that it does go that way and not the other way... (yes, you managed to confuse me that much that I even had to look up things to verify) ... cooling down ... brings back ... This should certainly be worded "reduces" or some such. So, any idea why I'm missing callbacks in acerhdf (if that is what I'm supposed to expect to happen)? Kernel bug, .config mistake, missing/wrong user-side setup? Needless to say if kernel bug this ought to be fixed pre-3.8 ideally. Thanks, Andreas Mohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] Update LZO compression code for v3.9
Hi Linus, please pull my "lzo-update" branch from git://github.com/markus-oberhumer/linux.git lzo-update You can also browse the branch at https://github.com/markus-oberhumer/linux/compare/lzo-update The LZO update actually had been approved by akpm for a 3.7 merge and is available in linux-next since October, but I've only recently learned that there is no automatic flow from linux-next to linux and I have to personally send a pull request. Many thanks, Markus Summary: Update the Linux kernel LZO compression and decompression code to the current upstream version which features significant performance improvements on modern machines. $ git shortlog v3.8..lzo-update Markus F.X.J. Oberhumer (3): lib/lzo: Rename lzo1x_decompress.c to lzo1x_decompress_safe.c lib/lzo: Update LZO compression to current upstream version crypto: testmgr - update LZO compression test vectors $ git diff --stat v3.8..lzo-update crypto/testmgr.h| 38 +++-- include/linux/lzo.h | 15 +- lib/decompress_unlzo.c |2 +- lib/lzo/Makefile|2 +- lib/lzo/lzo1x_compress.c| 335 ++ lib/lzo/lzo1x_decompress.c | 255 - lib/lzo/lzo1x_decompress_safe.c | 237 +++ lib/lzo/lzodefs.h | 38 +++-- 8 files changed, 488 insertions(+), 434 deletions(-) Some *synthetic* benchmarks: x86_64 (Sandy Bridge), gcc-4.6 -O3, Silesia test corpus, 256 kB block-size: compression speed decompression speed LZO-2005: 150 MB/sec 468 MB/sec LZO-2012: 434 MB/sec 1210 MB/sec i386 (Sandy Bridge), gcc-4.6 -O3, Silesia test corpus, 256 kB block-size: compression speed decompression speed LZO-2005: 143 MB/sec 409 MB/sec LZO-2012: 372 MB/sec 1121 MB/sec armv7 (Cortex-A9), Linaro gcc-4.6 -O3, Silesia test corpus, 256 kB block-size: compression speed decompression speed LZO-2005: 27 MB/sec 84 MB/sec LZO-2012: 44 MB/sec 117 MB/sec **LZO-2013-UA : 47 MB/sec 167 MB/sec Legend: LZO-2005: LZO version in current 3.8 kernel (which is based on the LZO 2.02 release from 2005) LZO-2012: updated LZO version available in linux-next **LZO-2013-UA : updated LZO version available in linux-next plus experimental ARM Unaligned Access patch. This needs approval from some ARM maintainer ist NOT YET INCLUDED. -- Markus Oberhumer, , http://www.oberhumer.com/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()
On 02/22/2013 01:02 PM, Mike Galbraith wrote: > On Fri, 2013-02-22 at 10:36 +0800, Michael Wang wrote: >> On 02/21/2013 05:43 PM, Mike Galbraith wrote: >>> On Thu, 2013-02-21 at 17:08 +0800, Michael Wang wrote: >>> But is this patch set really cause regression on your Q6600? It may sacrificed some thing, but I still think it will benefit far more, especially on huge systems. >>> >>> We spread on FORK/EXEC, and will no longer will pull communicating tasks >>> back to a shared cache with the new logic preferring to leave wakee >>> remote, so while no, I haven't tested (will try to find round tuit) it >>> seems it _must_ hurt. Dragging data from one llc to the other on Q6600 >>> hurts a LOT. Every time a client and server are cross llc, it's a huge >>> hit. The previous logic pulled communicating tasks together right when >>> it matters the most, intermittent load... or interactive use. >> >> I agree that this is a problem need to be solved, but don't agree that >> wake_affine() is the solution. > > It's not perfect, but it's better than no countering force at all. It's > a relic of the dark ages, when affine meant L2, ie this cpu. Now days, > affine has a whole new meaning, L3, so it could be done differently, but > _some_ kind of opposing force is required. > >> According to my understanding, in the old world, wake_affine() will only >> be used if curr_cpu and prev_cpu share cache, which means they are in >> one package, whatever search in llc sd of curr_cpu or prev_cpu, we won't >> have the chance to spread the task out of that package. > > ? affine_sd is the first domain spanning both cpus, that may be NODE. > True we won't ever spread in the wakeup path unless SD_WAKE_BALANCE is > set that is. Would be nice to be able to do that without shredding > performance. > > Off the top of my pointy head, I can think of a way to _maybe_ improve > the "affine" wakeup criteria: Add a small (package size? and very fast) > FIFO queue to task struct, record waker/wakee relationship. If > relationship exists in that queue (rbtree), try to wake local, if not, > wake remote. The thought is to identify situations ala 1:N pgbench > where you really need to keep the load spread. That need arises when > the sum wakees + waker won't fit in one cache. True buddies would > always hit (hm, hit rate), always try to become affine where they > thrive. 1:N stuff starts missing when client count exceeds package > size, starts expanding it's horizons. 'Course you would still need to > NAK if imbalanced too badly, and let NUMA stuff NAK touching lard-balls > and whatnot. With a little more smarts, we could have happy 1:N, and > buddies don't have to chat through 2m thick walls to make 1:N scale as > well as it can before it dies of stupidity. Just confirm that I'm not on the wrong way, did the 1:N mode here means 1 task forked N threads, and child always talk with father? Regards, Michael Wang > > -Mike > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: build failure after merge of the final tree (drm tree related)
Hmm, maybe DRM_GEM_CMA_HELPER should depend on ARM (or !PPC)? Or maybe there is an alternative fxn to use on other archs? In truth, it is fine to make TILCDC depend on ARM, as it wouldn't be used on any other platform (today.. until TI comes up with some crazy chip w/ some TI DSP plus display controller), although that doesn't quite feel like the right fix. It would be nice to make the CMA helpers do the right thing on other archs somehow. BR, -R On Thu, Feb 21, 2013 at 11:17 PM, Stephen Rothwell wrote: > Hi all, > > After merging the final tree, today's linux-next build (powerpc > allyesconfig) failed like this: > > drivers/gpu/drm/drm_gem_cma_helper.c: In function 'drm_gem_cma_buf_destroy': > drivers/gpu/drm/drm_gem_cma_helper.c:38:2: error: implicit declaration of > function 'dma_free_writecombine' [-Werror=implicit-function-declaration] > drivers/gpu/drm/drm_gem_cma_helper.c: In function 'drm_gem_cma_create': > drivers/gpu/drm/drm_gem_cma_helper.c:61:2: error: implicit declaration of > function 'dma_alloc_writecombine' [-Werror=implicit-function-declaration] > > Probably caused by commit 16ea975eac67 ("drm/tilcdc: add TI LCD > Controller DRM driver (v4)") which forced CONFIG_DRM_GEM_CMA_HELPER to > 'y'. dma_alloc/free_writecombine are only defined on ARM. > > I added this patch for today. > > From: Stephen Rothwell > Date: Fri, 22 Feb 2013 15:14:50 +1100 > Subject: [PATCH] drm/tilcdc: only build on arm > > Signed-off-by: Stephen Rothwell > --- > drivers/gpu/drm/tilcdc/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/tilcdc/Kconfig b/drivers/gpu/drm/tilcdc/Kconfig > index ae14fd6..d24d040 100644 > --- a/drivers/gpu/drm/tilcdc/Kconfig > +++ b/drivers/gpu/drm/tilcdc/Kconfig > @@ -1,6 +1,6 @@ > config DRM_TILCDC > tristate "DRM Support for TI LCDC Display Controller" > - depends on DRM && OF > + depends on DRM && OF && ARM > select DRM_KMS_HELPER > select DRM_KMS_CMA_HELPER > select DRM_GEM_CMA_HELPER > -- > 1.8.1 > > -- > Cheers, > Stephen Rothwells...@canb.auug.org.au -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()
On Fri, 2013-02-22 at 10:37 +0800, Michael Wang wrote: > According to the testing result, I could not agree this purpose of > wake_affine() benefit us, but I'm sure that wake_affine() is a terrible > performance killer when system is busy. (hm, result is singular.. pgbench in 1:N mode only?) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/3] sched: schedule balance map foundation
On 02/22/2013 12:46 PM, Alex Shi wrote: > On 02/22/2013 12:19 PM, Michael Wang wrote: >> Why not seek other way to change O(n^2) to O(n)? Access 2G memory is unbelievable performance cost. >> Not access 2G memory, but (2G / 16K) memory, the sbm size is O(N). >> >> And please notice that on 16k cpus system, topology will be deep if NUMA >> enabled (O(log N) as Peter said), and that's really a good stage for >> this idea to perform on, we could save lot's of recursed 'for' cycles. >> > > CPU execute part is very very fast compare to the memory access, the > 'for' cycles cost is most on the memory access for many domain/groups > data, not instruction execution. > > In a hot patch, several KB memory access will cause clear cpu cache > pollution then make kernel slowly. Hmm...that's a good catch. Comparison between memory access and cpu execution, no doubt the latter will win, you are right. But that was same in the old world when access the struct sched_domain, isn't it? for_each_domain(cpu, tmp) { if (weight <= tmp->span_weight) break; if (tmp->flags & sd_flag) sd = tmp; } Both old and new may access data across nodes, but the old one will access several times more, isn't it? Regards, Michael Wang > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()
On Fri, 2013-02-22 at 10:36 +0800, Michael Wang wrote: > On 02/21/2013 05:43 PM, Mike Galbraith wrote: > > On Thu, 2013-02-21 at 17:08 +0800, Michael Wang wrote: > > > >> But is this patch set really cause regression on your Q6600? It may > >> sacrificed some thing, but I still think it will benefit far more, > >> especially on huge systems. > > > > We spread on FORK/EXEC, and will no longer will pull communicating tasks > > back to a shared cache with the new logic preferring to leave wakee > > remote, so while no, I haven't tested (will try to find round tuit) it > > seems it _must_ hurt. Dragging data from one llc to the other on Q6600 > > hurts a LOT. Every time a client and server are cross llc, it's a huge > > hit. The previous logic pulled communicating tasks together right when > > it matters the most, intermittent load... or interactive use. > > I agree that this is a problem need to be solved, but don't agree that > wake_affine() is the solution. It's not perfect, but it's better than no countering force at all. It's a relic of the dark ages, when affine meant L2, ie this cpu. Now days, affine has a whole new meaning, L3, so it could be done differently, but _some_ kind of opposing force is required. > According to my understanding, in the old world, wake_affine() will only > be used if curr_cpu and prev_cpu share cache, which means they are in > one package, whatever search in llc sd of curr_cpu or prev_cpu, we won't > have the chance to spread the task out of that package. ? affine_sd is the first domain spanning both cpus, that may be NODE. True we won't ever spread in the wakeup path unless SD_WAKE_BALANCE is set that is. Would be nice to be able to do that without shredding performance. Off the top of my pointy head, I can think of a way to _maybe_ improve the "affine" wakeup criteria: Add a small (package size? and very fast) FIFO queue to task struct, record waker/wakee relationship. If relationship exists in that queue (rbtree), try to wake local, if not, wake remote. The thought is to identify situations ala 1:N pgbench where you really need to keep the load spread. That need arises when the sum wakees + waker won't fit in one cache. True buddies would always hit (hm, hit rate), always try to become affine where they thrive. 1:N stuff starts missing when client count exceeds package size, starts expanding it's horizons. 'Course you would still need to NAK if imbalanced too badly, and let NUMA stuff NAK touching lard-balls and whatnot. With a little more smarts, we could have happy 1:N, and buddies don't have to chat through 2m thick walls to make 1:N scale as well as it can before it dies of stupidity. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] x86/microcode for v3.9-rc1
Hi Linus, This patchset lets us update the CPU microcode very, very early in initialization if the BIOS fails to do so (never happens, right?) This is handy for dealing with things like the Atom erratum where we have to run without PSE because microcode loading happens too late. As I mentioned in the x86/mm push request it depends on that infrastructure but it is otherwise a standalone feature. The following changes since commit ac2cbab21f318e19bc176a7f38a120cec835220f: x86: Don't panic if can not alloc buffer for swiotlb (2013-01-29 19:36:53 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/microcode for you to fetch changes up to da76f64e7eb28b718501d15c1b79af560b7ca4ea: x86/Kconfig: Make early microcode loading a configuration feature (2013-01-31 13:20:42 -0800) Fenghua Yu (12): x86, doc: Documentation for early microcode loading x86/microcode_intel.h: Define functions and macros for early loading ucode x86/common.c: Make have_cpuid_p() a global function x86/common.c: load ucode in 64 bit or show loading ucode info in 32 bit on AP x86/microcode_core_early.c: Define interfaces for early loading ucode x86/microcode_intel_lib.c: Early update ucode on Intel's CPU x86/tlbflush.h: Define __native_flush_tlb_global_irq_disabled() x86/microcode_intel_early.c: Early update ucode on Intel's CPU x86/head_32.S: Early update ucode in 32-bit x86/head64.c: Early update ucode in 64-bit x86/mm/init.c: Copy ucode from initrd image to kernel memory x86/Kconfig: Make early microcode loading a configuration feature Documentation/x86/early-microcode.txt | 43 ++ arch/x86/Kconfig| 18 + arch/x86/include/asm/microcode.h| 14 + arch/x86/include/asm/microcode_intel.h | 85 arch/x86/include/asm/processor.h| 8 + arch/x86/include/asm/tlbflush.h | 18 +- arch/x86/kernel/Makefile| 3 + arch/x86/kernel/cpu/common.c| 17 +- arch/x86/kernel/head64.c| 6 + arch/x86/kernel/head_32.S | 11 + arch/x86/kernel/microcode_core.c| 7 +- arch/x86/kernel/microcode_core_early.c | 76 +++ arch/x86/kernel/microcode_intel.c | 198 ++-- arch/x86/kernel/microcode_intel_early.c | 796 arch/x86/kernel/microcode_intel_lib.c | 174 +++ arch/x86/mm/init.c | 10 + 16 files changed, 1301 insertions(+), 183 deletions(-) [full diff omitted due to length] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] kexec: avoid freeing NULL pointer in function kimage_crash_alloc
Though there is no error if we free a NULL pointer, I think we could avoid this behaviour. Change the code a little in kimage_crash_alloc() could avoid this kind of unnecessary free. Cc: "Eric W. Biederman" Cc: Andrew Morton Signed-off-by: Zhang Yanfei --- kernel/kexec.c | 15 +++ 1 files changed, 7 insertions(+), 8 deletions(-) diff --git a/kernel/kexec.c b/kernel/kexec.c index 5e4bd78..4e96fa7 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -310,7 +310,7 @@ static int kimage_crash_alloc(struct kimage **rimage, unsigned long entry, mend = mstart + image->segment[i].memsz - 1; /* Ensure we are within the crash kernel limits */ if ((mstart < crashk_res.start) || (mend > crashk_res.end)) - goto out; + goto out_free; } /* @@ -323,16 +323,15 @@ static int kimage_crash_alloc(struct kimage **rimage, unsigned long entry, get_order(KEXEC_CONTROL_PAGE_SIZE)); if (!image->control_code_page) { printk(KERN_ERR "Could not allocate control_code_buffer\n"); - goto out; + goto out_free; } - result = 0; -out: - if (result == 0) - *rimage = image; - else - kfree(image); + *rimage = image; + return 0; +out_free: + kfree(image); +out: return result; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] PM / devfreq: fix missing unlock on error in exynos4_busfreq_pm_notifier_event()
On 12:33-20130222, Wei Yongjun wrote: > From: Wei Yongjun > > Add the missing unlock before return from function > exynos4_busfreq_pm_notifier_event() in the error > handling case. > > This issue introduced by commit 8fa938 > (PM / devfreq: exynos4_bus: honor RCU lock usage) > > Signed-off-by: Wei Yongjun > --- > drivers/devfreq/exynos4_bus.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/devfreq/exynos4_bus.c b/drivers/devfreq/exynos4_bus.c > index 46d94e9..6208a68 100644 > --- a/drivers/devfreq/exynos4_bus.c > +++ b/drivers/devfreq/exynos4_bus.c > @@ -974,6 +974,7 @@ static int exynos4_busfreq_pm_notifier_event(struct > notifier_block *this, > rcu_read_unlock(); > dev_err(data->dev, "%s: unable to find a min freq\n", > __func__); > + mutex_unlock(&data->lock); > return PTR_ERR(opp); > } > new_oppinfo.rate = opp_get_freq(opp); > > Arrgh.. Thanks for catching this :( My bad. Fix looks good to me. upto MyungJoo. MyungJoo, Rafael, btw, adding linux...@vger.kernel.org to MAINTAINERS for devfreq might be a nice idea to have right audience. -- Regards, Nishanth Menon -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/3] sched: schedule balance map foundation
On 02/22/2013 12:19 PM, Michael Wang wrote: > >> > Why not seek other way to change O(n^2) to O(n)? >> > >> > Access 2G memory is unbelievable performance cost. > Not access 2G memory, but (2G / 16K) memory, the sbm size is O(N). > > And please notice that on 16k cpus system, topology will be deep if NUMA > enabled (O(log N) as Peter said), and that's really a good stage for > this idea to perform on, we could save lot's of recursed 'for' cycles. > CPU execute part is very very fast compare to the memory access, the 'for' cycles cost is most on the memory access for many domain/groups data, not instruction execution. In a hot patch, several KB memory access will cause clear cpu cache pollution then make kernel slowly. -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] kexec: fix memory leak in function kimage_normal_alloc
If kimage_normal_alloc() fails to alloc pages for image->swap_page, it should call kimage_free_page_list() to free allocated pages in image->control_pages list before it frees image. Cc: "Eric W. Biederman" Cc: Andrew Morton Cc: Sasha Levin Signed-off-by: Zhang Yanfei --- kernel/kexec.c | 18 ++ 1 files changed, 10 insertions(+), 8 deletions(-) diff --git a/kernel/kexec.c b/kernel/kexec.c index 5e4bd78..a57face 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -223,6 +223,8 @@ out: } +static void kimage_free_page_list(struct list_head *list); + static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry, unsigned long nr_segments, struct kexec_segment __user *segments) @@ -248,22 +250,22 @@ static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry, get_order(KEXEC_CONTROL_PAGE_SIZE)); if (!image->control_code_page) { printk(KERN_ERR "Could not allocate control_code_buffer\n"); - goto out; + goto out_free; } image->swap_page = kimage_alloc_control_pages(image, 0); if (!image->swap_page) { printk(KERN_ERR "Could not allocate swap buffer\n"); - goto out; + goto out_free; } - result = 0; - out: - if (result == 0) - *rimage = image; - else - kfree(image); + *rimage = image; + return 0; +out_free: + kimage_free_page_list(&image->control_pages); + kfree(image); +out: return result; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] PM / devfreq: fix missing unlock on error in exynos4_busfreq_pm_notifier_event()
From: Wei Yongjun Add the missing unlock before return from function exynos4_busfreq_pm_notifier_event() in the error handling case. This issue introduced by commit 8fa938 (PM / devfreq: exynos4_bus: honor RCU lock usage) Signed-off-by: Wei Yongjun --- drivers/devfreq/exynos4_bus.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/devfreq/exynos4_bus.c b/drivers/devfreq/exynos4_bus.c index 46d94e9..6208a68 100644 --- a/drivers/devfreq/exynos4_bus.c +++ b/drivers/devfreq/exynos4_bus.c @@ -974,6 +974,7 @@ static int exynos4_busfreq_pm_notifier_event(struct notifier_block *this, rcu_read_unlock(); dev_err(data->dev, "%s: unable to find a min freq\n", __func__); + mutex_unlock(&data->lock); return PTR_ERR(opp); } new_oppinfo.rate = opp_get_freq(opp); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] slave-dmaengine updates
Hi Linus, Here is the slave-dmaengine updates. This is fairly big pull by my standards as I had missed last merge window. So we have the support for device tree for slave-dmaengine, large updates to dw_dmac driver from Andy for reusing on different architectures. Along with this we have fixes on bunch of the drivers Thanks ~Vinod -- The following changes since commit d1c3ed669a2d452cacfb48c2d171a1f364dae2ed: are available in the git repository at: git://git.infradead.org/users/vkoul/slave-dma.git next Akinobu Mita (4): dmaengine: use for_each_set_bit dma: amba-pl08x: use vchan_dma_desc_free_list dmatest: adjust invalid module parameters for number of source buffers async_tx: use memchr_inv Alessandro Rubini (1): pl080.h: moved from arm/include/asm/hardware to include/linux/amba/ Andy Shevchenko (34): dw_dmac: change dev_printk() to corresponding macros dw_dmac: don't call platform_get_drvdata twice dw_dmac: change dev_crit to dev_WARN in dwc_handle_error dw_dmac: introduce to_dw_desc() macro dw_dmac: absence of pdata isn't critical when autocfg is set dw_dmac: check for mapping errors dw_dmac: remove redundant check dw_dmac: update tx_node_active in dwc_do_single_block dma: dw_dmac: add dwc_chan_pause and dwc_chan_resume dmaengine: introduce is_slave_direction function dma: at_hdmac: check direction properly for cyclic transfers dma: dw_dmac: check direction properly in dw_dma_cyclic_prep dma: ep93xx_dma: reuse is_slave_direction helper dma: ipu_idmac: reuse is_slave_direction helper dma: ste_dma40: reuse is_slave_direction helper dw_dmac: call .probe after we have a device in place dw_dmac: store direction in the custom channel structure dw_dmac: make usage of dw_dma_slave optional dw_dmac: backlink to dw_dma in dw_dma_chan is superfluous dw_dmac: allocate dma descriptors from DMA_COHERENT memory dw_dmac: don't exceed AHB master number in dwc_get_data_width dw_dmac: move soft LLP code from tasklet to dwc_scan_descriptors dw_dmac: print out DW_PARAMS and DWC_PARAMS when debug dw_dmac: remove unnecessary tx_list field in dw_dma_chan dw_dmac: introduce total_len field in struct dw_desc dw_dmac: fill individual length of descriptor dw_dmac: return proper residue value dw_dmac: apply default dma_mask if needed dma: of-dma: protect list write operation by spin_lock dmaengine.h: remove redundant else keyword dma: coh901318: avoid unbalanced locking dma: coh901318: set residue only if dma is in progress edma: do not waste memory for dma_mask dma: tegra20-apb-dma: remove unnecessary assignment Arnd Bergmann (1): Revert "ARM: SPEAr13xx: Pass DW DMAC platform data from DT" Barry Song (4): dmaengine: sirf: enable the driver support new SiRFmarco SoC DMAEngine: add dmaengine_prep_interleaved_dma wrapper for interleaved api DMAEngine: sirf: add DMA pause/resume support DMAEngine: sirf: lock the shared registers access in sirfsoc_dma_terminate_all Bartlomiej Zolnierkiewicz (10): async_tx: add missing DMA unmap to async_memcpy() ioat: add missing DMA unmap to ioat_dma_self_test() mtd: fsmc_nand: add missing DMA unmap to dma_xfer() carma-fpga: pass correct flags to ->device_prep_dma_memcpy() ioat3: add missing DMA unmap to ioat_xor_val_self_test() async_tx: fix build for async_memset dmaengine: remove dma_async_memcpy_pending() macro dmaengine: remove dma_async_memcpy_complete() macro dmaengine: add cpu_relax() to busy-loop in dma_sync_wait() async_tx: fix checking of dma_wait_for_async_tx() return value Cong Ding (3): dma: remove unnecessary null pointer check in mmp_pdma.c dma: sh/shdma-base.c: remove unnecessary null pointer check dma: of-dma.c: fix memory leakage Dave Jiang (3): ioat: Add alignment workaround for IVB platforms ioat: remove chanerr mask setting for IOAT v3.x ioatdma: fix race between updating ioat->head and IOAT_COMPLETION_PENDING Fabio Baltieri (8): dmaengine: ste_dma40: add a done queue for completed descriptors dmaengine: ste_dma40: add missing kernel-doc entry dmaengine: ste_dma40: minor cosmetic fixes dmaengine: ste_dma40: minor code readability fixes dmaengine: ste_dma40: add software lli support dmaengine: set_dma40: ignore spurious interrupts dmaengine: set_dma40: balance clock in probe fail code dmaengine: ste_dma40: do not remove descriptors for cyclic transfers Fabio Estevam (1): dma: mxs-dma: Fix build warnings with W=1 Fengguang Wu (1): dmaengine: ioat - fix spare sparse complain Gerald Baeza (2): dmaengine: ste_dma40: support fixed physical channel allocation dmaengine: ste_dma40: physical channels number correction Guennad
Re: [PATCH 1/7] ksm: add some comments
On 02/21/2013 04:19 PM, Hugh Dickins wrote: Added slightly more detail to the Documentation of merge_across_nodes, a few comments in areas indicated by review, and renamed get_ksm_page()'s argument from "locked" to "lock_it". No functional change. Signed-off-by: Hugh Dickins --- Documentation/vm/ksm.txt | 16 mm/ksm.c | 18 ++ 2 files changed, 26 insertions(+), 8 deletions(-) --- mmotm.orig/Documentation/vm/ksm.txt 2013-02-20 22:28:09.456001057 -0800 +++ mmotm/Documentation/vm/ksm.txt 2013-02-20 22:28:23.580001392 -0800 @@ -60,10 +60,18 @@ sleep_millisecs - how many milliseconds merge_across_nodes - specifies if pages from different numa nodes can be merged. When set to 0, ksm merges only pages which physically - reside in the memory area of same NUMA node. It brings - lower latency to access to shared page. Value can be - changed only when there is no ksm shared pages in system. - Default: 1 + reside in the memory area of same NUMA node. That brings + lower latency to access of shared pages. Systems with more + nodes, at significant NUMA distances, are likely to benefit + from the lower latency of setting 0. Smaller systems, which + need to minimize memory usage, are likely to benefit from + the greater sharing of setting 1 (default). You may wish to + compare how your system performs under each setting, before + deciding on which to use. merge_across_nodes setting can be + changed only when there are no ksm shared pages in system: + set run 2 to unmerge pages first, then to 1 after changing + merge_across_nodes, to remerge according to the new setting. What's the root reason merge_across_nodes setting just can be changed only when there are no ksm shared pages in system? Can they be unmerged and merged again during ksmd scan? + Default: 1 (merging across nodes as in earlier releases) run - set 0 to stop ksmd from running but keep merged pages, set 1 to run ksmd e.g. "echo 1 > /sys/kernel/mm/ksm/run", --- mmotm.orig/mm/ksm.c 2013-02-20 22:28:09.456001057 -0800 +++ mmotm/mm/ksm.c 2013-02-20 22:28:23.584001392 -0800 @@ -87,6 +87,9 @@ *take 10 attempts to find a page in the unstable tree, once it is found, *it is secured in the stable tree. (When we scan a new page, we first *compare it against the stable tree, and then against the unstable tree.) + * + * If the merge_across_nodes tunable is unset, then KSM maintains multiple + * stable trees and multiple unstable trees: one of each for each NUMA node. */ /** @@ -524,7 +527,7 @@ static void remove_node_from_stable_tree * a page to put something that might look like our key in page->mapping. * is on its way to being freed; but it is an anomaly to bear in mind. */ -static struct page *get_ksm_page(struct stable_node *stable_node, bool locked) +static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it) { struct page *page; void *expected_mapping; @@ -573,7 +576,7 @@ again: goto stale; } - if (locked) { + if (lock_it) { lock_page(page); if (ACCESS_ONCE(page->mapping) != expected_mapping) { unlock_page(page); @@ -703,10 +706,17 @@ static int remove_stable_node(struct sta return 0; } - if (WARN_ON_ONCE(page_mapped(page))) + if (WARN_ON_ONCE(page_mapped(page))) { + /* +* This should not happen: but if it does, just refuse to let +* merge_across_nodes be switched - there is no need to panic. +*/ err = -EBUSY; - else { + } else { /* +* The stable node did not yet appear stale to get_ksm_page(), +* since that allows for an unmapped ksm page to be recognized +* right up until it is freed; but the node is safe to remove. * This page might be in a pagevec waiting to be freed, * or it might be PageSwapCache (perhaps under writeback), * or it might have been removed from swapcache a moment ago. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: mailto:"d...@kvack.org";> em...@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkm
Re: [RFC PATCH v3 1/3] sched: schedule balance map foundation
On 02/22/2013 11:33 AM, Alex Shi wrote: > On 02/22/2013 10:53 AM, Michael Wang wrote: >> And the final cost is 3000 int and 103 pointer, and some padding, >> but won't bigger than 10M, not a big deal for a system with 1000 cpu >> too. Maybe, but quadric stuff should be frowned upon at all times, these things tend to explode when you least expect it. For instance, IIRC the biggest single image system SGI booted had 16k cpus in there, that ends up at something like 14+14+3=31 aka as 2G of storage just for your lookup -- that seems somewhat preposterous. >> Honestly, if I'm a admin who own 16k cpus system (I could not even image >> how many memory it could have...), I really prefer to exchange 2G memory >> to gain some performance. >> >> I see your point here, the cost of space will grow exponentially, but >> the memory of system will also grow, and according to my understanding , >> it's faster. > Hi, Alex Thanks for your reply. > Why not seek other way to change O(n^2) to O(n)? > > Access 2G memory is unbelievable performance cost. Not access 2G memory, but (2G / 16K) memory, the sbm size is O(N). And please notice that on 16k cpus system, topology will be deep if NUMA enabled (O(log N) as Peter said), and that's really a good stage for this idea to perform on, we could save lot's of recursed 'for' cycles. > > There are too many jokes on the short-sight of compute scalability, like > Gates' 64K memory in 2000. Please do believe me that I won't give up any chance to solve or lighten this issue (like apply Mike's suggestion), and please let me know if you have any suggestions to reduce the memory cost. May be I could make this idea as an option, override the select_task_rq_fair() when people want the new logical, and if they don't want to trade with memory, just !CONFIG. Regards, Michael Wang > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: build failure after merge of the final tree (drm tree related)
Hi all, After merging the final tree, today's linux-next build (powerpc allyesconfig) failed like this: drivers/gpu/drm/drm_gem_cma_helper.c: In function 'drm_gem_cma_buf_destroy': drivers/gpu/drm/drm_gem_cma_helper.c:38:2: error: implicit declaration of function 'dma_free_writecombine' [-Werror=implicit-function-declaration] drivers/gpu/drm/drm_gem_cma_helper.c: In function 'drm_gem_cma_create': drivers/gpu/drm/drm_gem_cma_helper.c:61:2: error: implicit declaration of function 'dma_alloc_writecombine' [-Werror=implicit-function-declaration] Probably caused by commit 16ea975eac67 ("drm/tilcdc: add TI LCD Controller DRM driver (v4)") which forced CONFIG_DRM_GEM_CMA_HELPER to 'y'. dma_alloc/free_writecombine are only defined on ARM. I added this patch for today. From: Stephen Rothwell Date: Fri, 22 Feb 2013 15:14:50 +1100 Subject: [PATCH] drm/tilcdc: only build on arm Signed-off-by: Stephen Rothwell --- drivers/gpu/drm/tilcdc/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/tilcdc/Kconfig b/drivers/gpu/drm/tilcdc/Kconfig index ae14fd6..d24d040 100644 --- a/drivers/gpu/drm/tilcdc/Kconfig +++ b/drivers/gpu/drm/tilcdc/Kconfig @@ -1,6 +1,6 @@ config DRM_TILCDC tristate "DRM Support for TI LCDC Display Controller" - depends on DRM && OF + depends on DRM && OF && ARM select DRM_KMS_HELPER select DRM_KMS_CMA_HELPER select DRM_GEM_CMA_HELPER -- 1.8.1 -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpM6ymwwh7Qp.pgp Description: PGP signature
Re: [PATCH] staging/zcache: Fix/improve zcache writeback code, tie to a config option
On 02/07/2013 02:27 AM, Dan Magenheimer wrote: It was observed by Andrea Arcangeli in 2011 that zcache can get "full" and there must be some way for compressed swap pages to be (uncompressed and then) sent through to the backing swap disk. A prototype of this functionality, called "unuse", was added in 2012 as part of a major update to zcache (aka "zcache2"), but was left unfinished due to the unfortunate temporary fork of zcache. This earlier version of the code had an unresolved memory leak and was anyway dependent on not-yet-upstream frontswap and mm changes. The code was meanwhile adapted by Seth Jennings for similar functionality in zswap (which he calls "flush"). Seth also made some clever simplifications which are herein ported back to zcache. As a result of those simplifications, the frontswap changes are no longer necessary, but a slightly different (and simpler) set of mm changes are still required [1]. The memory leak is also fixed. Due to feedback from akpm in a zswap thread, this functionality in zcache has now been renamed from "unuse" to "writeback". Although this zcache writeback code now works, there are open questions as how best to handle the policy that drives it. As a result, this patch also ties writeback to a new config option. And, since the code still depends on not-yet-upstreamed mm patches, to avoid build problems, the config option added by this patch temporarily depends on "BROKEN"; this config dependency can be removed in trees that contain the necessary mm patches. [1] https://lkml.org/lkml/2013/1/29/540/ https://lkml.org/lkml/2013/1/29/539/ shrink_zcache_memory: while(nr_evict-- > 0) { page = zcache_evict_eph_pageframe(); if (page == NULL) break; zcache_free_page(page); } zcache_evict_eph_pageframe ->zbud_evict_pageframe_lru ->zbud_evict_tmem ->tmem_flush_page ->zcache_pampd_free ->zcache_free_page <- zbudpage has already been free here If the zcache_free_page called in shrink_zcache_memory can be treated as a double free? Signed-off-by: Dan Magenheimer --- drivers/staging/zcache/Kconfig | 17 ++ drivers/staging/zcache/zcache-main.c | 332 +++--- 2 files changed, 284 insertions(+), 65 deletions(-) diff --git a/drivers/staging/zcache/Kconfig b/drivers/staging/zcache/Kconfig index c1dbd04..7358270 100644 --- a/drivers/staging/zcache/Kconfig +++ b/drivers/staging/zcache/Kconfig @@ -24,3 +24,20 @@ config RAMSTER while minimizing total RAM across the cluster. RAMster, like zcache2, compresses swap pages into local RAM, but then remotifies the compressed pages to another node in the RAMster cluster. + +# Depends on not-yet-upstreamed mm patches to export end_swap_bio_write and +# __add_to_swap_cache, and implement __swap_writepage (which is swap_writepage +# without the frontswap call. When these are in-tree, the dependency on +# BROKEN can be removed +config ZCACHE_WRITEBACK + bool "Allow compressed swap pages to be writtenback to swap disk" + depends on ZCACHE=y && BROKEN + default n + help + Zcache caches compressed swap pages (and other data) in RAM which + often improves performance by avoiding I/O's due to swapping. + In some workloads with very long-lived large processes, it can + instead reduce performance. Writeback decompresses zcache-compressed + pages (in LRU order) when under memory pressure and writes them to + the backing swap disk to ameliorate this problem. Policy driving + writeback is still under development. diff --git a/drivers/staging/zcache/zcache-main.c b/drivers/staging/zcache/zcache-main.c index c1ac905..5bf14c3 100644 --- a/drivers/staging/zcache/zcache-main.c +++ b/drivers/staging/zcache/zcache-main.c @@ -22,6 +22,10 @@ #include #include #include +#include +#include +#include +#include #include #include @@ -55,6 +59,9 @@ static inline void frontswap_tmem_exclusive_gets(bool b) } #endif +/* enable (or fix code) when Seth's patches are accepted upstream */ +#define zcache_writeback_enabled 0 + static int zcache_enabled __read_mostly; static int disable_cleancache __read_mostly; static int disable_frontswap __read_mostly; @@ -181,6 +188,8 @@ static unsigned long zcache_last_active_anon_pageframes; static unsigned long zcache_last_inactive_anon_pageframes; static unsigned long zcache_eph_nonactive_puts_ignored; static unsigned long zcache_pers_nonactive_puts_ignored; +static unsigned long zcache_writtenback_pages; +static long zcache_outstanding_writeback_pages; #ifdef CONFIG_DEBUG_FS #include @@ -239,6 +248,9 @@ static int zcache_debugfs_init(void) zdfs64("eph_zbytes_max", S_IRUGO, root, &zcache_eph_zbytes_max); zdfs64("pers_zbytes", S_IRUGO, root, &zcache_pers_zbytes); zdfs64("pers_zbytes_max", S_IRUGO, root, &zcache_pers_zbytes_max); + zd
Re: [PATCH 0/4] dcache: make Oracle more scalable on large systems
On 02/21/2013 07:13 PM, Andi Kleen wrote: Dave Chinner writes: On Tue, Feb 19, 2013 at 01:50:55PM -0500, Waiman Long wrote: It was found that the Oracle database software issues a lot of call to the seq_path() kernel function which translates a (dentry, mnt) pair to an absolute path. The seq_path() function will eventually take the following two locks: Nobody should be doing reverse dentry-to-name lookups in a quantity sufficient for it to become a performance limiting factor. What is the Oracle DB actually using this path for? Yes calling d_path frequently is usually a bug elsewhere. Is that through /proc ? -Andi A sample strace of Oracle indicates that it opens a lot of /proc filesystem files such as the stat, maps, etc many times while running. Oracle has a very detailed system performance reporting infrastructure in place to report almost all aspect of system performance through its AWR reporting tool or the browser-base enterprise manager. Maybe that is the reason why it is hitting this performance bottleneck. Regards, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH RFC] video: Add Hyper-V Synthetic Video Frame Buffer Driver
> From: Olaf Hering > Sent: Thursday, February 21, 2013 10:53 AM > To: Haiyang Zhang > Cc: florianschandi...@gmx.de; linux-fb...@vger.kernel.org; KY Srinivasan; > jasow...@redhat.com; linux-kernel@vger.kernel.org; > de...@linuxdriverproject.org > Subject: Re: [PATCH RFC] video: Add Hyper-V Synthetic Video Frame Buffer > Driver > > On Tue, Feb 19, Haiyang Zhang wrote: > > > In my test, the vesafb doesn't automatically give up the emulated video > > device, > > unless I add the DMI based mechanism to let it exit on Hyper-V. > > From reading the code, it seems to do that via > do_remove_conflicting_framebuffers(). hypervfb does not set apertures > etc, so that function is a noop. We are currently allocating a new framebuffer for hyperv_fb, which is different from the framebuffer for the emulated video. So this cannot be detected by do_remove_conflicting_framebuffers() based on apertures_overlap(). > My point is that with this new driver distro kernel will have no console > output until hypervfb is loaded. On native hardware there is at least > vesafb which can display something until initrd is running. So if the > hypervisor allows that hypervfb can shutdown the emulated vesa hardware > then it should do that. Since the generic vga driver starts to work early in the boot process, the console messages are still displayed without vesafb. Actually, I didn't see any console messages missing when comparing it to the original VM before my patch. Thanks, - Haiyang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kexec: prevent double free on image allocation failure
于 2013年02月22日 11:41, Sasha Levin 写道: > On 02/21/2013 09:46 PM, Zhang Yanfei wrote: >> 于 2013年02月22日 09:55, Eric W. Biederman 写道: >>> Sasha Levin writes: >>> If kimage_normal_alloc() fails to initialize an allocated kimage, it will free the image but would still set 'rimage', as a result kexec_load will try to free it again. This would explode as part of the freeing process is accessing internal members which point to uninitialized memory. >>> >>> Agreed. >>> >>> I don't think that failure path has ever actually been exercised. >>> >>> The code is wrong, and it is worth fixing. >>> >>> Andrew I do you think you could queue this up? I don't have a handy tree. >> >> >> I still found another malloc/free problem in this function. So I update the >> patch. >> >> - >> >> From 1fb76a35e4109e1435f55048c20ea58622e7f87b Mon Sep 17 00:00:00 2001 >> From: Zhang Yanfei >> Date: Fri, 22 Feb 2013 10:34:02 +0800 >> Subject: [PATCH] kexec: fix allocation problems in function >> kimage_normal_alloc >> >> The function kimage_normal_alloc() has 2 allocation problems that may cause >> failures: >> >> 1. If kimage_normal_alloc() fails to initialize an allocated kimage, it >> will >> free the image but would still set 'rimage', as a result kexec_load will >> try to free it again. >> >> This would explode as part of the freeing process is accessing internal >> members which point to uninitialized memory. >> >> 2. If kimage_normal_alloc() fails to alloc pages for image->swap_page, it >> should call kimage_free_page_list() to free allocated pages in >> image->control_pages list before it frees image. >> >> Signed-off-by: Sasha Levin >> Signed-off-by: Zhang Yanfei >> --- >> kernel/kexec.c | 10 ++ >> 1 files changed, 6 insertions(+), 4 deletions(-) >> >> diff --git a/kernel/kexec.c b/kernel/kexec.c >> index 5e4bd78..f219357 100644 >> --- a/kernel/kexec.c >> +++ b/kernel/kexec.c >> @@ -223,6 +223,8 @@ out: >> >> } >> >> +static void kimage_free_page_list(struct list_head *list); >> + >> static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry, >> unsigned long nr_segments, >> struct kexec_segment __user *segments) >> @@ -236,8 +238,6 @@ static int kimage_normal_alloc(struct kimage **rimage, >> unsigned long entry, >> if (result) >> goto out; >> >> -*rimage = image; >> - >> /* >> * Find a location for the control code buffer, and add it >> * the vector of segments so that it's pages will also be >> @@ -259,10 +259,12 @@ static int kimage_normal_alloc(struct kimage **rimage, >> unsigned long entry, >> >> result = 0; >> out: >> -if (result == 0) >> +if (result == 0) { >> *rimage = image; >> -else >> +} else { >> +kimage_free_page_list(&image->control_pages); >> kfree(image); >> +} >> >> return result; >> } > > And if do_kimage_alloc() fails instead of kimage_alloc_control_pages() > you will NULL deref 'image', so now instead of leaking pages the kernel > will explode. Oh, I missed this. > > Either way, this issue you've pointed out should be fixed in a separate > patch. > > OK,I will send another patch. Thanks Zhang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] iommu: making IOMMU sysfs nodes API public
On Fri, 2013-02-22 at 11:04 +1100, David Gibson wrote: > On Tue, Feb 19, 2013 at 01:11:51PM -0700, Alex Williamson wrote: > > On Tue, 2013-02-19 at 18:38 +1100, David Gibson wrote: > > > On Mon, Feb 18, 2013 at 10:24:00PM -0700, Alex Williamson wrote: > > > > On Mon, 2013-02-18 at 17:15 +1100, Alexey Kardashevskiy wrote: > [snip] > > > > Adding the window size to sysfs seems more readily convenient, > > > > but is it so hard for userspace to open the files and call a couple > > > > ioctls to get far enough to call IOMMU_GET_INFO? I'm unconvinced the > > > > clutter in sysfs more than just a quick fix. Thanks, > > > > > > And finally, as Alexey points out, isn't the point here so we know how > > > much rlimit to give qemu? Using ioctls we'd need a special tool just > > > to check the dma window sizes, which seems a bit hideous. > > > > Is it more hideous that using iommu groups to report a vfio imposed > > restriction? Are a couple open files and a handful of ioctls worse than > > code to parse directory entries and the future maintenance of an > > unrestricted grab bag of sysfs entries? > > The fact that the memory is locked is a vfio restriction, but the > actual dma window size is, genuinely, a property of the group. A group is an association of devices based on isolation and visibility. The dma window happens to be associated with a group on your platform, but that's not always the case. This is why I was hoping something in sysfs already reported the dma window so that we could point to it rather than creating an interface where it doesn't really belong. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT] Networking
On 13-02-21 09:26 PM, Paul Gortmaker wrote: > On Thu, Feb 21, 2013 at 9:37 AM, Mark Lord wrote: >> On 13-02-20 10:05 PM, Linus Torvalds wrote: >>> On Wed, Feb 20, 2013 at 2:09 PM, David Miller wrote: .. >>> Nooo You killed the 3c501 and 3c503 drivers! Snif. >>> >>> I wonder if they still worked.. >> >> I hope they're not really dead, because we still use them in several >> machines here >> as secondary interfaces for test rigs and whatnot. .. > Did you actually look at the drivers deleted? .. Finally got to one of the boxes here to check. And you're right, I was confusing drivers. I always seem to get the 3c509 (ISA) stuff confused with the 3c59x (PCI). Our boxes here have the 3c59x (PCI) cards. R.I.P. 3c50x. :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] staging/zcache: Fix/improve zcache writeback code, tie to a config option
On 02/07/2013 02:27 AM, Dan Magenheimer wrote: It was observed by Andrea Arcangeli in 2011 that zcache can get "full" and there must be some way for compressed swap pages to be (uncompressed and then) sent through to the backing swap disk. A prototype of this functionality, called "unuse", was added in 2012 as part of a major update to zcache (aka "zcache2"), but was left unfinished due to the unfortunate temporary fork of zcache. This earlier version of the code had an unresolved memory leak and was anyway dependent on not-yet-upstream frontswap and mm changes. The code was meanwhile adapted by Seth Jennings for similar functionality in zswap (which he calls "flush"). Seth also made some clever simplifications which are herein ported back to zcache. As a result of those simplifications, the frontswap changes are no longer necessary, but a slightly different (and simpler) set of mm changes are still required [1]. The memory leak is also fixed. Due to feedback from akpm in a zswap thread, this functionality in zcache has now been renamed from "unuse" to "writeback". Although this zcache writeback code now works, there are open questions as how best to handle the policy that drives it. As a result, this patch also ties writeback to a new config option. And, since the code still depends on not-yet-upstreamed mm patches, to avoid build problems, the config option added by this patch temporarily depends on "BROKEN"; this config dependency can be removed in trees that contain the necessary mm patches. [1] https://lkml.org/lkml/2013/1/29/540/ https://lkml.org/lkml/2013/1/29/539/ This patch leads to backend interact with core mm directly, is it core mm should interact with frontend instead of backend? In addition, frontswap has already have shrink funtion, should we can take advantage of it? Signed-off-by: Dan Magenheimer --- drivers/staging/zcache/Kconfig | 17 ++ drivers/staging/zcache/zcache-main.c | 332 +++--- 2 files changed, 284 insertions(+), 65 deletions(-) diff --git a/drivers/staging/zcache/Kconfig b/drivers/staging/zcache/Kconfig index c1dbd04..7358270 100644 --- a/drivers/staging/zcache/Kconfig +++ b/drivers/staging/zcache/Kconfig @@ -24,3 +24,20 @@ config RAMSTER while minimizing total RAM across the cluster. RAMster, like zcache2, compresses swap pages into local RAM, but then remotifies the compressed pages to another node in the RAMster cluster. + +# Depends on not-yet-upstreamed mm patches to export end_swap_bio_write and +# __add_to_swap_cache, and implement __swap_writepage (which is swap_writepage +# without the frontswap call. When these are in-tree, the dependency on +# BROKEN can be removed +config ZCACHE_WRITEBACK + bool "Allow compressed swap pages to be writtenback to swap disk" + depends on ZCACHE=y && BROKEN + default n + help + Zcache caches compressed swap pages (and other data) in RAM which + often improves performance by avoiding I/O's due to swapping. + In some workloads with very long-lived large processes, it can + instead reduce performance. Writeback decompresses zcache-compressed + pages (in LRU order) when under memory pressure and writes them to + the backing swap disk to ameliorate this problem. Policy driving + writeback is still under development. diff --git a/drivers/staging/zcache/zcache-main.c b/drivers/staging/zcache/zcache-main.c index c1ac905..5bf14c3 100644 --- a/drivers/staging/zcache/zcache-main.c +++ b/drivers/staging/zcache/zcache-main.c @@ -22,6 +22,10 @@ #include #include #include +#include +#include +#include +#include #include #include @@ -55,6 +59,9 @@ static inline void frontswap_tmem_exclusive_gets(bool b) } #endif +/* enable (or fix code) when Seth's patches are accepted upstream */ +#define zcache_writeback_enabled 0 + static int zcache_enabled __read_mostly; static int disable_cleancache __read_mostly; static int disable_frontswap __read_mostly; @@ -181,6 +188,8 @@ static unsigned long zcache_last_active_anon_pageframes; static unsigned long zcache_last_inactive_anon_pageframes; static unsigned long zcache_eph_nonactive_puts_ignored; static unsigned long zcache_pers_nonactive_puts_ignored; +static unsigned long zcache_writtenback_pages; +static long zcache_outstanding_writeback_pages; #ifdef CONFIG_DEBUG_FS #include @@ -239,6 +248,9 @@ static int zcache_debugfs_init(void) zdfs64("eph_zbytes_max", S_IRUGO, root, &zcache_eph_zbytes_max); zdfs64("pers_zbytes", S_IRUGO, root, &zcache_pers_zbytes); zdfs64("pers_zbytes_max", S_IRUGO, root, &zcache_pers_zbytes_max); + zdfs("outstanding_writeback_pages", S_IRUGO, root, + &zcache_outstanding_writeback_pages); + zdfs("writtenback_pages", S_IRUGO, root, &zcache_writtenback_pages); return 0; } #undefz
linux-next: manual merge of the akpm-current tree with the tree
Hi Andrew, Today's linux-next merge of the akpm-current tree got a conflict in fs/btrfs/inode.cfs/btrfs/file.c between commit 55e301fd57a6 ("Btrfs: move fs/btrfs/ioctl.h to include/uapi/linux/btrfs.h") from the btrfs tree and commit "aio: don't include aio.h in sched.h" from the akpm-current tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc fs/btrfs/file.c index 8614c5b,39f556f..000 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@@ -30,7 -30,7 +30,8 @@@ #include #include #include +#include + #include #include "ctree.h" #include "disk-io.h" #include "transaction.h" diff --cc fs/btrfs/inode.c index 40d49da,ed7ea0a..000 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@@ -39,8 -39,7 +39,9 @@@ #include #include #include +#include +#include + #include #include "compat.h" #include "ctree.h" #include "disk-io.h" pgpMDsgjxCauN.pgp Description: PGP signature
Re: [PATCH 0/7] ksm: responses to NUMA review
On 02/21/2013 04:17 PM, Hugh Dickins wrote: Here's a second KSM series, based on mmotm 2013-02-19-17-20: partly in response to Mel's review feedback, partly fixes to issues that I found myself in doing more review and testing. None of the issues fixed are truly show-stoppers, though I would prefer them fixed sooner than later. Do you have any ideas ksm support page cache and tmpfs? 1 ksm: add some comments 2 ksm: treat unstable nid like in stable tree 3 ksm: shrink 32-bit rmap_item back to 32 bytes 4 mm,ksm: FOLL_MIGRATION do migration_entry_wait 5 mm,ksm: swapoff might need to copy 6 mm: cleanup "swapcache" in do_swap_page 7 ksm: allocate roots when needed Documentation/vm/ksm.txt | 16 +++- include/linux/mm.h |1 mm/ksm.c | 137 +++-- mm/memory.c | 38 +++--- mm/swapfile.c| 15 +++- 5 files changed, 140 insertions(+), 67 deletions(-) Thanks, Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: mailto:"d...@kvack.org";> em...@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kexec: prevent double free on image allocation failure
On 02/21/2013 09:46 PM, Zhang Yanfei wrote: > 于 2013年02月22日 09:55, Eric W. Biederman 写道: >> Sasha Levin writes: >> >>> If kimage_normal_alloc() fails to initialize an allocated kimage, it will >>> free >>> the image but would still set 'rimage', as a result kexec_load will try >>> to free it again. >>> >>> This would explode as part of the freeing process is accessing internal >>> members which point to uninitialized memory. >> >> Agreed. >> >> I don't think that failure path has ever actually been exercised. >> >> The code is wrong, and it is worth fixing. >> >> Andrew I do you think you could queue this up? I don't have a handy tree. > > > I still found another malloc/free problem in this function. So I update the > patch. > > - > > From 1fb76a35e4109e1435f55048c20ea58622e7f87b Mon Sep 17 00:00:00 2001 > From: Zhang Yanfei > Date: Fri, 22 Feb 2013 10:34:02 +0800 > Subject: [PATCH] kexec: fix allocation problems in function > kimage_normal_alloc > > The function kimage_normal_alloc() has 2 allocation problems that may cause > failures: > > 1. If kimage_normal_alloc() fails to initialize an allocated kimage, it will > free the image but would still set 'rimage', as a result kexec_load will > try to free it again. > > This would explode as part of the freeing process is accessing internal > members which point to uninitialized memory. > > 2. If kimage_normal_alloc() fails to alloc pages for image->swap_page, it > should call kimage_free_page_list() to free allocated pages in > image->control_pages list before it frees image. > > Signed-off-by: Sasha Levin > Signed-off-by: Zhang Yanfei > --- > kernel/kexec.c | 10 ++ > 1 files changed, 6 insertions(+), 4 deletions(-) > > diff --git a/kernel/kexec.c b/kernel/kexec.c > index 5e4bd78..f219357 100644 > --- a/kernel/kexec.c > +++ b/kernel/kexec.c > @@ -223,6 +223,8 @@ out: > > } > > +static void kimage_free_page_list(struct list_head *list); > + > static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry, > unsigned long nr_segments, > struct kexec_segment __user *segments) > @@ -236,8 +238,6 @@ static int kimage_normal_alloc(struct kimage **rimage, > unsigned long entry, > if (result) > goto out; > > - *rimage = image; > - > /* >* Find a location for the control code buffer, and add it >* the vector of segments so that it's pages will also be > @@ -259,10 +259,12 @@ static int kimage_normal_alloc(struct kimage **rimage, > unsigned long entry, > > result = 0; > out: > - if (result == 0) > + if (result == 0) { > *rimage = image; > - else > + } else { > + kimage_free_page_list(&image->control_pages); > kfree(image); > + } > > return result; > } And if do_kimage_alloc() fails instead of kimage_alloc_control_pages() you will NULL deref 'image', so now instead of leaking pages the kernel will explode. Either way, this issue you've pointed out should be fixed in a separate patch. Thanks, Sasha -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] arm: use built-in byte swap function
On Thu, 21 Feb 2013, Kim Phillips wrote: > Here's the asm version I'm working on now, based on compiler > output of the C version. Haven't tested beyond defconfig builds, > which pass ok. > > Is there anything I have to do for thumb mode? If so, how to test? You just need to pick a config that uses some ARMv7 processor, and enable CONFIG_THUMB2_KERNEL. I don't see any problem with your patch wrt Thumb2. Still, I have minor comments below. > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index dedf02b..e8a41d0 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -59,6 +59,7 @@ config ARM > select CLONE_BACKWARDS > select OLD_SIGSUSPEND3 > select OLD_SIGACTION > + select ARCH_USE_BUILTIN_BSWAP > help > The ARM series is a line of low-power-consumption RISC chip designs > licensed by ARM Ltd and targeted at embedded applications and > diff --git a/arch/arm/boot/compressed/Makefile > b/arch/arm/boot/compressed/Makefile > index 5cad8a6..a277e97 100644 > --- a/arch/arm/boot/compressed/Makefile > +++ b/arch/arm/boot/compressed/Makefile > @@ -108,12 +108,12 @@ endif > > targets := vmlinux vmlinux.lds \ >piggy.$(suffix_y) piggy.$(suffix_y).o \ > - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \ > + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \ Should be both bswapsdi2.o bswapsdi2.S >font.o font.c head.o misc.o $(OBJS) > > # Make sure files are removed during clean > extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \ > - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) > + lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs) Should be bswapsdi2.S. > ifeq ($(CONFIG_FUNCTION_TRACER),y) > ORIG_CFLAGS := $(KBUILD_CFLAGS) > @@ -155,6 +155,12 @@ ashldi3 = $(obj)/ashldi3.o > $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S > $(call cmd,shipped) > > +# For __bswapsi2, __bswapdi2 > +bswapsdi2 = $(obj)/bswapsdi2.o > + > +$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S > + $(call cmd,shipped) > + > # We need to prevent any GOTOFF relocs being used with references > # to symbols in the .bss section since we cannot relocate them > # independently from the rest at run time. This can be achieved by > @@ -176,7 +182,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a > "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \ > fi > > $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o > \ > - $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE > + $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \ > + $(bswapsdi2) FORCE > @$(check_for_multiple_zreladdr) > $(call if_changed,ld) > @$(check_for_bad_syms) > diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c > index 60d3b73..ba578f7 100644 > --- a/arch/arm/kernel/armksyms.c > +++ b/arch/arm/kernel/armksyms.c > @@ -35,6 +35,8 @@ extern void __ucmpdi2(void); > extern void __udivsi3(void); > extern void __umodsi3(void); > extern void __do_div64(void); > +extern void __bswapsi2(void); > +extern void __bswapdi2(void); > > extern void __aeabi_idiv(void); > extern void __aeabi_idivmod(void); > @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2); > EXPORT_SYMBOL(__udivsi3); > EXPORT_SYMBOL(__umodsi3); > EXPORT_SYMBOL(__do_div64); > +EXPORT_SYMBOL(__bswapsi2); > +EXPORT_SYMBOL(__bswapdi2); > > #ifdef CONFIG_AEABI > EXPORT_SYMBOL(__aeabi_idiv); > diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile > index af72969..5383df7 100644 > --- a/arch/arm/lib/Makefile > +++ b/arch/arm/lib/Makefile > @@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o > csumpartial.o \ > ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \ > ucmpdi2.o lib1funcs.o div64.o \ > io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ > -call_with_stack.o > +call_with_stack.o bswapsdi2.o > > mmu-y:= clear_user.o copy_page.o getuser.o putuser.o > > diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S > new file mode 100644 > index 000..e9c8ca7 > --- /dev/null > +++ b/arch/arm/lib/bswapsdi2.S > @@ -0,0 +1,36 @@ > +#include > + > +#if __LINUX_ARM_ARCH__ >= 6 > +ENTRY(__bswapsi2) > + rev r0, r0 > + bx lr > +ENDPROC(__bswapsi2) > + > +ENTRY(__bswapdi2) > + rev r3, r0 > + rev r0, r1 > + mov r1, r3 > + bx lr > +ENDPROC(__bswapdi2) > +#else > +ENTRY(__bswapsi2) > + eor r3, r0, r0, ror #16 > + lsr r3, r3, #8 Some older binutils used with pre ARMv6 platforms don't understand the latest unified syntax. So in this case it is better to use: mov r3, r3, lsr #8 > + bic r3, r3, #65280 @ 0xff00 Please use #0xff00 directly rather than keeping it as a comment. > +
Re: [RFC PATCH v3 1/3] sched: schedule balance map foundation
On 02/22/2013 10:53 AM, Michael Wang wrote: >> > >>> >> And the final cost is 3000 int and 103 pointer, and some padding, >>> >> but won't bigger than 10M, not a big deal for a system with 1000 cpu >>> >> too. >> > >> > Maybe, but quadric stuff should be frowned upon at all times, these >> > things tend to explode when you least expect it. >> > >> > For instance, IIRC the biggest single image system SGI booted had 16k >> > cpus in there, that ends up at something like 14+14+3=31 aka as 2G of >> > storage just for your lookup -- that seems somewhat preposterous. > Honestly, if I'm a admin who own 16k cpus system (I could not even image > how many memory it could have...), I really prefer to exchange 2G memory > to gain some performance. > > I see your point here, the cost of space will grow exponentially, but > the memory of system will also grow, and according to my understanding , > it's faster. Why not seek other way to change O(n^2) to O(n)? Access 2G memory is unbelievable performance cost. There are too many jokes on the short-sight of compute scalability, like Gates' 64K memory in 2000. -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: Crash cgdeleting empty memory cgroups with memory.kmem.limit_in_bytes set
(2013/02/21 17:34), Glauber Costa wrote: On 02/21/2013 03:00 AM, Tejun Heo wrote: (cc'ing cgroup / memcg people and quoting whole body) Looks like something is going wrong with memcg cache destruction. Glauber, any ideas? Also, can we please not use names as generic as kmem_cache_destroy_work_func for something specific to memcg? How about something like memcg_destroy_cache_workfn? I will take a look. Thanks for the report for the reportee: I tested cgroup deletion quite extensively (quite important feature for me) so it is nice to have an uncaught case. About naming, I can change, no problem. seems reproduced on linux-3.8 On KVM guest , Fedora18's config + kmemcg. -Kame == [ 250.533831] general protection fault: [#1] SMP [ 250.538096] Modules linked in: ebtable_nat xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable_filter ebtables be2iscsi iscsi_boot_sysfs ip6table_filter ip6_tables bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc 8139too snd_timer microcode snd 8139cp mii floppy pcspkr virtio_balloon soundcore i2c_piix4 btrfs libcrc32c zlib_deflate cirrus drm_kms_helper ttm drm virtio_blk i2c_core [ 250.538096] CPU 1 [ 250.538096] Pid: 38, comm: kworker/1:1 Not tainted 3.8.0 #3 Bochs Bochs [ 250.538096] RIP: 0010:[] [] kmem_cache_free+0x13a/0x1d0 [ 250.538096] RSP: 0018:880214345cc8 EFLAGS: 00010286 [ 250.538096] RAX: 81d84020 RBX: 880217000f00 RCX: 0068 [ 250.538096] RDX: RSI: 880217000f00 RDI: 880217000f00 [ 250.538096] RBP: 880214345ce8 R08: 13c0 R09: 006c [ 250.538096] R10: 0007ebc0ffe0 R11: 0007ebc0ffe0 R12: 880217001100 [ 250.538096] R13: 880214042c00 R14: 0200 R15: 880217000ef0 [ 250.538096] FS: () GS:88021fc8() knlGS: [ 250.538096] CS: 0010 DS: ES: CR0: 8005003b [ 250.538096] CR2: 003e98ae6ef0 CR3: 00021365 CR4: 06e0 [ 250.538096] DR0: DR1: DR2: [ 250.538096] DR3: DR6: 0ff0 DR7: 0400 [ 250.538096] Process kworker/1:1 (pid: 38, threadinfo 880214344000, task 88021435) [ 250.538096] Stack: [ 250.538096] e8c013c0 880214042c00 [ 250.538096] 880214345d18 81182084 880214042c00 880217000ef0 [ 250.538096] 880217000ef0 880214042c00 880214345d88 81184d7e [ 250.538096] Call Trace: [ 250.538096] [] free_kmem_cache_nodes+0x64/0xb0 [ 250.538096] [] __kmem_cache_shutdown+0x24e/0x320 [ 250.538096] [] ? kmem_cache_shrink+0x210/0x230 [ 250.538096] [] kmem_cache_destroy+0x3f/0xe0 [ 250.538096] [] kmem_cache_destroy_work_func+0x30/0x60 [ 250.538096] [] process_one_work+0x147/0x490 [ 250.538096] [] ? mem_cgroup_slabinfo_read+0xb0/0xb0 [ 250.538096] [] worker_thread+0x15e/0x450 [ 250.538096] [] ? busy_worker_rebind_fn+0x110/0x110 [ 250.538096] [] kthread+0xc0/0xd0 [ 250.538096] [] ? ftrace_define_fields_xen_mc_entry+0xa0/0xf0 [ 250.538096] [] ? kthread_create_on_node+0x120/0x120 [ 250.538096] [] ret_from_fork+0x7c/0xb0 [ 250.538096] [] ? kthread_create_on_node+0x120/0x120 [ 250.538096] Code: c1 e0 06 48 01 d0 48 8b 10 80 e6 80 0f 85 98 00 00 00 48 8b 40 30 49 39 c4 0f 84 f9 fe ff ff 48 8b 90 b8 00 00 00 48 85 d2 74 06 <4c> 3b 62 20 74 50 48 8b 50 60 49 8b 4c 24 60 31 c0 48 c7 c6 68 [ 250.538096] RIP [] kmem_cache_free+0x13a/0x1d0 [ 250.538096] RSP [ 250.746175] ---[ end trace 91abe13b8481aaaf ]--- [ 250.748879] BUG: unable to handle kernel paging request at ffd8 [ 250.749818] IP: [] kthread_data+0x10/0x20 [ 250.749818] PGD 1c0e067 PUD 1c0f067 PMD 0 [ 250.749818] Oops: [#2] SMP [ 250.749818] Modules linked in: ebtable_nat xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable_filter ebtables be2iscsi iscsi_boot_sysfs ip6table_filter ip6_tables bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc 8139too snd_timer microcode snd 8139cp mii floppy pcspkr virtio_balloon soundcore i2c_piix4 btrfs libc
Re: Questin about swap_slot free and invalidate page
On 02/22/2013 05:42 AM, Dan Magenheimer wrote: From: Ric Mason [mailto:ric.mas...@gmail.com] Subject: Re: Questin about swap_slot free and invalidate page On 02/19/2013 11:27 PM, Dan Magenheimer wrote: From: Ric Mason [mailto:ric.mas...@gmail.com] Hugh is right that handling the possibility of duplicates is part of the tmem ABI. If there is any possibility of duplicates, the ABI defines how a backend must handle them to avoid data coherency issues. The kernel implements an in-kernel API which implements the tmem ABI. If the frontend and backend can always agree that duplicate Which ABI in zcache implement that? https://oss.oracle.com/projects/tmem/dist/documentation/api/tmemspec-v001.pdf The in-kernel APIs are frontswap and cleancache. For more information about tmem, see http://lwn.net/Articles/454795/ But you mentioned that you have in-kernel API which can handle duplicate. Do you mean zcache_cleancache/frontswap_put_page? I think they just overwrite instead of optional flush the page on the second(duplicate) put as mentioned in your tmemspec. Maybe I am misunderstanding your question... The spec allows overwrite (and return success) OR flush the page (and return failure). Zcache does the latter (flush). The code that implements it is in tmem_put. Thanks for your point out. Pers pages can have duplicate put since swap cache page can be reused. Can eph pages also have duplicate put? If yes, when can happen? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Skip looking at skip if next or last is set
On 02/22/2013 12:06 AM, Srikar Dronamraju wrote: > * Peter Zijlstra [2013-02-20 09:46:25]: > >> On Mon, 2013-02-18 at 18:31 +0530, Srikar Dronamraju wrote: >>> pick_next_entity() prefers next, then last. However code checks if the >>> left entity can be skipped even if next / last is set. >>> >>> Check if left entity should be skipped only if next/last is not set. >> >> You fail to explain why its a problem and continue to make a horrid mess >> of the code.. >> > > If we look at the comments above pick_next_entity(), it states: > /* > * Pick the next process, keeping these things in mind, in this order: > * 1) keep things fair between processes/task groups > * 2) pick the "next" process, since someone really wants that to run > * 3) pick the "last" process, for cache locality > * 4) do not run the "skip" process, if something else is available > */ > > Currently the code checks in the reverse order, though the preference is > correctly maintained as listed in comments. But in some cases, we might be > doing redundant checks. Lets assume next is set, then we should avoid > checking for skip, last and their fairness with left. > > So what I intended to do was change the order, i.e check for last only if next > is not set (or was picking next was unfair wrt left) and check for "something > else (second from left)" if last is not set (or picking last was unfair wrt > left). > > However after sending the patch, I stumbled across these links. > https://lkml.org/lkml/2012/1/16/500 and https://lkml.org/lkml/2012/1/25/195 Hi, Srikar That drag me back to the time when I'm starting to look at scheduler ;-) Actually I give up this idea since I missed one point that the code will be optimized by the compiler, and usually it will become some logical we could not image. My patch is correct logically, but it may not benefit scheduler a lot, I don't think there will be a benchmark show us better results, and in scheduler world, benchmark talks... Regards, Michael Wang > >>> Signed-off-by: Srikar Dronamraju >>> --- >>> kernel/sched/fair.c | 31 +++ >>> 1 files changed, 15 insertions(+), 16 deletions(-) >>> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>> index fdee793..cc97b12 100644 >>> --- a/kernel/sched/fair.c >>> +++ b/kernel/sched/fair.c >>> @@ -1900,27 +1900,26 @@ static struct sched_entity *pick_next_entity(struct >>> cfs_rq *cfs_rq) >>> struct sched_entity *left = se; >>> >>> /* >>> -* Avoid running the skip buddy, if running something else can >>> -* be done without getting too unfair. >>> +* Someone really wants next to run. If it's not unfair, run it. >>> */ >>> - if (cfs_rq->skip == se) { >>> - struct sched_entity *second = __pick_next_entity(se); >>> + if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1) { >>> + se = cfs_rq->next; >>> + } else if (cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, left) < >>> 1) { >>> + /* >>> +* Prefer last buddy, try to return the CPU to a preempted >>> +* task. >>> +*/ >>> + se = cfs_rq->last; >>> + } else if (cfs_rq->skip == left) { >>> + /* >>> +* Avoid running the skip buddy, if running something else >>> +* can be done without getting too unfair. >>> +*/ >>> + struct sched_entity *second = __pick_next_entity(left); >>> if (second && wakeup_preempt_entity(second, left) < 1) >>> se = second; >>> } >>> >>> - /* >>> -* Prefer last buddy, try to return the CPU to a preempted task. >>> -*/ >>> - if (cfs_rq->last && wakeup_preempt_entity(cfs_rq->last, left) < 1) >>> - se = cfs_rq->last; >>> - >>> - /* >>> -* Someone really wants this to run. If it's not unfair, run it. >>> -*/ >>> - if (cfs_rq->next && wakeup_preempt_entity(cfs_rq->next, left) < 1) >>> - se = cfs_rq->next; >>> - >>> clear_buddies(cfs_rq, se); >>> >>> return se; >>> >> >> > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] arm: use built-in byte swap function
On Thu, 21 Feb 2013 11:40:54 -0500 Nicolas Pitre wrote: > On Thu, 21 Feb 2013, Kim Phillips wrote: > > > On Wed, 20 Feb 2013 23:29:58 -0500 > > Nicolas Pitre wrote: > > > > > On Wed, 20 Feb 2013, Kim Phillips wrote: > > > > > > > On Wed, 20 Feb 2013 10:43:18 -0500 > > > > Nicolas Pitre wrote: > > > > > > > > > On Wed, 20 Feb 2013, Woodhouse, David wrote: > > > > > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote: > > > > > > > ... in which case there is no harm shipping a .c file and > > > > > > > trivially > > > > > > > enforcing -O2, the rest being equal. > > > > > > > > > > > > For today's compilers, unless the wind changes. > > > > > > > > > > We'll adapt if necessary. Going with -O2 should remain pretty safe > > > > > anyway. > > > > > > > > Alas, not so for gcc 4.4 - I had forgotten I had tested > > > > Ubuntu/Linaro 4.4.7-1ubuntu2 here: > > > > > > > > https://patchwork.kernel.org/patch/2101491/ > > > > > > > > add -O2 to that test script and gcc 4.4 *always* emits calls to > > > > __bswap[sd]i2, even with -march=armv6k+. > > > > argh, sorry - that script was testing support for > > __builtin_bswap{16,32,64} directly, which isn't the same as testing > > code generation of a byte swap pattern in C. > > Still, I'm not as confident as I was about this. which part exactly? Having -O2 as "protection"? Yes, me neither. > > I'll still try the assembly approach - gcc 4.4's armv6 output looks > > worse than both the pre-armv6 and post-armv6 __arch_swab32 > > implementations currently in use: > > > > mov ip, sp > > push{fp, ip, lr, pc} > > sub fp, ip, #4 > > You should use -fomit-frame-pointer to compile this. We don't need a > frame pointer here, especially for a leaf function that the compiler > decides to call on its own. > > > and r2, r0, #65280 ; 0xff00 > > lsl ip, r0, #24 > > orr r1, ip, r0, lsr #24 > > and r0, r0, #16711680 ; 0xff > > orr r3, r1, r2, lsl #8 > > orr r0, r3, r0, lsr #8 > > Other than that, it is true that the above is slightly suboptimal. Here's the asm version I'm working on now, based on compiler output of the C version. Haven't tested beyond defconfig builds, which pass ok. Is there anything I have to do for thumb mode? If so, how to test? diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index dedf02b..e8a41d0 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -59,6 +59,7 @@ config ARM select CLONE_BACKWARDS select OLD_SIGSUSPEND3 select OLD_SIGACTION + select ARCH_USE_BUILTIN_BSWAP help The ARM series is a line of low-power-consumption RISC chip designs licensed by ARM Ltd and targeted at embedded applications and diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile index 5cad8a6..a277e97 100644 --- a/arch/arm/boot/compressed/Makefile +++ b/arch/arm/boot/compressed/Makefile @@ -108,12 +108,12 @@ endif targets := vmlinux vmlinux.lds \ piggy.$(suffix_y) piggy.$(suffix_y).o \ -lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \ +lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \ font.o font.c head.o misc.o $(OBJS) # Make sure files are removed during clean extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \ -lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs) +lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs) ifeq ($(CONFIG_FUNCTION_TRACER),y) ORIG_CFLAGS := $(KBUILD_CFLAGS) @@ -155,6 +155,12 @@ ashldi3 = $(obj)/ashldi3.o $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S $(call cmd,shipped) +# For __bswapsi2, __bswapdi2 +bswapsdi2 = $(obj)/bswapsdi2.o + +$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S + $(call cmd,shipped) + # We need to prevent any GOTOFF relocs being used with references # to symbols in the .bss section since we cannot relocate them # independently from the rest at run time. This can be achieved by @@ -176,7 +182,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \ fi $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \ - $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE + $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \ + $(bswapsdi2) FORCE @$(check_for_multiple_zreladdr) $(call if_changed,ld) @$(check_for_bad_syms) diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c index 60d3b73..ba578f7 100644 --- a/arch/arm/kernel/armksyms.c +++ b/arch/arm/kernel/armksyms.c @@ -35,6 +35,8 @@ extern void __ucmpdi2(void); extern void __udivsi3(void); extern void __umodsi3(void); extern void __do_div64(void); +extern void __bswapsi2(void); +extern void __bswapdi2(void); extern void __aeabi_idiv(void); extern void __aeabi_idivmod(void); @@ -114,6 +116,8 @@ EXPORT_SY
Re: [PATCH] kexec: prevent double free on image allocation failure
于 2013年02月22日 09:55, Eric W. Biederman 写道: > Sasha Levin writes: > >> If kimage_normal_alloc() fails to initialize an allocated kimage, it will >> free >> the image but would still set 'rimage', as a result kexec_load will try >> to free it again. >> >> This would explode as part of the freeing process is accessing internal >> members which point to uninitialized memory. > > Agreed. > > I don't think that failure path has ever actually been exercised. > > The code is wrong, and it is worth fixing. > > Andrew I do you think you could queue this up? I don't have a handy tree. I still found another malloc/free problem in this function. So I update the patch. - >From 1fb76a35e4109e1435f55048c20ea58622e7f87b Mon Sep 17 00:00:00 2001 From: Zhang Yanfei Date: Fri, 22 Feb 2013 10:34:02 +0800 Subject: [PATCH] kexec: fix allocation problems in function kimage_normal_alloc The function kimage_normal_alloc() has 2 allocation problems that may cause failures: 1. If kimage_normal_alloc() fails to initialize an allocated kimage, it will free the image but would still set 'rimage', as a result kexec_load will try to free it again. This would explode as part of the freeing process is accessing internal members which point to uninitialized memory. 2. If kimage_normal_alloc() fails to alloc pages for image->swap_page, it should call kimage_free_page_list() to free allocated pages in image->control_pages list before it frees image. Signed-off-by: Sasha Levin Signed-off-by: Zhang Yanfei --- kernel/kexec.c | 10 ++ 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/kernel/kexec.c b/kernel/kexec.c index 5e4bd78..f219357 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -223,6 +223,8 @@ out: } +static void kimage_free_page_list(struct list_head *list); + static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry, unsigned long nr_segments, struct kexec_segment __user *segments) @@ -236,8 +238,6 @@ static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry, if (result) goto out; - *rimage = image; - /* * Find a location for the control code buffer, and add it * the vector of segments so that it's pages will also be @@ -259,10 +259,12 @@ static int kimage_normal_alloc(struct kimage **rimage, unsigned long entry, result = 0; out: - if (result == 0) + if (result == 0) { *rimage = image; - else + } else { + kimage_free_page_list(&image->control_pages); kfree(image); + } return result; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv5 2/8] zsmalloc: add documentation
On 02/21/2013 11:50 PM, Seth Jennings wrote: On 02/21/2013 02:49 AM, Ric Mason wrote: On 02/19/2013 03:16 AM, Seth Jennings wrote: On 02/16/2013 12:21 AM, Ric Mason wrote: On 02/14/2013 02:38 AM, Seth Jennings wrote: This patch adds a documentation file for zsmalloc at Documentation/vm/zsmalloc.txt Signed-off-by: Seth Jennings --- Documentation/vm/zsmalloc.txt | 68 + 1 file changed, 68 insertions(+) create mode 100644 Documentation/vm/zsmalloc.txt diff --git a/Documentation/vm/zsmalloc.txt b/Documentation/vm/zsmalloc.txt new file mode 100644 index 000..85aa617 --- /dev/null +++ b/Documentation/vm/zsmalloc.txt @@ -0,0 +1,68 @@ +zsmalloc Memory Allocator + +Overview + +zmalloc a new slab-based memory allocator, +zsmalloc, for storing compressed pages. It is designed for +low fragmentation and high allocation success rate on +large object, but <= PAGE_SIZE allocations. + +zsmalloc differs from the kernel slab allocator in two primary +ways to achieve these design goals. + +zsmalloc never requires high order page allocations to back +slabs, or "size classes" in zsmalloc terms. Instead it allows +multiple single-order pages to be stitched together into a +"zspage" which backs the slab. This allows for higher allocation +success rate under memory pressure. + +Also, zsmalloc allows objects to span page boundaries within the +zspage. This allows for lower fragmentation than could be had +with the kernel slab allocator for objects between PAGE_SIZE/2 +and PAGE_SIZE. With the kernel slab allocator, if a page compresses +to 60% of it original size, the memory savings gained through +compression is lost in fragmentation because another object of +the same size can't be stored in the leftover space. + +This ability to span pages results in zsmalloc allocations not being +directly addressable by the user. The user is given an +non-dereferencable handle in response to an allocation request. +That handle must be mapped, using zs_map_object(), which returns +a pointer to the mapped region that can be used. The mapping is +necessary since the object data may reside in two different +noncontigious pages. Do you mean the reason of to use a zsmalloc object must map after malloc is object data maybe reside in two different nocontiguous pages? Yes, that is one reason for the mapping. The other reason (more of an added bonus) is below. + +For 32-bit systems, zsmalloc has the added benefit of being +able to back slabs with HIGHMEM pages, something not possible What's the meaning of "back slabs with HIGHMEM pages"? By HIGHMEM, I'm referring to the HIGHMEM memory zone on 32-bit systems with larger that 1GB (actually a little less) of RAM. The upper 3GB of the 4GB address space, depending on kernel build options, is not directly addressable by the kernel, but can be mapped into the kernel address space with functions like kmap() or kmap_atomic(). These pages can't be used by slab/slub because they are not continuously mapped into the kernel address space. However, since zsmalloc requires a mapping anyway to handle objects that span non-contiguous page boundaries, we do the kernel mapping as part of the process. So zspages, the conceptual slab in zsmalloc backed by single-order pages can include pages from the HIGHMEM zone as well. Thanks for your clarify, http://lwn.net/Articles/537422/, your article about zswap in lwn. "Additionally, the kernel slab allocator does not allow objects that are less than a page in size to span a page boundary. This means that if an object is PAGE_SIZE/2 + 1 bytes in size, it effectively use an entire page, resulting in ~50% waste. Hense there are *no kmalloc() cache size* between PAGE_SIZE/2 and PAGE_SIZE." Are your sure? It seems that kmalloc cache support big size, your can check in include/linux/kmalloc_sizes.h Yes, kmalloc can allocate large objects > PAGE_SIZE, but there are no cache sizes _between_ PAGE_SIZE/2 and PAGE_SIZE. For example, on a system with 4k pages, there are no caches between kmalloc-2048 and kmalloc-4096. Since slub cache can merge, is it the root reason? Seth -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: mailto:"d...@kvack.org";> em...@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv5 2/8] zsmalloc: add documentation
On 02/21/2013 11:50 PM, Seth Jennings wrote: On 02/21/2013 02:49 AM, Ric Mason wrote: On 02/19/2013 03:16 AM, Seth Jennings wrote: On 02/16/2013 12:21 AM, Ric Mason wrote: On 02/14/2013 02:38 AM, Seth Jennings wrote: This patch adds a documentation file for zsmalloc at Documentation/vm/zsmalloc.txt Signed-off-by: Seth Jennings --- Documentation/vm/zsmalloc.txt | 68 + 1 file changed, 68 insertions(+) create mode 100644 Documentation/vm/zsmalloc.txt diff --git a/Documentation/vm/zsmalloc.txt b/Documentation/vm/zsmalloc.txt new file mode 100644 index 000..85aa617 --- /dev/null +++ b/Documentation/vm/zsmalloc.txt @@ -0,0 +1,68 @@ +zsmalloc Memory Allocator + +Overview + +zmalloc a new slab-based memory allocator, +zsmalloc, for storing compressed pages. It is designed for +low fragmentation and high allocation success rate on +large object, but <= PAGE_SIZE allocations. + +zsmalloc differs from the kernel slab allocator in two primary +ways to achieve these design goals. + +zsmalloc never requires high order page allocations to back +slabs, or "size classes" in zsmalloc terms. Instead it allows +multiple single-order pages to be stitched together into a +"zspage" which backs the slab. This allows for higher allocation +success rate under memory pressure. + +Also, zsmalloc allows objects to span page boundaries within the +zspage. This allows for lower fragmentation than could be had +with the kernel slab allocator for objects between PAGE_SIZE/2 +and PAGE_SIZE. With the kernel slab allocator, if a page compresses +to 60% of it original size, the memory savings gained through +compression is lost in fragmentation because another object of +the same size can't be stored in the leftover space. + +This ability to span pages results in zsmalloc allocations not being +directly addressable by the user. The user is given an +non-dereferencable handle in response to an allocation request. +That handle must be mapped, using zs_map_object(), which returns +a pointer to the mapped region that can be used. The mapping is +necessary since the object data may reside in two different +noncontigious pages. Do you mean the reason of to use a zsmalloc object must map after malloc is object data maybe reside in two different nocontiguous pages? Yes, that is one reason for the mapping. The other reason (more of an added bonus) is below. + +For 32-bit systems, zsmalloc has the added benefit of being +able to back slabs with HIGHMEM pages, something not possible What's the meaning of "back slabs with HIGHMEM pages"? By HIGHMEM, I'm referring to the HIGHMEM memory zone on 32-bit systems with larger that 1GB (actually a little less) of RAM. The upper 3GB of the 4GB address space, depending on kernel build options, is not directly addressable by the kernel, but can be mapped into the kernel address space with functions like kmap() or kmap_atomic(). These pages can't be used by slab/slub because they are not continuously mapped into the kernel address space. However, since zsmalloc requires a mapping anyway to handle objects that span non-contiguous page boundaries, we do the kernel mapping as part of the process. So zspages, the conceptual slab in zsmalloc backed by single-order pages can include pages from the HIGHMEM zone as well. Thanks for your clarify, http://lwn.net/Articles/537422/, your article about zswap in lwn. "Additionally, the kernel slab allocator does not allow objects that are less than a page in size to span a page boundary. This means that if an object is PAGE_SIZE/2 + 1 bytes in size, it effectively use an entire page, resulting in ~50% waste. Hense there are *no kmalloc() cache size* between PAGE_SIZE/2 and PAGE_SIZE." Are your sure? It seems that kmalloc cache support big size, your can check in include/linux/kmalloc_sizes.h Yes, kmalloc can allocate large objects > PAGE_SIZE, but there are no cache sizes _between_ PAGE_SIZE/2 and PAGE_SIZE. For example, on a system with 4k pages, there are no caches between kmalloc-2048 and kmalloc-4096. kmalloc object > PAGE_SIZE/2 or > PAGE_SIZE should also allocate from slab cache, correct? Then how can alloc object w/o slab cache which contains this object size objects? Seth -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: mailto:"d...@kvack.org";> em...@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/3] sched: schedule balance map foundation
On 02/21/2013 07:37 PM, Peter Zijlstra wrote: > On Thu, 2013-02-21 at 12:58 +0800, Michael Wang wrote: >> >> You are right, it cost space in order to accelerate the system, I've >> calculated the cost once before (I'm really not good at this, please >> let >> me know if I make any silly calculation...), > > The exact size isn't that important, but its trivial to see its quadric. > You have a NR_CPUS array per-cpu, thus O(n^2). > > ( side note; invest in getting good at complexity analysis -- or at > least competent, its the single most important aspect of programming. ) > > ... > >> And the final cost is 3000 int and 103 pointer, and some padding, >> but won't bigger than 10M, not a big deal for a system with 1000 cpu >> too. > > Maybe, but quadric stuff should be frowned upon at all times, these > things tend to explode when you least expect it. > > For instance, IIRC the biggest single image system SGI booted had 16k > cpus in there, that ends up at something like 14+14+3=31 aka as 2G of > storage just for your lookup -- that seems somewhat preposterous. Honestly, if I'm a admin who own 16k cpus system (I could not even image how many memory it could have...), I really prefer to exchange 2G memory to gain some performance. I see your point here, the cost of space will grow exponentially, but the memory of system will also grow, and according to my understanding , it's faster. Regards, Michael Wang > > The domain levels are roughly O(log n) related to the total cpus, so > what you're doing is replacing an O(log n) lookup with an O(1) lookup, > but at the cost of O(n^2) storage. > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 0/4] CPUFreq: Implement per policy instances of governors
On 11 February 2013 13:19, Viresh Kumar wrote: > This is targetted for 3.10-rc1 or linux-next just after the merge window. Hi Rafael, I have pushed this patch again with the modifications/fixups i posted: cpufreq-for-3.10 Also i have swapped patch 3 & 4, in case you decide to drop that Kconfig patch which is no. 4 now :) -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 3/4] cpufreq: Add Kconfig option to enable/disable have_multiple_policies
On 22 February 2013 07:59, Rafael J. Wysocki wrote: > On Friday, February 22, 2013 07:44:23 AM Viresh Kumar wrote: >> If you don't like this one then we can add another entry >> into struct policy like: gov_sysfs_parent. > > I don't know. This is going to look kind of ugly this way or another I think. > > Maybe I'll figure out something ... Another simple way of doing this is, leave this patch and here is why i say so. struct policy is allocated dynamically with kzalloc and so every field is zero including have_multiple_policies. And only the platforms needing this feature would make it 1 and all remaining ones would stay unchanged. This variable would waste just "4" bytes for platforms that don't need this feature. About performance: This if/else is called only on policy creation or destruction. For platforms that doesn't have multiple policies and thus all cpus share the same policy struct, the destruction might never happen unless we rmmod/insmod cpufreq driver, because policy destruction would only happen when all the cpus are removed :) So it will execute only once at boot time when we initialize policy struct. Is this patch worth keeping then? -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] Blackfin updates for 3.9
Hi Linus, The following changes since commit 19f949f52599ba7c3f67a5897ac6be14bfcb1200: Linux 3.8 (2013-02-18 15:58:34 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/lliubbo/blackfin.git for-linus for you to fetch changes up to f656c240ae07c48ddf8552e83b64692121044c42: blackfin: time-ts: Remove duplicate assignment (2013-02-20 15:21:23 +0800) Akinobu Mita (1): blackfin: use bitmap library functions Bob Liu (1): blackfin: mem_init: update dmc config register Sonic Zhang (1): blackfin: sync data in blackfin write buffer Stephen Boyd (1): blackfin: time-ts: Remove duplicate assignment Steven Miao (1): blackfin: pm: fix build error arch/blackfin/include/asm/mem_init.h |2 +- arch/blackfin/include/asm/uaccess.h |1 + arch/blackfin/kernel/dma-mapping.c| 23 +++ arch/blackfin/kernel/time-ts.c|6 -- arch/blackfin/mach-common/ints-priority.c |4 5 files changed, 13 insertions(+), 23 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()
On 02/21/2013 06:20 PM, Peter Zijlstra wrote: > On Thu, 2013-02-21 at 12:51 +0800, Michael Wang wrote: >> The old logical when locate affine_sd is: >> >> if prev_cpu != curr_cpu >> if wake_affine() >> prev_cpu = curr_cpu >> new_cpu = select_idle_sibling(prev_cpu) >> return new_cpu >> >> The new logical is same to the old one if prev_cpu == curr_cpu, so >> let's >> simplify the old logical like: >> >> if wake_affine() >> new_cpu = select_idle_sibling(curr_cpu) >> else >> new_cpu = select_idle_sibling(prev_cpu) >> >> return new_cpu >> >> Actually that doesn't make sense. > > It does :-) > >> I think wake_affine() is trying to check whether move a task from >> prev_cpu to curr_cpu will break the balance in affine_sd or not, but >> why >> won't break balance means curr_cpu is better than prev_cpu for >> searching >> the idle cpu? > > It doesn't, the whole affine wakeup stuff is meant to pull waking tasks > towards the cpu that does the wakeup, we limit this by putting bounds on > the imbalance this is may create. > > The reason we want to run tasks on the cpu that does the wakeup is > because that cpu 'obviously' is running something related and it seems > like a good idea to run related tasks close together. > > So look at affine wakeups as a force that groups related tasks. That's right, and it's one point I've missed when judging the wake_affine()... But that's really some benefit hardly to be estimate, especially when the workload is heavy, the cost of wake_affine() is very high to calculated se one by one, is that worth for some benefit we could not promise? According to the testing result, I could not agree this purpose of wake_affine() benefit us, but I'm sure that wake_affine() is a terrible performance killer when system is busy. > >> So the new logical in this patch set is: >> >> new_cpu = select_idle_sibling(prev_cpu) >> if idle_cpu(new_cpu) >> return new_cpu >> >> new_cpu = select_idle_sibling(curr_cpu) >> if idle_cpu(new_cpu) { >> if wake_affine() >> return new_cpu >> } >> >> return prev_cpu >> >> And now, unless we are really going to move load from prev_cpu to >> curr_cpu, we won't use wake_affine() any more. > > That's completely breaks stuff, not cool. Could you please give more details on what's the point you think is bad? Regards, Michael Wang > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: manual merge of the writeback tree with the btrfs tree
Hi Wu, Today's linux-next merge of the writeback tree got a conflict in fs/btrfs/extent-tree.c between commit da633a421701 ("Btrfs: flush all dirty inodes if writeback can not start") from the btrfs tree and commit 10ee27a06cc8 ("vfs: re-implement writeback_inodes_sb(_nr)_if_idle() and rename them") from the writeback tree. I fixed it up (I assumed that the former supercedes the latter and used that) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpBWcad_ACJ7.pgp Description: PGP signature
Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()
On 02/21/2013 05:43 PM, Mike Galbraith wrote: > On Thu, 2013-02-21 at 17:08 +0800, Michael Wang wrote: > >> But is this patch set really cause regression on your Q6600? It may >> sacrificed some thing, but I still think it will benefit far more, >> especially on huge systems. > > We spread on FORK/EXEC, and will no longer will pull communicating tasks > back to a shared cache with the new logic preferring to leave wakee > remote, so while no, I haven't tested (will try to find round tuit) it > seems it _must_ hurt. Dragging data from one llc to the other on Q6600 > hurts a LOT. Every time a client and server are cross llc, it's a huge > hit. The previous logic pulled communicating tasks together right when > it matters the most, intermittent load... or interactive use. I agree that this is a problem need to be solved, but don't agree that wake_affine() is the solution. According to my understanding, in the old world, wake_affine() will only be used if curr_cpu and prev_cpu share cache, which means they are in one package, whatever search in llc sd of curr_cpu or prev_cpu, we won't have the chance to spread the task out of that package. I'm going to recover the logical that only do select_idle_sibling() when prev_cpu and curr_cpu are affine, so now the new logical will only prefer leaving task in old package if both prev_cpu and curr_cpu are in that package, I think this could solve the problem, isn't it? Regards, Michael Wang > > -Mike > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kexec: prevent double free on image allocation failure
On 02/21/2013 08:55 PM, ebied...@xmission.com wrote: > Sasha Levin writes: > >> If kimage_normal_alloc() fails to initialize an allocated kimage, it will >> free >> the image but would still set 'rimage', as a result kexec_load will try >> to free it again. >> >> This would explode as part of the freeing process is accessing internal >> members which point to uninitialized memory. > > Agreed. > > I don't think that failure path has ever actually been exercised. trinity is actually quite good at hitting that, which is how I discovered it: [ 418.138251] Could not allocate control_code_buffer [ 418.143739] general protection fault: [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 418.147131] Dumping ftrace buffer: [ 418.147901](ftrace buffer empty) [ 418.148697] Modules linked in: [ 418.153440] CPU 1 [ 418.153440] Pid: 18098, comm: trinity Tainted: GW 3.8.0-next-20130220-sasha-00037-gc07b3b2-dirty #7 [ 418.153440] RIP: 0010:[] [] kimage_free_page_list+0x16/0x50 [ 418.153440] RSP: 0018:88009bfade78 EFLAGS: 00010292 [ 418.153440] RAX: 00180004 RBX: 0002 RCX: [ 418.153440] RDX: 88009c1a RSI: 0001 RDI: 6b6b6b6b6b6b6b6b [ 418.153440] RBP: 88009bfade98 R08: 2782 R09: [ 418.153440] R10: R11: R12: 88009c6cb4d0 [ 418.153440] R13: 88009c6cb720 R14: 88009c6cb4d0 R15: 00f6 [ 418.153440] FS: 7fb7eb95b700() GS:8800bb80() knlGS: [ 418.153440] CS: 0010 DS: ES: CR0: 80050033 [ 418.153440] CR2: 004808e0 CR3: 9eaaa000 CR4: 000406e0 [ 418.153440] DR0: DR1: DR2: [ 418.153440] DR3: DR6: 0ff0 DR7: 0400 [ 418.153440] Process trinity (pid: 18098, threadinfo 88009bfac000, task 88009c1a) [ 418.153440] Stack: [ 418.153440] 8546e948 0002 88009c6cb4d0 [ 418.153440] 88009bfaded8 8119b60f 0002 0002 [ 418.153440] 88009c6cb4d0 88009c6cb4d0 fff4 00f6 [ 418.153440] Call Trace: [ 418.153440] [] kimage_free+0x2f/0x100 [ 418.153440] [] sys_kexec_load+0x593/0x660 [ 418.153440] [] ? trace_hardirqs_on+0xd/0x10 [ 418.153440] [] tracesys+0xe1/0xe6 [ 418.153440] Code: c1 ef 0c 55 48 c1 e7 06 48 89 e5 48 01 c7 e8 82 ff ff ff 5d c3 55 48 89 e5 41 55 49 89 fd 41 54 53 48 83 ec 08 48 8b 3f 49 39 fd <48> 8b 1f 75 08 eb 22 0f 1f 00 48 89 d3 4c 8d 67 e0 e8 54 a6 8a [ 418.153440] RIP [] kimage_free_page_list+0x16/0x50 [ 418.153440] RSP [ 418.219646] ---[ end trace 0adb1d6b71fefb29 ]--- Thanks, Sasha -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 4/4] cpufreq: Get rid of "struct global_attr"
On 22 February 2013 08:03, Rafael J. Wysocki wrote: > On Friday, February 22, 2013 07:47:44 AM Viresh Kumar wrote: >> On 22 February 2013 05:15, Rafael J. Wysocki wrote: >> > Why did you change all of the lines of this macro instead of changing just >> > the >> > one line you needed to change? >> >> I didn't like the indentation used within the macro. So did it. > > In general, things like that are for separate cleanup patches. If you mix > functional changes with cleanups, poeple get confused and it's difficult to > see > what's needed and what's "optional". > > I know it's tempting to fix stuff like that along with doing functional > changes and I do that sometimes. Not very often, though, and with care. Even i give similar comments sometimes but forget these while writing my patches :) Anyway, fixup: commit b1bbb99467d56140cf3a8a2b70e61b456aa46e48 Author: Viresh Kumar Date: Fri Feb 22 07:59:20 2013 +0530 fixup! cpufreq: Get rid of "struct global_attr" --- drivers/cpufreq/intel_pstate.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index e795134..49846b9 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -273,12 +273,12 @@ static void intel_pstate_debug_expose_params(void) /** debugfs end / /** sysfs begin / -#define show_one(file_name, object)\ -static ssize_t show_##file_name\ -(struct cpufreq_policy *policy, char *buf) \ -{ \ - return sprintf(buf, "%u\n", limits.object); \ -} +#define show_one(file_name, object)\ + static ssize_t show_##file_name \ + (struct cpufreq_policy *policy, char *buf) \ + { \ + return sprintf(buf, "%u\n", limits.object); \ + } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/4] cpufreq: Add per policy governor-init/exit infrastructure
On 22 February 2013 05:05, Rafael J. Wysocki wrote: > Why don't you use different values here? > > If you need only one value, one #define should be sufficient. This is the fixup i have for this, I will push all patches again to cpufreq-for-3.10 branch: -- commit 4d7296fb64f2353aafad5104f0a046466d0f4ea9 Author: Viresh Kumar Date: Fri Feb 22 07:56:31 2013 +0530 fixup! cpufreq: Add per policy governor-init/exit infrastructure --- include/linux/cpufreq.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 3b822ce..b7393b5 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -183,7 +183,7 @@ static inline unsigned long cpufreq_scale(unsigned long old, u_int div, u_int mu #define CPUFREQ_GOV_STOP 2 #define CPUFREQ_GOV_LIMITS 3 #define CPUFREQ_GOV_POLICY_INIT4 -#define CPUFREQ_GOV_POLICY_EXIT4 +#define CPUFREQ_GOV_POLICY_EXIT5 struct cpufreq_governor { charname[CPUFREQ_NAME_LEN]; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/