Re: [PATCH 1/1] pwm: Use pr_* functions in pwm-samsung.c file
On Fri, Jul 06, 2012 at 02:43:50PM +0530, Sachin Kamat wrote: > Replace printk with pr_* functions to avoid checkpatch warnings. > > Signed-off-by: Sachin Kamat > --- > drivers/pwm/pwm-samsung.c |6 -- > 1 files changed, 4 insertions(+), 2 deletions(-) Applied, thanks, Thierry > > diff --git a/drivers/pwm/pwm-samsung.c b/drivers/pwm/pwm-samsung.c > index 35fa0e8..d103865 100644 > --- a/drivers/pwm/pwm-samsung.c > +++ b/drivers/pwm/pwm-samsung.c > @@ -11,6 +11,8 @@ > * the Free Software Foundation; either version 2 of the License. > */ > > +#define pr_fmt(fmt) "pwm-samsung: " fmt > + > #include > #include > #include > @@ -340,13 +342,13 @@ static int __init pwm_init(void) > clk_scaler[1] = clk_get(NULL, "pwm-scaler1"); > > if (IS_ERR(clk_scaler[0]) || IS_ERR(clk_scaler[1])) { > - printk(KERN_ERR "%s: failed to get scaler clocks\n", __func__); > + pr_err("failed to get scaler clocks\n"); > return -EINVAL; > } > > ret = platform_driver_register(_pwm_driver); > if (ret) > - printk(KERN_ERR "%s: failed to add pwm driver\n", __func__); > + pr_err("failed to add pwm driver\n"); > > return ret; > } > -- > 1.7.4.1 > > > pgpsYSC8a6JMF.pgp Description: PGP signature
Antw: Re: /sys and access(2): Correctly implemented?
>>> Ryan Mallon schrieb am 09.07.2012 um 01:24 in Nachricht <4ffa16b6.9050...@gmail.com>: > On 06/07/12 16:27, Ulrich Windl wrote: > > Hi! > > > > Recently I found a problem with the command (kernel 3.0.34-0.7-default from > SLES 11 SP2, run as root): > > test -r "$file" && cat "$file" > > emitting "Permission denied" > > > > Investigating, I found that "test" actually uses "access()" to check for > permissions. Unfortunately there are some files in /sys that have > "write-only" > permission bits set (e.g. /sys/devices/system/cpu/probe). > > > > ~ # ll /sys/devices/system/cpu/probe > > --w--- 1 root root 4096 Jun 29 12:43 /sys/devices/system/cpu/probe > > ~ # F=/sys/devices/system/cpu/probe > > ~ # test "$F" && cat "$F" > > cat: /sys/devices/system/cpu/probe: Permission denied > > Looks like you have a typo here, I think you wanted "test -r $F", not > "test $F", the latter will just evaluate "$F" as an expression which > will be true, and so you get the permission denied error running cat. Hi! You are right: It's a typo, but only in the message; the actual test was done correctly, and the outcome is quite the same. > > Using "test -r $F" on a write-only sysfs file correctly returns false on > my machine (Ubuntu 10.04.4 LTS/2.6.32-41-generic). Not here, unfortunately: # ll /sys/devices/system/cpu/probe --w--- 1 root root 4096 Jul 2 11:52 /sys/devices/system/cpu/probe # F=/sys/devices/system/cpu/probe # test -r "$F" && cat "$F" cat: /sys/devices/system/cpu/probe: Permission denied # uname -a Linux h07 2.6.32.59-0.3-default #1 SMP 2012-04-27 11:14:44 +0200 x86_64 x86_64 x86_64 GNU/Linux Regards, Ulrich -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC 2/2] kvm PLE handler: Choose better candidate for directed yield
From: Raghavendra K T Currently PLE handler can repeatedly do a directed yield to same vcpu that has recently done PL exit. This can degrade the performance Try to yield to most eligible guy instead, by alternate yielding. Precisely, give chance to a VCPU which has: (a) Not done PLE exit at all (probably he is preempted lock-holder) (b) VCPU skipped in last iteration because it did PL exit, and probably has become eligible now (next eligible lock holder) Signed-off-by: Raghavendra K T --- arch/s390/include/asm/kvm_host.h |5 + arch/x86/include/asm/kvm_host.h |2 +- arch/x86/kvm/x86.c | 14 ++ virt/kvm/kvm_main.c |3 +++ 4 files changed, 23 insertions(+), 1 deletions(-) diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index dd17537..884f2c4 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -256,5 +256,10 @@ struct kvm_arch{ struct gmap *gmap; }; +static inline bool kvm_arch_vcpu_check_and_update_eligible(struct kvm_vcpu *v) +{ + return true; +} + extern int sie64a(struct kvm_s390_sie_block *, u64 *); #endif diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 857ca68..ce01db3 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -962,7 +962,7 @@ extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn); void kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err); int kvm_is_in_guest(void); - +bool kvm_arch_vcpu_check_and_update_eligible(struct kvm_vcpu *vcpu); void kvm_pmu_init(struct kvm_vcpu *vcpu); void kvm_pmu_destroy(struct kvm_vcpu *vcpu); void kvm_pmu_reset(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 07dbd14..24ceae8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6623,6 +6623,20 @@ bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu) kvm_x86_ops->interrupt_allowed(vcpu); } +bool kvm_arch_vcpu_check_and_update_eligible(struct kvm_vcpu *vcpu) +{ + bool eligible; + + eligible = !vcpu->arch.plo.pause_loop_exited || + (vcpu->arch.plo.pause_loop_exited && +vcpu->arch.plo.dy_eligible); + + if (vcpu->arch.plo.pause_loop_exited) + vcpu->arch.plo.dy_eligible = !vcpu->arch.plo.dy_eligible; + + return eligible; +} + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7e14068..519321a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1595,6 +1595,9 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) continue; if (waitqueue_active(>wq)) continue; + if (!kvm_arch_vcpu_check_and_update_eligible(vcpu)) { + continue; + } if (kvm_vcpu_yield_to(vcpu)) { kvm->last_boosted_vcpu = i; yielded = 1; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC 0/2] kvm: Improving directed yield in PLE handler
Currently Pause Looop Exit (PLE) handler is doing directed yield to a random VCPU on PL exit. Though we already have filtering while choosing the candidate to yield_to, we can do better. Problem is, for large vcpu guests, we have more probability of yielding to a bad vcpu. We are not able to prevent directed yield to same guy who has done PL exit recently, who perhaps spins again and wastes CPU. Fix that by keeping track of who has done PL exit. So The Algorithm in series give chance to a VCPU which has: (a) Not done PLE exit at all (probably he is preempted lock-holder) (b) VCPU skipped in last iteration because it did PL exit, and probably has become eligible now (next eligible lock holder) Future enhancemnets: (1) Currently we have a boolean to decide on eligibility of vcpu. It would be nice if I get feedback on guest (>32 vcpu) whether we can improve better with integer counter. (with counter = say f(log n )). (2) We have not considered system load during iteration of vcpu. With that information we can limit the scan and also decide whether schedule() is better. [ I am able to use #kicked vcpus to decide on this But may be there are better ideas like information from global loadavg.] (3) We can exploit this further with PV patches since it also knows about next eligible lock-holder. Summary: There is a huge improvement for moderate / no overcommit scenario for kvm based guest on PLE machine (which is difficult ;) ). Result: Base : kernel 3.5.0-rc5 with Rik's Ple handler fix Machine : Intel(R) Xeon(R) CPU X7560 @ 2.27GHz, 4 numa node, 256GB RAM, 32 core machine Host: enterprise linux gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) with test kernels Guest: fedora 16 with 32 vcpus 8GB memory. Benchmarks: 1) kernbench: kernbench-0.5 (kernbench -f -H -M -o 2*vcpu) Very first run in kernbench is omitted. 2) sysbench: 0.4.12 sysbench --test=oltp --db-driver=pgsql prepare sysbench --num-threads=2*vcpu --max-requests=10 --test=oltp --oltp-table-size=50 --db-driver=pgsql --oltp-read-only run Note that driver for this pgsql. 3) ebizzy: release 0.3 cmd: ebizzy -S 120 1) kernbench (time in sec lesser is better) +---+---+---++---+ base_rikstdev patched stdev %improve +---+---+---++---+ 1x 49.2300 1.0171 38.3792 1.3659 28.27261% 2x 91.9358 1.7768 85.8842 1.6654 7.04623% +---+---+---++---+ 2) sysbench (time in sec lesser is better) +---+---+---++---+ base_rikstdev patched stdev %improve +---+---+---++---+ 1x 12.1623 0.0942 12.1674 0.3126-0.04192% 2x 14.3069 0.8520 14.1879 0.6811 0.83874% +---+---+---++---+ Note that 1x scenario differs in only third decimal place and degradation/improvemnet for sysbench will not be seen even with higher confidence interval. 3) ebizzy (records/sec more is better) +---+---+---++---+ base_rikstdev patched stdev %improve +---+---+---++---+ 1x 1129.2500 28.67932316.625053.0066 105.14722% 2x 1892.3750 75.11122386.5000 168.8033 26.11137% +---+---+---++---+ kernbench 1x: 4 fast runs = 12 runs avg kernbench 2x: 4 fast runs = 12 runs avg sysbench 1x: 8runs avg sysbench 2x: 8runs avg ebizzy 1x: 8runs avg ebizzy 2x: 8runs avg Thanks Vatsa and Srikar for brainstorming discussions regarding optimizations. Raghavendra K T (2): kvm vcpu: Note down pause loop exit kvm PLE handler: Choose better candidate for directed yield arch/s390/include/asm/kvm_host.h |5 + arch/x86/include/asm/kvm_host.h |9 - arch/x86/kvm/svm.c |1 + arch/x86/kvm/vmx.c |1 + arch/x86/kvm/x86.c | 18 +- virt/kvm/kvm_main.c |3 +++ 6 files changed, 35 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC 1/2] kvm vcpu: Note down pause loop exit
Signed-off-by: Raghavendra K T Noting pause loop exited vcpu helps in filtering right candidate to yield. Yielding to same vcpu may result in more wastage of cpu. From: Raghavendra K T --- arch/x86/include/asm/kvm_host.h |7 +++ arch/x86/kvm/svm.c |1 + arch/x86/kvm/vmx.c |1 + arch/x86/kvm/x86.c |4 +++- 4 files changed, 12 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index db7c1f2..857ca68 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -484,6 +484,13 @@ struct kvm_vcpu_arch { u64 length; u64 status; } osvw; + + /* Pause loop exit optimization */ + struct { + bool pause_loop_exited; + bool dy_eligible; + } plo; + }; struct kvm_lpage_info { diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f75af40..a492f5d 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3264,6 +3264,7 @@ static int interrupt_window_interception(struct vcpu_svm *svm) static int pause_interception(struct vcpu_svm *svm) { + svm->vcpu.arch.plo.pause_loop_exited = true; kvm_vcpu_on_spin(&(svm->vcpu)); return 1; } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 32eb588..600fb3c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4945,6 +4945,7 @@ out: static int handle_pause(struct kvm_vcpu *vcpu) { skip_emulated_instruction(vcpu); + vcpu->arch.plo.pause_loop_exited = true; kvm_vcpu_on_spin(vcpu); return 1; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index be6d549..07dbd14 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5331,7 +5331,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (req_immediate_exit) smp_send_reschedule(vcpu->cpu); - + vcpu->arch.plo.pause_loop_exited = false; kvm_guest_enter(); if (unlikely(vcpu->arch.switch_db_regs)) { @@ -6168,6 +6168,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) BUG_ON(vcpu->kvm == NULL); kvm = vcpu->kvm; + vcpu->arch.plo.pause_loop_exited = false; + vcpu->arch.plo.dy_eligible = true; vcpu->arch.emulate_ctxt.ops = _ops; if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu)) vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pwm-backlight: add regulator and GPIO support
On 07/09/2012 02:19 PM, Jingoo Han wrote: I couldn't agree with Stephen Warren more. Could you support DT and non-DT case for backwards compatibility? Both cases are handled in the new version I just sent. I hope all other concerns have also been addressed properly. If I forgot something please ping me. Alex. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH V2 1/3] power sequences interpreter for device tree
Some device drivers (panel backlights especially) need to follow precise sequences for powering on and off, involving gpios, regulators, PWMs with a precise powering order and delays to respect between each steps. These sequences are board-specific, and do not belong to a particular driver - therefore they have been performed by board-specific hook functions to far. With the advent of the device tree, we cannot rely of board-specific hooks anymore, but still need a way to implement these sequences in a portable manner. This patch introduces a simple interpreter that can execute such power sequences encoded either as platform data or within the device tree. Signed-off-by: Alexandre Courbot --- drivers/video/backlight/Makefile| 2 +- drivers/video/backlight/power_seq.c | 298 drivers/video/backlight/pwm_bl.c| 3 +- include/linux/power_seq.h | 96 4 files changed, 397 insertions(+), 2 deletions(-) create mode 100644 drivers/video/backlight/power_seq.c create mode 100644 include/linux/power_seq.h diff --git a/drivers/video/backlight/Makefile b/drivers/video/backlight/Makefile index a2ac9cf..6bff124 100644 --- a/drivers/video/backlight/Makefile +++ b/drivers/video/backlight/Makefile @@ -28,7 +28,7 @@ obj-$(CONFIG_BACKLIGHT_OMAP1) += omap1_bl.o obj-$(CONFIG_BACKLIGHT_PANDORA)+= pandora_bl.o obj-$(CONFIG_BACKLIGHT_PROGEAR) += progear_bl.o obj-$(CONFIG_BACKLIGHT_CARILLO_RANCH) += cr_bllcd.o -obj-$(CONFIG_BACKLIGHT_PWM)+= pwm_bl.o +obj-$(CONFIG_BACKLIGHT_PWM)+= pwm_bl.o power_seq.o obj-$(CONFIG_BACKLIGHT_DA903X) += da903x_bl.o obj-$(CONFIG_BACKLIGHT_DA9052) += da9052_bl.o obj-$(CONFIG_BACKLIGHT_MAX8925)+= max8925_bl.o diff --git a/drivers/video/backlight/power_seq.c b/drivers/video/backlight/power_seq.c new file mode 100644 index 000..f54cb7d --- /dev/null +++ b/drivers/video/backlight/power_seq.c @@ -0,0 +1,298 @@ +#include +#include +#include +#include +#include +#include +#include +#include + +#define PWM_SEQ_TYPE(type) [POWER_SEQ_ ## type] = #type +static const char *pwm_seq_types[] = { + PWM_SEQ_TYPE(STOP), + PWM_SEQ_TYPE(DELAY), + PWM_SEQ_TYPE(REGULATOR), + PWM_SEQ_TYPE(PWM), + PWM_SEQ_TYPE(GPIO), +}; +#undef PWM_SEQ_TYPE + +static bool power_seq_step_run(struct power_seq_step *step) +{ + switch (step->type) { + case POWER_SEQ_DELAY: + msleep(step->parameter); + break; + case POWER_SEQ_REGULATOR: + if (step->parameter) + regulator_enable(step->resource->regulator); + else + regulator_disable(step->resource->regulator); + break; + case POWER_SEQ_PWM: + if (step->parameter) + pwm_enable(step->resource->pwm); + else + pwm_disable(step->resource->pwm); + break; + case POWER_SEQ_GPIO: + gpio_set_value_cansleep(step->resource->gpio, step->parameter); + break; + /* should never happen since we verify the data when building it */ + default: + return -EINVAL; + } + + return 0; +} + +int power_seq_run(power_seq *seq) +{ + int err; + + if (!seq) return 0; + + while (seq->type != POWER_SEQ_STOP) { + if ((err = power_seq_step_run(seq++))) { + return err; + } + } + + return 0; +} + +static int of_parse_power_seq_step(struct device *dev, struct property *prop, + struct platform_power_seq_step *seq, + int max_steps) +{ + void *value = prop->value; + void *end = prop->value + prop->length; + int slen, smax, cpt = 0, i, ret; + char tmp_buf[32]; + + while (value < end && cpt < max_steps) { + smax = value - end; + slen = strnlen(value, end - value); + + /* Unterminated string / not a string? */ + if (slen >= end - value) + goto invalid_seq; + + /* Find a matching sequence step type */ + for (i = 0; i < POWER_SEQ_MAX; i++) + if (!strcmp(value, pwm_seq_types[i])) + break; + + if (i >= POWER_SEQ_MAX) + goto unknown_step; + + value += slen + 1; + + seq[cpt].type = i; + switch (seq[cpt].type) { + case POWER_SEQ_DELAY: + /* integer parameter */ + seq[cpt].parameter = be32_to_cpup(value); + value += sizeof(__be32); + break; + case POWER_SEQ_REGULATOR: + case POWER_SEQ_PWM: + case POWER_SEQ_GPIO: + /* consumer string */ +
[RFC][PATCH V2 3/3] tegra: add pwm backlight device tree nodes
Signed-off-by: Alexandre Courbot --- arch/arm/boot/dts/tegra20-ventana.dts | 31 +++ arch/arm/boot/dts/tegra20.dtsi| 2 +- 2 files changed, 32 insertions(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/tegra20-ventana.dts b/arch/arm/boot/dts/tegra20-ventana.dts index be90544..c67d9e1 100644 --- a/arch/arm/boot/dts/tegra20-ventana.dts +++ b/arch/arm/boot/dts/tegra20-ventana.dts @@ -317,6 +317,37 @@ bus-width = <8>; }; + backlight { + compatible = "pwm-backlight"; + brightness-levels = <0 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 255>; + default-brightness-level = <12>; + + pwms = < 2 500>; + pwm-names = "backlight"; + power-supply = <_reg>; + enable-gpios = < 28 0>; + + power-on-sequence = "REGULATOR", "power", <1>, + "DELAY", <10>, + "PWM", "backlight", <1>, + "GPIO", "enable", <1>; + power-off-sequence = "GPIO", "enable", <0>, +"PWM", "backlight", <0>, +"DELAY", <10>, +"REGULATOR", "power", <0>; + }; + + backlight_reg: fixedregulator@176 { + compatible = "regulator-fixed"; + regulator-name = "backlight_regulator"; + regulator-min-microvolt = <180>; + regulator-max-microvolt = <180>; + gpio = < 176 0>; + startup-delay-us = <0>; + enable-active-high; + regulator-boot-off; + }; + sound { compatible = "nvidia,tegra-audio-wm8903-ventana", "nvidia,tegra-audio-wm8903"; diff --git a/arch/arm/boot/dts/tegra20.dtsi b/arch/arm/boot/dts/tegra20.dtsi index 405d167..67a6cd9 100644 --- a/arch/arm/boot/dts/tegra20.dtsi +++ b/arch/arm/boot/dts/tegra20.dtsi @@ -123,7 +123,7 @@ status = "disabled"; }; - pwm { + pwm: pwm { compatible = "nvidia,tegra20-pwm"; reg = <0x7000a000 0x100>; #pwm-cells = <2>; -- 1.7.11.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH V2 2/3] pwm_backlight: use power sequences
Make use of the power sequences specified in the device tree or platform data, if any. Signed-off-by: Alexandre Courbot --- .../bindings/video/backlight/pwm-backlight.txt | 28 ++- drivers/video/backlight/power_seq.c| 44 ++--- drivers/video/backlight/pwm_bl.c | 210 +++-- include/linux/pwm_backlight.h | 37 +++- 4 files changed, 239 insertions(+), 80 deletions(-) diff --git a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt index 1e4fc72..86c9253 100644 --- a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt +++ b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt @@ -2,7 +2,10 @@ pwm-backlight bindings Required properties: - compatible: "pwm-backlight" - - pwms: OF device-tree PWM specification (see PWM binding[0]) + - pwms: OF device-tree PWM specification (see PWM binding[0]). Exactly one PWM + must be specified + - pwm-names: a list of names for the PWM devices specified in the + "pwms" property (see PWM binding[0]) - brightness-levels: Array of distinct brightness levels. Typically these are in the range from 0 to 255, but any range starting at 0 will do. The actual brightness level (PWM duty cycle) will be interpolated @@ -10,10 +13,18 @@ Required properties: last value in the array represents a 100% duty cycle (brightest). - default-brightness-level: the default brightness level (index into the array defined by the "brightness-levels" property) + - power-on-sequence: Power sequence that will bring the backlight on. This + sequence must reference the PWM specified in the pwms property by its + name. It can also reference extra GPIOs or regulators, and introduce + delays between sequence steps + - power-off-sequence: Power sequence that will bring the backlight off. This + sequence must reference the PWM specified in the pwms property by its + name. It can also reference extra GPIOs or regulators, and introduce + delays between sequence steps Optional properties: - - pwm-names: a list of names for the PWM devices specified in the - "pwms" property (see PWM binding[0]) + - *-supply: a reference to a regulator used within a power sequence + - *-gpios: a reference to a GPIO used within a power sequence. [0]: Documentation/devicetree/bindings/pwm/pwm.txt @@ -22,7 +33,18 @@ Example: backlight { compatible = "pwm-backlight"; pwms = < 0 500>; + pwm-names = "backlight"; brightness-levels = <0 4 8 16 32 64 128 255>; default-brightness-level = <6>; + power-supply = <_reg>; + enable-gpios = < 6 0>; + power-on-sequence = "REGULATOR", "power", <1>, + "DELAY", <10>, + "PWM", "backlight", <1>, + "GPIO", "enable", <1>; + power-off-sequence = "GPIO", "enable", <0>, +"PWM", "backlight", <0>, +"DELAY", <10>, +"REGULATOR", "power", <0>; }; diff --git a/drivers/video/backlight/power_seq.c b/drivers/video/backlight/power_seq.c index f54cb7d..f8737db 100644 --- a/drivers/video/backlight/power_seq.c +++ b/drivers/video/backlight/power_seq.c @@ -118,9 +118,9 @@ static int of_parse_power_seq_step(struct device *dev, struct property *prop, tmp_buf[sizeof(tmp_buf) - 6] = 0; strcat(tmp_buf, "-gpios"); ret = of_get_named_gpio(dev->of_node, tmp_buf, 0); - if (ret >= 0) + if (ret >= 0) { seq[cpt].value = ret; - else { + } else { if (ret != -EPROBE_DEFER) dev_err(dev, "cannot get gpio \"%s\"\n", seq[cpt].id); @@ -218,26 +218,26 @@ power_seq *power_seq_build(struct device *dev, power_seq_resources *ress, seq->type = pseq->type; switch (pseq->type) { - case POWER_SEQ_REGULATOR: - case POWER_SEQ_GPIO: - case POWER_SEQ_PWM: - if (!(res = power_seq_find_resource(ress, pseq))) { - /* create resource node */ - res = devm_kzalloc(dev, sizeof(*res), - GFP_KERNEL); - if (!res) - return ERR_PTR(-ENOMEM); -
[RFC][PATCHv2 0/3] Power sequences interpreter for pwm_backlight
This is a RFC since this patch largely drifted beyond its original goal of supporting one GPIO and one regulator for the pwm_backlight driver. The issue to address is that backlight power sequences, which were implemented using board-specific callbacks so far, could not be used with the device tree. This series of patches adds a small power sequence interpreter that allows to acquire and control regulators, GPIOs, and PWMs during sequences defined in the device tree. It is easy to use, low-footprint, and takes care of managing the resources that it acquires. The implementation is working and should be complete, but documentation is lacking. Also since the interpreter could be used by other drivers (which ones?), it may make sense to have it in a better place than drivers/video/backlight/. The tegra device tree nodes are just here as an example usage. Alexandre Courbot (3): Power sequences interpreter for device tree pwm-backlight: use power sequences tegra: add pwm backlight device tree nodes .../bindings/video/backlight/pwm-backlight.txt | 28 +- arch/arm/boot/dts/tegra20-ventana.dts | 31 +++ arch/arm/boot/dts/tegra20.dtsi | 2 +- drivers/video/backlight/Makefile | 2 +- drivers/video/backlight/power_seq.c| 298 + drivers/video/backlight/pwm_bl.c | 212 +++ include/linux/power_seq.h | 96 +++ include/linux/pwm_backlight.h | 37 ++- 8 files changed, 645 insertions(+), 61 deletions(-) create mode 100644 drivers/video/backlight/power_seq.c create mode 100644 include/linux/power_seq.h -- 1.7.11.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: manual merge of the arm-soc tree with the gpio-lw tree
Hi all, Today's linux-next merge of the arm-soc tree got a conflict in drivers/gpio/gpio-mxc.c between commit fef2bca203e9 ("gpio/mxc: use the edge_sel feature if available") from the gpio-lw tree and commit 1ab7ef158dfb ("gpio/mxc: move irq_domain_add_legacy call into gpio driver") from the arm-soc tree. I fixed it up (I think - see below) and can carry the fix as necessary. -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc drivers/gpio/gpio-mxc.c index f45bb54,e5db670..000 --- a/drivers/gpio/gpio-mxc.c +++ b/drivers/gpio/gpio-mxc.c @@@ -184,19 -160,15 +184,19 @@@ static int gpio_set_irq_type(struct irq edge = GPIO_INT_FALL_EDGE; break; case IRQ_TYPE_EDGE_BOTH: - val = gpio_get_value(gpio); - if (val) { - edge = GPIO_INT_LOW_LEV; - pr_debug("mxc: set GPIO %d to low trigger\n", gpio); + if (GPIO_EDGE_SEL >= 0) { + edge = GPIO_INT_BOTH_EDGES; } else { - edge = GPIO_INT_HIGH_LEV; - pr_debug("mxc: set GPIO %d to high trigger\n", gpio); + val = gpio_get_value(gpio); + if (val) { + edge = GPIO_INT_LOW_LEV; + pr_debug("mxc: set GPIO %d to low trigger\n", gpio); + } else { + edge = GPIO_INT_HIGH_LEV; + pr_debug("mxc: set GPIO %d to high trigger\n", gpio); + } - port->both_edges |= 1 << (gpio & 31); ++ port->both_edges |= 1 << gpio_idx; } - port->both_edges |= 1 << gpio_idx; break; case IRQ_TYPE_LEVEL_LOW: edge = GPIO_INT_LOW_LEV; @@@ -208,24 -180,11 +208,24 @@@ return -EINVAL; } - reg += GPIO_ICR1 + ((gpio_idx & 0x10) >> 2); /* ICR1 or ICR2 */ - bit = gpio_idx & 0xf; - val = readl(reg) & ~(0x3 << (bit << 1)); - writel(val | (edge << (bit << 1)), reg); + if (GPIO_EDGE_SEL >= 0) { + val = readl(port->base + GPIO_EDGE_SEL); + if (edge == GPIO_INT_BOTH_EDGES) - writel(val | (1 << (gpio & 0x1f)), ++ writel(val | (1 << gpio_idx), + port->base + GPIO_EDGE_SEL); + else - writel(val & ~(1 << (gpio & 0x1f)), ++ writel(val & ~(1 << gpio_idx), + port->base + GPIO_EDGE_SEL); + } + + if (edge != GPIO_INT_BOTH_EDGES) { - reg += GPIO_ICR1 + ((gpio & 0x10) >> 2); /* lower or upper register */ - bit = gpio & 0xf; ++ reg += GPIO_ICR1 + ((gpio_idx & 0x10) >> 2); /* ICR1 or ICR2 */ ++ bit = gpio_idx & 0xf; + val = readl(reg) & ~(0x3 << (bit << 1)); + writel(val | (edge << (bit << 1)), reg); + } + - writel(1 << (gpio & 0x1f), port->base + GPIO_ISR); + writel(1 << gpio_idx, port->base + GPIO_ISR); return 0; } pgpvSImW71mr8.pgp Description: PGP signature
[PATCHv3] pwm_backlight: pass correct brightness to callback
pwm_backlight_update_status calls the notify() and notify_after() callbacks before and after applying the new PWM settings. However, if brightness levels are used, the brightness value will be changed from the index into the levels array to the PWM duty cycle length before being passed to notify_after(), which results in inconsistent behavior. Signed-off-by: Alexandre Courbot --- drivers/video/backlight/pwm_bl.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/video/backlight/pwm_bl.c b/drivers/video/backlight/pwm_bl.c index 057389d..be48517 100644 --- a/drivers/video/backlight/pwm_bl.c +++ b/drivers/video/backlight/pwm_bl.c @@ -54,14 +54,17 @@ static int pwm_backlight_update_status(struct backlight_device *bl) pwm_config(pb->pwm, 0, pb->period); pwm_disable(pb->pwm); } else { + int duty_cycle; if (pb->levels) { - brightness = pb->levels[brightness]; + duty_cycle = pb->levels[brightness]; max = pb->levels[max]; + } else { + duty_cycle = brightness; } - brightness = pb->lth_brightness + - (brightness * (pb->period - pb->lth_brightness) / max); - pwm_config(pb->pwm, brightness, pb->period); + duty_cycle = pb->lth_brightness + +(duty_cycle * (pb->period - pb->lth_brightness) / max); + pwm_config(pb->pwm, duty_cycle, pb->period); pwm_enable(pb->pwm); } -- 1.7.11.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv3] pwm_backlight: pass correct brightness to callback
pwm_backlight_update_status calls the notify() and notify_after() callbacks before and after applying the new PWM settings. However, if brightness levels are used, the brightness value will be changed from the index into the levels array to the PWM duty cycle length before being passed to notify_after(), which results in inconsistent behavior. Signed-off-by: Alexandre Courbot acour...@nvidia.com --- drivers/video/backlight/pwm_bl.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/video/backlight/pwm_bl.c b/drivers/video/backlight/pwm_bl.c index 057389d..be48517 100644 --- a/drivers/video/backlight/pwm_bl.c +++ b/drivers/video/backlight/pwm_bl.c @@ -54,14 +54,17 @@ static int pwm_backlight_update_status(struct backlight_device *bl) pwm_config(pb-pwm, 0, pb-period); pwm_disable(pb-pwm); } else { + int duty_cycle; if (pb-levels) { - brightness = pb-levels[brightness]; + duty_cycle = pb-levels[brightness]; max = pb-levels[max]; + } else { + duty_cycle = brightness; } - brightness = pb-lth_brightness + - (brightness * (pb-period - pb-lth_brightness) / max); - pwm_config(pb-pwm, brightness, pb-period); + duty_cycle = pb-lth_brightness + +(duty_cycle * (pb-period - pb-lth_brightness) / max); + pwm_config(pb-pwm, duty_cycle, pb-period); pwm_enable(pb-pwm); } -- 1.7.11.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: manual merge of the arm-soc tree with the gpio-lw tree
Hi all, Today's linux-next merge of the arm-soc tree got a conflict in drivers/gpio/gpio-mxc.c between commit fef2bca203e9 (gpio/mxc: use the edge_sel feature if available) from the gpio-lw tree and commit 1ab7ef158dfb (gpio/mxc: move irq_domain_add_legacy call into gpio driver) from the arm-soc tree. I fixed it up (I think - see below) and can carry the fix as necessary. -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc drivers/gpio/gpio-mxc.c index f45bb54,e5db670..000 --- a/drivers/gpio/gpio-mxc.c +++ b/drivers/gpio/gpio-mxc.c @@@ -184,19 -160,15 +184,19 @@@ static int gpio_set_irq_type(struct irq edge = GPIO_INT_FALL_EDGE; break; case IRQ_TYPE_EDGE_BOTH: - val = gpio_get_value(gpio); - if (val) { - edge = GPIO_INT_LOW_LEV; - pr_debug(mxc: set GPIO %d to low trigger\n, gpio); + if (GPIO_EDGE_SEL = 0) { + edge = GPIO_INT_BOTH_EDGES; } else { - edge = GPIO_INT_HIGH_LEV; - pr_debug(mxc: set GPIO %d to high trigger\n, gpio); + val = gpio_get_value(gpio); + if (val) { + edge = GPIO_INT_LOW_LEV; + pr_debug(mxc: set GPIO %d to low trigger\n, gpio); + } else { + edge = GPIO_INT_HIGH_LEV; + pr_debug(mxc: set GPIO %d to high trigger\n, gpio); + } - port-both_edges |= 1 (gpio 31); ++ port-both_edges |= 1 gpio_idx; } - port-both_edges |= 1 gpio_idx; break; case IRQ_TYPE_LEVEL_LOW: edge = GPIO_INT_LOW_LEV; @@@ -208,24 -180,11 +208,24 @@@ return -EINVAL; } - reg += GPIO_ICR1 + ((gpio_idx 0x10) 2); /* ICR1 or ICR2 */ - bit = gpio_idx 0xf; - val = readl(reg) ~(0x3 (bit 1)); - writel(val | (edge (bit 1)), reg); + if (GPIO_EDGE_SEL = 0) { + val = readl(port-base + GPIO_EDGE_SEL); + if (edge == GPIO_INT_BOTH_EDGES) - writel(val | (1 (gpio 0x1f)), ++ writel(val | (1 gpio_idx), + port-base + GPIO_EDGE_SEL); + else - writel(val ~(1 (gpio 0x1f)), ++ writel(val ~(1 gpio_idx), + port-base + GPIO_EDGE_SEL); + } + + if (edge != GPIO_INT_BOTH_EDGES) { - reg += GPIO_ICR1 + ((gpio 0x10) 2); /* lower or upper register */ - bit = gpio 0xf; ++ reg += GPIO_ICR1 + ((gpio_idx 0x10) 2); /* ICR1 or ICR2 */ ++ bit = gpio_idx 0xf; + val = readl(reg) ~(0x3 (bit 1)); + writel(val | (edge (bit 1)), reg); + } + - writel(1 (gpio 0x1f), port-base + GPIO_ISR); + writel(1 gpio_idx, port-base + GPIO_ISR); return 0; } pgpvSImW71mr8.pgp Description: PGP signature
[RFC][PATCHv2 0/3] Power sequences interpreter for pwm_backlight
This is a RFC since this patch largely drifted beyond its original goal of supporting one GPIO and one regulator for the pwm_backlight driver. The issue to address is that backlight power sequences, which were implemented using board-specific callbacks so far, could not be used with the device tree. This series of patches adds a small power sequence interpreter that allows to acquire and control regulators, GPIOs, and PWMs during sequences defined in the device tree. It is easy to use, low-footprint, and takes care of managing the resources that it acquires. The implementation is working and should be complete, but documentation is lacking. Also since the interpreter could be used by other drivers (which ones?), it may make sense to have it in a better place than drivers/video/backlight/. The tegra device tree nodes are just here as an example usage. Alexandre Courbot (3): Power sequences interpreter for device tree pwm-backlight: use power sequences tegra: add pwm backlight device tree nodes .../bindings/video/backlight/pwm-backlight.txt | 28 +- arch/arm/boot/dts/tegra20-ventana.dts | 31 +++ arch/arm/boot/dts/tegra20.dtsi | 2 +- drivers/video/backlight/Makefile | 2 +- drivers/video/backlight/power_seq.c| 298 + drivers/video/backlight/pwm_bl.c | 212 +++ include/linux/power_seq.h | 96 +++ include/linux/pwm_backlight.h | 37 ++- 8 files changed, 645 insertions(+), 61 deletions(-) create mode 100644 drivers/video/backlight/power_seq.c create mode 100644 include/linux/power_seq.h -- 1.7.11.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH V2 2/3] pwm_backlight: use power sequences
Make use of the power sequences specified in the device tree or platform data, if any. Signed-off-by: Alexandre Courbot acour...@nvidia.com --- .../bindings/video/backlight/pwm-backlight.txt | 28 ++- drivers/video/backlight/power_seq.c| 44 ++--- drivers/video/backlight/pwm_bl.c | 210 +++-- include/linux/pwm_backlight.h | 37 +++- 4 files changed, 239 insertions(+), 80 deletions(-) diff --git a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt index 1e4fc72..86c9253 100644 --- a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt +++ b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt @@ -2,7 +2,10 @@ pwm-backlight bindings Required properties: - compatible: pwm-backlight - - pwms: OF device-tree PWM specification (see PWM binding[0]) + - pwms: OF device-tree PWM specification (see PWM binding[0]). Exactly one PWM + must be specified + - pwm-names: a list of names for the PWM devices specified in the + pwms property (see PWM binding[0]) - brightness-levels: Array of distinct brightness levels. Typically these are in the range from 0 to 255, but any range starting at 0 will do. The actual brightness level (PWM duty cycle) will be interpolated @@ -10,10 +13,18 @@ Required properties: last value in the array represents a 100% duty cycle (brightest). - default-brightness-level: the default brightness level (index into the array defined by the brightness-levels property) + - power-on-sequence: Power sequence that will bring the backlight on. This + sequence must reference the PWM specified in the pwms property by its + name. It can also reference extra GPIOs or regulators, and introduce + delays between sequence steps + - power-off-sequence: Power sequence that will bring the backlight off. This + sequence must reference the PWM specified in the pwms property by its + name. It can also reference extra GPIOs or regulators, and introduce + delays between sequence steps Optional properties: - - pwm-names: a list of names for the PWM devices specified in the - pwms property (see PWM binding[0]) + - *-supply: a reference to a regulator used within a power sequence + - *-gpios: a reference to a GPIO used within a power sequence. [0]: Documentation/devicetree/bindings/pwm/pwm.txt @@ -22,7 +33,18 @@ Example: backlight { compatible = pwm-backlight; pwms = pwm 0 500; + pwm-names = backlight; brightness-levels = 0 4 8 16 32 64 128 255; default-brightness-level = 6; + power-supply = backlight_reg; + enable-gpios = gpio 6 0; + power-on-sequence = REGULATOR, power, 1, + DELAY, 10, + PWM, backlight, 1, + GPIO, enable, 1; + power-off-sequence = GPIO, enable, 0, +PWM, backlight, 0, +DELAY, 10, +REGULATOR, power, 0; }; diff --git a/drivers/video/backlight/power_seq.c b/drivers/video/backlight/power_seq.c index f54cb7d..f8737db 100644 --- a/drivers/video/backlight/power_seq.c +++ b/drivers/video/backlight/power_seq.c @@ -118,9 +118,9 @@ static int of_parse_power_seq_step(struct device *dev, struct property *prop, tmp_buf[sizeof(tmp_buf) - 6] = 0; strcat(tmp_buf, -gpios); ret = of_get_named_gpio(dev-of_node, tmp_buf, 0); - if (ret = 0) + if (ret = 0) { seq[cpt].value = ret; - else { + } else { if (ret != -EPROBE_DEFER) dev_err(dev, cannot get gpio \%s\\n, seq[cpt].id); @@ -218,26 +218,26 @@ power_seq *power_seq_build(struct device *dev, power_seq_resources *ress, seq-type = pseq-type; switch (pseq-type) { - case POWER_SEQ_REGULATOR: - case POWER_SEQ_GPIO: - case POWER_SEQ_PWM: - if (!(res = power_seq_find_resource(ress, pseq))) { - /* create resource node */ - res = devm_kzalloc(dev, sizeof(*res), - GFP_KERNEL); - if (!res) - return ERR_PTR(-ENOMEM); - memcpy(res-plat,
[RFC][PATCH V2 3/3] tegra: add pwm backlight device tree nodes
Signed-off-by: Alexandre Courbot acour...@nvidia.com --- arch/arm/boot/dts/tegra20-ventana.dts | 31 +++ arch/arm/boot/dts/tegra20.dtsi| 2 +- 2 files changed, 32 insertions(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/tegra20-ventana.dts b/arch/arm/boot/dts/tegra20-ventana.dts index be90544..c67d9e1 100644 --- a/arch/arm/boot/dts/tegra20-ventana.dts +++ b/arch/arm/boot/dts/tegra20-ventana.dts @@ -317,6 +317,37 @@ bus-width = 8; }; + backlight { + compatible = pwm-backlight; + brightness-levels = 0 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 255; + default-brightness-level = 12; + + pwms = pwm 2 500; + pwm-names = backlight; + power-supply = backlight_reg; + enable-gpios = gpio 28 0; + + power-on-sequence = REGULATOR, power, 1, + DELAY, 10, + PWM, backlight, 1, + GPIO, enable, 1; + power-off-sequence = GPIO, enable, 0, +PWM, backlight, 0, +DELAY, 10, +REGULATOR, power, 0; + }; + + backlight_reg: fixedregulator@176 { + compatible = regulator-fixed; + regulator-name = backlight_regulator; + regulator-min-microvolt = 180; + regulator-max-microvolt = 180; + gpio = gpio 176 0; + startup-delay-us = 0; + enable-active-high; + regulator-boot-off; + }; + sound { compatible = nvidia,tegra-audio-wm8903-ventana, nvidia,tegra-audio-wm8903; diff --git a/arch/arm/boot/dts/tegra20.dtsi b/arch/arm/boot/dts/tegra20.dtsi index 405d167..67a6cd9 100644 --- a/arch/arm/boot/dts/tegra20.dtsi +++ b/arch/arm/boot/dts/tegra20.dtsi @@ -123,7 +123,7 @@ status = disabled; }; - pwm { + pwm: pwm { compatible = nvidia,tegra20-pwm; reg = 0x7000a000 0x100; #pwm-cells = 2; -- 1.7.11.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH V2 1/3] power sequences interpreter for device tree
Some device drivers (panel backlights especially) need to follow precise sequences for powering on and off, involving gpios, regulators, PWMs with a precise powering order and delays to respect between each steps. These sequences are board-specific, and do not belong to a particular driver - therefore they have been performed by board-specific hook functions to far. With the advent of the device tree, we cannot rely of board-specific hooks anymore, but still need a way to implement these sequences in a portable manner. This patch introduces a simple interpreter that can execute such power sequences encoded either as platform data or within the device tree. Signed-off-by: Alexandre Courbot acour...@nvidia.com --- drivers/video/backlight/Makefile| 2 +- drivers/video/backlight/power_seq.c | 298 drivers/video/backlight/pwm_bl.c| 3 +- include/linux/power_seq.h | 96 4 files changed, 397 insertions(+), 2 deletions(-) create mode 100644 drivers/video/backlight/power_seq.c create mode 100644 include/linux/power_seq.h diff --git a/drivers/video/backlight/Makefile b/drivers/video/backlight/Makefile index a2ac9cf..6bff124 100644 --- a/drivers/video/backlight/Makefile +++ b/drivers/video/backlight/Makefile @@ -28,7 +28,7 @@ obj-$(CONFIG_BACKLIGHT_OMAP1) += omap1_bl.o obj-$(CONFIG_BACKLIGHT_PANDORA)+= pandora_bl.o obj-$(CONFIG_BACKLIGHT_PROGEAR) += progear_bl.o obj-$(CONFIG_BACKLIGHT_CARILLO_RANCH) += cr_bllcd.o -obj-$(CONFIG_BACKLIGHT_PWM)+= pwm_bl.o +obj-$(CONFIG_BACKLIGHT_PWM)+= pwm_bl.o power_seq.o obj-$(CONFIG_BACKLIGHT_DA903X) += da903x_bl.o obj-$(CONFIG_BACKLIGHT_DA9052) += da9052_bl.o obj-$(CONFIG_BACKLIGHT_MAX8925)+= max8925_bl.o diff --git a/drivers/video/backlight/power_seq.c b/drivers/video/backlight/power_seq.c new file mode 100644 index 000..f54cb7d --- /dev/null +++ b/drivers/video/backlight/power_seq.c @@ -0,0 +1,298 @@ +#include linux/err.h +#include linux/of_gpio.h +#include linux/device.h +#include linux/slab.h +#include linux/power_seq.h +#include linux/delay.h +#include linux/pwm.h +#include linux/regulator/consumer.h + +#define PWM_SEQ_TYPE(type) [POWER_SEQ_ ## type] = #type +static const char *pwm_seq_types[] = { + PWM_SEQ_TYPE(STOP), + PWM_SEQ_TYPE(DELAY), + PWM_SEQ_TYPE(REGULATOR), + PWM_SEQ_TYPE(PWM), + PWM_SEQ_TYPE(GPIO), +}; +#undef PWM_SEQ_TYPE + +static bool power_seq_step_run(struct power_seq_step *step) +{ + switch (step-type) { + case POWER_SEQ_DELAY: + msleep(step-parameter); + break; + case POWER_SEQ_REGULATOR: + if (step-parameter) + regulator_enable(step-resource-regulator); + else + regulator_disable(step-resource-regulator); + break; + case POWER_SEQ_PWM: + if (step-parameter) + pwm_enable(step-resource-pwm); + else + pwm_disable(step-resource-pwm); + break; + case POWER_SEQ_GPIO: + gpio_set_value_cansleep(step-resource-gpio, step-parameter); + break; + /* should never happen since we verify the data when building it */ + default: + return -EINVAL; + } + + return 0; +} + +int power_seq_run(power_seq *seq) +{ + int err; + + if (!seq) return 0; + + while (seq-type != POWER_SEQ_STOP) { + if ((err = power_seq_step_run(seq++))) { + return err; + } + } + + return 0; +} + +static int of_parse_power_seq_step(struct device *dev, struct property *prop, + struct platform_power_seq_step *seq, + int max_steps) +{ + void *value = prop-value; + void *end = prop-value + prop-length; + int slen, smax, cpt = 0, i, ret; + char tmp_buf[32]; + + while (value end cpt max_steps) { + smax = value - end; + slen = strnlen(value, end - value); + + /* Unterminated string / not a string? */ + if (slen = end - value) + goto invalid_seq; + + /* Find a matching sequence step type */ + for (i = 0; i POWER_SEQ_MAX; i++) + if (!strcmp(value, pwm_seq_types[i])) + break; + + if (i = POWER_SEQ_MAX) + goto unknown_step; + + value += slen + 1; + + seq[cpt].type = i; + switch (seq[cpt].type) { + case POWER_SEQ_DELAY: + /* integer parameter */ + seq[cpt].parameter = be32_to_cpup(value); + value += sizeof(__be32); + break; + case POWER_SEQ_REGULATOR: +
Re: [PATCH] pwm-backlight: add regulator and GPIO support
On 07/09/2012 02:19 PM, Jingoo Han wrote: I couldn't agree with Stephen Warren more. Could you support DT and non-DT case for backwards compatibility? Both cases are handled in the new version I just sent. I hope all other concerns have also been addressed properly. If I forgot something please ping me. Alex. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC 1/2] kvm vcpu: Note down pause loop exit
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com Noting pause loop exited vcpu helps in filtering right candidate to yield. Yielding to same vcpu may result in more wastage of cpu. From: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/include/asm/kvm_host.h |7 +++ arch/x86/kvm/svm.c |1 + arch/x86/kvm/vmx.c |1 + arch/x86/kvm/x86.c |4 +++- 4 files changed, 12 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index db7c1f2..857ca68 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -484,6 +484,13 @@ struct kvm_vcpu_arch { u64 length; u64 status; } osvw; + + /* Pause loop exit optimization */ + struct { + bool pause_loop_exited; + bool dy_eligible; + } plo; + }; struct kvm_lpage_info { diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f75af40..a492f5d 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3264,6 +3264,7 @@ static int interrupt_window_interception(struct vcpu_svm *svm) static int pause_interception(struct vcpu_svm *svm) { + svm-vcpu.arch.plo.pause_loop_exited = true; kvm_vcpu_on_spin((svm-vcpu)); return 1; } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 32eb588..600fb3c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4945,6 +4945,7 @@ out: static int handle_pause(struct kvm_vcpu *vcpu) { skip_emulated_instruction(vcpu); + vcpu-arch.plo.pause_loop_exited = true; kvm_vcpu_on_spin(vcpu); return 1; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index be6d549..07dbd14 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5331,7 +5331,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (req_immediate_exit) smp_send_reschedule(vcpu-cpu); - + vcpu-arch.plo.pause_loop_exited = false; kvm_guest_enter(); if (unlikely(vcpu-arch.switch_db_regs)) { @@ -6168,6 +6168,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) BUG_ON(vcpu-kvm == NULL); kvm = vcpu-kvm; + vcpu-arch.plo.pause_loop_exited = false; + vcpu-arch.plo.dy_eligible = true; vcpu-arch.emulate_ctxt.ops = emulate_ops; if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu)) vcpu-arch.mp_state = KVM_MP_STATE_RUNNABLE; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC 0/2] kvm: Improving directed yield in PLE handler
Currently Pause Looop Exit (PLE) handler is doing directed yield to a random VCPU on PL exit. Though we already have filtering while choosing the candidate to yield_to, we can do better. Problem is, for large vcpu guests, we have more probability of yielding to a bad vcpu. We are not able to prevent directed yield to same guy who has done PL exit recently, who perhaps spins again and wastes CPU. Fix that by keeping track of who has done PL exit. So The Algorithm in series give chance to a VCPU which has: (a) Not done PLE exit at all (probably he is preempted lock-holder) (b) VCPU skipped in last iteration because it did PL exit, and probably has become eligible now (next eligible lock holder) Future enhancemnets: (1) Currently we have a boolean to decide on eligibility of vcpu. It would be nice if I get feedback on guest (32 vcpu) whether we can improve better with integer counter. (with counter = say f(log n )). (2) We have not considered system load during iteration of vcpu. With that information we can limit the scan and also decide whether schedule() is better. [ I am able to use #kicked vcpus to decide on this But may be there are better ideas like information from global loadavg.] (3) We can exploit this further with PV patches since it also knows about next eligible lock-holder. Summary: There is a huge improvement for moderate / no overcommit scenario for kvm based guest on PLE machine (which is difficult ;) ). Result: Base : kernel 3.5.0-rc5 with Rik's Ple handler fix Machine : Intel(R) Xeon(R) CPU X7560 @ 2.27GHz, 4 numa node, 256GB RAM, 32 core machine Host: enterprise linux gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) with test kernels Guest: fedora 16 with 32 vcpus 8GB memory. Benchmarks: 1) kernbench: kernbench-0.5 (kernbench -f -H -M -o 2*vcpu) Very first run in kernbench is omitted. 2) sysbench: 0.4.12 sysbench --test=oltp --db-driver=pgsql prepare sysbench --num-threads=2*vcpu --max-requests=10 --test=oltp --oltp-table-size=50 --db-driver=pgsql --oltp-read-only run Note that driver for this pgsql. 3) ebizzy: release 0.3 cmd: ebizzy -S 120 1) kernbench (time in sec lesser is better) +---+---+---++---+ base_rikstdev patched stdev %improve +---+---+---++---+ 1x 49.2300 1.0171 38.3792 1.3659 28.27261% 2x 91.9358 1.7768 85.8842 1.6654 7.04623% +---+---+---++---+ 2) sysbench (time in sec lesser is better) +---+---+---++---+ base_rikstdev patched stdev %improve +---+---+---++---+ 1x 12.1623 0.0942 12.1674 0.3126-0.04192% 2x 14.3069 0.8520 14.1879 0.6811 0.83874% +---+---+---++---+ Note that 1x scenario differs in only third decimal place and degradation/improvemnet for sysbench will not be seen even with higher confidence interval. 3) ebizzy (records/sec more is better) +---+---+---++---+ base_rikstdev patched stdev %improve +---+---+---++---+ 1x 1129.2500 28.67932316.625053.0066 105.14722% 2x 1892.3750 75.11122386.5000 168.8033 26.11137% +---+---+---++---+ kernbench 1x: 4 fast runs = 12 runs avg kernbench 2x: 4 fast runs = 12 runs avg sysbench 1x: 8runs avg sysbench 2x: 8runs avg ebizzy 1x: 8runs avg ebizzy 2x: 8runs avg Thanks Vatsa and Srikar for brainstorming discussions regarding optimizations. Raghavendra K T (2): kvm vcpu: Note down pause loop exit kvm PLE handler: Choose better candidate for directed yield arch/s390/include/asm/kvm_host.h |5 + arch/x86/include/asm/kvm_host.h |9 - arch/x86/kvm/svm.c |1 + arch/x86/kvm/vmx.c |1 + arch/x86/kvm/x86.c | 18 +- virt/kvm/kvm_main.c |3 +++ 6 files changed, 35 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC 2/2] kvm PLE handler: Choose better candidate for directed yield
From: Raghavendra K T raghavendra...@linux.vnet.ibm.com Currently PLE handler can repeatedly do a directed yield to same vcpu that has recently done PL exit. This can degrade the performance Try to yield to most eligible guy instead, by alternate yielding. Precisely, give chance to a VCPU which has: (a) Not done PLE exit at all (probably he is preempted lock-holder) (b) VCPU skipped in last iteration because it did PL exit, and probably has become eligible now (next eligible lock holder) Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/s390/include/asm/kvm_host.h |5 + arch/x86/include/asm/kvm_host.h |2 +- arch/x86/kvm/x86.c | 14 ++ virt/kvm/kvm_main.c |3 +++ 4 files changed, 23 insertions(+), 1 deletions(-) diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index dd17537..884f2c4 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -256,5 +256,10 @@ struct kvm_arch{ struct gmap *gmap; }; +static inline bool kvm_arch_vcpu_check_and_update_eligible(struct kvm_vcpu *v) +{ + return true; +} + extern int sie64a(struct kvm_s390_sie_block *, u64 *); #endif diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 857ca68..ce01db3 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -962,7 +962,7 @@ extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn); void kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err); int kvm_is_in_guest(void); - +bool kvm_arch_vcpu_check_and_update_eligible(struct kvm_vcpu *vcpu); void kvm_pmu_init(struct kvm_vcpu *vcpu); void kvm_pmu_destroy(struct kvm_vcpu *vcpu); void kvm_pmu_reset(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 07dbd14..24ceae8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6623,6 +6623,20 @@ bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu) kvm_x86_ops-interrupt_allowed(vcpu); } +bool kvm_arch_vcpu_check_and_update_eligible(struct kvm_vcpu *vcpu) +{ + bool eligible; + + eligible = !vcpu-arch.plo.pause_loop_exited || + (vcpu-arch.plo.pause_loop_exited +vcpu-arch.plo.dy_eligible); + + if (vcpu-arch.plo.pause_loop_exited) + vcpu-arch.plo.dy_eligible = !vcpu-arch.plo.dy_eligible; + + return eligible; +} + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7e14068..519321a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1595,6 +1595,9 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) continue; if (waitqueue_active(vcpu-wq)) continue; + if (!kvm_arch_vcpu_check_and_update_eligible(vcpu)) { + continue; + } if (kvm_vcpu_yield_to(vcpu)) { kvm-last_boosted_vcpu = i; yielded = 1; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Antw: Re: /sys and access(2): Correctly implemented?
Ryan Mallon rmal...@gmail.com schrieb am 09.07.2012 um 01:24 in Nachricht 4ffa16b6.9050...@gmail.com: On 06/07/12 16:27, Ulrich Windl wrote: Hi! Recently I found a problem with the command (kernel 3.0.34-0.7-default from SLES 11 SP2, run as root): test -r $file cat $file emitting Permission denied Investigating, I found that test actually uses access() to check for permissions. Unfortunately there are some files in /sys that have write-only permission bits set (e.g. /sys/devices/system/cpu/probe). ~ # ll /sys/devices/system/cpu/probe --w--- 1 root root 4096 Jun 29 12:43 /sys/devices/system/cpu/probe ~ # F=/sys/devices/system/cpu/probe ~ # test $F cat $F cat: /sys/devices/system/cpu/probe: Permission denied Looks like you have a typo here, I think you wanted test -r $F, not test $F, the latter will just evaluate $F as an expression which will be true, and so you get the permission denied error running cat. Hi! You are right: It's a typo, but only in the message; the actual test was done correctly, and the outcome is quite the same. Using test -r $F on a write-only sysfs file correctly returns false on my machine (Ubuntu 10.04.4 LTS/2.6.32-41-generic). Not here, unfortunately: # ll /sys/devices/system/cpu/probe --w--- 1 root root 4096 Jul 2 11:52 /sys/devices/system/cpu/probe # F=/sys/devices/system/cpu/probe # test -r $F cat $F cat: /sys/devices/system/cpu/probe: Permission denied # uname -a Linux h07 2.6.32.59-0.3-default #1 SMP 2012-04-27 11:14:44 +0200 x86_64 x86_64 x86_64 GNU/Linux Regards, Ulrich -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] pwm: Use pr_* functions in pwm-samsung.c file
On Fri, Jul 06, 2012 at 02:43:50PM +0530, Sachin Kamat wrote: Replace printk with pr_* functions to avoid checkpatch warnings. Signed-off-by: Sachin Kamat sachin.ka...@linaro.org --- drivers/pwm/pwm-samsung.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) Applied, thanks, Thierry diff --git a/drivers/pwm/pwm-samsung.c b/drivers/pwm/pwm-samsung.c index 35fa0e8..d103865 100644 --- a/drivers/pwm/pwm-samsung.c +++ b/drivers/pwm/pwm-samsung.c @@ -11,6 +11,8 @@ * the Free Software Foundation; either version 2 of the License. */ +#define pr_fmt(fmt) pwm-samsung: fmt + #include linux/export.h #include linux/kernel.h #include linux/platform_device.h @@ -340,13 +342,13 @@ static int __init pwm_init(void) clk_scaler[1] = clk_get(NULL, pwm-scaler1); if (IS_ERR(clk_scaler[0]) || IS_ERR(clk_scaler[1])) { - printk(KERN_ERR %s: failed to get scaler clocks\n, __func__); + pr_err(failed to get scaler clocks\n); return -EINVAL; } ret = platform_driver_register(s3c_pwm_driver); if (ret) - printk(KERN_ERR %s: failed to add pwm driver\n, __func__); + pr_err(failed to add pwm driver\n); return ret; } -- 1.7.4.1 pgpsYSC8a6JMF.pgp Description: PGP signature
Re: [PATCHv3] pwm_backlight: pass correct brightness to callback
On Mon, Jul 09, 2012 at 03:04:23PM +0900, Alexandre Courbot wrote: pwm_backlight_update_status calls the notify() and notify_after() callbacks before and after applying the new PWM settings. However, if brightness levels are used, the brightness value will be changed from the index into the levels array to the PWM duty cycle length before being passed to notify_after(), which results in inconsistent behavior. Signed-off-by: Alexandre Courbot acour...@nvidia.com --- drivers/video/backlight/pwm_bl.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) Applied, with a minor stylistic fixup adding a blank line after the duty_cycle variable declaration. Thanks. Thierry diff --git a/drivers/video/backlight/pwm_bl.c b/drivers/video/backlight/pwm_bl.c index 057389d..be48517 100644 --- a/drivers/video/backlight/pwm_bl.c +++ b/drivers/video/backlight/pwm_bl.c @@ -54,14 +54,17 @@ static int pwm_backlight_update_status(struct backlight_device *bl) pwm_config(pb-pwm, 0, pb-period); pwm_disable(pb-pwm); } else { + int duty_cycle; if (pb-levels) { - brightness = pb-levels[brightness]; + duty_cycle = pb-levels[brightness]; max = pb-levels[max]; + } else { + duty_cycle = brightness; } - brightness = pb-lth_brightness + - (brightness * (pb-period - pb-lth_brightness) / max); - pwm_config(pb-pwm, brightness, pb-period); + duty_cycle = pb-lth_brightness + + (duty_cycle * (pb-period - pb-lth_brightness) / max); + pwm_config(pb-pwm, duty_cycle, pb-period); pwm_enable(pb-pwm); } -- 1.7.11.1 pgpGgEqB91z2W.pgp Description: PGP signature
Re: [PATCH RFC 1/2] kvm vcpu: Note down pause loop exit
On 07/09/2012 11:50 AM, Raghavendra K T wrote: Signed-off-by: Raghavendra K Traghavendra...@linux.vnet.ibm.com Noting pause loop exited vcpu helps in filtering right candidate to yield. Yielding to same vcpu may result in more wastage of cpu. From: Raghavendra K Traghavendra...@linux.vnet.ibm.com --- Oops. Sorry some how sign-off and from interchanged.. interchanged -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fat: Support fallocate on fat.
Hi. Ogawa. 2012/7/8, OGAWA Hirofumi hirof...@mail.parknet.co.jp: Namjae Jeon linkinj...@gmail.com writes: +/* + * preallocate space for a file. This implements fat's fallocate file + * operation, which gets called from sys_fallocate system call. User + * space requests len bytes at offset.If FALLOC_FL_KEEP_SIZE is set + * we just allocate clusters without zeroing them out.Otherwise we + * allocate and zero out clusters via an expanding truncate. + */ +static long fat_fallocate(struct file *file, int mode, +loff_t offset, loff_t len) +{ +int err = 0; +struct inode *inode = file-f_mapping-host; +int cluster, nr_cluster, fclus, dclus, free_bytes, nr_bytes; +struct super_block *sb = inode-i_sb; +struct msdos_sb_info *sbi = MSDOS_SB(sb); What happens if called for directory? And does this guarantee it never expose the uninitialized data userland? It cannot be called for directory because in do_fallocate (which calls fat_fallocate), there is check to open the file in write mode. If it is opened in read only mode, it returns bad file descriptor: - do_fallocate() { ... .. if (!(file-f_mode FMODE_WRITE)) return -EBADF; .. - We cannot open a directory in write mode. So fallocate can never be called for a directory. As long as user appends data to file (instead of seeking to an offset greater than inode-i_size and writing to it), it can guarantee. But if user use random offset, it can not.. +/* No support for hole punch or other fallocate flags. */ +if (mode ~FALLOC_FL_KEEP_SIZE) +return -EOPNOTSUPP; +if ((offset + len) = MSDOS_I(inode)-mmu_private) { +fat_msg(sb, KERN_ERR, +fat_fallocate():Blocks already allocated); +return -EINVAL; +} Please don't output any message by user error. And EINVAL is right behavior if (offset + len) allocated size? Sounds like strange design. Okay, I will remove message. and I will change return sucess instead of EINVAL. +if ((mode FALLOC_FL_KEEP_SIZE)) { +/* First compute the number of clusters to be allocated */ +if (inode-i_size 0) { +err = fat_get_cluster(inode, FAT_ENT_EOF, +fclus, dclus); +if (err 0) { +fat_msg(sb, KERN_ERR, + fat_fallocate():fat_get_cluster() error); Use %s and __func__. And looks like the error is normal (e.g. ENOSPC), so I don't see why it needs to report. okay, I will remove it. [...] +/* + * calculate i_blocks and mmu_private from the actual number of + * allocated clusters instead of doing it from file size.This ensures + * that the preallocated disk space using FALLOC_FL_KEEP_SIZE is + * persistent across remounts and writes go into the allocated clusters. + */ +fat_calc_dir_size(inode); Looks like the wrong. If you didn't initialize preallocated space, the data never be exposed to userland. It is security bug. As explained above, if we do append write instead of seeking into a random offset, there is no security risk. The main disadvantage with initializing the preallocated space (as is done in case of without FALLOC_FL_KEEP_SIZE ) is it takes long time for bigger allocation sizes. It took ~70 seconds to preallocate 2GB on our target if FALLOC_FL_KEEP_SIZE is not set. Thanks. inode-i_blocks = ((inode-i_size + (sbi-cluster_size - 1)) ~((loff_t)sbi-cluster_size - 1)) 9; +MSDOS_I(inode)-mmu_private = inode-i_size; +/* restore i_size */ +inode-i_size = le32_to_cpu(de-size); fat_time_fat2unix(sbi, inode-i_mtime, de-time, de-date, 0); if (sbi-options.isvfat) { -- OGAWA Hirofumi hirof...@mail.parknet.co.jp -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net-next 1/2] r8169: support RTL8106E
Francois, what would you like me to do with these two patches? I haven't seen full ACKs from you yet. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] ieee802154: verify packet size before trying to allocate it
From: Sasha Levin levinsasha...@gmail.com Date: Mon, 2 Jul 2012 13:29:55 +0200 Currently when sending data over datagram, the send function will attempt to allocate any size passed on from the userspace. We should make sure that this size is checked and limited. We'll limit it to the MTU of the device, which is checked later anyway. Signed-off-by: Sasha Levin levinsasha...@gmail.com Applied. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86, boot: Optimize the elf header handling.
H. Peter Anvin h...@zytor.com writes: On 07/01/2012 01:40 PM, Eric W. Biederman wrote: So I have tracked down part of the crazyness. CONFIG_RODATA actually uses 2MB alignment, making -z max_page_size=4096 a bit questionable. Questionable how? It's not really like it matters since we're not going to mmap the ELF. Questionable as in the current elf loader in misc.c relies on the fact that there is an almost a fixed offset between physical addresses and file offsets for all of the PT_LOAD segments in the Elf header. In fact CONFIG_RODATA CONFIG_X86_64 CONFIG_SMP in combination with -z max_page_size=4096 fails to boot. The Elf loader in misc.c starts coping from lower addresses to higher addresses, instead of higher addresses to lower and that fails miserably. But -z max_page_size=4096 is not the problem the ELF loader is. Eric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Question] sched/rt_mutex: re-enqueue_task on rt_mutex_setprio()
On Mon, 2012-07-09 at 09:50 +0900, Namhyung Kim wrote: On Sat, 07 Jul 2012 21:29:19 -0400, Steven Rostedt wrote: On Sat, 2012-07-07 at 14:44 +0900, Namhyung Kim wrote: Hi, I have a question on the code below: void rt_mutex_setprio(struct task_struct *p, int prio) { ... if (on_rq) enqueue_task(rq, p, oldprio prio ? ENQUEUE_HEAD : 0); When enqueueing @p with new @prio, it seems put @p at the head of a rq if appropriate. I guess it's the case of boosting @p with higher priority, right? Actually, no. We put @p at the head of the queue when unboosting. If a task is going from a high priority into a lower priority, it is still treated as important for that priority, and is put to the front of the queue (it was just higher than everything else on that queue). But if we are boosting a task from a low priority, why put it to the head of other tasks of its new priority, when those tasks were just higher than this task, and this task is now just an equal. Thanks for the explanation. (Isn't it worth getting commented?) :) Possibly, note that this part is well spec'ed by POSIX, see http://pubs.opengroup.org/onlinepubs/009695299/functions/xsh_chap02_08.html SCHED_FIFO.8 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/4] x86 boot: Jump to the entry point address in the elf header.
Since we have the kernel's entry point stored in the ELF header use it, and stop hardcoding the value. Signed-off-by: Eric W. Biederman ebied...@xmission.com --- arch/x86/boot/compressed/head_32.S |2 +- arch/x86/boot/compressed/head_64.S |2 +- arch/x86/boot/compressed/misc.c| 16 +--- 3 files changed, 11 insertions(+), 9 deletions(-) diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S index c85e3ac..1b15e2c 100644 --- a/arch/x86/boot/compressed/head_32.S +++ b/arch/x86/boot/compressed/head_32.S @@ -211,7 +211,7 @@ relocated: * Jump to the decompressed kernel. */ xorl%ebx, %ebx - jmp *%ebp + jmp *%eax /* * Stack and heap for uncompression diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S index 87e03a1..9b8d782 100644 --- a/arch/x86/boot/compressed/head_64.S +++ b/arch/x86/boot/compressed/head_64.S @@ -337,7 +337,7 @@ relocated: /* * Jump to the decompressed kernel. */ - jmp *%rbp + jmp *%rax .data gdt: diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index 7116dcb..fc96c3e 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -273,7 +273,7 @@ static void error(char *x) asm(hlt); } -static void parse_elf(void *output) +static void *parse_elf(void *output) { #ifdef CONFIG_X86_64 Elf64_Ehdr ehdr; @@ -323,13 +323,15 @@ static void parse_elf(void *output) } free(phdrs); + return output + (ehdr.e_entry - LOAD_PHYSICAL_ADDR); } -asmlinkage void decompress_kernel(void *rmode, memptr heap, - unsigned char *input_data, - unsigned long input_len, - unsigned char *output) +asmlinkage void *decompress_kernel(void *rmode, memptr heap, + unsigned char *input_data, + unsigned long input_len, + unsigned char *output) { + void *entry; real_mode = rmode; if (cmdline_find_option_bool(quiet)) @@ -372,8 +374,8 @@ asmlinkage void decompress_kernel(void *rmode, memptr heap, if (!quiet) putstr(\nDecompressing Linux... ); decompress(input_data, input_len, NULL, NULL, output, NULL, error); - parse_elf(output); + entry = parse_elf(output); if (!quiet) putstr(done.\nBooting the kernel.\n); - return; + return entry; } -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/4] x86 boot: Optimize the elf header handling.
Create a space for the elf headers at the begginng of the kernels image in memory. - Rework arch/x86/kernel/vmlinux.lds.S so that we allow room for the ELF header in the loaded image. This removes the need in the ELF executalbe to insert padding between the ELf headers and the data of the first program segment. This reduces the size of vmlinux by 2MB on x86_64. This removes an overlap of the ELF header and kernel text in arch/x86/boot/compressed that required code to moved. - Move the symbol _text outside of the .text section, and add the fixups in relocs.c to add relocations against _text. This allows the symbol _text to come before the ELF header and effectively including the ELF header in the text section. If this isn't done _text moves 344 bytes in memory on x86_64 and creates subtle breakage in routines like cleanup_highmap, which assume _text is at the beginning of the kernels memory and that _text is 4K+ aligned. The current usage of the symbol _text is already that _text specifies the beginning of the kernel's memory and that _stext specifies where the kernel's code actually starts. Signed-off-by: Eric W. Biederman ebied...@xmission.com --- arch/x86/kernel/vmlinux.lds.S |9 + arch/x86/tools/relocs.c |1 + 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 22a1530..d6e1a44 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -68,7 +68,7 @@ jiffies_64 = jiffies; #endif PHDRS { - text PT_LOAD FLAGS(5); /* R_E */ + text PT_LOAD FLAGS(5) FILEHDR; /* R_E */ data PT_LOAD FLAGS(6); /* RW_ */ #ifdef CONFIG_X86_64 #ifdef CONFIG_SMP @@ -82,16 +82,17 @@ PHDRS { SECTIONS { #ifdef CONFIG_X86_32 -. = LOAD_OFFSET + LOAD_PHYSICAL_ADDR; + _text = LOAD_OFFSET + LOAD_PHYSICAL_ADDR; +. = LOAD_OFFSET + LOAD_PHYSICAL_ADDR + SIZEOF_HEADERS; phys_startup_32 = startup_32 - LOAD_OFFSET; #else -. = __START_KERNEL; + _text = __START_KERNEL; +. = __START_KERNEL + SIZEOF_HEADERS; phys_startup_64 = startup_64 - LOAD_OFFSET; #endif /* Text and read-only data */ .text : AT(ADDR(.text) - LOAD_OFFSET) { - _text = .; /* bootstrapping code */ HEAD_TEXT #ifdef CONFIG_X86_32 diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c index 5a1847d..6f32b7b 100644 --- a/arch/x86/tools/relocs.c +++ b/arch/x86/tools/relocs.c @@ -72,6 +72,7 @@ static const char * const sym_regex_kernel[S_NSYMTYPES] = { __end_rodata| __initramfs_start| (jiffies|jiffies_64)| + _text| _end)$ }; -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: Tree for July 9
Hi all, Changes since 20120706: I have not done the powerpc allyesconfig build today as it is too broken. Undropped tree: gpio-lw The jdelvare-hwmon tree lost its conflict. The v4l-dvb tree gained a build failure so I used the version from next-20120706. The infiniband tree lost its build failure but gained a conflict against Linus' tree. The l2-mtd tree lost its conflict. The input-mt tree lost its conflict. The mfd tree gained a build failure so I used the version from next-20120706. The dt-rh tree lost its conflict. The gpio-lw tree lost its build failure. The arm-soc tree gained a conflict against the gpio-lw tree. I have still reverted 3 commits from the signal tree at the request of the arm maintainer. The akpm tree lost a commit that turned up elsewhere. I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use git pull to do so as that will try to merge the new linux-next release with the old one. You should use git fetch as mentioned in the FAQ on the wiki (see below). You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the final fixups (if any), it is also built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and allyesconfig (minus CONFIG_PROFILE_ALL_BRANCHES - this fails its final link) and i386, sparc, sparc64 and arm defconfig. These builds also have CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and CONFIG_DEBUG_INFO disabled when necessary. Below is a summary of the state of the merge. We are up to 196 trees (counting Linus' and 26 trees of patches pending for Linus' tree), more are welcome (even if they are currently empty). Thanks to those who have contributed, and to those who haven't, please do. Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. There is a wiki covering stuff to do with linux-next at http://linux.f-seidel.de/linux-next/pmwiki/ . Thanks to Frank Seidel. -- Cheers, Stephen Rothwells...@canb.auug.org.au $ git checkout master $ git reset --hard stable Merging origin/master (8c84bf4 Merge branch 'for-3.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup) Merging fixes/master (9023a40 Merge tag 'mmc-fixes-for-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc) Merging kbuild-current/rc-fixes (f8f5701 Linux 3.5-rc1) Merging arm-current/fixes (09b2ad1 ARM: fix warning caused by wrongly typed arm_dma_limit) Merging m68k-current/for-linus (d8ce726 m68k: Use generic strncpy_from_user(), strlen_user(), and strnlen_user()) Merging powerpc-merge/merge (2f584a1 powerpc/kvm: sldi should be sld) Merging sparc/master (6a8ead0 sparc32: Remove superfluous extern declarations for prom_*() functions) Merging net/master (9e85a6f Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux) Merging sound-current/for-linus (9e9b594 ALSA: usb-audio: Fix the first PCM interface assignment) Merging pci-current/for-linus (314489b Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc) Merging wireless/master (76cf5c7 iwlegacy: don't mess up the SCD when removing a key) Merging driver-core.current/driver-core-linus (68b6507 kmsg: make sure all messages reach a newly registered boot console) Merging tty.current/tty-linus (6b16351 Linux 3.5-rc4) Merging usb.current/usb-linus (b086b6b USB: cdc-wdm: fix lockup on error in wdm_read) Merging staging.current/staging-linus (6887a41 Linux 3.5-rc5) Merging char-misc.current/char-misc-linus (6b16351 Linux 3.5-rc4) Merging input-current/for-linus (9b7e31b Input: request threaded-only IRQs with IRQF_ONESHOT) Merging md-current/for-linus (1068411 md/raid10: fix careless build error) Merging audit-current/for-linus (c158a35 audit: no leading space in audit_log_d_path prefix) Merging crypto-current/master (c475c06 hwrng: atmel-rng - fix data valid check) Merging ide/master (39a50b4 Merge branch 'hfsplus') Merging dwmw2/master (244dc4e Merge git://git.infradead.org/users/dwmw2/random-2.6) Merging sh-current/sh-fixes-for-linus (64941d8 sh: Fix up se7721 GPIOLIB=y build warnings.) Merging irqdomain-current/irqdomain/merge (15e06bf irqdomain: Fix debugfs formatting) Merging devicetree-current/devicetree/merge (4e8383b of: release node fix for of_parse_phandle_with_args)
[PATCH 3/4] x86 boot: When building vmlinux.bin properly precompute the memory image
The ELF loader in arch/x86/boot/compressed/misc.c is extremely fragile, as it copies the ELF executable over itself to put the code and data in their proper place. Squeezing unneeded space out of vmlinux by passing -z max-page-size 4096 to ld was enough to render the kernel unbootable. I explored creating a flush function for our current crop of kernel decompressors. While that works it has the very unfortunate side effect of needing a much larger BOOT_HEAP_SIZE. A couple of our supported decompressors in that mode malloc 32MB for use during decompression. The other solution is to return to the original design where we created a file known as vmlinux.bin with exactly what we wanted in memory and compressed that. At this point in time there are complications to going back to the original design. - We need to preserve the ELF headers inside the compresed image file for Xen and other interesting bootloaders that open up the bzImage and boot the ELF executable contained inside. - ld will not uniformly produce a file where the file offsets have a constant offset from the in memory addresses. In particular combinations of CONFIG_RODATA and CLONFIG_x86_64 CONFIG_SMP play games with 2MB alignments and the virtual address of functions that cause ld to emit valid ELF executables that do not have a fixed differents betwen file offset and loaded physical address making the ELF executable something that must be procecessed to get an in memory image. - The old solution to creating a memory image objcopy -O binary comes very close but it always strips the ELF header even when the ELF header is explicitly made part of the ELF file. Since all of the prebuilt tools don't work I have written a small program mkelfbin, that generates a memory image by loading an ELF executable into an in memory array. Then the ELF program headers offset fields are adjusted to reflect where in the memory image each program header is referring to. By design this results in program headers with a fixes offset between the file offset and the physical memory address with the file be loaded in memory. With the compressed data being a proper memory image misc.c no longer needs an ELF loader or dangerous copies over itself so those are removed. The result is a simpler more robust boot process, that still retains all of the modern bells and whistles. Signed-off-by: Eric W. Biederman ebied...@xmission.com --- arch/x86/boot/compressed/Makefile | 10 +- arch/x86/boot/compressed/misc.c | 52 +- arch/x86/boot/compressed/misc.h |8 + arch/x86/boot/compressed/mkelfbin.c | 323 +++ 4 files changed, 343 insertions(+), 50 deletions(-) create mode 100644 arch/x86/boot/compressed/mkelfbin.c diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index e398bb5..67b9ae4 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -21,7 +21,7 @@ GCOV_PROFILE := n LDFLAGS := -m elf_$(UTS_MACHINE) LDFLAGS_vmlinux := -T -hostprogs-y:= mkpiggy +hostprogs-y:= mkpiggy mkelfbin HOST_EXTRACFLAGS += -I$(srctree)/tools/include VMLINUX_OBJS = $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \ @@ -36,9 +36,11 @@ $(obj)/vmlinux: $(VMLINUX_OBJS) FORCE $(call if_changed,ld) @: -OBJCOPYFLAGS_vmlinux.bin := -R .comment -S -$(obj)/vmlinux.bin: vmlinux FORCE - $(call if_changed,objcopy) +quiet_cmd_mkelfbin = MKELFBIN $@ + cmd_mkelfbin = $(obj)/mkelfbin $ $@ || ( rm -f $@ ; false ) + +$(obj)/vmlinux.bin: vmlinux $(obj)/mkelfbin FORCE + $(call if_changed,mkelfbin) targets += vmlinux.bin.all vmlinux.relocs diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index fc96c3e..cb374ff 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -275,55 +275,15 @@ static void error(char *x) static void *parse_elf(void *output) { -#ifdef CONFIG_X86_64 - Elf64_Ehdr ehdr; - Elf64_Phdr *phdrs, *phdr; -#else - Elf32_Ehdr ehdr; - Elf32_Phdr *phdrs, *phdr; -#endif - void *dest; - int i; + ehdr_t *ehdr = output; - memcpy(ehdr, output, sizeof(ehdr)); - if (ehdr.e_ident[EI_MAG0] != ELFMAG0 || - ehdr.e_ident[EI_MAG1] != ELFMAG1 || - ehdr.e_ident[EI_MAG2] != ELFMAG2 || - ehdr.e_ident[EI_MAG3] != ELFMAG3) { + if (ehdr-e_ident[EI_MAG0] != ELFMAG0 || + ehdr-e_ident[EI_MAG1] != ELFMAG1 || + ehdr-e_ident[EI_MAG2] != ELFMAG2 || + ehdr-e_ident[EI_MAG3] != ELFMAG3) error(Kernel is not a valid ELF file); - return; - } - - if (!quiet) - putstr(Parsing ELF... ); - - phdrs = malloc(sizeof(*phdrs) * ehdr.e_phnum); - if (!phdrs) - error(Failed to allocate space for phdrs); - - memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum); -
[PATCH 4/4] x86 boot: Tell ld the kernel doesn't want 2MB file offset alignment.
By default ld uses 2MB pages and aligns our 3 program segments in the file on 2MB boundaries, creating unnecessarily large uncompressed vmlinux files. Solve this by passing -z max-page-size 4096 to ld. In my test x86_64 SMP test configuration with CONFIG_DEBUG_RODATA enabled, this reduces the size of vmlinux by roughly 5MB from 15141772 bytes to 10210188 bytes. Signed-off-by: Eric W. Biederman ebied...@xmission.com --- arch/x86/Makefile |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 1f25214..b5b31c3 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -120,7 +120,7 @@ avx_instr := $(call as-instr,vxorps %ymm0$(comma)%ymm1$(comma)%ymm2,-DCONFIG_AS_ KBUILD_AFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) $(avx_instr) KBUILD_CFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) $(avx_instr) -LDFLAGS := -m elf_$(UTS_MACHINE) +LDFLAGS := -m elf_$(UTS_MACHINE) -z max-page-size=4096 # Speed up the build KBUILD_CFLAGS += -pipe -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] ubi: Fix bad PEBs reserve caclulation
2012/7/7 Shmulik Ladkani shmulik.ladk...@gmail.com: Many thanks for testing. Could you please verify the crash only occurs with the patch? Can you provide the vmlinux matching this oops, so I may analyze the exact null dereferencing point? It seems to be somewhere in ubi_wl_init, however the patch seems not to affect these parts of ubi... Hi ! I can't reproduce it... Maybe the problem was between the chair and the keyboard. Anyway, if I ran into it again, I'll let you know. Richard. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] x86 boot: Jump to the entry point address in the elf header.
And Peter no rush on these. I have just finished testing and I am pushing the changes out before I forget them. Moving the Elf loader earlier to compile time makes the code a lot more robust. Eric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out
From: cj...@qca.qualcomm.com Date: Wed, 4 Jul 2012 10:51:48 +0800 some people report atl1c could cause system hang with following kernel trace info: --- WARNING: at.../net/sched/sch_generic.c:258 dev_watchdog+0x1db/0x1d0() ... NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out ... --- This is caused by netif_stop_queue calling when cable Link is down. So remove netif_stop_queue, because link_watch will take it over. Signed-off-by: xiong xi...@qca.qualcomm.com Cc: stable sta...@vger.kernel.org Signed-off-by: Cloud Ren cj...@qca.qualcomm.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] x86, boot: Optimize the elf header handling.
Tejun Heo t...@kernel.org writes: Hello, guys. On Sun, Jul 01, 2012 at 11:37:22AM -0700, H. Peter Anvin wrote: If we don't need it, I think we can use -z max-page-size=4096, but we use the PMD alignment for percpu on x86-64; Tejun, does that apply to the .data..percpu section in the executable as well? I don't think the .data..percpu section needs 2M alignment. The percpu data section is only used as init template and actual percpu addresses always go through offsetting against __per_cpu_offset[] - no matter what the vaddrs in the vmlinux are, they get offsetted into 2M aligned linear address if necessary. I think the only alignment .data..percpu needs is cacheline alignment for separating its subsections. Thanks. My basic testing isn't showing any problems. Of course all that changed was where in the vmlinux file not where in physical memory the data was loaded, so problems would really surprise me. Eric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RESEND] Fix a dead loop in async_synchronize_full()
This patch tries to fix a dead loop in async_synchronize_full(), which could be seen when preemption is disabled on a single cpu machine. void async_synchronize_full(void) { do { async_synchronize_cookie(next_cookie); } while (!list_empty(async_running) || ! list_empty(async_pending)); } async_synchronize_cookie() calls async_synchronize_cookie_domain() with async_running as the default domain to synchronize. However, there might be some works in the async_pending list from other domains. On a single cpu system, without preemption, there is no chance for the other works to finish, so async_synchronize_full() enters a dead loop. It seems async_synchronize_full() wants to synchronize all entries in all running lists(domains), so maybe we could just check the entry_count to know whether all works are finished. Currently, async_synchronize_cookie_domain() expects a non-NULL running list ( if NULL, there would be NULL pointer dereference ), so maybe a NULL pointer could be used as an indication for the functions to synchronize all works in all domains. Reported-by: Paul E. McKenney paul...@linux.vnet.ibm.com Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com Tested-by: Paul E. McKenney paul...@linux.vnet.ibm.com Tested-by: Christian Kujau li...@nerdbynature.de --- kernel/async.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/kernel/async.c b/kernel/async.c index bd0c168..32d8dc9 100644 --- a/kernel/async.c +++ b/kernel/async.c @@ -86,6 +86,13 @@ static async_cookie_t __lowest_in_progress(struct list_head *running) { struct async_entry *entry; + if (!running) { /* just check the entry count */ + if (atomic_read(entry_count)) + return 0; /* smaller than any cookie */ + else + return next_cookie; + } + if (!list_empty(running)) { entry = list_first_entry(running, struct async_entry, list); @@ -236,9 +243,7 @@ EXPORT_SYMBOL_GPL(async_schedule_domain); */ void async_synchronize_full(void) { - do { - async_synchronize_cookie(next_cookie); - } while (!list_empty(async_running) || !list_empty(async_pending)); + async_synchronize_cookie_domain(next_cookie, NULL); } EXPORT_SYMBOL_GPL(async_synchronize_full); @@ -258,7 +263,7 @@ EXPORT_SYMBOL_GPL(async_synchronize_full_domain); /** * async_synchronize_cookie_domain - synchronize asynchronous function calls within a certain domain with cookie checkpointing * @cookie: async_cookie_t to use as checkpoint - * @running: running list to synchronize on + * @running: running list to synchronize on, NULL indicates all lists * * This function waits until all asynchronous function calls for the * synchronization domain specified by the running list @list submitted -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Question] sched/rt_mutex: re-enqueue_task on rt_mutex_setprio()
On Mon, Jul 9, 2012 at 3:48 PM, Peter Zijlstra pet...@infradead.org wrote: On Mon, 2012-07-09 at 09:50 +0900, Namhyung Kim wrote: On Sat, 07 Jul 2012 21:29:19 -0400, Steven Rostedt wrote: On Sat, 2012-07-07 at 14:44 +0900, Namhyung Kim wrote: Hi, I have a question on the code below: void rt_mutex_setprio(struct task_struct *p, int prio) { ... if (on_rq) enqueue_task(rq, p, oldprio prio ? ENQUEUE_HEAD : 0); When enqueueing @p with new @prio, it seems put @p at the head of a rq if appropriate. I guess it's the case of boosting @p with higher priority, right? Actually, no. We put @p at the head of the queue when unboosting. If a task is going from a high priority into a lower priority, it is still treated as important for that priority, and is put to the front of the queue (it was just higher than everything else on that queue). But if we are boosting a task from a low priority, why put it to the head of other tasks of its new priority, when those tasks were just higher than this task, and this task is now just an equal. Thanks for the explanation. (Isn't it worth getting commented?) :) Possibly, note that this part is well spec'ed by POSIX, see http://pubs.opengroup.org/onlinepubs/009695299/functions/xsh_chap02_08.html SCHED_FIFO.8 Thanks for the pointer. I need to educate myself a lot more! Thanks, Namhyung -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] netdev/phy: Fixup lockdep warnings in mdio-mux.c
From: David Daney ddaney.c...@gmail.com Date: Wed, 4 Jul 2012 15:06:16 -0700 From: David Daney david.da...@cavium.com With lockdep enabled we get: ... This is a false positive, since we are indeed using 'nested' locking, we need to use mutex_lock_nested(). Now in theory we can stack multiple MDIO multiplexers, but that would require passing the nesting level (which is difficult to know) to mutex_lock_nested(). Instead we assume the simple case of a single level of nesting. Since these are only warning messages, it isn't so important to solve the general case. Signed-off-by: David Daney david.da...@cavium.com Applied to 'net', thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] cgroup: fix panic in netprio_cgroup
From: Gao feng gaof...@cn.fujitsu.com Date: Thu, 5 Jul 2012 17:28:40 +0800 we set max_prioidx to the first zero bit index of prioidx_map in function get_prioidx. So when we delete the low index netprio cgroup and adding a new netprio cgroup again,the max_prioidx will be set to the low index. when we set the high index cgroup's net_prio.ifpriomap,the function write_priomap will call update_netdev_tables to alloc memory which size is sizeof(struct netprio_map) + sizeof(u32) * (max_prioidx + 1), so the size of array that map-priomap point to is max_prioidx +1, which is low than what we actually need. fix this by adding check in get_prioidx,only set max_prioidx when max_prioidx low than the new prioidx. Signed-off-by: Gao feng gaof...@cn.fujitsu.com Applied. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Antw: Re: /sys and access(2): Correctly implemented?
On 09/07/12 16:23, Ulrich Windl wrote: Ryan Mallon rmal...@gmail.com schrieb am 09.07.2012 um 01:24 in Nachricht 4ffa16b6.9050...@gmail.com: On 06/07/12 16:27, Ulrich Windl wrote: Hi! Recently I found a problem with the command (kernel 3.0.34-0.7-default from SLES 11 SP2, run as root): test -r $file cat $file emitting Permission denied Investigating, I found that test actually uses access() to check for permissions. Unfortunately there are some files in /sys that have write-only permission bits set (e.g. /sys/devices/system/cpu/probe). ~ # ll /sys/devices/system/cpu/probe --w--- 1 root root 4096 Jun 29 12:43 /sys/devices/system/cpu/probe ~ # F=/sys/devices/system/cpu/probe ~ # test $F cat $F cat: /sys/devices/system/cpu/probe: Permission denied Looks like you have a typo here, I think you wanted test -r $F, not test $F, the latter will just evaluate $F as an expression which will be true, and so you get the permission denied error running cat. Hi! You are right: It's a typo, but only in the message; the actual test was done correctly, and the outcome is quite the same. Using test -r $F on a write-only sysfs file correctly returns false on my machine (Ubuntu 10.04.4 LTS/2.6.32-41-generic). Not here, unfortunately: Oops, I missed the bit about you running as root. I get the same results running as root on my machine as you, both for sysfs and regular files. It appears that access(2) as the super-user is might be implementation defined, see: http://pubs.opengroup.org/onlinepubs/95399/functions/access.html http://lists.gnu.org/archive/html/bug-bash/2010-07/msg00071.html However, I can't find any concrete information on it for Linux, and the manpage doesn't mention anything other the the X_OK bit. ~Ryan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] fs/ubifs/orphan.c: remove invalid reference to list iterator variable
From: Julia Lawall julia.law...@lip6.fr If list_for_each_entry, etc complete a traversal of the list, the iterator variable ends up pointing to an address at an offset from the list head, and not a meaningful structure. Thus this value should not be used after the end of the iterator. Replace a field access from orphan by NULL in two places. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // smpl @@ identifier c; expression E; iterator name list_for_each_entry; statement S; @@ list_for_each_entry(c,...) { ... when != break; when forall when strict } ... ( c = E | *c ) // /smpl Signed-off-by: Julia Lawall julia.law...@lip6.fr --- fs/ubifs/orphan.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/ubifs/orphan.c b/fs/ubifs/orphan.c index b02734d..cebf17e 100644 --- a/fs/ubifs/orphan.c +++ b/fs/ubifs/orphan.c @@ -176,7 +176,7 @@ int ubifs_orphan_start_commit(struct ubifs_info *c) *last = orphan; last = orphan-cnext; } - *last = orphan-cnext; + *last = NULL; c-cmt_orphans = c-new_orphans; c-new_orphans = 0; dbg_cmt(%d orphans to commit, c-cmt_orphans); @@ -382,7 +382,7 @@ static int consolidate(struct ubifs_info *c) last = orphan-cnext; cnt += 1; } - *last = orphan-cnext; + *last = NULL; ubifs_assert(cnt == c-tot_orphans - c-new_orphans); c-cmt_orphans = cnt; c-ohead_lnum = c-orph_first; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pull request: wireless 2012-07-06
From: John W. Linville linvi...@tuxdriver.com Date: Fri, 6 Jul 2012 15:20:35 -0400 Please let me know if there are problems! This indentation is not correct: commit 01f9cb073c827c60c43f769763b49a2026f1a897 Author: Thomas Huehn tho...@net.t-labs.tu-berlin.de Date: Thu Jun 28 14:39:51 2012 -0700 mwl8k: fix possible race condition in info-control.sta use ... + sta = ieee80211_find_sta_by_ifaddr(hw, wh-addr1, + wh-addr2); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 11/11] MAINTAINERS: add fblog entry
On Sun, Jul 8, 2012 at 11:56 PM, David Herrmann dh.herrm...@googlemail.com wrote: Add myself as maintainer for the fblog driver to the MAINTAINERS file. Signed-off-by: David Herrmann dh.herrm...@googlemail.com --- MAINTAINERS | 6 ++ 1 file changed, 6 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index ae8fe46..249b02a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2854,6 +2854,12 @@ F: drivers/video/ F: include/video/ F: include/linux/fb.h +FRAMEBUFFER LOG DRIVER +M: David Herrmann dh.herrm...@googlemail.com +L: linux-ser...@vger.kernel.org Why linux-serial, and not linux-fbdev? +S: Maintained +F: drivers/video/console/fblog.c + FREESCALE DMA DRIVER M: Li Yang le...@freescale.com M: Zhang Wei z...@zh-kernel.org Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 04/11] fbdev: export get_fb_info()/put_fb_info()
On Sun, Jul 8, 2012 at 11:56 PM, David Herrmann dh.herrm...@googlemail.com wrote: --- a/drivers/video/fbmem.c +++ b/drivers/video/fbmem.c @@ -46,7 +46,7 @@ static DEFINE_MUTEX(registration_lock); struct fb_info *registered_fb[FB_MAX] __read_mostly; int num_registered_fb __read_mostly; -static struct fb_info *get_fb_info(unsigned int idx) +struct fb_info *get_fb_info(unsigned int idx) { struct fb_info *fb_info; @@ -61,14 +61,16 @@ static struct fb_info *get_fb_info(unsigned int idx) return fb_info; } +EXPORT_SYMBOL(get_fb_info); EXPORT_SYMBOL_GPL? -static void put_fb_info(struct fb_info *fb_info) +void put_fb_info(struct fb_info *fb_info) { if (!atomic_dec_and_test(fb_info-count)) return; if (fb_info-fbops-fb_destroy) fb_info-fbops-fb_destroy(fb_info); } +EXPORT_SYMBOL(put_fb_info); EXPORT_SYMBOL_GPL? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fwd: Hid over I2C and ACPI interaction
On 2012年07月09日 12:02, Moore, Robert wrote: These are already defined in acpica - in the file acrestyp.h ACPI_RESOURCE_FIXED_DMA FixedDma; ACPI_RESOURCE_GPIO Gpio; ACPI_RESOURCE_I2C_SERIALBUS I2cSerialBus; ACPI_RESOURCE_SPI_SERIALBUS SpiSerialBus; ACPI_RESOURCE_UART_SERIALBUSUartSerialBus; ACPI_RESOURCE_COMMON_SERIALBUS CommonSerialBus; Yeah. Thanks for Bob's reminder. We can reuse these macros. -Original Message- From: linux-acpi-ow...@vger.kernel.org [mailto:linux-acpi- ow...@vger.kernel.org] On Behalf Of Lan Tianyu Sent: Sunday, July 08, 2012 8:25 PM To: Mika Westerberg Cc: Zhang, Rui; kh...@linux-fr.org; ben-li...@fluff.org; w.s...@pengutronix.de; l...@kernel.org; linux-a...@vger.kernel.org; linux- i...@vger.kernel.org; linux-kernel@vger.kernel.org; jkos...@suse.cz; cha...@enac.fr; jj_d...@emc.com.tw; bhelg...@google.com; abe...@mit.edu Subject: Re: Fwd: Hid over I2C and ACPI interaction On 2012年07月06日 13:52, Mika Westerberg wrote: On Thu, Jul 05, 2012 at 03:01:57PM +0800, Zhang Rui wrote: +Note that although these are ACPI devices, we prefer to use PnP drivers for them, +this is because: +1. all the non-ACPI-predefined Devices are exported as PnP devices as well +2. PnP bus is a well designed bus. Probing via PnP layer saves a lot of work + for the device driver, e.g. getting parsing ACPI resources. (Nice BKM, thanks for sharing) I have few questions about using PnP drivers instead of pure ACPI drivers. ACPI 5.0 defined some new resources, for example Fixed DMA descriptor that has information about the request line + channel for the device to use. Hovewer, PnP drivers pass resources as 'struct resource', which basically only has start and end - how do you represent all this new stuff using 'struct resource'? I think we can add new interface to get acpi specific resources. e.g struct acpi_resource pnp_get_acpi_resource(...). When the pnp acpi devices were initialized, put those acpi specific resources into a new resource list pnpdev-acpi_resources. What pnp_get_acpi_resource does is to get specified type acpi resources and return. We also need to define some acpi resource types. ACPI_RESOURCE_DMA ACPI_RESOURCE_I2C_SERIALBUS ACPI_RESOURCE_SPI_SERIALBUS ACPI_RESOURCE_UART_SERIALBUS ACPI_RESOURCE_COMMON_SERIALBUS ... How about this? welcome to comments. Or should we use acpi_walk_resources() where 'struct resource' is not suitable? -- Best Regards Tianyu Lan linux kernel enabling team -- To unsubscribe from this list: send the line unsubscribe linux-acpi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best Regards Tianyu Lan linux kernel enabling team -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 04/11] mm: memcg: push down PageSwapCache check into uncharge entry functions
On Mon, Jul 09, 2012 at 11:42:12AM +0900, Kamezawa Hiroyuki wrote: (2012/07/05 9:44), Johannes Weiner wrote: @@ -3278,10 +3283,11 @@ void mem_cgroup_end_migration(struct mem_cgroup *memcg, unused = oldpage; } anon = PageAnon(used); - __mem_cgroup_uncharge_common(unused, - anon ? MEM_CGROUP_CHARGE_TYPE_ANON -: MEM_CGROUP_CHARGE_TYPE_CACHE, - true); + if (!PageSwapCache(page)) + __mem_cgroup_uncharge_common(unused, +anon ? MEM_CGROUP_CHARGE_TYPE_ANON +: MEM_CGROUP_CHARGE_TYPE_CACHE, +true); !PageSwapCache(unused) ? Argh, right. But I think unused page's PG_swapcache is always dropped. So, the check is not necessary. Oh, this is intentional: the check was in __mem_cgroup_uncharge_common before, which means it applied to this entry point as well. This is supposed to be a mechanical change that does not change any logic. The check is then removed in the next patch. --- Subject: mm: memcg: push down PageSwapCache check into uncharge entry functions fix Signed-off-by: Johannes Weiner han...@cmpxchg.org --- diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a3bf414..f4ff18a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3283,7 +3283,7 @@ void mem_cgroup_end_migration(struct mem_cgroup *memcg, unused = oldpage; } anon = PageAnon(used); - if (!PageSwapCache(page)) + if (!PageSwapCache(unused)) __mem_cgroup_uncharge_common(unused, anon ? MEM_CGROUP_CHARGE_TYPE_ANON : MEM_CGROUP_CHARGE_TYPE_CACHE, -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: UBI fastmap updates
Hi Richard, On Sun, 08 Jul 2012 14:07:41 +0200 Richard Weinberger rich...@nod.at wrote: + /* TODO: in the new locking scheme, produce_free_peb is +* called under wl_lock taken. +* so when returning, should reacquire the lock +*/ Which new locking scheme? I am diffing linux-ubi fastmap HEAD against its fork point (vanilla ubi), that's 6b16351..d41a140 on linux-ubi. Which gives the following diff in produce_free_pebs: @@ -261,7 +266,6 @@ static int produce_free_peb(struct ubi_device *ubi) { int err; - spin_lock(ubi-wl_lock); while (!ubi-free.rb_node) { spin_unlock(ubi-wl_lock); @@ -272,7 +276,6 @@ static int produce_free_peb(struct ubi_device *ubi) spin_lock(ubi-wl_lock); } - spin_unlock(ubi-wl_lock); return 0; } Which is probably okay, since you obtain the lock in the new 'ubi_refill_pools', which calls produce_free_peb: +void ubi_refill_pools(struct ubi_device *ubi) +{ + spin_lock(ubi-wl_lock); + refill_wl_pool(ubi); + refill_wl_user_pool(ubi); + spin_unlock(ubi-wl_lock); +} However if 'do_work' fails within 'produce_free_peb', you return the error but leave wl_lock unlocked - where it is expected to be locked (otherwise, ubi_refill_pools will unlock it again): static int produce_free_peb(struct ubi_device *ubi) { int err; while (!ubi-free.rb_node) { spin_unlock(ubi-wl_lock); dbg_wl(do one work synchronously); err = do_work(ubi); if (err) return err; spin_lock(ubi-wl_lock); } return 0; } Regards, Shmulik -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] hw_random: mxc-rnga: Adapt clocks to new i.mx clock framework
On Fri, Jul 06, 2012 at 05:20:19PM -0300, Fabio Estevam wrote: Cc: Theodore Ts'o ty...@mit.edu Cc: Herbert Xu herb...@gondor.apana.org.au Cc: linux-kernel@vger.kernel.org Signed-off-by: Fabio Estevam fabio.este...@freescale.com --- drivers/char/hw_random/mxc-rnga.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/char/hw_random/mxc-rnga.c b/drivers/char/hw_random/mxc-rnga.c index 85074de..c49c0b8 100644 --- a/drivers/char/hw_random/mxc-rnga.c +++ b/drivers/char/hw_random/mxc-rnga.c @@ -152,14 +152,14 @@ static int __init mxc_rnga_probe(struct platform_device *pdev) if (rng_dev) return -EBUSY; - clk = clk_get(pdev-dev, rng); + clk = clk_get(pdev-dev, NULL); if (IS_ERR(clk)) { dev_err(pdev-dev, Could not get rng_clk!\n); err = PTR_ERR(clk); goto out; } - clk_enable(clk); + clk_prepare_enable(clk); res = platform_get_resource(pdev, IORESOURCE_MEM, 0); if (!res) { @@ -201,7 +201,7 @@ err_ioremap: release_mem_region(res-start, resource_size(res)); err_region: - clk_disable(clk); + clk_disable_unprepare(clk); clk_put(clk); out: @@ -212,7 +212,7 @@ static int __exit mxc_rnga_remove(struct platform_device *pdev) { struct resource *res = platform_get_resource(pdev, IORESOURCE_MEM, 0); void __iomem *rng_base = (void __iomem *)mxc_rnga.priv; - struct clk *clk = clk_get(pdev-dev, rng); + struct clk *clk = clk_get(pdev-dev, NULL); Uhh, that's a driver bug that should be fixed. Although right now there is no reference counting for clocks, the driver should keep the clk internally instead of simply calling clk_get whenever it needs access to a clk. Sascha -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0| Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- | -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] net: cgroup: fix out of bounds accesses
From: Eric Dumazet eduma...@google.com dev-priomap is allocated by extend_netdev_table() called from update_netdev_tables(). And this is only called if write_priomap() is called. But if write_priomap() is not called, it seems we can have out of bounds accesses in cgrp_destroy(), read_priomap() skb_update_prio() With help from Gao Feng Signed-off-by: Eric Dumazet eduma...@google.com Cc: Neil Horman nhor...@tuxdriver.com Cc: Gao feng gaof...@cn.fujitsu.com --- net/core/dev.c|8 ++-- net/core/netprio_cgroup.c |4 ++-- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 84f01ba..0f28a9e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2444,8 +2444,12 @@ static void skb_update_prio(struct sk_buff *skb) { struct netprio_map *map = rcu_dereference_bh(skb-dev-priomap); - if ((!skb-priority) (skb-sk) map) - skb-priority = map-priomap[skb-sk-sk_cgrp_prioidx]; + if (!skb-priority skb-sk map) { + unsigned int prioidx = skb-sk-sk_cgrp_prioidx; + + if (prioidx map-priomap_len) + skb-priority = map-priomap[prioidx]; + } } #else #define skb_update_prio(skb) diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c index aa907ed..3e953ea 100644 --- a/net/core/netprio_cgroup.c +++ b/net/core/netprio_cgroup.c @@ -142,7 +142,7 @@ static void cgrp_destroy(struct cgroup *cgrp) rtnl_lock(); for_each_netdev(init_net, dev) { map = rtnl_dereference(dev-priomap); - if (map) + if (map cs-prioidx map-priomap_len) map-priomap[cs-prioidx] = 0; } rtnl_unlock(); @@ -166,7 +166,7 @@ static int read_priomap(struct cgroup *cont, struct cftype *cft, rcu_read_lock(); for_each_netdev_rcu(init_net, dev) { map = rcu_dereference(dev-priomap); - priority = map ? map-priomap[prioidx] : 0; + priority = (map prioidx map-priomap_len) ? map-priomap[prioidx] : 0; cb-fill(cb, dev-name, priority); } rcu_read_unlock(); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH V2 2/3] pwm_backlight: use power sequences
Sorry, I just noticed a mistake in this patch I made while merging another one. The following also needs to be changed, otherwise the power-on sequence will never be executed: diff --git a/drivers/video/backlight/pwm_bl.c b/drivers/video/backlight/pwm_bl.c index 1a38953..4546d23 100644 --- a/drivers/video/backlight/pwm_bl.c +++ b/drivers/video/backlight/pwm_bl.c @@ -65,7 +98,7 @@ static int pwm_backlight_update_status(struct backlight_device *bl) duty_cycle = pb-lth_brightness + (duty_cycle * (pb-period - pb-lth_brightness) / max); pwm_config(pb-pwm, duty_cycle, pb-period); - pwm_enable(pb-pwm); + pwm_backlight_on(bl); } Apologies for the inconvenience. Alex. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fwd: Hid over I2C and ACPI interaction
On Mon, Jul 09, 2012 at 11:24:45AM +0800, Lan Tianyu wrote: I think we can add new interface to get acpi specific resources. e.g struct acpi_resource pnp_get_acpi_resource(...). When the pnp acpi devices were initialized, put those acpi specific resources into a new resource list pnpdev-acpi_resources. What pnp_get_acpi_resource does is to get specified type acpi resources and return. We also need to define some acpi resource types. Yeah, that sounds good to me. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] kvm: Improving directed yield in PLE handler
On 09/07/12 08:20, Raghavendra K T wrote: Currently Pause Looop Exit (PLE) handler is doing directed yield to a random VCPU on PL exit. Though we already have filtering while choosing the candidate to yield_to, we can do better. Problem is, for large vcpu guests, we have more probability of yielding to a bad vcpu. We are not able to prevent directed yield to same guy who has done PL exit recently, who perhaps spins again and wastes CPU. Fix that by keeping track of who has done PL exit. So The Algorithm in series give chance to a VCPU which has: We could do the same for s390. The appropriate exit would be diag44 (yield to hypervisor). Almost all s390 kernels use diag9c (directed yield to a given guest cpu) for spinlocks, though. So there is no win here, but there are other cases were diag44 is used, e.g. cpu_relax. I have to double check with others, if these cases are critical, but for now, it seems that your dummy implementation for s390 is just fine. After all it is a no-op until we implement something. Thanks Christian -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] CONFIG_CC_STACKPROTECTOR is no longer experimental
Hi all, Le vendredi 06 juillet 2012 à 22:19 +0200, Paul Bolle a écrit : On Fri, 2012-07-06 at 10:58 -0700, Arjan van de Ven wrote: I rather just retire the whole concept of Experimental. it's really utterly meaningless in practice anyway. See Russell King's quick survey in https://lkml.org/lkml/2012/1/18/397 : almost all defconfigs had CONFIG_EXPERIMENTAL enabled. I didn't recheck since I'm sure little has changed. That macro and the related Kconfig symbol seem indeed meaningless. I admit I have CONFIG_EXPERIMENTAL enabled on all my systems as well, even the ones running an enterprise grade flavor of GNU/Linux. This isn't necessarily surprising. Having to make decisions at build time has always been an issue for distributions. The proper way for distributions to deal with experimental drivers is to package them separately and/or blacklist them by default. For experimental options, best is to make them tunable at run time, for example using module parameters. As for options still depending on EXPERIMENTAL when they no longer should, this can partly be explained when the EXPERIMENTAL dependency doesn't show up in the short description. This is the case of CONFIG_CC_STACKPROTECTOR. As everybody has CONFIG_EXPERIMENTAL enabled, nobody notices the dependency. The existence of CONFIG_EXPERIMENTAL may give developers the impression that depending on it is sufficient and the right thing to do for experimental drivers/features. That would be true if depending on CONFIG_EXPERIMENTAL would automatically add (EXPERIMENTAL) to the short description, as Randy and I were discussing previously, but this was never implemented. If we all agree that CONFIG_EXPERIMENTAL is no longer a good idea, then I'm fine dropping it. I'm always happy to see kernel configuration options go. Then options which used to depend on it and did not have (EXPERIMENTAL) in their short description should have it appended. These options should also default to N (but I think this is the default default if none is specified?) Maybe a task for kernel janitors? Back to my initial question, am I right to assume that CONFIG_CC_STACKPROTECTOR is no longer experimental and can be enabled in distribution kernels? Thanks, -- Jean Delvare Suse L3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Antw: Re: /sys and access(2): Correctly implemented?
Hi! Still the problem seems to be related to the sysfs: # cd /tmp # touch testfile # chmod u=w,go= testfile # F=/tmp/testfile # test -r $F cat $F So it seems access(2) works correctly for root and normal filesystems. That's why I came up with the issue here. Regards, Ulrich Ryan Mallon rmal...@gmail.com schrieb am 09.07.2012 um 09:22 in Nachricht 4ffa86c5.7090...@gmail.com: On 09/07/12 16:23, Ulrich Windl wrote: Ryan Mallon rmal...@gmail.com schrieb am 09.07.2012 um 01:24 in Nachricht 4ffa16b6.9050...@gmail.com: On 06/07/12 16:27, Ulrich Windl wrote: Hi! Recently I found a problem with the command (kernel 3.0.34-0.7-default from SLES 11 SP2, run as root): test -r $file cat $file emitting Permission denied Investigating, I found that test actually uses access() to check for permissions. Unfortunately there are some files in /sys that have write-only permission bits set (e.g. /sys/devices/system/cpu/probe). ~ # ll /sys/devices/system/cpu/probe --w--- 1 root root 4096 Jun 29 12:43 /sys/devices/system/cpu/probe ~ # F=/sys/devices/system/cpu/probe ~ # test $F cat $F cat: /sys/devices/system/cpu/probe: Permission denied Looks like you have a typo here, I think you wanted test -r $F, not test $F, the latter will just evaluate $F as an expression which will be true, and so you get the permission denied error running cat. Hi! You are right: It's a typo, but only in the message; the actual test was done correctly, and the outcome is quite the same. Using test -r $F on a write-only sysfs file correctly returns false on my machine (Ubuntu 10.04.4 LTS/2.6.32-41-generic). Not here, unfortunately: Oops, I missed the bit about you running as root. I get the same results running as root on my machine as you, both for sysfs and regular files. It appears that access(2) as the super-user is might be implementation defined, see: http://pubs.opengroup.org/onlinepubs/95399/functions/access.html http://lists.gnu.org/archive/html/bug-bash/2010-07/msg00071.html However, I can't find any concrete information on it for Linux, and the manpage doesn't mention anything other the the X_OK bit. ~Ryan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] net: cgroup: fix out of bounds accesses
于 2012年07月09日 15:45, Eric Dumazet 写道: From: Eric Dumazet eduma...@google.com dev-priomap is allocated by extend_netdev_table() called from update_netdev_tables(). And this is only called if write_priomap() is called. But if write_priomap() is not called, it seems we can have out of bounds accesses in cgrp_destroy(), read_priomap() skb_update_prio() With help from Gao Feng Signed-off-by: Eric Dumazet eduma...@google.com Cc: Neil Horman nhor...@tuxdriver.com Cc: Gao feng gaof...@cn.fujitsu.com --- net/core/dev.c|8 ++-- net/core/netprio_cgroup.c |4 ++-- 2 files changed, 8 insertions(+), 4 deletions(-) Acked-by: Gao feng gaof...@cn.fujitsu.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs
Hi Wen, 2012/07/06 18:20, Wen Congyang wrote: At 07/06/2012 04:27 PM, Yasuaki Ishimatsu Wrote: Hi Wen, 2012/07/04 19:01, Wen Congyang wrote: At 07/04/2012 01:52 PM, Yasuaki Ishimatsu Wrote: Hi Wen, 2012/07/04 14:08, Wen Congyang wrote: At 07/04/2012 12:45 PM, Yasuaki Ishimatsu Wrote: Hi Wen, 2012/07/03 15:35, Wen Congyang wrote: At 07/03/2012 01:56 PM, Yasuaki Ishimatsu Wrote: When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type} sysfs files are created. But there is no code to remove these files. The patch implements the function to remove them. Note : The code does not free firmware_map_entry since there is no way to free memory which is allocated by bootmem. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- drivers/firmware/memmap.c| 70 +++ include/linux/firmware-map.h |6 +++ mm/memory_hotplug.c |6 +++ 3 files changed, 81 insertions(+), 1 deletion(-) Index: linux-3.5-rc4/mm/memory_hotplug.c === --- linux-3.5-rc4.orig/mm/memory_hotplug.c 2012-07-03 14:22:00.190240794 +0900 +++ linux-3.5-rc4/mm/memory_hotplug.c 2012-07-03 14:22:03.549198802 +0900 @@ -661,7 +661,11 @@ EXPORT_SYMBOL_GPL(add_memory); int remove_memory(int nid, u64 start, u64 size) { - return -EBUSY; + lock_memory_hotplug(); + /* remove memmap entry */ + firmware_map_remove(start, start + size - 1, System RAM); + unlock_memory_hotplug(); + return 0; } EXPORT_SYMBOL_GPL(remove_memory); Index: linux-3.5-rc4/include/linux/firmware-map.h === --- linux-3.5-rc4.orig/include/linux/firmware-map.h2012-07-03 14:21:45.766421116 +0900 +++ linux-3.5-rc4/include/linux/firmware-map.h 2012-07-03 14:22:03.550198789 +0900 @@ -25,6 +25,7 @@ int firmware_map_add_early(u64 start, u64 end, const char *type); int firmware_map_add_hotplug(u64 start, u64 end, const char *type); +int firmware_map_remove(u64 start, u64 end, const char *type); #else /* CONFIG_FIRMWARE_MEMMAP */ @@ -38,6 +39,11 @@ static inline int firmware_map_add_hotpl return 0; } +static inline int firmware_map_remove(u64 start, u64 end, const char *type) +{ + return 0; +} + #endif /* CONFIG_FIRMWARE_MEMMAP */ #endif /* _LINUX_FIRMWARE_MAP_H */ Index: linux-3.5-rc4/drivers/firmware/memmap.c === --- linux-3.5-rc4.orig/drivers/firmware/memmap.c 2012-07-03 14:21:45.761421180 +0900 +++ linux-3.5-rc4/drivers/firmware/memmap.c2012-07-03 14:22:03.569198549 +0900 @@ -79,7 +79,16 @@ static const struct sysfs_ops memmap_att .show = memmap_attr_show, }; +static void release_firmware_map_entry(struct kobject *kobj) +{ + /* + * FIXME : There is no idea. + * How to free the entry which allocated bootmem? + */ I find a function free_bootmem(), but I am not sure whether it can work here. It cannot work here. Another problem: how to check whether the entry uses bootmem? When firmware_map_entry is allocated by kzalloc(), the page has PG_slab. This is not true. In my test, I find the page does not have PG_slab sometimes. I think that it depends on the allocated size. firmware_map_entry size is smaller than PAGE_SIZE. So the page has PG_Slab. In my test, I add printk in the function firmware_map_add_hotplug() to display page's flags. And sometimes the page is not allocated by slab(I use PageSlab() to verify it). How did you check it? Could you send your debug patch? When the memory is not allocated from slab, the flags is 0x108000. Thank you for sending the patch. I think the page to not have PageSlab is a compound page. So we can check whether the entry is allocate from bootmem or not as follow: static void release_firmware_map_entry(struct kobject *kobj) { struct firmware_map_entry *entry = to_memmap_entry(kobj); struct page *head_page; head_page = virt_to_head_page(entry); if (PageSlab(head_page)) kfree(etnry); else /* the entry is allocated from bootmem */ } Thanks, Yasuaki Ishimatsu From 8dd51368d6c03edf7edc89cab17441e3741c39c7 Mon Sep 17 00:00:00 2001 From: Wen Congyang we...@cn.fujitsu.com Date: Wed, 4 Jul 2012 16:05:26 +0800 Subject: [PATCH] debug
Re: UBI fastmap updates
Am 09.07.2012 09:37, schrieb Shmulik Ladkani: Hi Richard, On Sun, 08 Jul 2012 14:07:41 +0200 Richard Weinberger rich...@nod.at wrote: + /* TODO: in the new locking scheme, produce_free_peb is +* called under wl_lock taken. +* so when returning, should reacquire the lock +*/ Which new locking scheme? I am diffing linux-ubi fastmap HEAD against its fork point (vanilla ubi), that's 6b16351..d41a140 on linux-ubi. Which gives the following diff in produce_free_pebs: Ahh. _my_ new locking scheme. I feared someone else changed it meanwhile in mainline. ;) Yes, the ubi-wl_lock in produce_free_peb() is no longer needed. Again, thanks for pointing this out! Thanks, //richard signature.asc Description: OpenPGP digital signature
Re: [PATCH] mm: Warn about costly page allocation
On Mon, Jul 09, 2012 at 11:38:20AM +0900, Minchan Kim wrote: Since lumpy reclaim was introduced at 2.6.23, it helped higher order allocation. Recently, we removed it at 3.4 and we didn't enable compaction forcingly[1]. The reason makes sense that compaction.o + migration.o isn't trivial for system doesn't use higher order allocation. But the problem is that we have to enable compaction explicitly while lumpy reclaim enabled unconditionally. Normally, admin doesn't know his system have used higher order allocation and even lumpy reclaim have helped it. Admin in embdded system have a tendency to minimise code size so that they can disable compaction. In this case, we can see page allocation failure we can never see in the past. It's critical on embedded side because... Let's think this scenario. There is QA team in embedded company and they have tested their product. In test scenario, they can allocate 100 high order allocation. (they don't matter how many high order allocations in kernel are needed during test. their concern is just only working well or fail of their middleware/application) High order allocation will be serviced well by natural buddy allocation without lumpy's help. So they released the product and sold out all over the world. Unfortunately, in real practice, sometime, 105 high order allocation was needed rarely and fortunately, lumpy reclaim could help it so the product doesn't have a problem until now. If they use latest kernel, they will see the new config CONFIG_COMPACTION which is very poor documentation, and they can't know it's replacement of lumpy reclaim(even, they don't know lumpy reclaim) so they simply disable Depending on lumpy reclaim or compaction for high-order kernel allocations is dangerous. Both depend on being able to move MIGRATE_MOVABLE allocations to satisy the high-order allocation. If used regularly for high-order kernel allocations and they are long-lived, the system will eventually be unable to grant these allocations, with or without compaction or lumpy reclaim. Be also aware that lumpy reclaim was very aggressive when reclaiming pages to satisfy an allocation. Compaction is not and compaction can be temporarily disabled if an allocation attempt fails. If lumpy reclaim was being depended upon to satisfy high-order allocations, there is no guarantee, particularly with 3.4, that compaction will succeed as it does not reclaim aggressively. that option for size optimization. Of course, QA team still test it but they can't find the problem if they don't do test stronger than old. It ends up release the product and sold out all over the world, again. But in this time, we don't have both lumpy and compaction so the problem would happen in real practice. A poor enginner from Korea have to flight to the USA for the fix a ton of products. Otherwise, should recall products from all over the world. Maybe he can lose a job. :( This patch adds warning for notice. If the system try to allocate PAGE_ALLOC_COSTLY_ORDER above page and system enters reclaim path, it emits the warning. At least, it gives a chance to look into their system before the relase. This patch avoids false positive by alloc_large_system_hash which allocates with GFP_ATOMIC and a fallback mechanism so it can make this warning useless. [1] c53919ad(mm: vmscan: remove lumpy reclaim) Signed-off-by: Minchan Kim minc...@kernel.org --- mm/page_alloc.c | 16 1 file changed, 16 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a4d3a19..1155e00 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2276,6 +2276,20 @@ gfp_to_alloc_flags(gfp_t gfp_mask) return alloc_flags; } +#if defined(CONFIG_DEBUG_VM) !defined(CONFIG_COMPACTION) +static inline void check_page_alloc_costly_order(unsigned int order) +{ + if (unlikely(order PAGE_ALLOC_COSTLY_ORDER)) { + printk_once(WARNING: You are tring to allocate %d-order page. + You might need to turn on CONFIG_COMPACTION\n, order); + } WARN_ON_ONCE would tell you what is trying to satisfy the allocation. It should further check if this is a GFP_MOVABLE allocation or not and if not, then it should either be documented that compaction may only delay allocation failures and that they may need to consider reserving the memory in advance or doing something like forcing MIGRATE_RESERVE to only be used for high-order allocations. +} +#else +static inline void check_page_alloc_costly_order(unsigned int order) +{ +} +#endif + static inline struct page * __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, enum zone_type high_zoneidx, @@ -2353,6 +2367,8 @@ rebalance: if (!wait) goto nopage; + check_page_alloc_costly_order(order); + /* Avoid recursion of direct reclaim */ if (current-flags PF_MEMALLOC) goto nopage;
Re: [PATCH 2/5] uprobes: suppress uprobe_munmap() from mmput()
On Sun, 2012-07-08 at 22:30 +0200, Oleg Nesterov wrote: uprobe_munmap() does get_user_pages() and it is also called from the final mmput()-exit_mmap() path. This slows down exit/mmput() for no reason, and I think it is simply dangerous/wrong to try to fault-in a page into the dying mm. If nothing else, this happens after the last sync_mm_rss(), afaics handle_mm_fault() can change the task-rss_stat and make the subsequent check_mm() unhappy. Change uprobe_munmap() to check mm-mm_users != 0. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/events/uprobes.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index a93b6df..47c4e24 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1082,6 +1082,9 @@ void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned lon if (!atomic_read(uprobe_events) || !valid_vma(vma, false)) return; + if (!atomic_read(vma-vm_mm-mm_users)) /* called by mmput() ? */ + return; + if (!atomic_read(vma-vm_mm-uprobes_state.count)) return; But won't you leak uprobe refcounts like this? Those aren't tied to the task (which is dying) but to the vma's mapping the appropriate hunk of the text. Not doing the munmap will then not put the uprobe-ref.. Or am I missing something here? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Infinite looping in omap2430.c USB driver
* NeilBrown ne...@suse.de [120706 15:44]: Hello `./scripts/get_maintainer.pl -f drivers/usb/musb/omap2430.c` omap2430_musb_set_vbus in omap2430.c contains: while (musb_readb(musb-mregs, MUSB_DEVCTL) 0x80) { cpu_relax(); if (time_after(jiffies, timeout)) { dev_err(musb-controller, configured as A device timeout); ret = -EINVAL; break; } } having set unsigned long timeout = jiffies + msecs_to_jiffies(1000); so it can busy-loop for up to 1 second. Probably not ideal, but if it works I wouldn't complain. The if (int_usb MUSB_INTR_SESSREQ) { branch of musb_stage0_irq() called from musb_interrupt (from generic_interrupt) calls this: if (musb-int_usb) retval |= musb_stage0_irq(musb, musb-int_usb, devctl, power); so the busy loop can happen in an interrupt handler (not a threaded interrupt handler), which is probably less ideal. However this can be called with interrupt disabled, as happens at least during resume when resume_irqs() calls: raw_spin_lock_irqsave(desc-lock, flags); __enable_irq(desc, irq, true); raw_spin_unlock_irqrestore(desc-lock, flags); and an interrupt is found to be IRQS_PENDING. In this case interrupts are disabled so 'jiffies' never changes so this loop can continue forever. This happens on my (GTA04) phone fairly regularly - between 1 in 10 and 1 in 30 resumes. The musb-hdrc interrupt is pending and reports [ 4957.624176] musb-hdrc musb-hdrc: ** IRQ peripheral usb0040 tx rx 'usb0040' is MUSB_INTR_SESSREQ. I think this is triggered by detecting a voltage change on the USB ID pin - is that right? A short-to-earth would be a request to switch to host mode, which is why it tries to enable VBUS. Maybe there is some electrical noise which is being picked up? I guess that could happen if the transceiver pins are floating during suspend? In any case I get the interrupt despite nothing being plugged in, and the 0x80 bit of MUSB_DEVCTL never gets cleared. As far as I remember, musb tries to be smart about changing to host mode, and tries to do the session and vbus detection on it's own.. AFAIK, there's nothing you can do until musb is done and detects the VBUS is not rising and gives up. There are all kind of interrupt flag combinations trying to deal with that mess, maybe you need to add yet another one? I've added a simple loop counter which aborts the loop after 1000 loops - this takes about 5 seconds, but includes some printks which probably slow it down. In 2 out of 2 cases, subsequent messages show that the hsmmc driver for the uSD card that holds my root filesystem is messed up. It seems to be waiting for a request that is never going to complete. So maybe the hsmmc is causing the noise that triggers the musb issue. I can send a patch which add a loop count if you like, but I suspect you can come up with a much better approach. Sounds like that loop should be fixed. Regards, Tony -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] uprobes: kill copy_vma()-uprobe_mmap()
On Sun, 2012-07-08 at 22:30 +0200, Oleg Nesterov wrote: And why this uprobe_mmap() was added? I believe the intent was wrong. Note that the caller is going to do move_page_tables(), all registered uprobes are already faulted in, we only change the virtual addresses. I think it was because of the copy_vma + do_munmap. Since do_munmap() should be doing a put on the uprobe, we need an extra get to balance. That said, I cannot actually find the uprobe_munmap() from do_munmap(), but that might be due to lack of wakefulness etc.. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND v7 1/2] block: ioctl support for sanitize in eMMC 4.5
On 28 June 2012 14:02, Yaniv Gardi yga...@codeaurora.org wrote: Adding a new ioctl to support sanitize operation in eMMC cards version 4.5. The sanitize ioctl support helps performing this operation via user application. Signed-off-by: Yaniv Gardi yga...@codeaurora.org --- block/blk-core.c | 15 ++-- block/blk-lib.c | 51 + block/blk-merge.c |4 +++ block/elevator.c |2 +- block/ioctl.c |9 include/linux/blk_types.h |5 +++- include/linux/blkdev.h|3 ++ include/linux/fs.h|1 + kernel/trace/blktrace.c |2 + 9 files changed, 87 insertions(+), 5 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 3c923a7..4a56102b 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1641,7 +1641,7 @@ generic_make_request_checks(struct bio *bio) goto end_io; } - if (unlikely(!(bio-bi_rw REQ_DISCARD) + if (unlikely(!(bio-bi_rw (REQ_DISCARD | REQ_SANITIZE)) nr_sectors queue_max_hw_sectors(q))) { printk(KERN_ERR bio too big device %s (%u %u)\n, bdevname(bio-bi_bdev, b), @@ -1689,6 +1689,14 @@ generic_make_request_checks(struct bio *bio) goto end_io; } + if ((bio-bi_rw REQ_SANITIZE) + (!blk_queue_sanitize(q))) { + pr_info(%s - got a SANITIZE request but the queue + doesn't support sanitize requests, __func__); + err = -EOPNOTSUPP; + goto end_io; + } + if (blk_throtl_bio(q, bio)) return false; /* throttled, will be resubmitted later */ @@ -1794,7 +1802,8 @@ void submit_bio(int rw, struct bio *bio) * If it's a regular read/write or a barrier with data attached, * go through the normal accounting stuff before submission. */ - if (bio_has_data(bio) !(rw REQ_DISCARD)) { + if (bio_has_data(bio) + (!(rw (REQ_DISCARD | REQ_SANITIZE { if (rw WRITE) { count_vm_events(PGPGOUT, count); } else { @@ -1840,7 +1849,7 @@ EXPORT_SYMBOL(submit_bio); */ int blk_rq_check_limits(struct request_queue *q, struct request *rq) { - if (rq-cmd_flags REQ_DISCARD) + if (rq-cmd_flags (REQ_DISCARD | REQ_SANITIZE)) return 0; if (blk_rq_sectors(rq) queue_max_sectors(q) || diff --git a/block/blk-lib.c b/block/blk-lib.c index 2b461b4..280d63e 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -115,6 +115,57 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, EXPORT_SYMBOL(blkdev_issue_discard); /** + * blkdev_issue_sanitize - queue a sanitize request + * @bdev: blockdev to issue sanitize for + * @gfp_mask: memory allocation flags (for bio_alloc) + * + * Description: + *Issue a sanitize request for the specified block device + */ +int blkdev_issue_sanitize(struct block_device *bdev, gfp_t gfp_mask) +{ + DECLARE_COMPLETION_ONSTACK(wait); + struct request_queue *q = bdev_get_queue(bdev); + int type = REQ_WRITE | REQ_SANITIZE; + struct bio_batch bb; + struct bio *bio; + int ret = 0; + + if (!q) + return -ENXIO; + + if (!blk_queue_sanitize(q)) { + pr_err(%s - card doesn't support sanitize, __func__); + return -EOPNOTSUPP; + } + + bio = bio_alloc(gfp_mask, 1); + if (!bio) + return -ENOMEM; + + atomic_set(bb.done, 1); + bb.flags = 1 BIO_UPTODATE; + bb.wait = wait; + + bio-bi_end_io = bio_batch_end_io; + bio-bi_bdev = bdev; + bio-bi_private = bb; + + atomic_inc(bb.done); + submit_bio(type, bio); + + /* Wait for bios in-flight */ + if (!atomic_dec_and_test(bb.done)) + wait_for_completion(wait); + + if (!test_bit(BIO_UPTODATE, bb.flags)) + ret = -EIO; + + return ret; +} +EXPORT_SYMBOL(blkdev_issue_sanitize); + +/** * blkdev_issue_zeroout - generate number of zero filed write bios * @bdev: blockdev to issue * @sector:start sector diff --git a/block/blk-merge.c b/block/blk-merge.c index 160035f..7e24772 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -477,6 +477,10 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio) if (!rq_mergeable(rq)) return false; + /* don't merge file system requests and sanitize requests */ + if ((req-cmd_flags REQ_SANITIZE) != (next-cmd_flags REQ_SANITIZE)) + return false; + /* don't merge file system requests and discard requests */ if ((bio-bi_rw REQ_DISCARD) != (rq-bio-bi_rw
Re: linux-next: comment on pm tree commit
On Monday, July 09, 2012, Stephen Rothwell wrote: Hi Rafael, I noticed commit b8eec56cd8e5 (PM / cpuidle: System resume hang fix with cpuidle) in the pm tree needs some work (I noticed it because it was changed in a rebase ...). diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index a6b3f2e..b90ccb2 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -146,6 +146,8 @@ extern void cpuidle_unregister_device(struct cpuidle_device *dev); extern void cpuidle_pause_and_lock(void); extern void cpuidle_resume_and_unlock(void); +extern void cpuidle_pause(void); +extern void cpuidle_resume(void); extern int cpuidle_enable_device(struct cpuidle_device *dev); extern void cpuidle_disable_device(struct cpuidle_device *dev); extern int cpuidle_wrap_enter(struct cpuidle_device *dev, @@ -169,6 +171,8 @@ static inline void cpuidle_unregister_device(struct cpuidle_device *dev) { } static inline void cpuidle_pause_and_lock(void) { } static inline void cpuidle_resume_and_unlock(void) { } +static inline cpuidle_pause(void) { } +static inline cpuidle_resume(void) { } These need to be static inline void. I wonder what review and build testing this went through (the above should produce warnings since they are non void returning functions with no return statements). Thanks for reporting this, I tried to fix a build issue in the original patch hastily and failed miserably as you have noticed and then I build-tested a wrong tree. Sorry. It should be fixed now for real. Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Warn about costly page allocation
Hi Mel, On Mon, Jul 09, 2012 at 09:22:00AM +0100, Mel Gorman wrote: On Mon, Jul 09, 2012 at 11:38:20AM +0900, Minchan Kim wrote: Since lumpy reclaim was introduced at 2.6.23, it helped higher order allocation. Recently, we removed it at 3.4 and we didn't enable compaction forcingly[1]. The reason makes sense that compaction.o + migration.o isn't trivial for system doesn't use higher order allocation. But the problem is that we have to enable compaction explicitly while lumpy reclaim enabled unconditionally. Normally, admin doesn't know his system have used higher order allocation and even lumpy reclaim have helped it. Admin in embdded system have a tendency to minimise code size so that they can disable compaction. In this case, we can see page allocation failure we can never see in the past. It's critical on embedded side because... Let's think this scenario. There is QA team in embedded company and they have tested their product. In test scenario, they can allocate 100 high order allocation. (they don't matter how many high order allocations in kernel are needed during test. their concern is just only working well or fail of their middleware/application) High order allocation will be serviced well by natural buddy allocation without lumpy's help. So they released the product and sold out all over the world. Unfortunately, in real practice, sometime, 105 high order allocation was needed rarely and fortunately, lumpy reclaim could help it so the product doesn't have a problem until now. If they use latest kernel, they will see the new config CONFIG_COMPACTION which is very poor documentation, and they can't know it's replacement of lumpy reclaim(even, they don't know lumpy reclaim) so they simply disable Depending on lumpy reclaim or compaction for high-order kernel allocations is dangerous. Both depend on being able to move MIGRATE_MOVABLE allocations to satisy the high-order allocation. If used regularly for high-order kernel allocations and they are long-lived, the system will eventually be unable to grant these allocations, with or without compaction or lumpy reclaim. Indeed. Be also aware that lumpy reclaim was very aggressive when reclaiming pages to satisfy an allocation. Compaction is not and compaction can be temporarily disabled if an allocation attempt fails. If lumpy reclaim was being depended upon to satisfy high-order allocations, there is no guarantee, particularly with 3.4, that compaction will succeed as it does not reclaim aggressively. It's good explanation and let's add it in description. that option for size optimization. Of course, QA team still test it but they can't find the problem if they don't do test stronger than old. It ends up release the product and sold out all over the world, again. But in this time, we don't have both lumpy and compaction so the problem would happen in real practice. A poor enginner from Korea have to flight to the USA for the fix a ton of products. Otherwise, should recall products from all over the world. Maybe he can lose a job. :( This patch adds warning for notice. If the system try to allocate PAGE_ALLOC_COSTLY_ORDER above page and system enters reclaim path, it emits the warning. At least, it gives a chance to look into their system before the relase. This patch avoids false positive by alloc_large_system_hash which allocates with GFP_ATOMIC and a fallback mechanism so it can make this warning useless. [1] c53919ad(mm: vmscan: remove lumpy reclaim) Signed-off-by: Minchan Kim minc...@kernel.org --- mm/page_alloc.c | 16 1 file changed, 16 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a4d3a19..1155e00 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2276,6 +2276,20 @@ gfp_to_alloc_flags(gfp_t gfp_mask) return alloc_flags; } +#if defined(CONFIG_DEBUG_VM) !defined(CONFIG_COMPACTION) +static inline void check_page_alloc_costly_order(unsigned int order) +{ + if (unlikely(order PAGE_ALLOC_COSTLY_ORDER)) { + printk_once(WARNING: You are tring to allocate %d-order page. +You might need to turn on CONFIG_COMPACTION\n, order); + } WARN_ON_ONCE would tell you what is trying to satisfy the allocation. Do you mean that it would be better to use WARN_ON_ONCE rather than raw printk? If so, I would like to insist raw printk because WARN_ON_ONCE could be disabled by !CONFIG_BUG. If I miss something, could you elaborate it more? It should further check if this is a GFP_MOVABLE allocation or not and if not, then it should either be documented that compaction may only delay allocation failures and that they may need to consider reserving the memory in advance or doing something like forcing MIGRATE_RESERVE to only be used for high-order allocations. Okay. but I got confused you want to add above description
82571EB: Detected Hardware Unit Hang
Hi list, I'm seeing a Unit Hang even with the latest e1000e driver 2.0.0 when doing scp test. this issue is easy do reproduced on SUN FIRE X2270 M2, just copy a big file (500M) from another server will hit it at once. Would you please help on this? device info: # lspci -s 05:00.0 05:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06) # lspci -s 05:00.0 -n 05:00.0 0200: 8086:10bc (rev 06) # ethtool -i eth0 driver: e1000e version: 2.0.0-NAPI firmware-version: 5.10-2 bus-info: :05:00.0 # ethtool -k eth0 Offload parameters for eth0: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: on udp fragmentation offload: off generic segmentation offload: on generic-receive-offload: on kernel log: --- e1000e :05:00.0: eth0: Detected Hardware Unit Hang: TDH 6c TDT 81 next_to_use 81 next_to_clean6b buffer_info[next_to_clean]: time_stamp fffc7a23 next_to_watch71 jiffies fffc8c0c next_to_watch.status 0 MAC Status 80387 PHY Status 792d PHY 1000BASE-T Status 3c00 PHY Extended Status3000 PCI Status 10 e1000e :05:00.0: eth0: Detected Hardware Unit Hang: TDH 6c TDT 81 next_to_use 81 next_to_clean6b buffer_info[next_to_clean]: time_stamp fffc7a23 next_to_watch71 jiffies fffc9bac next_to_watch.status 0 MAC Status 80387 PHY Status 792d PHY 1000BASE-T Status 3c00 PHY Extended Status3000 PCI Status 10 [ cut here ] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x225/0x230() Hardware name: SUN FIRE X2270 M2 NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out Modules linked in: autofs4 hidp rfcomm bluetooth rfkill lockd sunrpc cpufreq_ondemand acpi_cpufreq mperf be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi video sbs sbshc acpi_pad acpi_ipmi ipmi_msghandler parport_pc lp parport e1000e(U) snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device igb snd_pcm_oss serio_raw snd_mixer_oss snd_pcm tpm_infineon snd_timer snd soundcore snd_page_alloc i2c_i801 iTCO_wdt i2c_core pcspkr i7core_edac iTCO_vendor_support ioatdma ghes dca edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage sd_mod crc_t10dif sg ahci libahci ext3 jbd mbcache [last unloaded: microcode] Pid: 0, comm: swapper Not tainted 2.6.39-200.24.1.el5uek #1 Call Trace: [c07d9ac5] ? dev_watchdog+0x225/0x230 [c045ba61] warn_slowpath_common+0x81/0xa0 [c07d9ac5] ? dev_watchdog+0x225/0x230 [c045bb23] warn_slowpath_fmt+0x33/0x40 [c07d9ac5] dev_watchdog+0x225/0x230 [c07d98a0] ? dev_activate+0xb0/0xb0 [c0468e82] call_timer_fn+0x32/0xf0 [c04bceb0] ? rcu_check_callbacks+0x80/0x80 [c046a76d] run_timer_softirq+0xed/0x1b0 [c07d98a0] ? dev_activate+0xb0/0xb0 [c0461a81] __do_softirq+0x91/0x1a0 [c04619f0] ? local_bh_enable+0x80/0x80 IRQ [c0462295] ? irq_exit+0x95/0xa0 [c087f8b8] ? smp_apic_timer_interrupt+0x38/0x42 [c08784f5] ? apic_timer_interrupt+0x31/0x38 [c046007b] ? do_exit+0x11b/0x370 [c065eae4] ? intel_idle+0xa4/0x100 [c078d9b9] ? cpuidle_idle_call+0xb9/0x1e0 [c0411d77] ? cpu_idle+0x97/0xd0 [c085cbbd] ? rest_init+0x5d/0x70 [c0b07a7a] ? start_kernel+0x28a/0x340 [c0b074b0] ? obsolete_checksetup+0xb0/0xb0 [c0b070a4] ? i386_start_kernel+0x64/0xb0 ---[ end trace 5502b55cd4d4e5cb ]--- e1000e :05:00.0: eth0: Reset adapter e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Thanks, Joe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/36] AArch64 Linux kernel port
Hi Jon, On 9 July 2012 03:01, Jon Masters jonat...@jonmasters.org wrote: On 07/08/2012 06:24 PM, Dennis Gilmore wrote: I know that the architecture really is new but thats not really clear by adding AArch32 into the mix to represent 32 bit arm as ARM has done or by calling it armv8. There is enough way to confuse them already why confuse things more by adding yet another variable that is AArch64. - From my and most of the other Fedora developers that i've discussed it with its more like reluctant acceptance of AArch64 than thinking is a good idea. btw, for clarification of anyone who is confused by the names...the new architecture is ARMv8. The 64-bit state is AArch64, during which the processor executes A64 instructions. The 32-bit state is AArch32, during which the processor executes either A32 (ARM version 7+) or T32 (Thumb - I guess Thumb2+ really due to some of deprecation) instructions. I've noticed that there appears to be a clarification effort in which AArch64 is used as an architecture name when ARMv8 would be confusing, which is most of the time if you don't know whether you're referring to the new A64 instruction set or the older ones. Thanks for clarifying this. I deliberately try not to use ARMv8 name to avoid confusion. Indeed, the new architecture is ARMv8 (following the ARM architectures numbering scheme). It has an AArch64 mode (with new exception model, new ISA) and an *optional* AArch32 mode (pretty much the same as ARMv7). The key here is that AArch32 is *optional* - we can have it at all levels, only some (e.g. EL0 - user) or not at all. These two modes also share very little, from a software perspective it's pretty much some register banking to allow compat mode support (e.g. you can read the AArch32 R0 register from the lower half of AArch64 X0). The AArch32 mode cannot switch by itself to an AArch64 mode, this requires taking an exception (can be SVC) to a higher level that actually runs in AArch64 mode. On an ARMv8 system, if it supports AArch32 at EL1 (kernel) you can run an ARMv7 kernel, and that's good for virtualisation. But I have *absolutely* no plans to support an AArch32 kernel for ARMv8 SoCs. -- Catalin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2 3/3] watchdog: omap_wdt: add device tree support
* Wim Van Sebroeck w...@iguana.be [120707 05:11]: Hi Tony, Hi Wim, * jgq...@gmail.com jgq...@gmail.com [120531 20:56]: From: Xiao Jiang jgq...@gmail.com Add device table for omap_wdt to support dt. Care to ack this patch in the series? Yep. Acked-by: Wim Van Sebroeck w...@iguana.be Thanks, I'll apply all three into omap devel-dt branch. Regards, Tony -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] KVM: X86: remove read buffer for mmio read
After commit f78146b0f9230765c6315b2e14f56112513389ad: KVM: Fix page-crossing MMIO MMIO that are split across a page boundary are currently broken - the code does not expect to be aborted by the exit to userspace for the first MMIO fragment. This patch fixes the problem by generalizing the current code for handling 16-byte MMIOs to handle a number of fragments, and changes the MMIO code to create those fragments. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Multiple MMIO reads can be merged into mmio_fragments, the read buffer is not needed anymore Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com --- arch/x86/include/asm/kvm_emulate.h |1 - arch/x86/kvm/emulate.c | 43 --- arch/x86/kvm/x86.c |2 - 3 files changed, 5 insertions(+), 41 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 1ac46c22..339d7c6 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -286,7 +286,6 @@ struct x86_emulate_ctxt { struct operand *memopp; struct fetch_cache fetch; struct read_cache io_read; - struct read_cache mem_read; }; /* Repeat String Operation Prefix */ diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index f95d242..aa455da 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1128,33 +1128,6 @@ static void fetch_bit_operand(struct x86_emulate_ctxt *ctxt) ctxt-src.val = (ctxt-dst.bytes 3) - 1; } -static int read_emulated(struct x86_emulate_ctxt *ctxt, -unsigned long addr, void *dest, unsigned size) -{ - int rc; - struct read_cache *mc = ctxt-mem_read; - - while (size) { - int n = min(size, 8u); - size -= n; - if (mc-pos mc-end) - goto read_cached; - - rc = ctxt-ops-read_emulated(ctxt, addr, mc-data + mc-end, n, - ctxt-exception); - if (rc != X86EMUL_CONTINUE) - return rc; - mc-end += n; - - read_cached: - memcpy(dest, mc-data + mc-pos, n); - mc-pos += n; - dest += n; - addr += n; - } - return X86EMUL_CONTINUE; -} - static int segmented_read(struct x86_emulate_ctxt *ctxt, struct segmented_address addr, void *data, @@ -1166,7 +1139,9 @@ static int segmented_read(struct x86_emulate_ctxt *ctxt, rc = linearize(ctxt, addr, size, false, linear); if (rc != X86EMUL_CONTINUE) return rc; - return read_emulated(ctxt, linear, data, size); + + return ctxt-ops-read_emulated(ctxt, linear, data, size, + ctxt-exception); } static int segmented_write(struct x86_emulate_ctxt *ctxt, @@ -4122,8 +4097,6 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt) int rc = X86EMUL_CONTINUE; int saved_dst_type = ctxt-dst.type; - ctxt-mem_read.pos = 0; - if (ctxt-mode == X86EMUL_MODE_PROT64 (ctxt-d No64)) { rc = emulate_ud(ctxt); goto done; @@ -4364,15 +4337,9 @@ writeback: * or, if it is not used, after each 1024 iteration. */ if ((r-end != 0 || ctxt-regs[VCPU_REGS_RCX] 0x3ff) - (r-end == 0 || r-end != r-pos)) { - /* -* Reset read cache. Usually happens before -* decode, but since instruction is restarted -* we have to do it here. -*/ - ctxt-mem_read.end = 0; + (r-end == 0 || r-end != r-pos)) return EMULATION_RESTART; - } + goto done; /* skip rip writeback */ } } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a01a424..7445545 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4399,8 +4399,6 @@ static void init_decode_cache(struct x86_emulate_ctxt *ctxt, ctxt-fetch.end = 0; ctxt-io_read.pos = 0; ctxt-io_read.end = 0; - ctxt-mem_read.pos = 0; - ctxt-mem_read.end = 0; } static void init_emulate_ctxt(struct kvm_vcpu *vcpu) -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] KVM: X86: introduce set_mmio_exit_info
Introduce set_mmio_exit_info to cleanup the common code Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com --- arch/x86/kvm/x86.c | 33 + 1 files changed, 17 insertions(+), 16 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7445545..7771f45 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3755,9 +3755,6 @@ static int read_exit_mmio(struct kvm_vcpu *vcpu, gpa_t gpa, static int write_exit_mmio(struct kvm_vcpu *vcpu, gpa_t gpa, void *val, int bytes) { - struct kvm_mmio_fragment *frag = vcpu-mmio_fragments[0]; - - memcpy(vcpu-run-mmio.data, frag-data, frag-len); return X86EMUL_CONTINUE; } @@ -3825,6 +3822,20 @@ mmio: return X86EMUL_CONTINUE; } +static void set_mmio_exit_info(struct kvm_vcpu *vcpu, + struct kvm_mmio_fragment *frag, bool write) +{ + struct kvm_run *run = vcpu-run; + + run-exit_reason = KVM_EXIT_MMIO; + run-mmio.phys_addr = frag-gpa; + run-mmio.len = frag-len; + run-mmio.is_write = vcpu-mmio_is_write = write; + + if (write) + memcpy(run-mmio.data, frag-data, frag-len); +} + int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr, void *val, unsigned int bytes, struct x86_exception *exception, @@ -3864,14 +3875,10 @@ int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr, return rc; gpa = vcpu-mmio_fragments[0].gpa; - vcpu-mmio_needed = 1; vcpu-mmio_cur_fragment = 0; - vcpu-run-mmio.len = vcpu-mmio_fragments[0].len; - vcpu-run-mmio.is_write = vcpu-mmio_is_write = ops-write; - vcpu-run-exit_reason = KVM_EXIT_MMIO; - vcpu-run-mmio.phys_addr = gpa; + set_mmio_exit_info(vcpu, vcpu-mmio_fragments[0], ops-write); return ops-read_write_exit_mmio(vcpu, gpa, val, bytes); } @@ -5490,7 +5497,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) */ static int complete_mmio(struct kvm_vcpu *vcpu) { - struct kvm_run *run = vcpu-run; struct kvm_mmio_fragment *frag; int r; @@ -5501,7 +5507,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu) /* Complete previous fragment */ frag = vcpu-mmio_fragments[vcpu-mmio_cur_fragment++]; if (!vcpu-mmio_is_write) - memcpy(frag-data, run-mmio.data, frag-len); + memcpy(frag-data, vcpu-run-mmio.data, frag-len); if (vcpu-mmio_cur_fragment == vcpu-mmio_nr_fragments) { vcpu-mmio_needed = 0; if (vcpu-mmio_is_write) @@ -5511,12 +5517,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu) } /* Initiate next fragment */ ++frag; - run-exit_reason = KVM_EXIT_MMIO; - run-mmio.phys_addr = frag-gpa; - if (vcpu-mmio_is_write) - memcpy(run-mmio.data, frag-data, frag-len); - run-mmio.len = frag-len; - run-mmio.is_write = vcpu-mmio_is_write; + set_mmio_exit_info(vcpu, frag, vcpu-mmio_is_write); return 0; } -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND v7 1/2] block: ioctl support for sanitize in eMMC 4.5
On 28 June 2012 14:02, Yaniv Gardi yga...@codeaurora.org wrote: Adding a new ioctl to support sanitize operation in eMMC cards version 4.5. The sanitize ioctl support helps performing this operation via user application. Signed-off-by: Yaniv Gardi yga...@codeaurora.org --- block/blk-core.c | 15 ++-- block/blk-lib.c | 51 + block/blk-merge.c |4 +++ block/elevator.c |2 +- block/ioctl.c |9 include/linux/blk_types.h |5 +++- include/linux/blkdev.h|3 ++ include/linux/fs.h|1 + kernel/trace/blktrace.c |2 + 9 files changed, 87 insertions(+), 5 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 3c923a7..4a56102b 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1641,7 +1641,7 @@ generic_make_request_checks(struct bio *bio) goto end_io; } - if (unlikely(!(bio-bi_rw REQ_DISCARD) + if (unlikely(!(bio-bi_rw (REQ_DISCARD | REQ_SANITIZE)) nr_sectors queue_max_hw_sectors(q))) { printk(KERN_ERR bio too big device %s (%u %u)\n, bdevname(bio-bi_bdev, b), @@ -1689,6 +1689,14 @@ generic_make_request_checks(struct bio *bio) goto end_io; } + if ((bio-bi_rw REQ_SANITIZE) + (!blk_queue_sanitize(q))) { + pr_info(%s - got a SANITIZE request but the queue + doesn't support sanitize requests, __func__); + err = -EOPNOTSUPP; + goto end_io; + } + if (blk_throtl_bio(q, bio)) return false; /* throttled, will be resubmitted later */ @@ -1794,7 +1802,8 @@ void submit_bio(int rw, struct bio *bio) * If it's a regular read/write or a barrier with data attached, * go through the normal accounting stuff before submission. */ - if (bio_has_data(bio) !(rw REQ_DISCARD)) { + if (bio_has_data(bio) + (!(rw (REQ_DISCARD | REQ_SANITIZE { if (rw WRITE) { count_vm_events(PGPGOUT, count); } else { @@ -1840,7 +1849,7 @@ EXPORT_SYMBOL(submit_bio); */ int blk_rq_check_limits(struct request_queue *q, struct request *rq) { - if (rq-cmd_flags REQ_DISCARD) + if (rq-cmd_flags (REQ_DISCARD | REQ_SANITIZE)) return 0; if (blk_rq_sectors(rq) queue_max_sectors(q) || diff --git a/block/blk-lib.c b/block/blk-lib.c index 2b461b4..280d63e 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -115,6 +115,57 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, EXPORT_SYMBOL(blkdev_issue_discard); /** + * blkdev_issue_sanitize - queue a sanitize request + * @bdev: blockdev to issue sanitize for + * @gfp_mask: memory allocation flags (for bio_alloc) + * + * Description: + *Issue a sanitize request for the specified block device + */ +int blkdev_issue_sanitize(struct block_device *bdev, gfp_t gfp_mask) +{ + DECLARE_COMPLETION_ONSTACK(wait); + struct request_queue *q = bdev_get_queue(bdev); + int type = REQ_WRITE | REQ_SANITIZE; + struct bio_batch bb; + struct bio *bio; + int ret = 0; + + if (!q) + return -ENXIO; + + if (!blk_queue_sanitize(q)) { + pr_err(%s - card doesn't support sanitize, __func__); + return -EOPNOTSUPP; + } + + bio = bio_alloc(gfp_mask, 1); + if (!bio) + return -ENOMEM; + + atomic_set(bb.done, 1); + bb.flags = 1 BIO_UPTODATE; + bb.wait = wait; + + bio-bi_end_io = bio_batch_end_io; + bio-bi_bdev = bdev; + bio-bi_private = bb; + + atomic_inc(bb.done); + submit_bio(type, bio); + + /* Wait for bios in-flight */ + if (!atomic_dec_and_test(bb.done)) + wait_for_completion(wait); + + if (!test_bit(BIO_UPTODATE, bb.flags)) + ret = -EIO; + + return ret; +} +EXPORT_SYMBOL(blkdev_issue_sanitize); + +/** * blkdev_issue_zeroout - generate number of zero filed write bios * @bdev: blockdev to issue * @sector:start sector diff --git a/block/blk-merge.c b/block/blk-merge.c index 160035f..7e24772 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -477,6 +477,10 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio) if (!rq_mergeable(rq)) return false; + /* don't merge file system requests and sanitize requests */ + if ((req-cmd_flags REQ_SANITIZE) != (next-cmd_flags REQ_SANITIZE)) this will not compile. gives compile error. Either change the function parameter to req or change in the above condition as rq + return false; +
Re: [PATCH 3/7] mm/slub.c: remove invalid reference to list iterator variable
On Sun, 8 Jul 2012, Julia Lawall wrote: From: Julia Lawall julia.law...@lip6.fr If list_for_each_entry, etc complete a traversal of the list, the iterator variable ends up pointing to an address at an offset from the list head, and not a meaningful structure. Thus this value should not be used after the end of the iterator. The patch replaces s-name by al-name, which is referenced nearby. This problem was found using Coccinelle (http://coccinelle.lip6.fr/). Signed-off-by: Julia Lawall julia.law...@lip6.fr --- mm/slub.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slub.c b/mm/slub.c index cc4ed03..ef9bf01 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -5395,7 +5395,7 @@ static int __init slab_sysfs_init(void) err = sysfs_slab_alias(al-s, al-name); if (err) printk(KERN_ERR SLUB: Unable to add boot slab alias - %s to sysfs\n, s-name); + %s to sysfs\n, al-name); kfree(al); } Applied, thanks! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: WARNING: __GFP_FS allocations with IRQs disabled (kmemcheck_alloc_shadow)
On Sun, 8 Jul 2012, David Rientjes wrote: The correct fix is what I proposed at http://marc.info/?l=linux-kernelm=133754837703630 and was awaiting testing. If Rus, Steven, or Fengguang could test this then we could add it as a stable backport as well. Looks good to me. Care to send it my way at penb...@kernel.org? It looks like people CC'd me as penb...@cs.helsinki.fi which is why I missed it. Pekka -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Warn about costly page allocation
On Mon, Jul 09, 2012 at 05:46:57PM +0900, Minchan Kim wrote: SNIP +#if defined(CONFIG_DEBUG_VM) !defined(CONFIG_COMPACTION) +static inline void check_page_alloc_costly_order(unsigned int order) +{ + if (unlikely(order PAGE_ALLOC_COSTLY_ORDER)) { + printk_once(WARNING: You are tring to allocate %d-order page. + You might need to turn on CONFIG_COMPACTION\n, order); + } WARN_ON_ONCE would tell you what is trying to satisfy the allocation. Do you mean that it would be better to use WARN_ON_ONCE rather than raw printk? Yes. If so, I would like to insist raw printk because WARN_ON_ONCE could be disabled by !CONFIG_BUG. If I miss something, could you elaborate it more? Ok, but all this will tell you is that *something* tried a high-order allocation. It will not tell you who and because it's a printk_once, it will also not tell you how often it's happening. You could add a dump_stack to capture that information. It should further check if this is a GFP_MOVABLE allocation or not and if not, then it should either be documented that compaction may only delay allocation failures and that they may need to consider reserving the memory in advance or doing something like forcing MIGRATE_RESERVE to only be used for high-order allocations. Okay. but I got confused you want to add above description in code directly like below or write it down in comment of check_page_alloc_costly_order? You're aiming this at embedded QA people according to your changelog so do whatever you think is going to be the most effective. It's already known that high-order kernel allocations are meant to be unreliable and apparently this is being ignored. The in-code warning could look something like if (unlikely(order PAGE_ALLOC_COSTLY_ORDER)) { printk_once(%s: page allocation high-order stupidity: order:%d, mode:0x%x\n, current-comm, order, gfp_mask); if (gfp_flags __GFP_MOVABLE) { printk_once(Enable compaction or whatever\n); dump_stack(); } else { printk_once(Regular high-order kernel allocations like this will eventually start failing.); dump_stack(); } } There should be a comment above it giving more information if you think the embedded people will actually read it. Of course, if this warning triggers during driver initialisation then it might be a completely useless. You could rate limit the warning (printk_ratelimit()) instead to be more effective. As I don't know what sort of device drivers you are seeing this problem with I can't judge what the best style of warning would be. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/5] uprobes: write_opcode() cleanups
* Oleg Nesterov o...@redhat.com wrote: On 07/06, Oleg Nesterov wrote: On 07/06, Ingo Molnar wrote: * Oleg Nesterov o...@redhat.com wrote: Hello, write_opcode() cleanups resend + new minor fix. Changes: - document the new argument in 2/5. - drop the buggy 5/5, thanks Anton for your quick nack. Probably I'll return to this later, I have another reason for this change. - so this 5/5 is new. Srikar, please add your acks unless you have some objections. Just wondering what's the review status of patches #1-#4? I hope Srikar will ack 1-4 soon. He observed the testing failures, but it turns out this series is innocent. I'll send more fixes soon. I'll skip #5 based on Oleg's request, Yes, thanks. I still think 5/5 makes sense, but we need to do something with uprobe.s:vma_address() first. Argh. I wrote this email because I wanted to say that I updated the changelog for 2/5 a little bit, but forgot to mention this. I'll send the updated patch in reply to 2/5 (once again, only the changelog was changed). But please let me know if you want me to resend 1-4. Once Srikar's ack is in then it would be nice to update the patches with the ack and resend #1-#4, to make sure I have all the intended patches and nothing more or less than that. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/16] sched: aggregate load contributed by task entities on parenting cfs_rq
* Peter Zijlstra pet...@infradead.org wrote: On Wed, 2012-07-04 at 17:28 +0200, Peter Zijlstra wrote: On Wed, 2012-06-27 at 19:24 -0700, Paul Turner wrote: For a given task t, we can compute its contribution to load as: task_load(t) = runnable_avg(t) * weight(t) On a parenting cfs_rq we can then aggregate runnable_load(cfs_rq) = \Sum task_load(t), for all runnable children t Maintain this bottom up, with task entities adding their contributed load to the parenting cfs_rq sum. When a task entities load changes we add the same delta to the maintained sum. Signed-off-by: Paul Turner p...@google.com Signed-off-by: Ben Segall bseg...@google.com A lot of patches have this funny sob trail.. Ben never send me these patches, so uhm. ? Should that be Reviewed-by, or what is the deal with those? Ben could you clarify what your exact contribution was? Ingo, supposing Ben is co-author and wrote a significant part of the patch, what are we supposed to do with these tags? There can be only one author SOB - additional contributions should be reflected via credits in the changelog, copyright lines and/or Originally-From: tags, or so. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: comment on pm tree commit
On 07/09/2012 01:54 PM, Rafael J. Wysocki wrote: On Monday, July 09, 2012, Stephen Rothwell wrote: Hi Rafael, I noticed commit b8eec56cd8e5 (PM / cpuidle: System resume hang fix with cpuidle) in the pm tree needs some work (I noticed it because it was changed in a rebase ...). diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index a6b3f2e..b90ccb2 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -146,6 +146,8 @@ extern void cpuidle_unregister_device(struct cpuidle_device *dev); extern void cpuidle_pause_and_lock(void); extern void cpuidle_resume_and_unlock(void); +extern void cpuidle_pause(void); +extern void cpuidle_resume(void); extern int cpuidle_enable_device(struct cpuidle_device *dev); extern void cpuidle_disable_device(struct cpuidle_device *dev); extern int cpuidle_wrap_enter(struct cpuidle_device *dev, @@ -169,6 +171,8 @@ static inline void cpuidle_unregister_device(struct cpuidle_device *dev) { } static inline void cpuidle_pause_and_lock(void) { } static inline void cpuidle_resume_and_unlock(void) { } +static inline cpuidle_pause(void) { } +static inline cpuidle_resume(void) { } These need to be static inline void. I wonder what review and build testing this went through (the above should produce warnings since they are non void returning functions with no return statements). Thanks for reporting this, I tried to fix a build issue in the original patch I apologise for not having taken care of the above build scenario. hastily and failed miserably as you have noticed and then I build-tested a wrong tree. Sorry. It should be fixed now for real. Thanks, Rafael Regards Preeti -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] staging: vt6655: Cleanup in usage of macros
2012/7/9 Joe Perches j...@perches.com On Sun, 2012-07-08 at 23:51 -0300, Marcos Paulo de Souza wrote: Hi kernel guys! This patchset aims to clean all unused and commented macros. For this challenge, forgotten-macros tool helped us. Perhaps there may be false positives in your code. Not in this case. After each changes in a file, I compiled all driver again. Many times, macros like the below are used: #define SUBSYSTEM_PREFIX_FOO1 #define SUBSYSTEM_PREFIX_BAR2 #define SUBSYSTEM_PREFIX_BAZ3 #define USE_TYPE(type) SUBSYSTEM_PREFIX_##type It doesn't seem your code knows that style. True! The tool is under development. A more robust method will be implemented, but for now, the tool can find the most easy dead macros. Also, the tool might be more flexible if it was written using perl or python. Yeah, I believe it's true! Thanks for the comments! But, for the changes, do I have your ack? Thanks Joe! -- Att, Marcos Paulo de Souza Acadêmico de Ciencia da Computação - FURB - SC Uma vida sem desafios é uma vida sem razão A life without challenges, is a non reason life -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 82571EB: Detected Hardware Unit Hang
On Mon, 2012-07-09 at 16:51 +0800, Joe Jin wrote: Hi list, I'm seeing a Unit Hang even with the latest e1000e driver 2.0.0 when doing scp test. this issue is easy do reproduced on SUN FIRE X2270 M2, just copy a big file (500M) from another server will hit it at once. Would you please help on this? Its a known problem. But apparently Intel guys are not very responsive, as they have another patch than the following : http://permalink.gmane.org/gmane.linux.network/232669 We only have to wait they push their alternative patch, eventually. In the mean time, you can use Hiroaki SHIMODA patch, it works. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] apic: fix kvm build on UP without IOAPIC
* Michael S. Tsirkin m...@redhat.com wrote: On Fri, Jul 06, 2012 at 04:12:23PM +0200, Ingo Molnar wrote: * Marcelo Tosatti mtosa...@redhat.com wrote: On Fri, Jul 06, 2012 at 01:13:14PM +0200, Ingo Molnar wrote: * H. Peter Anvin h...@zytor.com wrote: On 07/01/2012 08:05 AM, Michael S. Tsirkin wrote: On UP i386, when APIC is disabled # CONFIG_X86_UP_APIC is not set # CONFIG_PCI_IOAPIC is not set code looking at apicdrivers never has any effect but it still gets compiled in. In particular, this causes build failures with kvm, but it generally bloats the kernel unnecessarily. Fix by defining both __apicdrivers and __apicdrivers_end to be NULL when CONFIG_X86_LOCAL_APIC is unset: I verified that as the result any loop scanning __apicdrivers gets optimized out by the compiler. Warning: a .config with apic disabled doesn't seem to boot for me (even without this patch). Still verifying why, meanwhile this patch is compile-tested only. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Note: if this patch makes sense, can x86 maintainers please ACK applying it through the kvm tree, since that is where we see the issue that it addresses? Avi, Marcelo, maybe you can carry this in kvm/linux-next as a temporary measure so that linux-next builds? I'm not happy about that as a workflow, but since you guys have an immediate problem I guess we can do that. I'm rather unhappy about this workflow - we've got quite a few apic bits in the x86 tree this cycle as well and need extra external interaction, not. Which KVM tree commit caused this, could someone please give a lkml link or quote it here? It's not referenced in the fix patch either. Thanks, Ingo This tree (kvm.git next): http://git.kernel.org/?p=virt/kvm/kvm.git;a=shortlog;h=refs/heads/next Introduced by this commit: http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=ab9cf4996bb989983e73da894b8dd0239aa2c3c2 This bit: + if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) { + struct apic **drv; + + for (drv = __apicdrivers; drv __apicdrivers_end; drv++) { + /* Should happen once for each apic */ + WARN_ON((*drv)-eoi_write == kvm_guest_apic_eoi_write); + (*drv)-eoi_write = kvm_guest_apic_eoi_write; + } + } + is rather disgusting I have to say. WTH is the KVM code meddling with core x86 apic driver data structures directly? At minimum factor this out and create a proper apic.c function which is EXPORT_SYMBOL_GPL() exported or so... Thanks, Ingo OK, so apic_set_eoi_write()? Yes, with a changelog comment analyzing the design decisions and locking here - what happens if actual APIC driver use races with this update on SMP, why is it all safe, etc? Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 0/5] ARM: topology: set the capacity of each cores for big.LITTLE
This patchset creates an arch_scale_freq_power function for ARM, which is used to set the relative capacity of each core of a big.LITTLE system. It also removes the broken power estimation of x86. Modification since v3: - Add comments - Add optimization for SMP system - Ensure that capacity of a CPU will be at most 1 Modification since v2: - set_power_scale function becomes static - Rework loop in update_siblings_masks - Remove useless code in parse_dt_topology Modification since v1: - Add and update explanation about the use of the table and the range of the value - Remove the use of NR_CPUS and use nr_cpu_ids instead - Remove broken power estimation of x86 Peter Zijlstra (1): sched, x86: Remove broken power estimation Vincent Guittot (4): ARM: topology: Add arch_scale_freq_power function ARM: topology: factorize the update of sibling masks ARM: topology: Update cpu_power according to DT information sched: cpu_power: enable ARCH_POWER arch/arm/kernel/topology.c | 239 ++ arch/x86/kernel/cpu/Makefile |2 +- arch/x86/kernel/cpu/sched.c | 55 -- kernel/sched/features.h |2 +- 4 files changed, 219 insertions(+), 79 deletions(-) delete mode 100644 arch/x86/kernel/cpu/sched.c -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 1/5] ARM: topology: Add arch_scale_freq_power function
Add infrastructure to be able to modify the cpu_power of each core Signed-off-by: Vincent Guittot vincent.guit...@linaro.org Reviewed-by: Namhyung Kim namhy...@kernel.org --- arch/arm/kernel/topology.c | 38 +- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c index 8200dea..51f23b3 100644 --- a/arch/arm/kernel/topology.c +++ b/arch/arm/kernel/topology.c @@ -22,6 +22,37 @@ #include asm/cputype.h #include asm/topology.h +/* + * cpu power scale management + */ + +/* + * cpu power table + * This per cpu data structure describes the relative capacity of each core. + * On a heteregenous system, cores don't have the same computation capacity + * and we reflect that difference in the cpu_power field so the scheduler can + * take this difference into account during load balance. A per cpu structure + * is preferred because each CPU updates its own cpu_power field during the + * load balance except for idle cores. One idle core is selected to run the + * rebalance_domains for all idle cores and the cpu_power can be updated + * during this sequence. + */ +static DEFINE_PER_CPU(unsigned long, cpu_scale); + +unsigned long arch_scale_freq_power(struct sched_domain *sd, int cpu) +{ + return per_cpu(cpu_scale, cpu); +} + +static void set_power_scale(unsigned int cpu, unsigned long power) +{ + per_cpu(cpu_scale, cpu) = power; +} + +/* + * cpu topology management + */ + #define MPIDR_SMP_BITMASK (0x3 30) #define MPIDR_SMP_VALUE (0x2 30) @@ -41,6 +72,9 @@ #define MPIDR_LEVEL2_MASK 0xFF #define MPIDR_LEVEL2_SHIFT 16 +/* + * cpu topology table + */ struct cputopo_arm cpu_topology[NR_CPUS]; const struct cpumask *cpu_coregroup_mask(int cpu) @@ -134,7 +168,7 @@ void init_cpu_topology(void) { unsigned int cpu; - /* init core mask */ + /* init core mask and power*/ for_each_possible_cpu(cpu) { struct cputopo_arm *cpu_topo = (cpu_topology[cpu]); @@ -143,6 +177,8 @@ void init_cpu_topology(void) cpu_topo-socket_id = -1; cpumask_clear(cpu_topo-core_sibling); cpumask_clear(cpu_topo-thread_sibling); + + set_power_scale(cpu, SCHED_POWER_SCALE); } smp_wmb(); } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 2/5] ARM: topology: factorize the update of sibling masks
This factorization has also been proposed in another patch that has not been merged yet: http://lists.infradead.org/pipermail/linux-arm-kernel/2012-January/080873.html So, this patch could be dropped depending of the state of the other one. Signed-off-by: Lorenzo Pieralisi lorenzo.pieral...@arm.com Signed-off-by: Vincent Guittot vincent.guit...@linaro.org Reviewed-by: Namhyung Kim namhy...@kernel.org --- arch/arm/kernel/topology.c | 48 +--- 1 file changed, 27 insertions(+), 21 deletions(-) diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c index 51f23b3..eb5fc81 100644 --- a/arch/arm/kernel/topology.c +++ b/arch/arm/kernel/topology.c @@ -82,6 +82,32 @@ const struct cpumask *cpu_coregroup_mask(int cpu) return cpu_topology[cpu].core_sibling; } +void update_siblings_masks(unsigned int cpuid) +{ + struct cputopo_arm *cpu_topo, *cpuid_topo = cpu_topology[cpuid]; + int cpu; + + /* update core and thread sibling masks */ + for_each_possible_cpu(cpu) { + cpu_topo = cpu_topology[cpu]; + + if (cpuid_topo-socket_id != cpu_topo-socket_id) + continue; + + cpumask_set_cpu(cpuid, cpu_topo-core_sibling); + if (cpu != cpuid) + cpumask_set_cpu(cpu, cpuid_topo-core_sibling); + + if (cpuid_topo-core_id != cpu_topo-core_id) + continue; + + cpumask_set_cpu(cpuid, cpu_topo-thread_sibling); + if (cpu != cpuid) + cpumask_set_cpu(cpu, cpuid_topo-thread_sibling); + } + smp_wmb(); +} + /* * store_cpu_topology is called at boot when only one cpu is running * and with the mutex cpu_hotplug.lock locked, when several cpus have booted, @@ -91,7 +117,6 @@ void store_cpu_topology(unsigned int cpuid) { struct cputopo_arm *cpuid_topo = cpu_topology[cpuid]; unsigned int mpidr; - unsigned int cpu; /* If the cpu topology has been already set, just return */ if (cpuid_topo-core_id != -1) @@ -133,26 +158,7 @@ void store_cpu_topology(unsigned int cpuid) cpuid_topo-socket_id = -1; } - /* update core and thread sibling masks */ - for_each_possible_cpu(cpu) { - struct cputopo_arm *cpu_topo = cpu_topology[cpu]; - - if (cpuid_topo-socket_id == cpu_topo-socket_id) { - cpumask_set_cpu(cpuid, cpu_topo-core_sibling); - if (cpu != cpuid) - cpumask_set_cpu(cpu, - cpuid_topo-core_sibling); - - if (cpuid_topo-core_id == cpu_topo-core_id) { - cpumask_set_cpu(cpuid, - cpu_topo-thread_sibling); - if (cpu != cpuid) - cpumask_set_cpu(cpu, - cpuid_topo-thread_sibling); - } - } - } - smp_wmb(); + update_siblings_masks(cpuid); printk(KERN_INFO CPU%u: thread %d, cpu %d, socket %d, mpidr %x\n, cpuid, cpu_topology[cpuid].thread_id, -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 3/5] ARM: topology: Update cpu_power according to DT information
Use cpu compatibility field and clock-frequency field of DT to estimate the capacity of each core of the system and to update the cpu_power field accordingly. This patch enables to put more running tasks on big cores than on LITTLE ones. But this patch doesn't ensure that long running tasks will run on big cores and short ones on LITTLE cores. Signed-off-by: Vincent Guittot vincent.guit...@linaro.org Reviewed-by: Namhyung Kim namhy...@kernel.org --- arch/arm/kernel/topology.c | 153 1 file changed, 153 insertions(+) diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c index eb5fc81..198b084 100644 --- a/arch/arm/kernel/topology.c +++ b/arch/arm/kernel/topology.c @@ -17,7 +17,9 @@ #include linux/percpu.h #include linux/node.h #include linux/nodemask.h +#include linux/of.h #include linux/sched.h +#include linux/slab.h #include asm/cputype.h #include asm/topology.h @@ -49,6 +51,152 @@ static void set_power_scale(unsigned int cpu, unsigned long power) per_cpu(cpu_scale, cpu) = power; } +#ifdef CONFIG_OF +struct cpu_efficiency { + const char *compatible; + unsigned long efficiency; +}; + +/* + * Table of relative efficiency of each processors + * The efficiency value must fit in 20bit and the final + * cpu_scale value must be in the range + * 0 cpu_scale 3*SCHED_POWER_SCALE/2 + * in order to return at most 1 when DIV_ROUND_CLOSEST + * is used to compute the capacity of a CPU. + * Processors that are not defined in the table, + * use the default SCHED_POWER_SCALE value for cpu_scale. + */ +struct cpu_efficiency table_efficiency[] = { + {arm,cortex-a15, 3891}, + {arm,cortex-a7, 2048}, + {NULL, }, +}; + +struct cpu_capacity { + unsigned long hwid; + unsigned long capacity; +}; + +struct cpu_capacity *cpu_capacity; + +unsigned long middle_capacity = 1; + +/* + * Iterate all CPUs' descriptor in DT and compute the efficiency + * (as per table_efficiency). Also calculate a middle efficiency + * as close as possible to (max{eff_i} - min{eff_i}) / 2 + * This is later used to scale the cpu_power field such that an + * 'average' CPU is of middle power. Also see the comments near + * table_efficiency[] and update_cpu_power(). + */ +static void __init parse_dt_topology(void) +{ + struct cpu_efficiency *cpu_eff; + struct device_node *cn = NULL; + unsigned long min_capacity = (unsigned long)(-1); + unsigned long max_capacity = 0; + unsigned long capacity = 0; + int alloc_size, cpu = 0; + + alloc_size = nr_cpu_ids * sizeof(struct cpu_capacity); + cpu_capacity = (struct cpu_capacity *)kzalloc(alloc_size, GFP_NOWAIT); + + while ((cn = of_find_node_by_type(cn, cpu))) { + const u32 *rate, *reg; + int len; + + if (cpu = num_possible_cpus()) + break; + + for (cpu_eff = table_efficiency; cpu_eff-compatible; cpu_eff++) + if (of_device_is_compatible(cn, cpu_eff-compatible)) + break; + + if (cpu_eff-compatible == NULL) + continue; + + rate = of_get_property(cn, clock-frequency, len); + if (!rate || len != 4) { + pr_err(%s missing clock-frequency property\n, + cn-full_name); + continue; + } + + reg = of_get_property(cn, reg, len); + if (!reg || len != 4) { + pr_err(%s missing reg property\n, cn-full_name); + continue; + } + + capacity = ((be32_to_cpup(rate)) 20) * cpu_eff-efficiency; + + /* Save min capacity of the system */ + if (capacity min_capacity) + min_capacity = capacity; + + /* Save max capacity of the system */ + if (capacity max_capacity) + max_capacity = capacity; + + cpu_capacity[cpu].capacity = capacity; + cpu_capacity[cpu++].hwid = be32_to_cpup(reg); + } + + if (cpu num_possible_cpus()) + cpu_capacity[cpu].hwid = (unsigned long)(-1); + + /* If min and max capacities are equals, we bypass the update of the +* cpu_scale because all CPUs have the same capacity. Otherwise, we +* compute a middle_capacity factor that will ensure that the capacity +* of an 'average' CPU of the system will be as close as possible to +* SCHED_POWER_SCALE, which is the default value, but with the +* constraint explained near table_efficiency[]. +*/ + if (min_capacity == max_capacity) + cpu_capacity[0].hwid = (unsigned long)(-1); + else if (4*max_capacity (3*(max_capacity + min_capacity))) + middle_capacity = (min_capacity + max_capacity) +
[PATCH v4 4/5] sched, x86: Remove broken power estimation
From: Peter Zijlstra a.p.zijls...@chello.nl The x86 sched power implementation has been broken forever and gets in the way of other stuff, remove it. For archaeological interest, fixing this code would require dealing with the cross-cpu calling of these functions and more importantly, we need to filter idle time out of the a/m-perf stuff because the ratio will go down to 0 when idle, giving a 0 capacity which is not what we'd want. Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl Link: http://lkml.kernel.org/n/tip-wjjwelpti8f8k7i1pdnzm...@git.kernel.org --- arch/x86/kernel/cpu/Makefile |2 +- arch/x86/kernel/cpu/sched.c | 55 -- 2 files changed, 1 insertion(+), 56 deletions(-) delete mode 100644 arch/x86/kernel/cpu/sched.c diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile index 6ab6aa2..c598126 100644 --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -14,7 +14,7 @@ CFLAGS_common.o := $(nostackp) obj-y := intel_cacheinfo.o scattered.o topology.o obj-y += proc.o capflags.o powerflags.o common.o -obj-y += vmware.o hypervisor.o sched.o mshyperv.o +obj-y += vmware.o hypervisor.o mshyperv.o obj-y += rdrand.o obj-y += match.o diff --git a/arch/x86/kernel/cpu/sched.c b/arch/x86/kernel/cpu/sched.c deleted file mode 100644 index a640ae5..000 --- a/arch/x86/kernel/cpu/sched.c +++ /dev/null @@ -1,55 +0,0 @@ -#include linux/sched.h -#include linux/math64.h -#include linux/percpu.h -#include linux/irqflags.h - -#include asm/cpufeature.h -#include asm/processor.h - -#ifdef CONFIG_SMP - -static DEFINE_PER_CPU(struct aperfmperf, old_perf_sched); - -static unsigned long scale_aperfmperf(void) -{ - struct aperfmperf val, *old = __get_cpu_var(old_perf_sched); - unsigned long ratio, flags; - - local_irq_save(flags); - get_aperfmperf(val); - local_irq_restore(flags); - - ratio = calc_aperfmperf_ratio(old, val); - *old = val; - - return ratio; -} - -unsigned long arch_scale_freq_power(struct sched_domain *sd, int cpu) -{ - /* -* do aperf/mperf on the cpu level because it includes things -* like turbo mode, which are relevant to full cores. -*/ - if (boot_cpu_has(X86_FEATURE_APERFMPERF)) - return scale_aperfmperf(); - - /* -* maybe have something cpufreq here -*/ - - return default_scale_freq_power(sd, cpu); -} - -unsigned long arch_scale_smt_power(struct sched_domain *sd, int cpu) -{ - /* -* aperf/mperf already includes the smt gain -*/ - if (boot_cpu_has(X86_FEATURE_APERFMPERF)) - return SCHED_LOAD_SCALE; - - return default_scale_smt_power(sd, cpu); -} - -#endif -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 5/5] sched: cpu_power: enable ARCH_POWER
Heteregeneous ARM platform uses arch_scale_freq_power function to reflect the relative capacity of each core Signed-off-by: Vincent Guittot vincent.guit...@linaro.org --- kernel/sched/features.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/features.h b/kernel/sched/features.h index de00a48..d98ae90 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -42,7 +42,7 @@ SCHED_FEAT(CACHE_HOT_BUDDY, true) /* * Use arch dependent cpu power functions */ -SCHED_FEAT(ARCH_POWER, false) +SCHED_FEAT(ARCH_POWER, true) SCHED_FEAT(HRTICK, false) SCHED_FEAT(DOUBLE_TICK, false) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ext4: fix double quota opts show
Currently quota options showed twice, first time in _ext4_show_options() (inside main loop), and second time inside ext4_show_quota_options(). In my case it looks like follows: /dev/sdc /mnt ext4 rw,relatime,quota,usrquota,grpquota,data=ordered,usrquota,grpquota 0 0 Let's do not show quota's opts inside main loop, and let ext4_show_quota_options() does it primary job. Signed-off-by: Dmitry Monakhov dmonak...@openvz.org --- fs/ext4/super.c |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index eb7aa3e..84c7ba4 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1404,13 +1404,13 @@ static int clear_qf_name(struct super_block *sb, int qtype) #define MOPT_CLEAR_ERR 0x0010 #define MOPT_GTE0 0x0020 #ifdef CONFIG_QUOTA -#define MOPT_Q 0 -#define MOPT_QFMT 0x0040 +#define MOPT_Q 0x0040 +#define MOPT_QFMT 0x0080 #else #define MOPT_Q MOPT_NOSUPPORT #define MOPT_QFMT MOPT_NOSUPPORT #endif -#define MOPT_DATAJ 0x0080 +#define MOPT_DATAJ 0x0100 static const struct mount_opts { int token; @@ -1786,6 +1786,8 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb, (sbi-s_mount_opt m-mount_opt) != m-mount_opt) || (!want_set (sbi-s_mount_opt m-mount_opt))) continue; /* select Opt_noFoo vs Opt_Foo */ + if (m-flags (MOPT_Q | MOPT_QFMT)) + continue; /* will be handled in show_quota_options */ SEQ_OPTS_PRINT(%s, token2str(m-token)); } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ext4: fix hole punch failure when depth is greater than 0
On Thu, 5 Jul 2012, Ashish Sangwan wrote: Date: Thu, 5 Jul 2012 15:22:04 +0530 From: Ashish Sangwan ashishsangw...@gmail.com To: Lukáš Czerner lczer...@redhat.com Cc: sand...@redhat.com, Ted Tso ty...@mit.edu, linux-kernel@vger.kernel.org, linux-e...@vger.kernel.org, Namjae Jeon linkinj...@gmail.com Subject: Re: [PATCH v2] ext4: fix hole punch failure when depth is greater than 0 On Wed, Jul 4, 2012 at 11:03 PM, Lukáš Czerner lczer...@redhat.com wrote: So I've finally has some time to look at the patch and reproduce the problem. Thanks for noticing the problem, the patch seems good, though I have one question. Is the p_block setting really necessary ? I do not think so, but I might be missing something. Here is updated patch I've tested, bellow. AFAICS, p_block setting is necessary. As mentioned in the patch description, whether to continue removing extents or not is decided by the return value of function ext4_ext_more_to_rm() which checks 2 conditions: a) if there are no more indexes to process. b) if the number of entries are decreased in the header of depth -1. The p_block setting is important for condition b). In function ext4_ext_more_to_rm, there is this second check: if (le16_to_cpu(path-p_hdr-eh_entries) == path-p_block) return 0; If the value of p_block would not be correct, the above mentioned condition could become true while there are still blocks left to be removed. Ok, the code is not very clear, but now I can see it. p_block is actually misused here to store the number of indexes in the current node while diving into the tree. Then on the way up, we are checking that to see if the eh_entries changed or not (which is indicating that something has been freed deeper in the tree). That said, it makes sense to set it before the loop itself because we are actually skipping the path construction while diving into the tree since patch is already initialized and we're starting walking back from 'depth' up in this case. So the patch seems fine. Thanks for catching it and fixing it. You can add Reviewed-by: Lukas Czerner lczer...@redhat.com Note: there are some indent problems in your patch, like for example this: + path[k].p_block = + le16_to_cpu(path[k].p_hdr-eh_entries)+1; Before submitting the patch, I run checkpatch.pl with --strict option. It did'nt show any error or warning. Should I re-submit the patch with an extra tab before the second line? The call is yours. checkpatch.pl does not catch everything. Just look at how wrapping of long lines is done in the code, there are plenty of examples. Anyway, what do you think about the modification ? Also there is 1 modification missing from your patch. ext4_ext_drop_refs(path); kfree(path); + path = NULL; if (err == -EAGAIN) goto again; If path is not set to NULL, it will crash in xfstest #13. Ted has already reported it. Right, I've probably used the old patch as an example. Thanks! -Lukas Thanks, Ashish Thanks! -Lukas diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index b3300eb..94bc1bd 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -2570,7 +2570,7 @@ static int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start, { struct super_block *sb = inode-i_sb; int depth = ext_depth(inode); - struct ext4_ext_path *path; + struct ext4_ext_path *path = NULL; ext4_fsblk_t partial_cluster = 0; handle_t *handle; int i, err; @@ -2606,8 +2606,12 @@ again: } depth = ext_depth(inode); ex = path[depth].p_ext; - if (!ex) + if (!ex) { + ext4_ext_drop_refs(path); + kfree(path); + path = NULL; goto cont; + } ee_block = le32_to_cpu(ex-ee_block); @@ -2637,8 +2641,6 @@ again: if (err 0) goto out; } - ext4_ext_drop_refs(path); - kfree(path); } cont: @@ -2646,20 +2648,26 @@ cont: * We start scanning from right side, freeing all the blocks * after i_size and walking into the tree depth-wise. */ - depth = ext_depth(inode); - path = kzalloc(sizeof(struct ext4_ext_path) * (depth + 1), GFP_NOFS); - if (path == NULL) { - ext4_journal_stop(handle); - return -ENOMEM; - } - path[0].p_depth = depth; - path[0].p_hdr = ext_inode_hdr(inode); + if (path) + i = depth; + else { + depth = ext_depth(inode); + path = kzalloc(sizeof(struct ext4_ext_path) * (depth + 1), +
[tip:timers/urgent] hrtimer: Fix clock_was_set so it is safe to call from irq context
Commit-ID: adf374cf61394dd41c027c74a81f9bef90f7a640 Gitweb: http://git.kernel.org/tip/adf374cf61394dd41c027c74a81f9bef90f7a640 Author: John Stultz johns...@us.ibm.com AuthorDate: Thu, 5 Jul 2012 15:12:16 -0400 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Mon, 9 Jul 2012 11:35:38 +0200 hrtimer: Fix clock_was_set so it is safe to call from irq context NOTE: This is a prerequisite patch that's required to address the widely observed leap-second related futex/hrtimer issues. Currently clock_was_set() is unsafe to be called from irq context, as it calls on_each_cpu(). This causes problems when we need to adjust the time from update_wall_time(). To fix this, if clock_was_set is called when irqs are disabled, we schedule a timer to fire for immedately after we're out of interrupt context to then notify the hrtimer subsystem. Reported-by: Jan Engelhardt jeng...@inai.de Signed-off-by: John Stultz johns...@us.ibm.com Acked-by: Prarit Bhargava pra...@redhat.com CC: sta...@vger.kernel.org Link: http://lkml.kernel.org/r/1341515538-5100-2-git-send-email-johns...@us.ibm.com Signed-off-by: Thomas Gleixner t...@linutronix.de --- kernel/hrtimer.c | 17 - 1 files changed, 16 insertions(+), 1 deletions(-) diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index ae34bf5..d730678 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -746,7 +746,7 @@ static inline void retrigger_next_event(void *arg) { } * resolution timer interrupts. On UP we just disable interrupts and * call the high resolution interrupt code. */ -void clock_was_set(void) +static void do_clock_was_set(unsigned long data) { #ifdef CONFIG_HIGH_RES_TIMERS /* Retrigger the CPU local events everywhere */ @@ -755,6 +755,21 @@ void clock_was_set(void) timerfd_clock_was_set(); } +static DEFINE_TIMER(clock_was_set_timer, do_clock_was_set , 0, 0); + +void clock_was_set(void) +{ + /* +* We can't call on_each_cpu() from irq context, +* so if irqs are disabled , schedule the clock_was_set +* via a timer_list timer for right after. +*/ + if (irqs_disabled()) + mod_timer(clock_was_set_timer, jiffies); + else + do_clock_was_set(0); +} + /* * During resume we might have to reprogram the high resolution timer * interrupt (on the local CPU): -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] time: Fix leapsecond triggered hrtimer/ futex load spike issue
Commit-ID: bb88e92477def647976cd3d6964af98beceba900 Gitweb: http://git.kernel.org/tip/bb88e92477def647976cd3d6964af98beceba900 Author: John Stultz johns...@us.ibm.com AuthorDate: Thu, 5 Jul 2012 15:12:17 -0400 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Mon, 9 Jul 2012 11:35:38 +0200 time: Fix leapsecond triggered hrtimer/futex load spike issue As widely reported on the internet, some Linux systems after the leapsecond was inserted are experiencing futex related load spikes (usually connected to MySQL, Firefox, Thunderbird, Java, etc). An apparent for this issue workaround is running: $ date -s `date` Credit: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix I this issue is due to the leapsecond being added without calling clock_was_set() to notify the hrtimer subsystem of the change. The workaround functions as it forces a clock_was_set() call from settimeofday(). This fix adds the required clock_was_set() calls to where we adjust for leapseconds. NOTE: This fix *depends* on the previous fix, which allows clock_was_set to be called from atomic context. Do not try to apply just this patch. Reported-by: Jan Engelhardt jeng...@inai.de Signed-off-by: John Stultz johns...@us.ibm.com Acked-by: Prarit Bhargava pra...@redhat.com CC: sta...@vger.kernel.org Link: http://lkml.kernel.org/r/1341515538-5100-3-git-send-email-johns...@us.ibm.com Signed-off-by: Thomas Gleixner t...@linutronix.de --- kernel/time/timekeeping.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 6f46a00..cc2991d 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -963,6 +963,8 @@ static cycle_t logarithmic_accumulation(cycle_t offset, int shift) leap = second_overflow(timekeeper.xtime.tv_sec); timekeeper.xtime.tv_sec += leap; timekeeper.wall_to_monotonic.tv_sec -= leap; + if (leap) + clock_was_set(); } /* Accumulate raw time */ @@ -1079,6 +1081,8 @@ static void update_wall_time(void) leap = second_overflow(timekeeper.xtime.tv_sec); timekeeper.xtime.tv_sec += leap; timekeeper.wall_to_monotonic.tv_sec -= leap; + if (leap) + clock_was_set(); } timekeeping_update(false); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:timers/urgent] hrtimer: Update hrtimer base offsets each hrtimer_interrupt
Commit-ID: d7e2e7fef1f0f4e1e614353a6c5eef4e18d98d2d Gitweb: http://git.kernel.org/tip/d7e2e7fef1f0f4e1e614353a6c5eef4e18d98d2d Author: John Stultz johns...@us.ibm.com AuthorDate: Thu, 5 Jul 2012 15:12:18 -0400 Committer: Thomas Gleixner t...@linutronix.de CommitDate: Mon, 9 Jul 2012 11:35:38 +0200 hrtimer: Update hrtimer base offsets each hrtimer_interrupt This patch introduces a new funciton which captures the CLOCK_MONOTONIC time, along with the CLOCK_REALTIME and CLOCK_BOOTTIME offsets at the same moment. This new function is then used in place of ktime_get() when hrtimer_interrupt() is expiring timers. This ensures that any changes to realtime or boottime offsets are noticed and stored into the per-cpu hrtimer base structures, prior to doing any hrtimer expiration. This should ensure that timers are not expired early if the offsets changes under us. This is useful in the case where clock_was_set() is called from atomic context and have to schedule the hrtimer base offset update via a timer, as it provides extra robustness in the face of any possible timer delay. Signed-off-by: John Stultz johns...@us.ibm.com Acked-by: Prarit Bhargava pra...@redhat.com CC: sta...@vger.kernel.org Link: http://lkml.kernel.org/r/1341515538-5100-4-git-send-email-johns...@us.ibm.com Signed-off-by: Thomas Gleixner t...@linutronix.de --- include/linux/hrtimer.h |3 +++ kernel/hrtimer.c | 14 +++--- kernel/time/timekeeping.c | 34 ++ 3 files changed, 48 insertions(+), 3 deletions(-) diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index fd0dc30..f6b2a74 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -320,6 +320,9 @@ extern ktime_t ktime_get(void); extern ktime_t ktime_get_real(void); extern ktime_t ktime_get_boottime(void); extern ktime_t ktime_get_monotonic_offset(void); +extern void ktime_get_and_real_and_sleep_offset(ktime_t *monotonic, + ktime_t *real_offset, + ktime_t *sleep_offset); DECLARE_PER_CPU(struct tick_device, tick_cpu_device); diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index d730678..56600c4 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -1258,18 +1258,26 @@ static void __run_hrtimer(struct hrtimer *timer, ktime_t *now) void hrtimer_interrupt(struct clock_event_device *dev) { struct hrtimer_cpu_base *cpu_base = __get_cpu_var(hrtimer_bases); - ktime_t expires_next, now, entry_time, delta; + ktime_t expires_next, now, entry_time, delta, real_offset, sleep_offset; int i, retries = 0; BUG_ON(!cpu_base-hres_active); cpu_base-nr_events++; dev-next_event.tv64 = KTIME_MAX; - entry_time = now = ktime_get(); + + ktime_get_and_real_and_sleep_offset(now, real_offset, sleep_offset); + + entry_time = now; retry: expires_next.tv64 = KTIME_MAX; raw_spin_lock(cpu_base-lock); + + /* Update base offsets, to avoid early wakeups */ + cpu_base-clock_base[HRTIMER_BASE_REALTIME].offset = real_offset; + cpu_base-clock_base[HRTIMER_BASE_BOOTTIME].offset = sleep_offset; + /* * We set expires_next to KTIME_MAX here with cpu_base-lock * held to prevent that a timer is enqueued in our queue via @@ -1346,7 +1354,7 @@ retry: * interrupt routine. We give it 3 attempts to avoid * overreacting on some spurious event. */ - now = ktime_get(); + ktime_get_and_real_and_sleep_offset(now, real_offset, sleep_offset); cpu_base-nr_retries++; if (++retries 3) goto retry; diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index cc2991d..b3404cf 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1251,6 +1251,40 @@ void get_xtime_and_monotonic_and_sleep_offset(struct timespec *xtim, } /** + * ktime_get_and_real_and_sleep_offset() - hrtimer helper, gets monotonic ktime, + * realtime offset, and sleep offsets. + */ +void ktime_get_and_real_and_sleep_offset(ktime_t *monotonic, + ktime_t *real_offset, + ktime_t *sleep_offset) +{ + unsigned long seq; + struct timespec wtom, sleep; + u64 secs, nsecs; + + do { + seq = read_seqbegin(timekeeper.lock); + + secs = timekeeper.xtime.tv_sec + + timekeeper.wall_to_monotonic.tv_sec; + nsecs = timekeeper.xtime.tv_nsec + + timekeeper.wall_to_monotonic.tv_nsec; + nsecs += timekeeping_get_ns(); + /* If arch requires, add in gettimeoffset() */ + nsecs += arch_gettimeoffset(); + + wtom = timekeeper.wall_to_monotonic; + sleep = timekeeper.total_sleep_time; + }
[PATCH v2] usb: host: Fix possible kernel crash
In functions itd_complete sitd_complete, a pointer by name stream may get dereferenced after freeing it, when iso_stream_put is called with stream-refcount = 2. Hence fixing it. Signed-off-by: Venu Byravarasu vbyravar...@nvidia.com --- In Patchset 1, modified parameter of iso_stream_put() to handle the crash. However the crash can be handled without modifying the function parameter, by just adding a local variable in the functions that call iso_stream_put(). Hence implemented it in the current patch. drivers/usb/host/ehci-sched.c | 16 ++-- 1 files changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/usb/host/ehci-sched.c b/drivers/usb/host/ehci-sched.c index 33182c6..20d0c38 100644 --- a/drivers/usb/host/ehci-sched.c +++ b/drivers/usb/host/ehci-sched.c @@ -1715,6 +1715,7 @@ itd_complete ( struct ehci_iso_stream *stream = itd-stream; struct usb_device *dev; unsignedretval = false; + u32 stream_ref_count = 0; /* for each uframe with a packet */ for (uframe = 0; uframe 8; uframe++) { @@ -1783,7 +1784,8 @@ itd_complete ( dev-devpath, stream-bEndpointAddress 0x0f, (stream-bEndpointAddress USB_DIR_IN) ? in : out); } - iso_stream_put (ehci, stream); + stream_ref_count = stream-refcount; + iso_stream_put(ehci, stream); done: itd-urb = NULL; @@ -1797,7 +1799,7 @@ done: * Move it to a safe place until a new frame starts. */ list_move(itd-itd_list, ehci-cached_itd_list); - if (stream-refcount == 2) { + if (stream_ref_count == 3) { /* If iso_stream_put() were called here, stream * would be freed. Instead, just prevent reuse. */ @@ -1866,7 +1868,7 @@ done_not_linked: done: if (unlikely (status 0)) - iso_stream_put (ehci, stream); + iso_stream_put(ehci, stream); return status; } @@ -2127,6 +2129,7 @@ sitd_complete ( struct ehci_iso_stream *stream = sitd-stream; struct usb_device *dev; unsignedretval = false; + u32 stream_ref_count = 0; urb_index = sitd-index; desc = urb-iso_frame_desc [urb_index]; @@ -2179,7 +2182,8 @@ sitd_complete ( dev-devpath, stream-bEndpointAddress 0x0f, (stream-bEndpointAddress USB_DIR_IN) ? in : out); } - iso_stream_put (ehci, stream); + stream_ref_count = stream-refcount; + iso_stream_put(ehci, stream); done: sitd-urb = NULL; @@ -2193,7 +2197,7 @@ done: * Move it to a safe place until a new frame starts. */ list_move(sitd-sitd_list, ehci-cached_sitd_list); - if (stream-refcount == 2) { + if (stream_ref_count == 3) { /* If iso_stream_put() were called here, stream * would be freed. Instead, just prevent reuse. */ @@ -2259,7 +2263,7 @@ done_not_linked: done: if (status 0) - iso_stream_put (ehci, stream); + iso_stream_put(ehci, stream); return status; } -- 1.7.1.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V3 0/3] i2c-nomadik changes
On Mon, Jun 11, 2012 at 11:37:15PM +0200, Linus Walleij wrote: On Mon, Jun 11, 2012 at 10:56 PM, Alessandro Rubini rub...@gnudd.com wrote: V3: - fixed according to Linusw feedback (merged the patch he posted) - added Tested-by: Linusw on his permission I'm happy with this version, Wolfram if it looks OK to you, can you please merge this into the I2C tree? Done now, thanks. Is the latest version of Lee's patches for DT sent on 20/06 suitable to go on top of that? Regards, Wolfram -- Pengutronix e.K. | Wolfram Sang| Industrial Linux Solutions | http://www.pengutronix.de/ | signature.asc Description: Digital signature
Re: [PATCH RESEND v7 1/2] block: ioctl support for sanitize in eMMC 4.5
On 28 June 2012 14:02, Yaniv Gardi yga...@codeaurora.org wrote: Adding a new ioctl to support sanitize operation in eMMC cards version 4.5. The sanitize ioctl support helps performing this operation via user application. Signed-off-by: Yaniv Gardi yga...@codeaurora.org --- block/blk-core.c | 15 ++-- block/blk-lib.c | 51 + block/blk-merge.c |4 +++ block/elevator.c |2 +- block/ioctl.c |9 include/linux/blk_types.h |5 +++- include/linux/blkdev.h|3 ++ include/linux/fs.h|1 + kernel/trace/blktrace.c |2 + 9 files changed, 87 insertions(+), 5 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 3c923a7..4a56102b 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1641,7 +1641,7 @@ generic_make_request_checks(struct bio *bio) goto end_io; } - if (unlikely(!(bio-bi_rw REQ_DISCARD) + if (unlikely(!(bio-bi_rw (REQ_DISCARD | REQ_SANITIZE)) nr_sectors queue_max_hw_sectors(q))) { printk(KERN_ERR bio too big device %s (%u %u)\n, bdevname(bio-bi_bdev, b), @@ -1689,6 +1689,14 @@ generic_make_request_checks(struct bio *bio) goto end_io; } + if ((bio-bi_rw REQ_SANITIZE) + (!blk_queue_sanitize(q))) { + pr_info(%s - got a SANITIZE request but the queue + doesn't support sanitize requests, __func__); + err = -EOPNOTSUPP; + goto end_io; + } + if (blk_throtl_bio(q, bio)) return false; /* throttled, will be resubmitted later */ @@ -1794,7 +1802,8 @@ void submit_bio(int rw, struct bio *bio) * If it's a regular read/write or a barrier with data attached, * go through the normal accounting stuff before submission. */ - if (bio_has_data(bio) !(rw REQ_DISCARD)) { + if (bio_has_data(bio) + (!(rw (REQ_DISCARD | REQ_SANITIZE { if (rw WRITE) { count_vm_events(PGPGOUT, count); } else { @@ -1840,7 +1849,7 @@ EXPORT_SYMBOL(submit_bio); */ int blk_rq_check_limits(struct request_queue *q, struct request *rq) { - if (rq-cmd_flags REQ_DISCARD) + if (rq-cmd_flags (REQ_DISCARD | REQ_SANITIZE)) return 0; if (blk_rq_sectors(rq) queue_max_sectors(q) || diff --git a/block/blk-lib.c b/block/blk-lib.c index 2b461b4..280d63e 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -115,6 +115,57 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, EXPORT_SYMBOL(blkdev_issue_discard); /** + * blkdev_issue_sanitize - queue a sanitize request + * @bdev: blockdev to issue sanitize for + * @gfp_mask: memory allocation flags (for bio_alloc) + * + * Description: + *Issue a sanitize request for the specified block device + */ +int blkdev_issue_sanitize(struct block_device *bdev, gfp_t gfp_mask) +{ + DECLARE_COMPLETION_ONSTACK(wait); + struct request_queue *q = bdev_get_queue(bdev); + int type = REQ_WRITE | REQ_SANITIZE; + struct bio_batch bb; + struct bio *bio; + int ret = 0; + + if (!q) + return -ENXIO; + + if (!blk_queue_sanitize(q)) { + pr_err(%s - card doesn't support sanitize, __func__); + return -EOPNOTSUPP; + } + + bio = bio_alloc(gfp_mask, 1); + if (!bio) + return -ENOMEM; + + atomic_set(bb.done, 1); + bb.flags = 1 BIO_UPTODATE; + bb.wait = wait; + + bio-bi_end_io = bio_batch_end_io; + bio-bi_bdev = bdev; + bio-bi_private = bb; + + atomic_inc(bb.done); + submit_bio(type, bio); + + /* Wait for bios in-flight */ + if (!atomic_dec_and_test(bb.done)) + wait_for_completion(wait); + + if (!test_bit(BIO_UPTODATE, bb.flags)) + ret = -EIO; + + return ret; +} +EXPORT_SYMBOL(blkdev_issue_sanitize); + +/** * blkdev_issue_zeroout - generate number of zero filed write bios * @bdev: blockdev to issue * @sector:start sector diff --git a/block/blk-merge.c b/block/blk-merge.c index 160035f..7e24772 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -477,6 +477,10 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio) if (!rq_mergeable(rq)) return false; + /* don't merge file system requests and sanitize requests */ + if ((req-cmd_flags REQ_SANITIZE) != (next-cmd_flags REQ_SANITIZE)) + return false; also the next data structure is not available in this function. + /* don't merge file system requests and discard requests */
Re: [PATCH 04/16] mm: allow PF_MEMALLOC from softirq context
On Sun, Jul 08, 2012 at 08:12:11PM +0200, Sebastian Andrzej Siewior wrote: On Wed, Jun 27, 2012 at 09:26:14AM +0100, Mel Gorman wrote: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b6c0727..5c6d9c6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2265,7 +2265,11 @@ gfp_to_alloc_flags(gfp_t gfp_mask) if (likely(!(gfp_mask __GFP_NOMEMALLOC))) { if (gfp_mask __GFP_MEMALLOC) alloc_flags |= ALLOC_NO_WATERMARKS; - else if (likely(!(gfp_mask __GFP_NOMEMALLOC)) !in_interrupt()) + else if (in_serving_softirq() (current-flags PF_MEMALLOC)) + alloc_flags |= ALLOC_NO_WATERMARKS; + else if (!in_interrupt() + ((current-flags PF_MEMALLOC) || + unlikely(test_thread_flag(TIF_MEMDIE alloc_flags |= ALLOC_NO_WATERMARKS; } You allocate in RX path with __GFP_MEMALLOC and your sk-sk_allocation has also __GFP_MEMALLOC set. That means you should get ALLOC_NO_WATERMARKS in alloc_flags. In the cases where they are annotated correctly, yes. It is recordeed if the page gets allocated from the PFMEMALLOC reserves. If the received packet is not SOCK_MEMALLOC and the page was allocated from PFMEMALLOC reserves it is then discarded and the packet must be retransmitted. Let me try again: - lets assume your allocation happens with alloc_page(), without __GFP_MEMALLOC in GFP_FLAGS and with PF_MEMALLOC in current-flags. Now you may get memory which you wouldn't receive otherwise (without PF_MEMALLOC). Okay, understood. So you don't have to annotate each page allocation in your receive path for instance as long as the process has the flag set. Yes. - lets assume your allocation happens with kmalloc() without __GFP_MEMALLOC and current-flags has PF_MEMALLOC ORed and your SLAB pool is empty. This forces SLAB to allocate more pages from the buddy allocator with it will receive more likely (due to -current-flags + PF_MEMALLOC) but SLAB will drop this extra memory because the page has -pf_memory (or something like that) set and the GFP_FLAGS do not have __GFP_MEMALLOC set. It's recorded if the slab page was allocated from PFMEMALLOC reserves (see patch 2 from the swap over NBD series). slab will use this page for objects but only allocate them to callers that pass a gfp_pfmemalloc_allowed() check. kmalloc() users with either __GFP_MEMALLOC or PF_MEMALLOC will get the pages they need but they will not leak to !_GFP_MEMALLOC users as that would potentially deadlock. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Resend with Ack][PATCH v1] PCI: allow acpiphp to handle PCIe ports without native PCIe hotplug capability
Hi Bjorn and Yinghai, What's the policy to export a symbol by EXPORT_SYMBOL() or EXPORT_SYMBOL_GPL()? I know the legal difference, but don't know when I should mark a symbol as GPL. Thanks! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/