date:20120709

From: Raghavendra K T 

Currently PLE handler can repeatedly do a directed yield to same vcpu
that has recently done PL exit. This can degrade the performance
Try to yield to most eligible guy instead, by alternate yielding.

Precisely, give chance to a VCPU which has:
 (a) Not done PLE exit at all (probably he is preempted lock-holder)
 (b) VCPU skipped in last iteration because it did PL exit, and probably
 has become eligible now (next eligible lock holder)

Signed-off-by: Raghavendra K T 
---
 arch/s390/include/asm/kvm_host.h |5 +
 arch/x86/include/asm/kvm_host.h  |2 +-
 arch/x86/kvm/x86.c   |   14 ++
 virt/kvm/kvm_main.c  |3 +++
 4 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index dd17537..884f2c4 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -256,5 +256,10 @@ struct kvm_arch{
struct gmap *gmap;
 };
 
+static inline bool kvm_arch_vcpu_check_and_update_eligible(struct kvm_vcpu *v)
+{
+   return true;
+}
+
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 857ca68..ce01db3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -962,7 +962,7 @@ extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, 
gfn_t gfn);
 void kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
 
 int kvm_is_in_guest(void);
-
+bool kvm_arch_vcpu_check_and_update_eligible(struct kvm_vcpu *vcpu);
 void kvm_pmu_init(struct kvm_vcpu *vcpu);
 void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
 void kvm_pmu_reset(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 07dbd14..24ceae8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6623,6 +6623,20 @@ bool kvm_arch_can_inject_async_page_present(struct 
kvm_vcpu *vcpu)
kvm_x86_ops->interrupt_allowed(vcpu);
 }
 
+bool kvm_arch_vcpu_check_and_update_eligible(struct kvm_vcpu *vcpu)
+{
+   bool eligible;
+
+   eligible = !vcpu->arch.plo.pause_loop_exited ||
+   (vcpu->arch.plo.pause_loop_exited &&
+vcpu->arch.plo.dy_eligible);
+
+   if (vcpu->arch.plo.pause_loop_exited)
+   vcpu->arch.plo.dy_eligible = !vcpu->arch.plo.dy_eligible;
+
+   return eligible;
+}
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7e14068..519321a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1595,6 +1595,9 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
continue;
if (waitqueue_active(>wq))
continue;
+   if (!kvm_arch_vcpu_check_and_update_eligible(vcpu)) {
+   continue;
+   }
if (kvm_vcpu_yield_to(vcpu)) {
kvm->last_boosted_vcpu = i;
yielded = 1;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC 0/2] kvm: Improving directed yield in PLE handler


Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering while choosing
the candidate to yield_to, we can do better.

Problem is, for large vcpu guests, we have more probability of yielding
to a bad vcpu. We are not able to prevent directed yield to same guy who
has done PL exit recently, who perhaps spins again and wastes CPU.

Fix that by keeping track of who has done PL exit. So The Algorithm in series
give chance to a VCPU which has:

 (a) Not done PLE exit at all (probably he is preempted lock-holder)

 (b) VCPU skipped in last iteration because it did PL exit, and probably
 has become eligible now (next eligible lock holder)

Future enhancemnets:
  (1) Currently we have a boolean to decide on eligibility of vcpu. It
would be nice if I get feedback on guest (>32 vcpu) whether we can
improve better with integer counter. (with counter = say f(log n )).
  
  (2) We have not considered system load during iteration of vcpu. With
   that information we can limit the scan and also decide whether schedule()
   is better. [ I am able to use #kicked vcpus to decide on this But may
   be there are better ideas like information from global loadavg.]

  (3) We can exploit this further with PV patches since it also knows about
   next eligible lock-holder.

Summary: There is a huge improvement for moderate / no overcommit scenario
 for kvm based guest on PLE machine (which is difficult ;) ).

Result:
Base : kernel 3.5.0-rc5 with Rik's Ple handler fix

Machine : Intel(R) Xeon(R) CPU X7560  @ 2.27GHz, 4 numa node, 256GB RAM,
  32 core machine

Host: enterprise linux  gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC)
  with test kernels 

Guest: fedora 16 with 32 vcpus 8GB memory. 

Benchmarks:
1) kernbench: kernbench-0.5 (kernbench -f -H -M -o 2*vcpu)
Very first run in kernbench is omitted.

2) sysbench: 0.4.12
sysbench --test=oltp --db-driver=pgsql prepare
sysbench --num-threads=2*vcpu --max-requests=10 --test=oltp 
--oltp-table-size=50 --db-driver=pgsql --oltp-read-only run
Note that driver for this pgsql.

3) ebizzy: release 0.3
cmd: ebizzy -S 120 

  1) kernbench (time in sec lesser is better)
+---+---+---++---+
   base_rikstdev   patched  stdev   %improve
+---+---+---++---+
1x  49.2300 1.0171  38.3792 1.3659 28.27261%
2x  91.9358 1.7768  85.8842 1.6654  7.04623%
+---+---+---++---+

  2) sysbench (time in sec lesser is better)
+---+---+---++---+
   base_rikstdev   patched  stdev   %improve
+---+---+---++---+
1x  12.1623 0.0942  12.1674 0.3126-0.04192%
2x  14.3069 0.8520  14.1879 0.6811 0.83874%
+---+---+---++---+

Note that 1x scenario differs in only third decimal place and
degradation/improvemnet for sysbench will not be seen even with
higher confidence interval.


  3) ebizzy (records/sec more is better)
+---+---+---++---+
   base_rikstdev   patched  stdev   %improve
+---+---+---++---+
1x  1129.2500  28.67932316.625053.0066 105.14722%
2x  1892.3750  75.11122386.5000   168.8033  26.11137%
+---+---+---++---+

kernbench 1x: 4 fast runs = 12 runs avg
kernbench 2x: 4 fast runs = 12 runs avg

sysbench 1x: 8runs avg
sysbench 2x: 8runs avg

ebizzy 1x: 8runs avg
ebizzy 2x: 8runs avg

Thanks Vatsa and Srikar for brainstorming discussions regarding
optimizations.

 Raghavendra K T (2):
   kvm vcpu: Note down pause loop exit
   kvm PLE handler: Choose better candidate for directed yield

 arch/s390/include/asm/kvm_host.h |5 +
 arch/x86/include/asm/kvm_host.h  |9 -
 arch/x86/kvm/svm.c   |1 +
 arch/x86/kvm/vmx.c   |1 +
 arch/x86/kvm/x86.c   |   18 +-
 virt/kvm/kvm_main.c  |3 +++
 6 files changed, 35 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC 1/2] kvm vcpu: Note down pause loop exit

Signed-off-by: Raghavendra K T 

Noting pause loop exited vcpu helps in filtering right candidate to yield.
Yielding to same vcpu may result in more wastage of cpu.

From: Raghavendra K T 
---
 arch/x86/include/asm/kvm_host.h |7 +++
 arch/x86/kvm/svm.c  |1 +
 arch/x86/kvm/vmx.c  |1 +
 arch/x86/kvm/x86.c  |4 +++-
 4 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index db7c1f2..857ca68 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -484,6 +484,13 @@ struct kvm_vcpu_arch {
u64 length;
u64 status;
} osvw;
+
+   /* Pause loop exit optimization */
+   struct {
+   bool pause_loop_exited;
+   bool dy_eligible;
+   } plo;
+
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f75af40..a492f5d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3264,6 +3264,7 @@ static int interrupt_window_interception(struct vcpu_svm 
*svm)
 
 static int pause_interception(struct vcpu_svm *svm)
 {
+   svm->vcpu.arch.plo.pause_loop_exited = true;
kvm_vcpu_on_spin(&(svm->vcpu));
return 1;
 }
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 32eb588..600fb3c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4945,6 +4945,7 @@ out:
 static int handle_pause(struct kvm_vcpu *vcpu)
 {
skip_emulated_instruction(vcpu);
+   vcpu->arch.plo.pause_loop_exited = true;
kvm_vcpu_on_spin(vcpu);
 
return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index be6d549..07dbd14 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5331,7 +5331,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
if (req_immediate_exit)
smp_send_reschedule(vcpu->cpu);
-
+   vcpu->arch.plo.pause_loop_exited = false;
kvm_guest_enter();
 
if (unlikely(vcpu->arch.switch_db_regs)) {
@@ -6168,6 +6168,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
BUG_ON(vcpu->kvm == NULL);
kvm = vcpu->kvm;
 
+   vcpu->arch.plo.pause_loop_exited = false;
+   vcpu->arch.plo.dy_eligible = true;
vcpu->arch.emulate_ctxt.ops = _ops;
if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu))
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pwm-backlight: add regulator and GPIO support

2012-07-09 Thread Alex Courbot


On 07/09/2012 02:19 PM, Jingoo Han wrote:

I couldn't agree with Stephen Warren more.
Could you support DT and non-DT case for backwards compatibility?


Both cases are handled in the new version I just sent. I hope all other 
concerns have also been addressed properly. If I forgot something please 
ping me.


Alex.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH V2 1/3] power sequences interpreter for device tree

Some device drivers (panel backlights especially) need to follow precise
sequences for powering on and off, involving gpios, regulators, PWMs
with a precise powering order and delays to respect between each steps.
These sequences are board-specific, and do not belong to a particular
driver - therefore they have been performed by board-specific hook
functions to far.

With the advent of the device tree, we cannot rely of board-specific
hooks anymore, but still need a way to implement these sequences in a
portable manner. This patch introduces a simple interpreter that can
execute such power sequences encoded either as platform data or within
the device tree.

Signed-off-by: Alexandre Courbot 
---
 drivers/video/backlight/Makefile|   2 +-
 drivers/video/backlight/power_seq.c | 298 
 drivers/video/backlight/pwm_bl.c|   3 +-
 include/linux/power_seq.h   |  96 
 4 files changed, 397 insertions(+), 2 deletions(-)
 create mode 100644 drivers/video/backlight/power_seq.c
 create mode 100644 include/linux/power_seq.h

diff --git a/drivers/video/backlight/Makefile b/drivers/video/backlight/Makefile
index a2ac9cf..6bff124 100644
--- a/drivers/video/backlight/Makefile
+++ b/drivers/video/backlight/Makefile
@@ -28,7 +28,7 @@ obj-$(CONFIG_BACKLIGHT_OMAP1) += omap1_bl.o
 obj-$(CONFIG_BACKLIGHT_PANDORA)+= pandora_bl.o
 obj-$(CONFIG_BACKLIGHT_PROGEAR) += progear_bl.o
 obj-$(CONFIG_BACKLIGHT_CARILLO_RANCH) += cr_bllcd.o
-obj-$(CONFIG_BACKLIGHT_PWM)+= pwm_bl.o
+obj-$(CONFIG_BACKLIGHT_PWM)+= pwm_bl.o power_seq.o
 obj-$(CONFIG_BACKLIGHT_DA903X) += da903x_bl.o
 obj-$(CONFIG_BACKLIGHT_DA9052) += da9052_bl.o
 obj-$(CONFIG_BACKLIGHT_MAX8925)+= max8925_bl.o
diff --git a/drivers/video/backlight/power_seq.c 
b/drivers/video/backlight/power_seq.c
new file mode 100644
index 000..f54cb7d
--- /dev/null
+++ b/drivers/video/backlight/power_seq.c
@@ -0,0 +1,298 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PWM_SEQ_TYPE(type) [POWER_SEQ_ ## type] = #type
+static const char *pwm_seq_types[] = {
+   PWM_SEQ_TYPE(STOP),
+   PWM_SEQ_TYPE(DELAY),
+   PWM_SEQ_TYPE(REGULATOR),
+   PWM_SEQ_TYPE(PWM),
+   PWM_SEQ_TYPE(GPIO),
+};
+#undef PWM_SEQ_TYPE
+
+static bool power_seq_step_run(struct power_seq_step *step)
+{
+   switch (step->type) {
+   case POWER_SEQ_DELAY:
+   msleep(step->parameter);
+   break;
+   case POWER_SEQ_REGULATOR:
+   if (step->parameter)
+   regulator_enable(step->resource->regulator);
+   else
+   regulator_disable(step->resource->regulator);
+   break;
+   case POWER_SEQ_PWM:
+   if (step->parameter)
+   pwm_enable(step->resource->pwm);
+   else
+   pwm_disable(step->resource->pwm);
+   break;
+   case POWER_SEQ_GPIO:
+   gpio_set_value_cansleep(step->resource->gpio, step->parameter);
+   break;
+   /* should never happen since we verify the data when building it */
+   default:
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+int power_seq_run(power_seq *seq)
+{
+   int err;
+
+   if (!seq) return 0;
+
+   while (seq->type != POWER_SEQ_STOP) {
+   if ((err = power_seq_step_run(seq++))) {
+   return err;
+   }
+   }
+
+   return 0;
+}
+
+static int of_parse_power_seq_step(struct device *dev, struct property *prop,
+  struct platform_power_seq_step *seq,
+  int max_steps)
+{
+   void *value = prop->value;
+   void *end = prop->value + prop->length;
+   int slen, smax, cpt = 0, i, ret;
+   char tmp_buf[32];
+
+   while (value < end && cpt < max_steps) {
+   smax = value - end;
+   slen = strnlen(value, end - value);
+
+   /* Unterminated string / not a string? */
+   if (slen >= end - value)
+   goto invalid_seq;
+
+   /* Find a matching sequence step type */
+   for (i = 0; i < POWER_SEQ_MAX; i++)
+   if (!strcmp(value, pwm_seq_types[i]))
+   break;
+
+   if (i >= POWER_SEQ_MAX)
+   goto unknown_step;
+
+   value += slen + 1;
+
+   seq[cpt].type = i;
+   switch (seq[cpt].type) {
+   case POWER_SEQ_DELAY:
+   /* integer parameter */
+   seq[cpt].parameter = be32_to_cpup(value);
+   value += sizeof(__be32);
+   break;
+   case POWER_SEQ_REGULATOR:
+   case POWER_SEQ_PWM:
+   case POWER_SEQ_GPIO:
+   /* consumer string */
+

[RFC][PATCH V2 3/3] tegra: add pwm backlight device tree nodes

Signed-off-by: Alexandre Courbot 
---
 arch/arm/boot/dts/tegra20-ventana.dts | 31 +++
 arch/arm/boot/dts/tegra20.dtsi|  2 +-
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/tegra20-ventana.dts 
b/arch/arm/boot/dts/tegra20-ventana.dts
index be90544..c67d9e1 100644
--- a/arch/arm/boot/dts/tegra20-ventana.dts
+++ b/arch/arm/boot/dts/tegra20-ventana.dts
@@ -317,6 +317,37 @@
bus-width = <8>;
};
 
+   backlight {
+   compatible = "pwm-backlight";
+   brightness-levels = <0 16 32 48 64 80 96 112 128 144 160 176 
192 208 224 240 255>;
+   default-brightness-level = <12>;
+
+   pwms = < 2 500>;
+   pwm-names = "backlight";
+   power-supply = <_reg>;
+   enable-gpios = < 28 0>;
+
+   power-on-sequence = "REGULATOR", "power", <1>,
+   "DELAY", <10>,
+   "PWM", "backlight", <1>,
+   "GPIO", "enable", <1>;
+   power-off-sequence = "GPIO", "enable", <0>,
+"PWM", "backlight", <0>,
+"DELAY", <10>,
+"REGULATOR", "power", <0>;
+   };
+
+   backlight_reg: fixedregulator@176 {
+   compatible = "regulator-fixed";
+   regulator-name = "backlight_regulator";
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   gpio = < 176 0>;
+   startup-delay-us = <0>;
+   enable-active-high;
+   regulator-boot-off;
+   };
+
sound {
compatible = "nvidia,tegra-audio-wm8903-ventana",
 "nvidia,tegra-audio-wm8903";
diff --git a/arch/arm/boot/dts/tegra20.dtsi b/arch/arm/boot/dts/tegra20.dtsi
index 405d167..67a6cd9 100644
--- a/arch/arm/boot/dts/tegra20.dtsi
+++ b/arch/arm/boot/dts/tegra20.dtsi
@@ -123,7 +123,7 @@
status = "disabled";
};
 
-   pwm {
+   pwm: pwm {
compatible = "nvidia,tegra20-pwm";
reg = <0x7000a000 0x100>;
#pwm-cells = <2>;
-- 
1.7.11.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH V2 2/3] pwm_backlight: use power sequences

Make use of the power sequences specified in the device tree or platform
data, if any.

Signed-off-by: Alexandre Courbot 
---
 .../bindings/video/backlight/pwm-backlight.txt |  28 ++-
 drivers/video/backlight/power_seq.c|  44 ++---
 drivers/video/backlight/pwm_bl.c   | 210 +++--
 include/linux/pwm_backlight.h  |  37 +++-
 4 files changed, 239 insertions(+), 80 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt 
b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
index 1e4fc72..86c9253 100644
--- a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
+++ b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
@@ -2,7 +2,10 @@ pwm-backlight bindings
 
 Required properties:
   - compatible: "pwm-backlight"
-  - pwms: OF device-tree PWM specification (see PWM binding[0])
+  - pwms: OF device-tree PWM specification (see PWM binding[0]). Exactly one 
PWM
+  must be specified
+  - pwm-names: a list of names for the PWM devices specified in the
+  "pwms" property (see PWM binding[0])
   - brightness-levels: Array of distinct brightness levels. Typically these
   are in the range from 0 to 255, but any range starting at 0 will do.
   The actual brightness level (PWM duty cycle) will be interpolated
@@ -10,10 +13,18 @@ Required properties:
   last value in the array represents a 100% duty cycle (brightest).
   - default-brightness-level: the default brightness level (index into the
   array defined by the "brightness-levels" property)
+  - power-on-sequence: Power sequence that will bring the backlight on. This
+  sequence must reference the PWM specified in the pwms property by its
+  name. It can also reference extra GPIOs or regulators, and introduce
+  delays between sequence steps
+  - power-off-sequence: Power sequence that will bring the backlight off. This
+  sequence must reference the PWM specified in the pwms property by its
+  name. It can also reference extra GPIOs or regulators, and introduce
+  delays between sequence steps
 
 Optional properties:
-  - pwm-names: a list of names for the PWM devices specified in the
-   "pwms" property (see PWM binding[0])
+  - *-supply: a reference to a regulator used within a power sequence
+  - *-gpios: a reference to a GPIO used within a power sequence.
 
 [0]: Documentation/devicetree/bindings/pwm/pwm.txt
 
@@ -22,7 +33,18 @@ Example:
backlight {
compatible = "pwm-backlight";
pwms = < 0 500>;
+   pwm-names = "backlight";
 
brightness-levels = <0 4 8 16 32 64 128 255>;
default-brightness-level = <6>;
+   power-supply = <_reg>;
+   enable-gpios = < 6 0>;
+   power-on-sequence = "REGULATOR", "power", <1>,
+   "DELAY", <10>,
+   "PWM", "backlight", <1>,
+   "GPIO", "enable", <1>;
+   power-off-sequence = "GPIO", "enable", <0>,
+"PWM", "backlight", <0>,
+"DELAY", <10>,
+"REGULATOR", "power", <0>;
};
diff --git a/drivers/video/backlight/power_seq.c 
b/drivers/video/backlight/power_seq.c
index f54cb7d..f8737db 100644
--- a/drivers/video/backlight/power_seq.c
+++ b/drivers/video/backlight/power_seq.c
@@ -118,9 +118,9 @@ static int of_parse_power_seq_step(struct device *dev, 
struct property *prop,
tmp_buf[sizeof(tmp_buf) - 6] = 0;
strcat(tmp_buf, "-gpios");
ret = of_get_named_gpio(dev->of_node, tmp_buf, 0);
-   if (ret >= 0)
+   if (ret >= 0) {
seq[cpt].value = ret;
-   else {
+   } else {
if (ret != -EPROBE_DEFER)
dev_err(dev, "cannot get gpio \"%s\"\n",
seq[cpt].id);
@@ -218,26 +218,26 @@ power_seq *power_seq_build(struct device *dev, 
power_seq_resources *ress,
seq->type = pseq->type;
 
switch (pseq->type) {
-   case POWER_SEQ_REGULATOR:
-   case POWER_SEQ_GPIO:
-   case POWER_SEQ_PWM:
-   if (!(res = power_seq_find_resource(ress, 
pseq))) {
-   /* create resource node */
-   res = devm_kzalloc(dev, sizeof(*res),
-  GFP_KERNEL);
-   if (!res)
-   return ERR_PTR(-ENOMEM);
-

[RFC][PATCHv2 0/3] Power sequences interpreter for pwm_backlight

This is a RFC since this patch largely drifted beyond its original goal
of supporting one GPIO and one regulator for the pwm_backlight driver.

The issue to address is that backlight power sequences, which were
implemented using board-specific callbacks so far, could not be used with
the device tree. This series of patches adds a small power sequence 
interpreter that allows to acquire and control regulators, GPIOs, and PWMs
during sequences defined in the device tree. It is easy to use,
low-footprint, and takes care of managing the resources that it acquires.

The implementation is working and should be complete, but documentation is
lacking. Also since the interpreter could be used by other drivers (which
ones?), it may make sense to have it in a better place than
drivers/video/backlight/.

The tegra device tree nodes are just here as an example usage.

Alexandre Courbot (3):
  Power sequences interpreter for device tree
  pwm-backlight: use power sequences
  tegra: add pwm backlight device tree nodes

 .../bindings/video/backlight/pwm-backlight.txt |  28 +-
 arch/arm/boot/dts/tegra20-ventana.dts  |  31 +++
 arch/arm/boot/dts/tegra20.dtsi |   2 +-
 drivers/video/backlight/Makefile   |   2 +-
 drivers/video/backlight/power_seq.c| 298 +
 drivers/video/backlight/pwm_bl.c   | 212 +++
 include/linux/power_seq.h  |  96 +++
 include/linux/pwm_backlight.h  |  37 ++-
 8 files changed, 645 insertions(+), 61 deletions(-)
 create mode 100644 drivers/video/backlight/power_seq.c
 create mode 100644 include/linux/power_seq.h

-- 
1.7.11.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the arm-soc tree with the gpio-lw tree

2012-07-09 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the arm-soc tree got a conflict in
drivers/gpio/gpio-mxc.c between commit fef2bca203e9 ("gpio/mxc: use the
edge_sel feature if available") from the gpio-lw tree and commit
1ab7ef158dfb ("gpio/mxc: move irq_domain_add_legacy call into gpio
driver") from the arm-soc tree.

I fixed it up (I think - see below) and can carry the fix as necessary.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/gpio/gpio-mxc.c
index f45bb54,e5db670..000
--- a/drivers/gpio/gpio-mxc.c
+++ b/drivers/gpio/gpio-mxc.c
@@@ -184,19 -160,15 +184,19 @@@ static int gpio_set_irq_type(struct irq
edge = GPIO_INT_FALL_EDGE;
break;
case IRQ_TYPE_EDGE_BOTH:
 -  val = gpio_get_value(gpio);
 -  if (val) {
 -  edge = GPIO_INT_LOW_LEV;
 -  pr_debug("mxc: set GPIO %d to low trigger\n", gpio);
 +  if (GPIO_EDGE_SEL >= 0) {
 +  edge = GPIO_INT_BOTH_EDGES;
} else {
 -  edge = GPIO_INT_HIGH_LEV;
 -  pr_debug("mxc: set GPIO %d to high trigger\n", gpio);
 +  val = gpio_get_value(gpio);
 +  if (val) {
 +  edge = GPIO_INT_LOW_LEV;
 +  pr_debug("mxc: set GPIO %d to low trigger\n", 
gpio);
 +  } else {
 +  edge = GPIO_INT_HIGH_LEV;
 +  pr_debug("mxc: set GPIO %d to high trigger\n", 
gpio);
 +  }
-   port->both_edges |= 1 << (gpio & 31);
++  port->both_edges |= 1 << gpio_idx;
}
 -  port->both_edges |= 1 << gpio_idx;
break;
case IRQ_TYPE_LEVEL_LOW:
edge = GPIO_INT_LOW_LEV;
@@@ -208,24 -180,11 +208,24 @@@
return -EINVAL;
}
  
 -  reg += GPIO_ICR1 + ((gpio_idx & 0x10) >> 2); /* ICR1 or ICR2 */
 -  bit = gpio_idx & 0xf;
 -  val = readl(reg) & ~(0x3 << (bit << 1));
 -  writel(val | (edge << (bit << 1)), reg);
 +  if (GPIO_EDGE_SEL >= 0) {
 +  val = readl(port->base + GPIO_EDGE_SEL);
 +  if (edge == GPIO_INT_BOTH_EDGES)
-   writel(val | (1 << (gpio & 0x1f)),
++  writel(val | (1 << gpio_idx),
 +  port->base + GPIO_EDGE_SEL);
 +  else
-   writel(val & ~(1 << (gpio & 0x1f)),
++  writel(val & ~(1 << gpio_idx),
 +  port->base + GPIO_EDGE_SEL);
 +  }
 +
 +  if (edge != GPIO_INT_BOTH_EDGES) {
-   reg += GPIO_ICR1 + ((gpio & 0x10) >> 2); /* lower or upper 
register */
-   bit = gpio & 0xf;
++  reg += GPIO_ICR1 + ((gpio_idx & 0x10) >> 2); /* ICR1 or ICR2 */
++  bit = gpio_idx & 0xf;
 +  val = readl(reg) & ~(0x3 << (bit << 1));
 +  writel(val | (edge << (bit << 1)), reg);
 +  }
 +
-   writel(1 << (gpio & 0x1f), port->base + GPIO_ISR);
+   writel(1 << gpio_idx, port->base + GPIO_ISR);
  
return 0;
  }


pgpvSImW71mr8.pgp
Description: PGP signature

[PATCHv3] pwm_backlight: pass correct brightness to callback

pwm_backlight_update_status calls the notify() and notify_after()
callbacks before and after applying the new PWM settings. However, if
brightness levels are used, the brightness value will be changed from
the index into the levels array to the PWM duty cycle length before
being passed to notify_after(), which results in inconsistent behavior.

Signed-off-by: Alexandre Courbot 
---
 drivers/video/backlight/pwm_bl.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/video/backlight/pwm_bl.c b/drivers/video/backlight/pwm_bl.c
index 057389d..be48517 100644
--- a/drivers/video/backlight/pwm_bl.c
+++ b/drivers/video/backlight/pwm_bl.c
@@ -54,14 +54,17 @@ static int pwm_backlight_update_status(struct 
backlight_device *bl)
pwm_config(pb->pwm, 0, pb->period);
pwm_disable(pb->pwm);
} else {
+   int duty_cycle;
if (pb->levels) {
-   brightness = pb->levels[brightness];
+   duty_cycle = pb->levels[brightness];
max = pb->levels[max];
+   } else {
+   duty_cycle = brightness;
}
 
-   brightness = pb->lth_brightness +
-   (brightness * (pb->period - pb->lth_brightness) / max);
-   pwm_config(pb->pwm, brightness, pb->period);
+   duty_cycle = pb->lth_brightness +
+(duty_cycle * (pb->period - pb->lth_brightness) / max);
+   pwm_config(pb->pwm, duty_cycle, pb->period);
pwm_enable(pb->pwm);
}
 
-- 
1.7.11.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv3] pwm_backlight: pass correct brightness to callback

pwm_backlight_update_status calls the notify() and notify_after()
callbacks before and after applying the new PWM settings. However, if
brightness levels are used, the brightness value will be changed from
the index into the levels array to the PWM duty cycle length before
being passed to notify_after(), which results in inconsistent behavior.

Signed-off-by: Alexandre Courbot acour...@nvidia.com
---
 drivers/video/backlight/pwm_bl.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/video/backlight/pwm_bl.c b/drivers/video/backlight/pwm_bl.c
index 057389d..be48517 100644
--- a/drivers/video/backlight/pwm_bl.c
+++ b/drivers/video/backlight/pwm_bl.c
@@ -54,14 +54,17 @@ static int pwm_backlight_update_status(struct 
backlight_device *bl)
pwm_config(pb-pwm, 0, pb-period);
pwm_disable(pb-pwm);
} else {
+   int duty_cycle;
if (pb-levels) {
-   brightness = pb-levels[brightness];
+   duty_cycle = pb-levels[brightness];
max = pb-levels[max];
+   } else {
+   duty_cycle = brightness;
}
 
-   brightness = pb-lth_brightness +
-   (brightness * (pb-period - pb-lth_brightness) / max);
-   pwm_config(pb-pwm, brightness, pb-period);
+   duty_cycle = pb-lth_brightness +
+(duty_cycle * (pb-period - pb-lth_brightness) / max);
+   pwm_config(pb-pwm, duty_cycle, pb-period);
pwm_enable(pb-pwm);
}
 
-- 
1.7.11.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the arm-soc tree with the gpio-lw tree

2012-07-09 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the arm-soc tree got a conflict in
drivers/gpio/gpio-mxc.c between commit fef2bca203e9 (gpio/mxc: use the
edge_sel feature if available) from the gpio-lw tree and commit
1ab7ef158dfb (gpio/mxc: move irq_domain_add_legacy call into gpio
driver) from the arm-soc tree.

I fixed it up (I think - see below) and can carry the fix as necessary.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/gpio/gpio-mxc.c
index f45bb54,e5db670..000
--- a/drivers/gpio/gpio-mxc.c
+++ b/drivers/gpio/gpio-mxc.c
@@@ -184,19 -160,15 +184,19 @@@ static int gpio_set_irq_type(struct irq
edge = GPIO_INT_FALL_EDGE;
break;
case IRQ_TYPE_EDGE_BOTH:
 -  val = gpio_get_value(gpio);
 -  if (val) {
 -  edge = GPIO_INT_LOW_LEV;
 -  pr_debug(mxc: set GPIO %d to low trigger\n, gpio);
 +  if (GPIO_EDGE_SEL = 0) {
 +  edge = GPIO_INT_BOTH_EDGES;
} else {
 -  edge = GPIO_INT_HIGH_LEV;
 -  pr_debug(mxc: set GPIO %d to high trigger\n, gpio);
 +  val = gpio_get_value(gpio);
 +  if (val) {
 +  edge = GPIO_INT_LOW_LEV;
 +  pr_debug(mxc: set GPIO %d to low trigger\n, 
gpio);
 +  } else {
 +  edge = GPIO_INT_HIGH_LEV;
 +  pr_debug(mxc: set GPIO %d to high trigger\n, 
gpio);
 +  }
-   port-both_edges |= 1  (gpio  31);
++  port-both_edges |= 1  gpio_idx;
}
 -  port-both_edges |= 1  gpio_idx;
break;
case IRQ_TYPE_LEVEL_LOW:
edge = GPIO_INT_LOW_LEV;
@@@ -208,24 -180,11 +208,24 @@@
return -EINVAL;
}
  
 -  reg += GPIO_ICR1 + ((gpio_idx  0x10)  2); /* ICR1 or ICR2 */
 -  bit = gpio_idx  0xf;
 -  val = readl(reg)  ~(0x3  (bit  1));
 -  writel(val | (edge  (bit  1)), reg);
 +  if (GPIO_EDGE_SEL = 0) {
 +  val = readl(port-base + GPIO_EDGE_SEL);
 +  if (edge == GPIO_INT_BOTH_EDGES)
-   writel(val | (1  (gpio  0x1f)),
++  writel(val | (1  gpio_idx),
 +  port-base + GPIO_EDGE_SEL);
 +  else
-   writel(val  ~(1  (gpio  0x1f)),
++  writel(val  ~(1  gpio_idx),
 +  port-base + GPIO_EDGE_SEL);
 +  }
 +
 +  if (edge != GPIO_INT_BOTH_EDGES) {
-   reg += GPIO_ICR1 + ((gpio  0x10)  2); /* lower or upper 
register */
-   bit = gpio  0xf;
++  reg += GPIO_ICR1 + ((gpio_idx  0x10)  2); /* ICR1 or ICR2 */
++  bit = gpio_idx  0xf;
 +  val = readl(reg)  ~(0x3  (bit  1));
 +  writel(val | (edge  (bit  1)), reg);
 +  }
 +
-   writel(1  (gpio  0x1f), port-base + GPIO_ISR);
+   writel(1  gpio_idx, port-base + GPIO_ISR);
  
return 0;
  }


pgpvSImW71mr8.pgp
Description: PGP signature

[RFC][PATCHv2 0/3] Power sequences interpreter for pwm_backlight

This is a RFC since this patch largely drifted beyond its original goal
of supporting one GPIO and one regulator for the pwm_backlight driver.

The issue to address is that backlight power sequences, which were
implemented using board-specific callbacks so far, could not be used with
the device tree. This series of patches adds a small power sequence 
interpreter that allows to acquire and control regulators, GPIOs, and PWMs
during sequences defined in the device tree. It is easy to use,
low-footprint, and takes care of managing the resources that it acquires.

The implementation is working and should be complete, but documentation is
lacking. Also since the interpreter could be used by other drivers (which
ones?), it may make sense to have it in a better place than
drivers/video/backlight/.

The tegra device tree nodes are just here as an example usage.

Alexandre Courbot (3):
  Power sequences interpreter for device tree
  pwm-backlight: use power sequences
  tegra: add pwm backlight device tree nodes

 .../bindings/video/backlight/pwm-backlight.txt |  28 +-
 arch/arm/boot/dts/tegra20-ventana.dts  |  31 +++
 arch/arm/boot/dts/tegra20.dtsi |   2 +-
 drivers/video/backlight/Makefile   |   2 +-
 drivers/video/backlight/power_seq.c| 298 +
 drivers/video/backlight/pwm_bl.c   | 212 +++
 include/linux/power_seq.h  |  96 +++
 include/linux/pwm_backlight.h  |  37 ++-
 8 files changed, 645 insertions(+), 61 deletions(-)
 create mode 100644 drivers/video/backlight/power_seq.c
 create mode 100644 include/linux/power_seq.h

-- 
1.7.11.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH V2 2/3] pwm_backlight: use power sequences

Make use of the power sequences specified in the device tree or platform
data, if any.

Signed-off-by: Alexandre Courbot acour...@nvidia.com
---
 .../bindings/video/backlight/pwm-backlight.txt |  28 ++-
 drivers/video/backlight/power_seq.c|  44 ++---
 drivers/video/backlight/pwm_bl.c   | 210 +++--
 include/linux/pwm_backlight.h  |  37 +++-
 4 files changed, 239 insertions(+), 80 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt 
b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
index 1e4fc72..86c9253 100644
--- a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
+++ b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
@@ -2,7 +2,10 @@ pwm-backlight bindings
 
 Required properties:
   - compatible: pwm-backlight
-  - pwms: OF device-tree PWM specification (see PWM binding[0])
+  - pwms: OF device-tree PWM specification (see PWM binding[0]). Exactly one 
PWM
+  must be specified
+  - pwm-names: a list of names for the PWM devices specified in the
+  pwms property (see PWM binding[0])
   - brightness-levels: Array of distinct brightness levels. Typically these
   are in the range from 0 to 255, but any range starting at 0 will do.
   The actual brightness level (PWM duty cycle) will be interpolated
@@ -10,10 +13,18 @@ Required properties:
   last value in the array represents a 100% duty cycle (brightest).
   - default-brightness-level: the default brightness level (index into the
   array defined by the brightness-levels property)
+  - power-on-sequence: Power sequence that will bring the backlight on. This
+  sequence must reference the PWM specified in the pwms property by its
+  name. It can also reference extra GPIOs or regulators, and introduce
+  delays between sequence steps
+  - power-off-sequence: Power sequence that will bring the backlight off. This
+  sequence must reference the PWM specified in the pwms property by its
+  name. It can also reference extra GPIOs or regulators, and introduce
+  delays between sequence steps
 
 Optional properties:
-  - pwm-names: a list of names for the PWM devices specified in the
-   pwms property (see PWM binding[0])
+  - *-supply: a reference to a regulator used within a power sequence
+  - *-gpios: a reference to a GPIO used within a power sequence.
 
 [0]: Documentation/devicetree/bindings/pwm/pwm.txt
 
@@ -22,7 +33,18 @@ Example:
backlight {
compatible = pwm-backlight;
pwms = pwm 0 500;
+   pwm-names = backlight;
 
brightness-levels = 0 4 8 16 32 64 128 255;
default-brightness-level = 6;
+   power-supply = backlight_reg;
+   enable-gpios = gpio 6 0;
+   power-on-sequence = REGULATOR, power, 1,
+   DELAY, 10,
+   PWM, backlight, 1,
+   GPIO, enable, 1;
+   power-off-sequence = GPIO, enable, 0,
+PWM, backlight, 0,
+DELAY, 10,
+REGULATOR, power, 0;
};
diff --git a/drivers/video/backlight/power_seq.c 
b/drivers/video/backlight/power_seq.c
index f54cb7d..f8737db 100644
--- a/drivers/video/backlight/power_seq.c
+++ b/drivers/video/backlight/power_seq.c
@@ -118,9 +118,9 @@ static int of_parse_power_seq_step(struct device *dev, 
struct property *prop,
tmp_buf[sizeof(tmp_buf) - 6] = 0;
strcat(tmp_buf, -gpios);
ret = of_get_named_gpio(dev-of_node, tmp_buf, 0);
-   if (ret = 0)
+   if (ret = 0) {
seq[cpt].value = ret;
-   else {
+   } else {
if (ret != -EPROBE_DEFER)
dev_err(dev, cannot get gpio \%s\\n,
seq[cpt].id);
@@ -218,26 +218,26 @@ power_seq *power_seq_build(struct device *dev, 
power_seq_resources *ress,
seq-type = pseq-type;
 
switch (pseq-type) {
-   case POWER_SEQ_REGULATOR:
-   case POWER_SEQ_GPIO:
-   case POWER_SEQ_PWM:
-   if (!(res = power_seq_find_resource(ress, 
pseq))) {
-   /* create resource node */
-   res = devm_kzalloc(dev, sizeof(*res),
-  GFP_KERNEL);
-   if (!res)
-   return ERR_PTR(-ENOMEM);
-   memcpy(res-plat,

[RFC][PATCH V2 3/3] tegra: add pwm backlight device tree nodes

Signed-off-by: Alexandre Courbot acour...@nvidia.com
---
 arch/arm/boot/dts/tegra20-ventana.dts | 31 +++
 arch/arm/boot/dts/tegra20.dtsi|  2 +-
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/tegra20-ventana.dts 
b/arch/arm/boot/dts/tegra20-ventana.dts
index be90544..c67d9e1 100644
--- a/arch/arm/boot/dts/tegra20-ventana.dts
+++ b/arch/arm/boot/dts/tegra20-ventana.dts
@@ -317,6 +317,37 @@
bus-width = 8;
};
 
+   backlight {
+   compatible = pwm-backlight;
+   brightness-levels = 0 16 32 48 64 80 96 112 128 144 160 176 
192 208 224 240 255;
+   default-brightness-level = 12;
+
+   pwms = pwm 2 500;
+   pwm-names = backlight;
+   power-supply = backlight_reg;
+   enable-gpios = gpio 28 0;
+
+   power-on-sequence = REGULATOR, power, 1,
+   DELAY, 10,
+   PWM, backlight, 1,
+   GPIO, enable, 1;
+   power-off-sequence = GPIO, enable, 0,
+PWM, backlight, 0,
+DELAY, 10,
+REGULATOR, power, 0;
+   };
+
+   backlight_reg: fixedregulator@176 {
+   compatible = regulator-fixed;
+   regulator-name = backlight_regulator;
+   regulator-min-microvolt = 180;
+   regulator-max-microvolt = 180;
+   gpio = gpio 176 0;
+   startup-delay-us = 0;
+   enable-active-high;
+   regulator-boot-off;
+   };
+
sound {
compatible = nvidia,tegra-audio-wm8903-ventana,
 nvidia,tegra-audio-wm8903;
diff --git a/arch/arm/boot/dts/tegra20.dtsi b/arch/arm/boot/dts/tegra20.dtsi
index 405d167..67a6cd9 100644
--- a/arch/arm/boot/dts/tegra20.dtsi
+++ b/arch/arm/boot/dts/tegra20.dtsi
@@ -123,7 +123,7 @@
status = disabled;
};
 
-   pwm {
+   pwm: pwm {
compatible = nvidia,tegra20-pwm;
reg = 0x7000a000 0x100;
#pwm-cells = 2;
-- 
1.7.11.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH V2 1/3] power sequences interpreter for device tree

Some device drivers (panel backlights especially) need to follow precise
sequences for powering on and off, involving gpios, regulators, PWMs
with a precise powering order and delays to respect between each steps.
These sequences are board-specific, and do not belong to a particular
driver - therefore they have been performed by board-specific hook
functions to far.

With the advent of the device tree, we cannot rely of board-specific
hooks anymore, but still need a way to implement these sequences in a
portable manner. This patch introduces a simple interpreter that can
execute such power sequences encoded either as platform data or within
the device tree.

Signed-off-by: Alexandre Courbot acour...@nvidia.com
---
 drivers/video/backlight/Makefile|   2 +-
 drivers/video/backlight/power_seq.c | 298 
 drivers/video/backlight/pwm_bl.c|   3 +-
 include/linux/power_seq.h   |  96 
 4 files changed, 397 insertions(+), 2 deletions(-)
 create mode 100644 drivers/video/backlight/power_seq.c
 create mode 100644 include/linux/power_seq.h

diff --git a/drivers/video/backlight/Makefile b/drivers/video/backlight/Makefile
index a2ac9cf..6bff124 100644
--- a/drivers/video/backlight/Makefile
+++ b/drivers/video/backlight/Makefile
@@ -28,7 +28,7 @@ obj-$(CONFIG_BACKLIGHT_OMAP1) += omap1_bl.o
 obj-$(CONFIG_BACKLIGHT_PANDORA)+= pandora_bl.o
 obj-$(CONFIG_BACKLIGHT_PROGEAR) += progear_bl.o
 obj-$(CONFIG_BACKLIGHT_CARILLO_RANCH) += cr_bllcd.o
-obj-$(CONFIG_BACKLIGHT_PWM)+= pwm_bl.o
+obj-$(CONFIG_BACKLIGHT_PWM)+= pwm_bl.o power_seq.o
 obj-$(CONFIG_BACKLIGHT_DA903X) += da903x_bl.o
 obj-$(CONFIG_BACKLIGHT_DA9052) += da9052_bl.o
 obj-$(CONFIG_BACKLIGHT_MAX8925)+= max8925_bl.o
diff --git a/drivers/video/backlight/power_seq.c 
b/drivers/video/backlight/power_seq.c
new file mode 100644
index 000..f54cb7d
--- /dev/null
+++ b/drivers/video/backlight/power_seq.c
@@ -0,0 +1,298 @@
+#include linux/err.h
+#include linux/of_gpio.h
+#include linux/device.h
+#include linux/slab.h
+#include linux/power_seq.h
+#include linux/delay.h
+#include linux/pwm.h
+#include linux/regulator/consumer.h
+
+#define PWM_SEQ_TYPE(type) [POWER_SEQ_ ## type] = #type
+static const char *pwm_seq_types[] = {
+   PWM_SEQ_TYPE(STOP),
+   PWM_SEQ_TYPE(DELAY),
+   PWM_SEQ_TYPE(REGULATOR),
+   PWM_SEQ_TYPE(PWM),
+   PWM_SEQ_TYPE(GPIO),
+};
+#undef PWM_SEQ_TYPE
+
+static bool power_seq_step_run(struct power_seq_step *step)
+{
+   switch (step-type) {
+   case POWER_SEQ_DELAY:
+   msleep(step-parameter);
+   break;
+   case POWER_SEQ_REGULATOR:
+   if (step-parameter)
+   regulator_enable(step-resource-regulator);
+   else
+   regulator_disable(step-resource-regulator);
+   break;
+   case POWER_SEQ_PWM:
+   if (step-parameter)
+   pwm_enable(step-resource-pwm);
+   else
+   pwm_disable(step-resource-pwm);
+   break;
+   case POWER_SEQ_GPIO:
+   gpio_set_value_cansleep(step-resource-gpio, step-parameter);
+   break;
+   /* should never happen since we verify the data when building it */
+   default:
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+int power_seq_run(power_seq *seq)
+{
+   int err;
+
+   if (!seq) return 0;
+
+   while (seq-type != POWER_SEQ_STOP) {
+   if ((err = power_seq_step_run(seq++))) {
+   return err;
+   }
+   }
+
+   return 0;
+}
+
+static int of_parse_power_seq_step(struct device *dev, struct property *prop,
+  struct platform_power_seq_step *seq,
+  int max_steps)
+{
+   void *value = prop-value;
+   void *end = prop-value + prop-length;
+   int slen, smax, cpt = 0, i, ret;
+   char tmp_buf[32];
+
+   while (value  end  cpt  max_steps) {
+   smax = value - end;
+   slen = strnlen(value, end - value);
+
+   /* Unterminated string / not a string? */
+   if (slen = end - value)
+   goto invalid_seq;
+
+   /* Find a matching sequence step type */
+   for (i = 0; i  POWER_SEQ_MAX; i++)
+   if (!strcmp(value, pwm_seq_types[i]))
+   break;
+
+   if (i = POWER_SEQ_MAX)
+   goto unknown_step;
+
+   value += slen + 1;
+
+   seq[cpt].type = i;
+   switch (seq[cpt].type) {
+   case POWER_SEQ_DELAY:
+   /* integer parameter */
+   seq[cpt].parameter = be32_to_cpup(value);
+   value += sizeof(__be32);
+   break;
+   case POWER_SEQ_REGULATOR:
+

Re: [PATCH] pwm-backlight: add regulator and GPIO support

2012-07-09 Thread Alex Courbot


On 07/09/2012 02:19 PM, Jingoo Han wrote:

I couldn't agree with Stephen Warren more.
Could you support DT and non-DT case for backwards compatibility?


Both cases are handled in the new version I just sent. I hope all other 
concerns have also been addressed properly. If I forgot something please 
ping me.


Alex.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC 1/2] kvm vcpu: Note down pause loop exit

Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com

Noting pause loop exited vcpu helps in filtering right candidate to yield.
Yielding to same vcpu may result in more wastage of cpu.

From: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/kvm_host.h |7 +++
 arch/x86/kvm/svm.c  |1 +
 arch/x86/kvm/vmx.c  |1 +
 arch/x86/kvm/x86.c  |4 +++-
 4 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index db7c1f2..857ca68 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -484,6 +484,13 @@ struct kvm_vcpu_arch {
u64 length;
u64 status;
} osvw;
+
+   /* Pause loop exit optimization */
+   struct {
+   bool pause_loop_exited;
+   bool dy_eligible;
+   } plo;
+
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f75af40..a492f5d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3264,6 +3264,7 @@ static int interrupt_window_interception(struct vcpu_svm 
*svm)
 
 static int pause_interception(struct vcpu_svm *svm)
 {
+   svm-vcpu.arch.plo.pause_loop_exited = true;
kvm_vcpu_on_spin((svm-vcpu));
return 1;
 }
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 32eb588..600fb3c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4945,6 +4945,7 @@ out:
 static int handle_pause(struct kvm_vcpu *vcpu)
 {
skip_emulated_instruction(vcpu);
+   vcpu-arch.plo.pause_loop_exited = true;
kvm_vcpu_on_spin(vcpu);
 
return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index be6d549..07dbd14 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5331,7 +5331,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
if (req_immediate_exit)
smp_send_reschedule(vcpu-cpu);
-
+   vcpu-arch.plo.pause_loop_exited = false;
kvm_guest_enter();
 
if (unlikely(vcpu-arch.switch_db_regs)) {
@@ -6168,6 +6168,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
BUG_ON(vcpu-kvm == NULL);
kvm = vcpu-kvm;
 
+   vcpu-arch.plo.pause_loop_exited = false;
+   vcpu-arch.plo.dy_eligible = true;
vcpu-arch.emulate_ctxt.ops = emulate_ops;
if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu))
vcpu-arch.mp_state = KVM_MP_STATE_RUNNABLE;

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC 0/2] kvm: Improving directed yield in PLE handler


Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering while choosing
the candidate to yield_to, we can do better.

Problem is, for large vcpu guests, we have more probability of yielding
to a bad vcpu. We are not able to prevent directed yield to same guy who
has done PL exit recently, who perhaps spins again and wastes CPU.

Fix that by keeping track of who has done PL exit. So The Algorithm in series
give chance to a VCPU which has:

 (a) Not done PLE exit at all (probably he is preempted lock-holder)

 (b) VCPU skipped in last iteration because it did PL exit, and probably
 has become eligible now (next eligible lock holder)

Future enhancemnets:
  (1) Currently we have a boolean to decide on eligibility of vcpu. It
would be nice if I get feedback on guest (32 vcpu) whether we can
improve better with integer counter. (with counter = say f(log n )).
  
  (2) We have not considered system load during iteration of vcpu. With
   that information we can limit the scan and also decide whether schedule()
   is better. [ I am able to use #kicked vcpus to decide on this But may
   be there are better ideas like information from global loadavg.]

  (3) We can exploit this further with PV patches since it also knows about
   next eligible lock-holder.

Summary: There is a huge improvement for moderate / no overcommit scenario
 for kvm based guest on PLE machine (which is difficult ;) ).

Result:
Base : kernel 3.5.0-rc5 with Rik's Ple handler fix

Machine : Intel(R) Xeon(R) CPU X7560  @ 2.27GHz, 4 numa node, 256GB RAM,
  32 core machine

Host: enterprise linux  gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC)
  with test kernels 

Guest: fedora 16 with 32 vcpus 8GB memory. 

Benchmarks:
1) kernbench: kernbench-0.5 (kernbench -f -H -M -o 2*vcpu)
Very first run in kernbench is omitted.

2) sysbench: 0.4.12
sysbench --test=oltp --db-driver=pgsql prepare
sysbench --num-threads=2*vcpu --max-requests=10 --test=oltp 
--oltp-table-size=50 --db-driver=pgsql --oltp-read-only run
Note that driver for this pgsql.

3) ebizzy: release 0.3
cmd: ebizzy -S 120 

  1) kernbench (time in sec lesser is better)
+---+---+---++---+
   base_rikstdev   patched  stdev   %improve
+---+---+---++---+
1x  49.2300 1.0171  38.3792 1.3659 28.27261%
2x  91.9358 1.7768  85.8842 1.6654  7.04623%
+---+---+---++---+

  2) sysbench (time in sec lesser is better)
+---+---+---++---+
   base_rikstdev   patched  stdev   %improve
+---+---+---++---+
1x  12.1623 0.0942  12.1674 0.3126-0.04192%
2x  14.3069 0.8520  14.1879 0.6811 0.83874%
+---+---+---++---+

Note that 1x scenario differs in only third decimal place and
degradation/improvemnet for sysbench will not be seen even with
higher confidence interval.


  3) ebizzy (records/sec more is better)
+---+---+---++---+
   base_rikstdev   patched  stdev   %improve
+---+---+---++---+
1x  1129.2500  28.67932316.625053.0066 105.14722%
2x  1892.3750  75.11122386.5000   168.8033  26.11137%
+---+---+---++---+

kernbench 1x: 4 fast runs = 12 runs avg
kernbench 2x: 4 fast runs = 12 runs avg

sysbench 1x: 8runs avg
sysbench 2x: 8runs avg

ebizzy 1x: 8runs avg
ebizzy 2x: 8runs avg

Thanks Vatsa and Srikar for brainstorming discussions regarding
optimizations.

 Raghavendra K T (2):
   kvm vcpu: Note down pause loop exit
   kvm PLE handler: Choose better candidate for directed yield

 arch/s390/include/asm/kvm_host.h |5 +
 arch/x86/include/asm/kvm_host.h  |9 -
 arch/x86/kvm/svm.c   |1 +
 arch/x86/kvm/vmx.c   |1 +
 arch/x86/kvm/x86.c   |   18 +-
 virt/kvm/kvm_main.c  |3 +++
 6 files changed, 35 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC 2/2] kvm PLE handler: Choose better candidate for directed yield

From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

Currently PLE handler can repeatedly do a directed yield to same vcpu
that has recently done PL exit. This can degrade the performance
Try to yield to most eligible guy instead, by alternate yielding.

Precisely, give chance to a VCPU which has:
 (a) Not done PLE exit at all (probably he is preempted lock-holder)
 (b) VCPU skipped in last iteration because it did PL exit, and probably
 has become eligible now (next eligible lock holder)

Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/s390/include/asm/kvm_host.h |5 +
 arch/x86/include/asm/kvm_host.h  |2 +-
 arch/x86/kvm/x86.c   |   14 ++
 virt/kvm/kvm_main.c  |3 +++
 4 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index dd17537..884f2c4 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -256,5 +256,10 @@ struct kvm_arch{
struct gmap *gmap;
 };
 
+static inline bool kvm_arch_vcpu_check_and_update_eligible(struct kvm_vcpu *v)
+{
+   return true;
+}
+
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 857ca68..ce01db3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -962,7 +962,7 @@ extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, 
gfn_t gfn);
 void kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
 
 int kvm_is_in_guest(void);
-
+bool kvm_arch_vcpu_check_and_update_eligible(struct kvm_vcpu *vcpu);
 void kvm_pmu_init(struct kvm_vcpu *vcpu);
 void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
 void kvm_pmu_reset(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 07dbd14..24ceae8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6623,6 +6623,20 @@ bool kvm_arch_can_inject_async_page_present(struct 
kvm_vcpu *vcpu)
kvm_x86_ops-interrupt_allowed(vcpu);
 }
 
+bool kvm_arch_vcpu_check_and_update_eligible(struct kvm_vcpu *vcpu)
+{
+   bool eligible;
+
+   eligible = !vcpu-arch.plo.pause_loop_exited ||
+   (vcpu-arch.plo.pause_loop_exited 
+vcpu-arch.plo.dy_eligible);
+
+   if (vcpu-arch.plo.pause_loop_exited)
+   vcpu-arch.plo.dy_eligible = !vcpu-arch.plo.dy_eligible;
+
+   return eligible;
+}
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7e14068..519321a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1595,6 +1595,9 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
continue;
if (waitqueue_active(vcpu-wq))
continue;
+   if (!kvm_arch_vcpu_check_and_update_eligible(vcpu)) {
+   continue;
+   }
if (kvm_vcpu_yield_to(vcpu)) {
kvm-last_boosted_vcpu = i;
yielded = 1;

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Antw: Re: /sys and access(2): Correctly implemented?

2012-07-09 Thread Ulrich Windl

 Ryan Mallon rmal...@gmail.com schrieb am 09.07.2012 um 01:24 in Nachricht
4ffa16b6.9050...@gmail.com:
 On 06/07/12 16:27, Ulrich Windl wrote:
  Hi!
  
  Recently I found a problem with the command (kernel 3.0.34-0.7-default from 
 SLES 11 SP2, run as root):
  test -r $file  cat $file
  emitting Permission denied
  
  Investigating, I found that test actually uses access() to check for 
 permissions. Unfortunately there are some files in /sys that have 
 write-only 
 permission bits set (e.g. /sys/devices/system/cpu/probe).
  
  ~ # ll /sys/devices/system/cpu/probe
  --w--- 1 root root 4096 Jun 29 12:43 /sys/devices/system/cpu/probe
  ~ # F=/sys/devices/system/cpu/probe
  ~ # test $F  cat $F
  cat: /sys/devices/system/cpu/probe: Permission denied
 
 Looks like you have a typo here, I think you wanted test -r $F, not
 test $F, the latter will just evaluate $F as an expression which
 will be true, and so you get the permission denied error running cat.

Hi!

You are right: It's a typo, but only in the message; the actual test was done 
correctly, and the outcome is quite the same.

 
 Using test -r $F on a write-only sysfs file correctly returns false on
 my machine (Ubuntu 10.04.4 LTS/2.6.32-41-generic).

Not here, unfortunately:
# ll /sys/devices/system/cpu/probe
--w--- 1 root root 4096 Jul  2 11:52 /sys/devices/system/cpu/probe
# F=/sys/devices/system/cpu/probe
# test -r $F  cat $F
cat: /sys/devices/system/cpu/probe: Permission denied
# uname -a
Linux h07 2.6.32.59-0.3-default #1 SMP 2012-04-27 11:14:44 +0200 x86_64 x86_64 
x86_64 GNU/Linux

Regards,
Ulrich


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] pwm: Use pr_* functions in pwm-samsung.c file

2012-07-09 Thread Thierry Reding

On Fri, Jul 06, 2012 at 02:43:50PM +0530, Sachin Kamat wrote:
 Replace printk with pr_* functions to avoid checkpatch warnings.
 
 Signed-off-by: Sachin Kamat sachin.ka...@linaro.org
 ---
  drivers/pwm/pwm-samsung.c |6 --
  1 files changed, 4 insertions(+), 2 deletions(-)

Applied, thanks,

Thierry

 
 diff --git a/drivers/pwm/pwm-samsung.c b/drivers/pwm/pwm-samsung.c
 index 35fa0e8..d103865 100644
 --- a/drivers/pwm/pwm-samsung.c
 +++ b/drivers/pwm/pwm-samsung.c
 @@ -11,6 +11,8 @@
   * the Free Software Foundation; either version 2 of the License.
  */
  
 +#define pr_fmt(fmt) pwm-samsung:  fmt
 +
  #include linux/export.h
  #include linux/kernel.h
  #include linux/platform_device.h
 @@ -340,13 +342,13 @@ static int __init pwm_init(void)
   clk_scaler[1] = clk_get(NULL, pwm-scaler1);
  
   if (IS_ERR(clk_scaler[0]) || IS_ERR(clk_scaler[1])) {
 - printk(KERN_ERR %s: failed to get scaler clocks\n, __func__);
 + pr_err(failed to get scaler clocks\n);
   return -EINVAL;
   }
  
   ret = platform_driver_register(s3c_pwm_driver);
   if (ret)
 - printk(KERN_ERR %s: failed to add pwm driver\n, __func__);
 + pr_err(failed to add pwm driver\n);
  
   return ret;
  }
 -- 
 1.7.4.1
 
 
 


pgpsYSC8a6JMF.pgp
Description: PGP signature

Re: [PATCHv3] pwm_backlight: pass correct brightness to callback

2012-07-09 Thread Thierry Reding

On Mon, Jul 09, 2012 at 03:04:23PM +0900, Alexandre Courbot wrote:
 pwm_backlight_update_status calls the notify() and notify_after()
 callbacks before and after applying the new PWM settings. However, if
 brightness levels are used, the brightness value will be changed from
 the index into the levels array to the PWM duty cycle length before
 being passed to notify_after(), which results in inconsistent behavior.
 
 Signed-off-by: Alexandre Courbot acour...@nvidia.com
 ---
  drivers/video/backlight/pwm_bl.c | 11 +++
  1 file changed, 7 insertions(+), 4 deletions(-)

Applied, with a minor stylistic fixup adding a blank line after the
duty_cycle variable declaration. Thanks.

Thierry

 diff --git a/drivers/video/backlight/pwm_bl.c 
 b/drivers/video/backlight/pwm_bl.c
 index 057389d..be48517 100644
 --- a/drivers/video/backlight/pwm_bl.c
 +++ b/drivers/video/backlight/pwm_bl.c
 @@ -54,14 +54,17 @@ static int pwm_backlight_update_status(struct 
 backlight_device *bl)
   pwm_config(pb-pwm, 0, pb-period);
   pwm_disable(pb-pwm);
   } else {
 + int duty_cycle;
   if (pb-levels) {
 - brightness = pb-levels[brightness];
 + duty_cycle = pb-levels[brightness];
   max = pb-levels[max];
 + } else {
 + duty_cycle = brightness;
   }
  
 - brightness = pb-lth_brightness +
 - (brightness * (pb-period - pb-lth_brightness) / max);
 - pwm_config(pb-pwm, brightness, pb-period);
 + duty_cycle = pb-lth_brightness +
 +  (duty_cycle * (pb-period - pb-lth_brightness) / max);
 + pwm_config(pb-pwm, duty_cycle, pb-period);
   pwm_enable(pb-pwm);
   }
  
 -- 
 1.7.11.1
 
 
 


pgpGgEqB91z2W.pgp
Description: PGP signature

Re: [PATCH RFC 1/2] kvm vcpu: Note down pause loop exit


On 07/09/2012 11:50 AM, Raghavendra K T wrote:

Signed-off-by: Raghavendra K Traghavendra...@linux.vnet.ibm.com

Noting pause loop exited vcpu helps in filtering right candidate to yield.
Yielding to same vcpu may result in more wastage of cpu.

From: Raghavendra K Traghavendra...@linux.vnet.ibm.com
---


Oops. Sorry some how sign-off and from interchanged.. interchanged

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fat: Support fallocate on fat.

2012-07-09 Thread Namjae Jeon

Hi. Ogawa.
2012/7/8, OGAWA Hirofumi hirof...@mail.parknet.co.jp:
 Namjae Jeon linkinj...@gmail.com writes:

 +/*
 + * preallocate space for a file. This implements fat's fallocate file
 + * operation, which gets called from sys_fallocate system call. User
 + * space requests len bytes at offset.If FALLOC_FL_KEEP_SIZE is set
 + * we just allocate clusters without zeroing them out.Otherwise we
 + * allocate and zero out clusters via an expanding truncate.
 + */
 +static long fat_fallocate(struct file *file, int mode,
 +loff_t offset, loff_t len)
 +{
 +int err = 0;
 +struct inode *inode = file-f_mapping-host;
 +int cluster, nr_cluster, fclus, dclus, free_bytes, nr_bytes;
 +struct super_block *sb = inode-i_sb;
 +struct msdos_sb_info *sbi = MSDOS_SB(sb);

 What happens if called for directory? And does this guarantee it never
 expose the uninitialized data userland?
It cannot be called for directory because in do_fallocate (which calls
fat_fallocate), there is check to open the file in write mode.
If it is opened in read only mode, it returns bad file descriptor:
-
do_fallocate()
{
...
..
if (!(file-f_mode  FMODE_WRITE))
return -EBADF;
 
 ..
-
We cannot open a directory in write mode. So fallocate can never be
called for a directory.
As long as user appends data to file (instead of seeking to an offset
greater than inode-i_size and writing to it), it can guarantee.
But if user use random offset, it can not..

 +/* No support for hole punch or other fallocate flags. */
 +if (mode  ~FALLOC_FL_KEEP_SIZE)
 +return -EOPNOTSUPP;

 +if ((offset + len) = MSDOS_I(inode)-mmu_private) {
 +fat_msg(sb, KERN_ERR,
 +fat_fallocate():Blocks already allocated);
 +return -EINVAL;
 +}

 Please don't output any message by user error. And EINVAL is right
 behavior if (offset + len)  allocated size? Sounds like strange design.
Okay, I will remove message.
and I will change return sucess instead of EINVAL.

 +if ((mode  FALLOC_FL_KEEP_SIZE)) {
 +/* First compute the number of clusters to be allocated */
 +if (inode-i_size  0) {
 +err = fat_get_cluster(inode, FAT_ENT_EOF,
 +fclus, dclus);
 +if (err  0) {
 +fat_msg(sb, KERN_ERR,
 +
 fat_fallocate():fat_get_cluster() error);

 Use %s and __func__. And looks like the error is normal
 (e.g. ENOSPC), so I don't see why it needs to report.
okay, I will remove it.

 [...]

 +/*
 + * calculate i_blocks and mmu_private from the actual number of
 + * allocated clusters instead of doing it from file size.This ensures
 + * that the preallocated disk space using FALLOC_FL_KEEP_SIZE is
 + * persistent across remounts and writes go into the allocated
 clusters.
 + */
 +fat_calc_dir_size(inode);

 Looks like the wrong. If you didn't initialize preallocated space, the
 data never be exposed to userland. It is security bug.
As explained above, if we do append write instead of seeking into a
random offset, there is no security risk. The main disadvantage with
initializing the
preallocated space (as is done in case of without FALLOC_FL_KEEP_SIZE
) is it takes long time for bigger allocation sizes. It took ~70
seconds to preallocate 2GB on our target if FALLOC_FL_KEEP_SIZE  is
not set.

Thanks.

  inode-i_blocks = ((inode-i_size + (sbi-cluster_size - 1))
  ~((loff_t)sbi-cluster_size - 1))  9;
 +MSDOS_I(inode)-mmu_private = inode-i_size;
 +/* restore i_size */
 +inode-i_size = le32_to_cpu(de-size);

  fat_time_fat2unix(sbi, inode-i_mtime, de-time, de-date, 0);
  if (sbi-options.isvfat) {

 --
 OGAWA Hirofumi hirof...@mail.parknet.co.jp

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net-next 1/2] r8169: support RTL8106E


Francois, what would you like me to do with these two patches?  I
haven't seen full ACKs from you yet.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] ieee802154: verify packet size before trying to allocate it

From: Sasha Levin levinsasha...@gmail.com
Date: Mon,  2 Jul 2012 13:29:55 +0200

 Currently when sending data over datagram, the send function will attempt to
 allocate any size passed on from the userspace.

 We should make sure that this size is checked and limited. We'll limit it
 to the MTU of the device, which is checked later anyway.

 Signed-off-by: Sasha Levin levinsasha...@gmail.com

Applied.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] x86, boot: Optimize the elf header handling.

H. Peter Anvin h...@zytor.com writes:

 On 07/01/2012 01:40 PM, Eric W. Biederman wrote:
 
 So I have tracked down part of the crazyness.
 CONFIG_RODATA actually uses 2MB alignment, making
 -z max_page_size=4096 a bit questionable.
 

 Questionable how?  It's not really like it matters since we're not going
 to mmap the ELF.

Questionable as in the current elf loader in misc.c relies on the fact
that there is an almost a fixed offset between physical addresses and
file offsets for all of the PT_LOAD segments in the Elf header.

In fact CONFIG_RODATA  CONFIG_X86_64  CONFIG_SMP in combination with
-z max_page_size=4096 fails to boot.  The Elf loader in misc.c starts
coping from lower addresses to higher addresses, instead of higher
addresses to lower and that fails miserably.

But -z max_page_size=4096 is not the problem the ELF loader is.

Eric
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Question] sched/rt_mutex: re-enqueue_task on rt_mutex_setprio()

2012-07-09 Thread Peter Zijlstra

On Mon, 2012-07-09 at 09:50 +0900, Namhyung Kim wrote:
 On Sat, 07 Jul 2012 21:29:19 -0400, Steven Rostedt wrote:
  On Sat, 2012-07-07 at 14:44 +0900, Namhyung Kim wrote:
  Hi,
  
  I have a question on the code below:
  
  void rt_mutex_setprio(struct task_struct *p, int prio)
  {
  ...
 if (on_rq)
 enqueue_task(rq, p, oldprio  prio ? ENQUEUE_HEAD : 0);
  
  When enqueueing @p with new @prio, it seems put @p at the head of a
  rq if appropriate. I guess it's the case of boosting @p with higher
  priority, right?
 
  Actually, no. We put @p at the head of the queue when unboosting. If a
  task is going from a high priority into a lower priority, it is still
  treated as important for that priority, and is put to the front of the
  queue (it was just higher than everything else on that queue). But if we
  are boosting a task from a low priority, why put it to the head of other
  tasks of its new priority, when those tasks were just higher than this
  task, and this task is now just an equal.
 
 Thanks for the explanation. (Isn't it worth getting commented?) :)

Possibly, note that this part is well spec'ed by POSIX, see 

http://pubs.opengroup.org/onlinepubs/009695299/functions/xsh_chap02_08.html

SCHED_FIFO.8
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4] x86 boot: Jump to the entry point address in the elf header.


Since we have the kernel's entry point stored in the ELF header use it,
and stop hardcoding the value.

Signed-off-by: Eric W. Biederman ebied...@xmission.com
---
 arch/x86/boot/compressed/head_32.S |2 +-
 arch/x86/boot/compressed/head_64.S |2 +-
 arch/x86/boot/compressed/misc.c|   16 +---
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/boot/compressed/head_32.S 
b/arch/x86/boot/compressed/head_32.S
index c85e3ac..1b15e2c 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -211,7 +211,7 @@ relocated:
  * Jump to the decompressed kernel.
  */
xorl%ebx, %ebx
-   jmp *%ebp
+   jmp *%eax
 
 /*
  * Stack and heap for uncompression
diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 87e03a1..9b8d782 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -337,7 +337,7 @@ relocated:
 /*
  * Jump to the decompressed kernel.
  */
-   jmp *%rbp
+   jmp *%rax
 
.data
 gdt:
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 7116dcb..fc96c3e 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -273,7 +273,7 @@ static void error(char *x)
asm(hlt);
 }
 
-static void parse_elf(void *output)
+static void *parse_elf(void *output)
 {
 #ifdef CONFIG_X86_64
Elf64_Ehdr ehdr;
@@ -323,13 +323,15 @@ static void parse_elf(void *output)
}
 
free(phdrs);
+   return output + (ehdr.e_entry - LOAD_PHYSICAL_ADDR);
 }
 
-asmlinkage void decompress_kernel(void *rmode, memptr heap,
- unsigned char *input_data,
- unsigned long input_len,
- unsigned char *output)
+asmlinkage void *decompress_kernel(void *rmode, memptr heap,
+  unsigned char *input_data,
+  unsigned long input_len,
+  unsigned char *output)
 {
+   void *entry;
real_mode = rmode;
 
if (cmdline_find_option_bool(quiet))
@@ -372,8 +374,8 @@ asmlinkage void decompress_kernel(void *rmode, memptr heap,
if (!quiet)
putstr(\nDecompressing Linux... );
decompress(input_data, input_len, NULL, NULL, output, NULL, error);
-   parse_elf(output);
+   entry = parse_elf(output);
if (!quiet)
putstr(done.\nBooting the kernel.\n);
-   return;
+   return entry;
 }
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/4] x86 boot: Optimize the elf header handling.


Create a space for the elf headers at the begginng of the kernels
image in memory.

- Rework arch/x86/kernel/vmlinux.lds.S so that we allow room for
  the ELF header in the loaded image.  This removes the need in
  the ELF executalbe to insert padding between the ELf headers
  and the data of the first program segment.  This reduces the
  size of vmlinux by 2MB on x86_64.  This removes an overlap
  of the ELF header and kernel text in arch/x86/boot/compressed
  that required code to moved.

- Move the symbol _text outside of the .text section, and add the
  fixups in relocs.c to add relocations against _text.  This allows
  the symbol _text to come before the ELF header and effectively
  including the ELF header in the text section.

  If this isn't done _text moves 344 bytes in memory on x86_64 and
  creates subtle breakage in routines like cleanup_highmap, which
  assume _text is at the beginning of the kernels memory and that
  _text is 4K+ aligned.

  The current usage of the symbol _text is already that _text
  specifies the beginning of the kernel's memory and that _stext
  specifies where the kernel's code actually starts.

Signed-off-by: Eric W. Biederman ebied...@xmission.com
---
 arch/x86/kernel/vmlinux.lds.S |9 +
 arch/x86/tools/relocs.c   |1 +
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 22a1530..d6e1a44 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -68,7 +68,7 @@ jiffies_64 = jiffies;
 #endif
 
 PHDRS {
-   text PT_LOAD FLAGS(5);  /* R_E */
+   text PT_LOAD FLAGS(5) FILEHDR;  /* R_E */
data PT_LOAD FLAGS(6);  /* RW_ */
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_SMP
@@ -82,16 +82,17 @@ PHDRS {
 SECTIONS
 {
 #ifdef CONFIG_X86_32
-. = LOAD_OFFSET + LOAD_PHYSICAL_ADDR;
+   _text = LOAD_OFFSET + LOAD_PHYSICAL_ADDR;
+. = LOAD_OFFSET + LOAD_PHYSICAL_ADDR + SIZEOF_HEADERS;
 phys_startup_32 = startup_32 - LOAD_OFFSET;
 #else
-. = __START_KERNEL;
+   _text = __START_KERNEL;
+. = __START_KERNEL + SIZEOF_HEADERS;
 phys_startup_64 = startup_64 - LOAD_OFFSET;
 #endif
 
/* Text and read-only data */
.text :  AT(ADDR(.text) - LOAD_OFFSET) {
-   _text = .;
/* bootstrapping code */
HEAD_TEXT
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 5a1847d..6f32b7b 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -72,6 +72,7 @@ static const char * const sym_regex_kernel[S_NSYMTYPES] = {
__end_rodata|
__initramfs_start|
(jiffies|jiffies_64)|
+   _text|
_end)$
 };
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: Tree for July 9

2012-07-09 Thread Stephen Rothwell

Hi all,

Changes since 20120706:

I have not done the powerpc allyesconfig build today as it is too broken.

Undropped tree: gpio-lw

The jdelvare-hwmon tree lost its conflict.

The v4l-dvb tree gained a build failure so I used the version from
next-20120706.

The infiniband tree lost its build failure but gained a conflict against
Linus' tree.

The l2-mtd tree lost its conflict.

The input-mt tree lost its conflict.

The mfd tree gained a build failure so I used the version from
next-20120706.

The dt-rh tree lost its conflict.

The gpio-lw tree lost its build failure.

The arm-soc tree gained a conflict against the gpio-lw tree.

I have still reverted 3 commits from the signal tree at the request of the
arm maintainer.

The akpm tree lost a commit that turned up elsewhere.



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use git pull
to do so as that will try to merge the new linux-next release with the
old one.  You should use git fetch as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the
final fixups (if any), it is also built with powerpc allnoconfig (32 and
64 bit), ppc44x_defconfig and allyesconfig (minus
CONFIG_PROFILE_ALL_BRANCHES - this fails its final link) and i386, sparc,
sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

We are up to 196 trees (counting Linus' and 26 trees of patches pending
for Linus' tree), more are welcome (even if they are currently empty).
Thanks to those who have contributed, and to those who haven't, please do.

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (8c84bf4 Merge branch 'for-3.5-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup)
Merging fixes/master (9023a40 Merge tag 'mmc-fixes-for-3.5-rc4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc)
Merging kbuild-current/rc-fixes (f8f5701 Linux 3.5-rc1)
Merging arm-current/fixes (09b2ad1 ARM: fix warning caused by wrongly typed 
arm_dma_limit)
Merging m68k-current/for-linus (d8ce726 m68k: Use generic strncpy_from_user(), 
strlen_user(), and strnlen_user())
Merging powerpc-merge/merge (2f584a1 powerpc/kvm: sldi should be sld)
Merging sparc/master (6a8ead0 sparc32: Remove superfluous extern declarations 
for prom_*() functions)
Merging net/master (9e85a6f Merge tag 'clk-fixes-for-linus' of 
git://git.linaro.org/people/mturquette/linux)
Merging sound-current/for-linus (9e9b594 ALSA: usb-audio: Fix the first PCM 
interface assignment)
Merging pci-current/for-linus (314489b Merge tag 'fixes-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging wireless/master (76cf5c7 iwlegacy: don't mess up the SCD when removing 
a key)
Merging driver-core.current/driver-core-linus (68b6507 kmsg: make sure all 
messages reach a newly registered boot console)
Merging tty.current/tty-linus (6b16351 Linux 3.5-rc4)
Merging usb.current/usb-linus (b086b6b USB: cdc-wdm: fix lockup on error in 
wdm_read)
Merging staging.current/staging-linus (6887a41 Linux 3.5-rc5)
Merging char-misc.current/char-misc-linus (6b16351 Linux 3.5-rc4)
Merging input-current/for-linus (9b7e31b Input: request threaded-only IRQs with 
IRQF_ONESHOT)
Merging md-current/for-linus (1068411 md/raid10: fix careless build error)
Merging audit-current/for-linus (c158a35 audit: no leading space in 
audit_log_d_path prefix)
Merging crypto-current/master (c475c06 hwrng: atmel-rng - fix data valid check)
Merging ide/master (39a50b4 Merge branch 'hfsplus')
Merging dwmw2/master (244dc4e Merge 
git://git.infradead.org/users/dwmw2/random-2.6)
Merging sh-current/sh-fixes-for-linus (64941d8 sh: Fix up se7721 GPIOLIB=y 
build warnings.)
Merging irqdomain-current/irqdomain/merge (15e06bf irqdomain: Fix debugfs 
formatting)
Merging devicetree-current/devicetree/merge (4e8383b of: release node fix for 
of_parse_phandle_with_args)

[PATCH 3/4] x86 boot: When building vmlinux.bin properly precompute the memory image


The ELF loader in arch/x86/boot/compressed/misc.c is extremely
fragile, as it copies the ELF executable over itself to put the
code and data in their proper place.  Squeezing unneeded space
out of vmlinux by passing -z max-page-size 4096 to ld was enough
to render the kernel unbootable.

I explored creating a flush function for our current crop of kernel
decompressors.  While that works it has the very unfortunate side
effect of needing a much larger BOOT_HEAP_SIZE.  A couple of our
supported decompressors in that mode malloc 32MB for use during
decompression.

The other solution is to return to the original design where we
created a file known as vmlinux.bin with exactly what we wanted in
memory and compressed that.

At this point in time there are complications to going back to the
original design.

- We need to preserve the ELF headers inside the compresed image file
  for Xen and other interesting bootloaders that open up the bzImage
  and boot the ELF executable contained inside.

- ld will not uniformly produce a file where the file offsets have a
  constant offset from the in memory addresses.  In particular
  combinations of CONFIG_RODATA and CLONFIG_x86_64  CONFIG_SMP
  play games with 2MB alignments and the virtual address of functions
  that cause ld to emit valid ELF executables that do not have
  a fixed differents betwen file offset and loaded physical address
  making the ELF executable something that must be procecessed to
  get an in memory image.

- The old solution to creating a memory image objcopy -O binary comes
  very close but it always strips the ELF header even when the ELF header
  is explicitly made part of the ELF file.

Since all of the prebuilt tools don't work I have written a small
program mkelfbin, that generates a memory image by loading an ELF
executable into an in memory array.  Then the ELF program headers
offset fields are adjusted to reflect where in the memory image each
program header is referring to.  By design this results in program
headers with a fixes offset between the file offset and the physical
memory address with the file be loaded in memory.

With the compressed data being a proper memory image misc.c no longer
needs an ELF loader or dangerous copies over itself so those are
removed.

The result is a simpler more robust boot process, that still retains
all of the modern bells and whistles.

Signed-off-by: Eric W. Biederman ebied...@xmission.com
---
 arch/x86/boot/compressed/Makefile   |   10 +-
 arch/x86/boot/compressed/misc.c |   52 +-
 arch/x86/boot/compressed/misc.h |8 +
 arch/x86/boot/compressed/mkelfbin.c |  323 +++
 4 files changed, 343 insertions(+), 50 deletions(-)
 create mode 100644 arch/x86/boot/compressed/mkelfbin.c

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index e398bb5..67b9ae4 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -21,7 +21,7 @@ GCOV_PROFILE := n
 LDFLAGS := -m elf_$(UTS_MACHINE)
 LDFLAGS_vmlinux := -T
 
-hostprogs-y:= mkpiggy
+hostprogs-y:= mkpiggy mkelfbin
 HOST_EXTRACFLAGS += -I$(srctree)/tools/include
 
 VMLINUX_OBJS = $(obj)/vmlinux.lds $(obj)/head_$(BITS).o $(obj)/misc.o \
@@ -36,9 +36,11 @@ $(obj)/vmlinux: $(VMLINUX_OBJS) FORCE
$(call if_changed,ld)
@:
 
-OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
-$(obj)/vmlinux.bin: vmlinux FORCE
-   $(call if_changed,objcopy)
+quiet_cmd_mkelfbin = MKELFBIN $@
+  cmd_mkelfbin = $(obj)/mkelfbin $  $@ || ( rm -f $@ ; false )
+
+$(obj)/vmlinux.bin: vmlinux $(obj)/mkelfbin FORCE
+   $(call if_changed,mkelfbin)
 
 targets += vmlinux.bin.all vmlinux.relocs
 
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index fc96c3e..cb374ff 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -275,55 +275,15 @@ static void error(char *x)
 
 static void *parse_elf(void *output)
 {
-#ifdef CONFIG_X86_64
-   Elf64_Ehdr ehdr;
-   Elf64_Phdr *phdrs, *phdr;
-#else
-   Elf32_Ehdr ehdr;
-   Elf32_Phdr *phdrs, *phdr;
-#endif
-   void *dest;
-   int i;
+   ehdr_t *ehdr = output;
 
-   memcpy(ehdr, output, sizeof(ehdr));
-   if (ehdr.e_ident[EI_MAG0] != ELFMAG0 ||
-  ehdr.e_ident[EI_MAG1] != ELFMAG1 ||
-  ehdr.e_ident[EI_MAG2] != ELFMAG2 ||
-  ehdr.e_ident[EI_MAG3] != ELFMAG3) {
+   if (ehdr-e_ident[EI_MAG0] != ELFMAG0 ||
+   ehdr-e_ident[EI_MAG1] != ELFMAG1 ||
+   ehdr-e_ident[EI_MAG2] != ELFMAG2 ||
+   ehdr-e_ident[EI_MAG3] != ELFMAG3)
error(Kernel is not a valid ELF file);
-   return;
-   }
-
-   if (!quiet)
-   putstr(Parsing ELF... );
-
-   phdrs = malloc(sizeof(*phdrs) * ehdr.e_phnum);
-   if (!phdrs)
-   error(Failed to allocate space for phdrs);
-
-   memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum);
-

[PATCH 4/4] x86 boot: Tell ld the kernel doesn't want 2MB file offset alignment.


By default ld uses 2MB pages and aligns our 3 program segments in the
file on 2MB boundaries, creating unnecessarily large uncompressed
vmlinux files.

Solve this by passing -z max-page-size 4096 to ld.

In my test x86_64 SMP test configuration with CONFIG_DEBUG_RODATA
enabled, this reduces the size of vmlinux by roughly 5MB from 15141772
bytes to 10210188 bytes.

Signed-off-by: Eric W. Biederman ebied...@xmission.com
---
 arch/x86/Makefile |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 1f25214..b5b31c3 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -120,7 +120,7 @@ avx_instr := $(call as-instr,vxorps 
%ymm0$(comma)%ymm1$(comma)%ymm2,-DCONFIG_AS_
 KBUILD_AFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) $(avx_instr)
 KBUILD_CFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) $(avx_instr)
 
-LDFLAGS := -m elf_$(UTS_MACHINE)
+LDFLAGS := -m elf_$(UTS_MACHINE) -z max-page-size=4096
 
 # Speed up the build
 KBUILD_CFLAGS += -pipe
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] ubi: Fix bad PEBs reserve caclulation

2012-07-09 Thread Richard Genoud

2012/7/7 Shmulik Ladkani shmulik.ladk...@gmail.com:
 Many thanks for testing.

 Could you please verify the crash only occurs with the patch?

 Can you provide the vmlinux matching this oops, so I may analyze the
 exact null dereferencing point?
 It seems to be somewhere in ubi_wl_init, however the patch seems not to
 affect these parts of ubi...

Hi !
I can't reproduce it...
Maybe the problem was between the chair and the keyboard.
Anyway, if I ran into it again, I'll let you know.

Richard.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] x86 boot: Jump to the entry point address in the elf header.


And Peter no rush on these.  I have just finished testing and I am
pushing the changes out before I forget them.

Moving the Elf loader earlier to compile time makes the code a lot
more robust.

Eric
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] atl1c: fix issue of transmit queue 0 timed out

From: cj...@qca.qualcomm.com
Date: Wed, 4 Jul 2012 10:51:48 +0800

 some people report atl1c could cause system hang with following
 kernel trace info:
 ---
 WARNING: at.../net/sched/sch_generic.c:258 dev_watchdog+0x1db/0x1d0()
 ...
 NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
 ...
 ---
 This is caused by netif_stop_queue calling when cable Link is down.
 So remove netif_stop_queue, because link_watch will take it over.
 
 Signed-off-by: xiong xi...@qca.qualcomm.com
 Cc: stable sta...@vger.kernel.org
 Signed-off-by: Cloud Ren cj...@qca.qualcomm.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] x86, boot: Optimize the elf header handling.

Tejun Heo t...@kernel.org writes:

 Hello, guys.

 On Sun, Jul 01, 2012 at 11:37:22AM -0700, H. Peter Anvin wrote:
 If we don't need it, I think we can use -z max-page-size=4096, but we
 use the PMD alignment for percpu on x86-64; Tejun, does that apply to
 the .data..percpu section in the executable as well?

 I don't think the .data..percpu section needs 2M alignment.  The
 percpu data section is only used as init template and actual percpu
 addresses always go through offsetting against __per_cpu_offset[] - no
 matter what the vaddrs in the vmlinux are, they get offsetted into 2M
 aligned linear address if necessary.  I think the only alignment
 .data..percpu needs is cacheline alignment for separating its
 subsections.

Thanks.  My basic testing isn't showing any problems.

Of course all that changed was where in the vmlinux file not where
in physical memory the data was loaded, so problems would really
surprise me. 

Eric

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND] Fix a dead loop in async_synchronize_full()

2012-07-09 Thread Li Zhong

This patch tries to fix a dead loop in  async_synchronize_full(), which
could be seen when preemption is disabled on a single cpu machine. 

void async_synchronize_full(void)
{
do {
async_synchronize_cookie(next_cookie);
} while (!list_empty(async_running) || !
list_empty(async_pending));
}

async_synchronize_cookie() calls async_synchronize_cookie_domain() with
async_running as the default domain to synchronize. 

However, there might be some works in the async_pending list from other
domains. On a single cpu system, without preemption, there is no chance
for the other works to finish, so async_synchronize_full() enters a dead
loop. 

It seems async_synchronize_full() wants to synchronize all entries in
all running lists(domains), so maybe we could just check the entry_count
to know whether all works are finished. 

Currently, async_synchronize_cookie_domain() expects a non-NULL running
list ( if NULL, there would be NULL pointer dereference ), so maybe a
NULL pointer could be used as an indication for the functions to
synchronize all works in all domains. 

Reported-by: Paul E. McKenney paul...@linux.vnet.ibm.com
Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com
Tested-by: Paul E. McKenney paul...@linux.vnet.ibm.com
Tested-by: Christian Kujau li...@nerdbynature.de
---
 kernel/async.c |   13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/async.c b/kernel/async.c
index bd0c168..32d8dc9 100644
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -86,6 +86,13 @@ static async_cookie_t  __lowest_in_progress(struct
list_head *running)
 {
struct async_entry *entry;
 
+   if (!running) { /* just check the entry count */
+   if (atomic_read(entry_count))
+   return 0; /* smaller than any cookie */
+   else
+   return next_cookie;
+   }
+
if (!list_empty(running)) {
entry = list_first_entry(running,
struct async_entry, list);
@@ -236,9 +243,7 @@ EXPORT_SYMBOL_GPL(async_schedule_domain);
  */
 void async_synchronize_full(void)
 {
-   do {
-   async_synchronize_cookie(next_cookie);
-   } while (!list_empty(async_running) || !list_empty(async_pending));
+   async_synchronize_cookie_domain(next_cookie, NULL);
 }
 EXPORT_SYMBOL_GPL(async_synchronize_full);
 
@@ -258,7 +263,7 @@ EXPORT_SYMBOL_GPL(async_synchronize_full_domain);
 /**
  * async_synchronize_cookie_domain - synchronize asynchronous function
calls within a certain domain with cookie checkpointing
  * @cookie: async_cookie_t to use as checkpoint
- * @running: running list to synchronize on
+ * @running: running list to synchronize on, NULL indicates all lists
  *
  * This function waits until all asynchronous function calls for the
  * synchronization domain specified by the running list @list submitted
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Question] sched/rt_mutex: re-enqueue_task on rt_mutex_setprio()

2012-07-09 Thread Namhyung Kim

On Mon, Jul 9, 2012 at 3:48 PM, Peter Zijlstra pet...@infradead.org wrote:
 On Mon, 2012-07-09 at 09:50 +0900, Namhyung Kim wrote:
 On Sat, 07 Jul 2012 21:29:19 -0400, Steven Rostedt wrote:
  On Sat, 2012-07-07 at 14:44 +0900, Namhyung Kim wrote:
  Hi,
 
  I have a question on the code below:
 
  void rt_mutex_setprio(struct task_struct *p, int prio)
  {
  ...
 if (on_rq)
 enqueue_task(rq, p, oldprio  prio ? ENQUEUE_HEAD : 0);
 
  When enqueueing @p with new @prio, it seems put @p at the head of a
  rq if appropriate. I guess it's the case of boosting @p with higher
  priority, right?
 
  Actually, no. We put @p at the head of the queue when unboosting. If a
  task is going from a high priority into a lower priority, it is still
  treated as important for that priority, and is put to the front of the
  queue (it was just higher than everything else on that queue). But if we
  are boosting a task from a low priority, why put it to the head of other
  tasks of its new priority, when those tasks were just higher than this
  task, and this task is now just an equal.

 Thanks for the explanation. (Isn't it worth getting commented?) :)

 Possibly, note that this part is well spec'ed by POSIX, see

 http://pubs.opengroup.org/onlinepubs/009695299/functions/xsh_chap02_08.html

 SCHED_FIFO.8

Thanks for the pointer. I need to educate myself a lot more!

Thanks,
Namhyung
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] netdev/phy: Fixup lockdep warnings in mdio-mux.c

From: David Daney ddaney.c...@gmail.com
Date: Wed,  4 Jul 2012 15:06:16 -0700

 From: David Daney david.da...@cavium.com

 With lockdep enabled we get:
 ...
 This is a false positive, since we are indeed using 'nested' locking,
 we need to use mutex_lock_nested().

 Now in theory we can stack multiple MDIO multiplexers, but that would
 require passing the nesting level (which is difficult to know) to
 mutex_lock_nested().  Instead we assume the simple case of a single
 level of nesting.  Since these are only warning messages, it isn't so
 important to solve the general case.

 Signed-off-by: David Daney david.da...@cavium.com

Applied to 'net', thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] cgroup: fix panic in netprio_cgroup

From: Gao feng gaof...@cn.fujitsu.com
Date: Thu, 5 Jul 2012 17:28:40 +0800

 we set max_prioidx to the first zero bit index of prioidx_map in
 function get_prioidx.

 So when we delete the low index netprio cgroup and adding a new
 netprio cgroup again,the max_prioidx will be set to the low index.

 when we set the high index cgroup's net_prio.ifpriomap,the function
 write_priomap will call update_netdev_tables to alloc memory which
 size is sizeof(struct netprio_map) + sizeof(u32) * (max_prioidx + 1),
 so the size of array that map-priomap point to is max_prioidx +1,
 which is low than what we actually need.

 fix this by adding check in get_prioidx,only set max_prioidx when
 max_prioidx low than the new prioidx.

 Signed-off-by: Gao feng gaof...@cn.fujitsu.com

Applied.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Antw: Re: /sys and access(2): Correctly implemented?

2012-07-09 Thread Ryan Mallon

On 09/07/12 16:23, Ulrich Windl wrote:
 Ryan Mallon rmal...@gmail.com schrieb am 09.07.2012 um 01:24 in Nachricht
 4ffa16b6.9050...@gmail.com:
 On 06/07/12 16:27, Ulrich Windl wrote:
 Hi!

 Recently I found a problem with the command (kernel 3.0.34-0.7-default from 
 SLES 11 SP2, run as root):
 test -r $file  cat $file
 emitting Permission denied

 Investigating, I found that test actually uses access() to check for 
 permissions. Unfortunately there are some files in /sys that have 
 write-only 
 permission bits set (e.g. /sys/devices/system/cpu/probe).

 ~ # ll /sys/devices/system/cpu/probe
 --w--- 1 root root 4096 Jun 29 12:43 /sys/devices/system/cpu/probe
 ~ # F=/sys/devices/system/cpu/probe
 ~ # test $F  cat $F
 cat: /sys/devices/system/cpu/probe: Permission denied

 Looks like you have a typo here, I think you wanted test -r $F, not
 test $F, the latter will just evaluate $F as an expression which
 will be true, and so you get the permission denied error running cat.
 
 Hi!
 
 You are right: It's a typo, but only in the message; the actual test was done 
 correctly, and the outcome is quite the same.
 

 Using test -r $F on a write-only sysfs file correctly returns false on
 my machine (Ubuntu 10.04.4 LTS/2.6.32-41-generic).
 
 Not here, unfortunately:

Oops, I missed the bit about you running as root. I get the same results
running as root on my machine as you, both for sysfs and regular files.

It appears that access(2) as the super-user is might be implementation
defined, see:

  http://pubs.opengroup.org/onlinepubs/95399/functions/access.html
  http://lists.gnu.org/archive/html/bug-bash/2010-07/msg00071.html

However, I can't find any concrete information on it for Linux, and the
manpage doesn't mention anything other the the X_OK bit.

~Ryan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fs/ubifs/orphan.c: remove invalid reference to list iterator variable

2012-07-09 Thread Julia Lawall

From: Julia Lawall julia.law...@lip6.fr

If list_for_each_entry, etc complete a traversal of the list, the iterator
variable ends up pointing to an address at an offset from the list head,
and not a meaningful structure.  Thus this value should not be used after
the end of the iterator.  Replace a field access from orphan by NULL in two
places.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// smpl
@@
identifier c;
expression E;
iterator name list_for_each_entry;
statement S;
@@

list_for_each_entry(c,...) { ... when != break;
 when forall
 when strict
}
...
(
c = E
|
*c
)
// /smpl

Signed-off-by: Julia Lawall julia.law...@lip6.fr

---
 fs/ubifs/orphan.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ubifs/orphan.c b/fs/ubifs/orphan.c
index b02734d..cebf17e 100644
--- a/fs/ubifs/orphan.c
+++ b/fs/ubifs/orphan.c
@@ -176,7 +176,7 @@ int ubifs_orphan_start_commit(struct ubifs_info *c)
*last = orphan;
last = orphan-cnext;
}
-   *last = orphan-cnext;
+   *last = NULL;
c-cmt_orphans = c-new_orphans;
c-new_orphans = 0;
dbg_cmt(%d orphans to commit, c-cmt_orphans);
@@ -382,7 +382,7 @@ static int consolidate(struct ubifs_info *c)
last = orphan-cnext;
cnt += 1;
}
-   *last = orphan-cnext;
+   *last = NULL;
ubifs_assert(cnt == c-tot_orphans - c-new_orphans);
c-cmt_orphans = cnt;
c-ohead_lnum = c-orph_first;

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pull request: wireless 2012-07-06

From: John W. Linville linvi...@tuxdriver.com
Date: Fri, 6 Jul 2012 15:20:35 -0400

 Please let me know if there are problems!

This indentation is not correct:

commit 01f9cb073c827c60c43f769763b49a2026f1a897
Author: Thomas Huehn tho...@net.t-labs.tu-berlin.de
Date:   Thu Jun 28 14:39:51 2012 -0700

mwl8k: fix possible race condition in info-control.sta use
 ...
+   sta = ieee80211_find_sta_by_ifaddr(hw, wh-addr1,
+   wh-addr2);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 11/11] MAINTAINERS: add fblog entry

2012-07-09 Thread Geert Uytterhoeven

On Sun, Jul 8, 2012 at 11:56 PM, David Herrmann
dh.herrm...@googlemail.com wrote:
 Add myself as maintainer for the fblog driver to the MAINTAINERS file.

 Signed-off-by: David Herrmann dh.herrm...@googlemail.com
 ---
  MAINTAINERS | 6 ++
  1 file changed, 6 insertions(+)

 diff --git a/MAINTAINERS b/MAINTAINERS
 index ae8fe46..249b02a 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -2854,6 +2854,12 @@ F:   drivers/video/
  F: include/video/
  F: include/linux/fb.h

 +FRAMEBUFFER LOG DRIVER
 +M: David Herrmann dh.herrm...@googlemail.com
 +L: linux-ser...@vger.kernel.org

Why linux-serial, and not linux-fbdev?

 +S: Maintained
 +F: drivers/video/console/fblog.c
 +
  FREESCALE DMA DRIVER
  M: Li Yang le...@freescale.com
  M: Zhang Wei z...@zh-kernel.org

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 04/11] fbdev: export get_fb_info()/put_fb_info()

2012-07-09 Thread Geert Uytterhoeven

On Sun, Jul 8, 2012 at 11:56 PM, David Herrmann
dh.herrm...@googlemail.com wrote:
 --- a/drivers/video/fbmem.c
 +++ b/drivers/video/fbmem.c
 @@ -46,7 +46,7 @@ static DEFINE_MUTEX(registration_lock);
  struct fb_info *registered_fb[FB_MAX] __read_mostly;
  int num_registered_fb __read_mostly;

 -static struct fb_info *get_fb_info(unsigned int idx)
 +struct fb_info *get_fb_info(unsigned int idx)
  {
 struct fb_info *fb_info;

 @@ -61,14 +61,16 @@ static struct fb_info *get_fb_info(unsigned int idx)

 return fb_info;
  }
 +EXPORT_SYMBOL(get_fb_info);

EXPORT_SYMBOL_GPL?

 -static void put_fb_info(struct fb_info *fb_info)
 +void put_fb_info(struct fb_info *fb_info)
  {
 if (!atomic_dec_and_test(fb_info-count))
 return;
 if (fb_info-fbops-fb_destroy)
 fb_info-fbops-fb_destroy(fb_info);
  }
 +EXPORT_SYMBOL(put_fb_info);

EXPORT_SYMBOL_GPL?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fwd: Hid over I2C and ACPI interaction

2012-07-09 Thread Lan Tianyu

On 2012年07月09日 12:02, Moore, Robert wrote:
 These are already defined in acpica - in the file acrestyp.h
 
  ACPI_RESOURCE_FIXED_DMA FixedDma;
 
  ACPI_RESOURCE_GPIO  Gpio;
  ACPI_RESOURCE_I2C_SERIALBUS I2cSerialBus;
  ACPI_RESOURCE_SPI_SERIALBUS SpiSerialBus;
  ACPI_RESOURCE_UART_SERIALBUSUartSerialBus;
  ACPI_RESOURCE_COMMON_SERIALBUS  CommonSerialBus;
 
Yeah. Thanks for Bob's reminder. We can reuse these macros.

 
 
 -Original Message-
 From: linux-acpi-ow...@vger.kernel.org [mailto:linux-acpi-
 ow...@vger.kernel.org] On Behalf Of Lan Tianyu
 Sent: Sunday, July 08, 2012 8:25 PM
 To: Mika Westerberg
 Cc: Zhang, Rui; kh...@linux-fr.org; ben-li...@fluff.org;
 w.s...@pengutronix.de; l...@kernel.org; linux-a...@vger.kernel.org; linux-
 i...@vger.kernel.org; linux-kernel@vger.kernel.org; jkos...@suse.cz;
 cha...@enac.fr; jj_d...@emc.com.tw; bhelg...@google.com; abe...@mit.edu
 Subject: Re: Fwd: Hid over I2C and ACPI interaction

 On 2012年07月06日 13:52, Mika Westerberg wrote:
 On Thu, Jul 05, 2012 at 03:01:57PM +0800, Zhang Rui wrote:
 +Note that although these are ACPI devices, we prefer to use PnP drivers
 for them,
 +this is because:
 +1. all the non-ACPI-predefined Devices are exported as PnP devices as
 well
 +2. PnP bus is a well designed bus. Probing via PnP layer saves a lot of
 work
 +   for the device driver, e.g. getting   parsing ACPI resources.

 (Nice BKM, thanks for sharing)

 I have few questions about using PnP drivers instead of pure ACPI
 drivers.

 ACPI 5.0 defined some new resources, for example Fixed DMA descriptor
 that has information about the request line + channel for the device to
 use. Hovewer, PnP drivers pass resources as 'struct resource', which
 basically only has start and end - how do you represent all this new
 stuff
 using 'struct resource'?

 I think we can add new interface to get acpi specific resources. e.g
 struct acpi_resource pnp_get_acpi_resource(...). When the pnp acpi devices
 were initialized, put those acpi specific resources into a new resource
 list
 pnpdev-acpi_resources. What pnp_get_acpi_resource does is to get specified
 type acpi resources and return. We also need to define some acpi resource
 types.

 ACPI_RESOURCE_DMA
 ACPI_RESOURCE_I2C_SERIALBUS
 ACPI_RESOURCE_SPI_SERIALBUS
 ACPI_RESOURCE_UART_SERIALBUS
 ACPI_RESOURCE_COMMON_SERIALBUS
 ...

 How about this? welcome to comments.

 Or should we use acpi_walk_resources() where 'struct resource' is not
 suitable?


 --
 Best Regards
 Tianyu Lan
 linux kernel enabling team
 --
 To unsubscribe from this list: send the line unsubscribe linux-acpi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Best Regards
Tianyu Lan
linux kernel enabling team
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 04/11] mm: memcg: push down PageSwapCache check into uncharge entry functions

2012-07-09 Thread Johannes Weiner

On Mon, Jul 09, 2012 at 11:42:12AM +0900, Kamezawa Hiroyuki wrote:
 (2012/07/05 9:44), Johannes Weiner wrote:
  @@ -3278,10 +3283,11 @@ void mem_cgroup_end_migration(struct mem_cgroup 
  *memcg,
  unused = oldpage;
  }
  anon = PageAnon(used);
  -   __mem_cgroup_uncharge_common(unused,
  -   anon ? MEM_CGROUP_CHARGE_TYPE_ANON
  -: MEM_CGROUP_CHARGE_TYPE_CACHE,
  -   true);
  +   if (!PageSwapCache(page))
  +   __mem_cgroup_uncharge_common(unused,
  +anon ? MEM_CGROUP_CHARGE_TYPE_ANON
  +: MEM_CGROUP_CHARGE_TYPE_CACHE,
  +true);
 
 !PageSwapCache(unused) ?

Argh, right.

 But I think unused page's PG_swapcache is always dropped. So, the check is
 not necessary.

Oh, this is intentional: the check was in __mem_cgroup_uncharge_common
before, which means it applied to this entry point as well.  This is
supposed to be a mechanical change that does not change any logic.
The check is then removed in the next patch.

---
Subject: mm: memcg: push down PageSwapCache check into uncharge entry functions 
fix

Signed-off-by: Johannes Weiner han...@cmpxchg.org
---

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a3bf414..f4ff18a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3283,7 +3283,7 @@ void mem_cgroup_end_migration(struct mem_cgroup *memcg,
unused = oldpage;
}
anon = PageAnon(used);
-   if (!PageSwapCache(page))
+   if (!PageSwapCache(unused))
__mem_cgroup_uncharge_common(unused,
 anon ? MEM_CGROUP_CHARGE_TYPE_ANON
 : MEM_CGROUP_CHARGE_TYPE_CACHE,

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: UBI fastmap updates

2012-07-09 Thread Shmulik Ladkani

Hi Richard,

On Sun, 08 Jul 2012 14:07:41 +0200 Richard Weinberger rich...@nod.at wrote:
  +   /* TODO: in the new locking scheme, produce_free_peb is
  +* called under wl_lock taken.
  +* so when returning, should reacquire the lock
  +*/
 
 Which new locking scheme?

I am diffing linux-ubi fastmap HEAD against its fork point (vanilla
ubi), that's 6b16351..d41a140 on linux-ubi.

Which gives the following diff in produce_free_pebs:

@@ -261,7 +266,6 @@ static int produce_free_peb(struct ubi_device *ubi)
 {
int err;
 
-   spin_lock(ubi-wl_lock);
while (!ubi-free.rb_node) {
spin_unlock(ubi-wl_lock);
 
@@ -272,7 +276,6 @@ static int produce_free_peb(struct ubi_device *ubi)
 
spin_lock(ubi-wl_lock);
}
-   spin_unlock(ubi-wl_lock);
 
return 0;
 }

Which is probably okay, since you obtain the lock in the new
'ubi_refill_pools', which calls produce_free_peb:

+void ubi_refill_pools(struct ubi_device *ubi)
+{
+   spin_lock(ubi-wl_lock);
+   refill_wl_pool(ubi);
+   refill_wl_user_pool(ubi);
+   spin_unlock(ubi-wl_lock);
+}

However if 'do_work' fails within 'produce_free_peb', you return the
error but leave wl_lock unlocked - where it is expected to be locked
(otherwise, ubi_refill_pools will unlock it again):

static int produce_free_peb(struct ubi_device *ubi)
{
int err;

while (!ubi-free.rb_node) {
spin_unlock(ubi-wl_lock);

dbg_wl(do one work synchronously);
err = do_work(ubi);
if (err)
return err;

spin_lock(ubi-wl_lock);
}

return 0;
}

Regards,
Shmulik
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] hw_random: mxc-rnga: Adapt clocks to new i.mx clock framework

2012-07-09 Thread Sascha Hauer

On Fri, Jul 06, 2012 at 05:20:19PM -0300, Fabio Estevam wrote:
 Cc: Theodore Ts'o ty...@mit.edu
 Cc: Herbert Xu herb...@gondor.apana.org.au 
 Cc: linux-kernel@vger.kernel.org
 Signed-off-by: Fabio Estevam fabio.este...@freescale.com
 ---
  drivers/char/hw_random/mxc-rnga.c |8 
  1 files changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/char/hw_random/mxc-rnga.c 
 b/drivers/char/hw_random/mxc-rnga.c
 index 85074de..c49c0b8 100644
 --- a/drivers/char/hw_random/mxc-rnga.c
 +++ b/drivers/char/hw_random/mxc-rnga.c
 @@ -152,14 +152,14 @@ static int __init mxc_rnga_probe(struct platform_device 
 *pdev)
   if (rng_dev)
   return -EBUSY;
  
 - clk = clk_get(pdev-dev, rng);
 + clk = clk_get(pdev-dev, NULL);
   if (IS_ERR(clk)) {
   dev_err(pdev-dev, Could not get rng_clk!\n);
   err = PTR_ERR(clk);
   goto out;
   }
  
 - clk_enable(clk);
 + clk_prepare_enable(clk);
  
   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
   if (!res) {
 @@ -201,7 +201,7 @@ err_ioremap:
   release_mem_region(res-start, resource_size(res));
  
  err_region:
 - clk_disable(clk);
 + clk_disable_unprepare(clk);
   clk_put(clk);
  
  out:
 @@ -212,7 +212,7 @@ static int __exit mxc_rnga_remove(struct platform_device 
 *pdev)
  {
   struct resource *res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
   void __iomem *rng_base = (void __iomem *)mxc_rnga.priv;
 - struct clk *clk = clk_get(pdev-dev, rng);
 + struct clk *clk = clk_get(pdev-dev, NULL);

Uhh, that's a driver bug that should be fixed. Although right now there
is no reference counting for clocks, the driver should keep the clk
internally instead of simply calling clk_get whenever it needs access to
a clk.

Sascha

-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] net: cgroup: fix out of bounds accesses

2012-07-09 Thread Eric Dumazet

From: Eric Dumazet eduma...@google.com

dev-priomap is allocated by extend_netdev_table() called from
update_netdev_tables().
And this is only called if write_priomap() is called.

But if write_priomap() is not called, it seems we can have out of bounds
accesses in cgrp_destroy(), read_priomap()  skb_update_prio()

With help from Gao Feng

Signed-off-by: Eric Dumazet eduma...@google.com
Cc: Neil Horman nhor...@tuxdriver.com
Cc: Gao feng gaof...@cn.fujitsu.com
---
net/core/dev.c|8 ++--
net/core/netprio_cgroup.c |4 ++--
2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 84f01ba..0f28a9e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2444,8 +2444,12 @@ static void skb_update_prio(struct sk_buff *skb)
 {
struct netprio_map *map = rcu_dereference_bh(skb-dev-priomap);
 
-   if ((!skb-priority)  (skb-sk)  map)
-   skb-priority = map-priomap[skb-sk-sk_cgrp_prioidx];
+   if (!skb-priority  skb-sk  map) {
+   unsigned int prioidx = skb-sk-sk_cgrp_prioidx;
+
+   if (prioidx  map-priomap_len)
+   skb-priority = map-priomap[prioidx];
+   }
 }
 #else
 #define skb_update_prio(skb)
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index aa907ed..3e953ea 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -142,7 +142,7 @@ static void cgrp_destroy(struct cgroup *cgrp)
rtnl_lock();
for_each_netdev(init_net, dev) {
map = rtnl_dereference(dev-priomap);
-   if (map)
+   if (map  cs-prioidx  map-priomap_len)
map-priomap[cs-prioidx] = 0;
}
rtnl_unlock();
@@ -166,7 +166,7 @@ static int read_priomap(struct cgroup *cont, struct cftype 
*cft,
rcu_read_lock();
for_each_netdev_rcu(init_net, dev) {
map = rcu_dereference(dev-priomap);
-   priority = map ? map-priomap[prioidx] : 0;
+   priority = (map  prioidx  map-priomap_len) ? 
map-priomap[prioidx] : 0;
cb-fill(cb, dev-name, priority);
}
rcu_read_unlock();


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH V2 2/3] pwm_backlight: use power sequences

2012-07-09 Thread Alex Courbot

Sorry, I just noticed a mistake in this patch I made while merging 
another one. The following also needs to be changed, otherwise the 
power-on sequence will never be executed:


diff --git a/drivers/video/backlight/pwm_bl.c 
b/drivers/video/backlight/pwm_bl.c

index 1a38953..4546d23 100644
--- a/drivers/video/backlight/pwm_bl.c
+++ b/drivers/video/backlight/pwm_bl.c
@@ -65,7 +98,7 @@ static int pwm_backlight_update_status(struct 
backlight_device *bl)

duty_cycle = pb-lth_brightness +
 (duty_cycle * (pb-period - pb-lth_brightness) / 
max);

pwm_config(pb-pwm, duty_cycle, pb-period);
-   pwm_enable(pb-pwm);
+   pwm_backlight_on(bl);
}


Apologies for the inconvenience.

Alex.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fwd: Hid over I2C and ACPI interaction

2012-07-09 Thread Mika Westerberg

On Mon, Jul 09, 2012 at 11:24:45AM +0800, Lan Tianyu wrote:
 I think we can add new interface to get acpi specific resources. e.g
 struct acpi_resource pnp_get_acpi_resource(...). When the pnp acpi devices
 were initialized, put those acpi specific resources into a new resource list
 pnpdev-acpi_resources. What pnp_get_acpi_resource does is to get specified
 type acpi resources and return. We also need to define some acpi resource 
 types.

Yeah, that sounds good to me.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/2] kvm: Improving directed yield in PLE handler

2012-07-09 Thread Christian Borntraeger

On 09/07/12 08:20, Raghavendra K T wrote:
 Currently Pause Looop Exit (PLE) handler is doing directed yield to a
 random VCPU on PL exit. Though we already have filtering while choosing
 the candidate to yield_to, we can do better.
 
 Problem is, for large vcpu guests, we have more probability of yielding
 to a bad vcpu. We are not able to prevent directed yield to same guy who
 has done PL exit recently, who perhaps spins again and wastes CPU.
 
 Fix that by keeping track of who has done PL exit. So The Algorithm in series
 give chance to a VCPU which has:


We could do the same for s390. The appropriate exit would be diag44 (yield to 
hypervisor).

Almost all s390 kernels use diag9c (directed yield to a given guest cpu) for 
spinlocks, though.
So there is no win here, but there are other cases were diag44 is used, e.g. 
cpu_relax.
I have to double check with others, if these cases are critical, but for now, 
it seems 
that your dummy implementation  for s390 is just fine. After all it is a no-op 
until 
we implement something.

Thanks

Christian

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] CONFIG_CC_STACKPROTECTOR is no longer experimental

2012-07-09 Thread Jean Delvare

Hi all,

Le vendredi 06 juillet 2012 à 22:19 +0200, Paul Bolle a écrit :
 On Fri, 2012-07-06 at 10:58 -0700, Arjan van de Ven wrote:
  I rather just retire the whole concept of Experimental.
  
  it's really utterly meaningless in practice anyway.
 
 See Russell King's quick survey in https://lkml.org/lkml/2012/1/18/397 :
 almost all defconfigs had CONFIG_EXPERIMENTAL enabled. I didn't recheck
 since I'm sure little has changed. That macro and the related Kconfig
 symbol seem indeed meaningless.

I admit I have CONFIG_EXPERIMENTAL enabled on all my systems as well,
even the ones running an enterprise grade flavor of GNU/Linux.

This isn't necessarily surprising. Having to make decisions at build
time has always been an issue for distributions. The proper way for
distributions to deal with experimental drivers is to package them
separately and/or blacklist them by default. For experimental options,
best is to make them tunable at run time, for example using module
parameters.

As for options still depending on EXPERIMENTAL when they no longer
should, this can partly be explained when the EXPERIMENTAL dependency
doesn't show up in the short description. This is the case of
CONFIG_CC_STACKPROTECTOR. As everybody has CONFIG_EXPERIMENTAL enabled,
nobody notices the dependency.

The existence of CONFIG_EXPERIMENTAL may give developers the impression
that depending on it is sufficient and the right thing to do for
experimental drivers/features. That would be true if depending on
CONFIG_EXPERIMENTAL would automatically add (EXPERIMENTAL) to the
short description, as Randy and I were discussing previously, but this
was never implemented.

If we all agree that CONFIG_EXPERIMENTAL is no longer a good idea, then
I'm fine dropping it. I'm always happy to see kernel configuration
options go. Then options which used to depend on it and did not have
(EXPERIMENTAL) in their short description should have it appended.
These options should also default to N (but I think this is the default
default if none is specified?) Maybe a task for kernel janitors?

Back to my initial question, am I right to assume that
CONFIG_CC_STACKPROTECTOR is no longer experimental and can be enabled in
distribution kernels?

Thanks,
-- 
Jean Delvare
Suse L3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Antw: Re: /sys and access(2): Correctly implemented?

2012-07-09 Thread Ulrich Windl

Hi!

Still the problem seems to be related to the sysfs:
# cd /tmp
# touch testfile
# chmod u=w,go= testfile
# F=/tmp/testfile
# test -r $F  cat $F

So it seems access(2) works correctly for root and normal filesystems. That's 
why I came up with the issue here.

Regards,
Ulrich

 Ryan Mallon rmal...@gmail.com schrieb am 09.07.2012 um 09:22 in Nachricht
4ffa86c5.7090...@gmail.com:
 On 09/07/12 16:23, Ulrich Windl wrote:
 Ryan Mallon rmal...@gmail.com schrieb am 09.07.2012 um 01:24 in 
 Nachricht
  4ffa16b6.9050...@gmail.com:
  On 06/07/12 16:27, Ulrich Windl wrote:
  Hi!
 
  Recently I found a problem with the command (kernel 3.0.34-0.7-default 
  from 
  SLES 11 SP2, run as root):
  test -r $file  cat $file
  emitting Permission denied
 
  Investigating, I found that test actually uses access() to check for 
  permissions. Unfortunately there are some files in /sys that have 
 write-only 
  permission bits set (e.g. /sys/devices/system/cpu/probe).
 
  ~ # ll /sys/devices/system/cpu/probe
  --w--- 1 root root 4096 Jun 29 12:43 /sys/devices/system/cpu/probe
  ~ # F=/sys/devices/system/cpu/probe
  ~ # test $F  cat $F
  cat: /sys/devices/system/cpu/probe: Permission denied
 
  Looks like you have a typo here, I think you wanted test -r $F, not
  test $F, the latter will just evaluate $F as an expression which
  will be true, and so you get the permission denied error running cat.
  
  Hi!
  
  You are right: It's a typo, but only in the message; the actual test was 
 done correctly, and the outcome is quite the same.
  
 
  Using test -r $F on a write-only sysfs file correctly returns false on
  my machine (Ubuntu 10.04.4 LTS/2.6.32-41-generic).
  
  Not here, unfortunately:
 
 Oops, I missed the bit about you running as root. I get the same results
 running as root on my machine as you, both for sysfs and regular files.
 
 It appears that access(2) as the super-user is might be implementation
 defined, see:
 
   http://pubs.opengroup.org/onlinepubs/95399/functions/access.html 
   http://lists.gnu.org/archive/html/bug-bash/2010-07/msg00071.html 
 
 However, I can't find any concrete information on it for Linux, and the
 manpage doesn't mention anything other the the X_OK bit.
 
 ~Ryan
 

 
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] net: cgroup: fix out of bounds accesses

2012-07-09 Thread Gao feng

于 2012年07月09日 15:45, Eric Dumazet 写道:
 From: Eric Dumazet eduma...@google.com
 
 dev-priomap is allocated by extend_netdev_table() called from
 update_netdev_tables().
 And this is only called if write_priomap() is called.
 
 But if write_priomap() is not called, it seems we can have out of bounds
 accesses in cgrp_destroy(), read_priomap()  skb_update_prio()
 
 With help from Gao Feng
 
 Signed-off-by: Eric Dumazet eduma...@google.com
 Cc: Neil Horman nhor...@tuxdriver.com
 Cc: Gao feng gaof...@cn.fujitsu.com
 ---
 net/core/dev.c|8 ++--
 net/core/netprio_cgroup.c |4 ++--
 2 files changed, 8 insertions(+), 4 deletions(-)

Acked-by: Gao feng gaof...@cn.fujitsu.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs

2012-07-09 Thread Yasuaki Ishimatsu

Hi Wen,

2012/07/06 18:20, Wen Congyang wrote:
 At 07/06/2012 04:27 PM, Yasuaki Ishimatsu Wrote:
 Hi Wen,

 2012/07/04 19:01, Wen Congyang wrote:
 At 07/04/2012 01:52 PM, Yasuaki Ishimatsu Wrote:
 Hi Wen,

 2012/07/04 14:08, Wen Congyang wrote:
 At 07/04/2012 12:45 PM, Yasuaki Ishimatsu Wrote:
 Hi Wen,

 2012/07/03 15:35, Wen Congyang wrote:
 At 07/03/2012 01:56 PM, Yasuaki Ishimatsu Wrote:
 When (hot)adding memory into system, /sys/firmware/memmap/X/{end, 
 start, type}
 sysfs files are created. But there is no code to remove these files. 
 The patch
 implements the function to remove them.

 Note : The code does not free firmware_map_entry since there is no way 
 to free
memory which is allocated by bootmem.

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 ---
  drivers/firmware/memmap.c|   70 
 +++
  include/linux/firmware-map.h |6 +++
  mm/memory_hotplug.c  |6 +++
  3 files changed, 81 insertions(+), 1 deletion(-)

 Index: linux-3.5-rc4/mm/memory_hotplug.c
 ===
 --- linux-3.5-rc4.orig/mm/memory_hotplug.c 2012-07-03 
 14:22:00.190240794 +0900
 +++ linux-3.5-rc4/mm/memory_hotplug.c  2012-07-03 14:22:03.549198802 
 +0900
 @@ -661,7 +661,11 @@ EXPORT_SYMBOL_GPL(add_memory);

  int remove_memory(int nid, u64 start, u64 size)
  {
 -  return -EBUSY;
 +  lock_memory_hotplug();
 +  /* remove memmap entry */
 +  firmware_map_remove(start, start + size - 1, System RAM);
 +  unlock_memory_hotplug();
 +  return 0;

  }
  EXPORT_SYMBOL_GPL(remove_memory);
 Index: linux-3.5-rc4/include/linux/firmware-map.h
 ===
 --- linux-3.5-rc4.orig/include/linux/firmware-map.h2012-07-03 
 14:21:45.766421116 +0900
 +++ linux-3.5-rc4/include/linux/firmware-map.h 2012-07-03 
 14:22:03.550198789 +0900
 @@ -25,6 +25,7 @@

  int firmware_map_add_early(u64 start, u64 end, const char *type);
  int firmware_map_add_hotplug(u64 start, u64 end, const char 
 *type);
 +int firmware_map_remove(u64 start, u64 end, const char *type);

  #else /* CONFIG_FIRMWARE_MEMMAP */

 @@ -38,6 +39,11 @@ static inline int firmware_map_add_hotpl
return 0;
  }

 +static inline int firmware_map_remove(u64 start, u64 end, const char 
 *type)
 +{
 +  return 0;
 +}
 +
  #endif /* CONFIG_FIRMWARE_MEMMAP */

  #endif /* _LINUX_FIRMWARE_MAP_H */
 Index: linux-3.5-rc4/drivers/firmware/memmap.c
 ===
 --- linux-3.5-rc4.orig/drivers/firmware/memmap.c   2012-07-03 
 14:21:45.761421180 +0900
 +++ linux-3.5-rc4/drivers/firmware/memmap.c2012-07-03 
 14:22:03.569198549 +0900
 @@ -79,7 +79,16 @@ static const struct sysfs_ops memmap_att
.show = memmap_attr_show,
  };

 +static void release_firmware_map_entry(struct kobject *kobj)
 +{
 +  /*
 +   * FIXME : There is no idea.
 +   * How to free the entry which allocated bootmem?
 +   */

 I find a function free_bootmem(), but I am not sure whether it can work 
 here.

 It cannot work here.

 Another problem: how to check whether the entry uses bootmem?

 When firmware_map_entry is allocated by kzalloc(), the page has PG_slab.

 This is not true. In my test, I find the page does not have PG_slab 
 sometimes.

 I think that it depends on the allocated size. firmware_map_entry size is
 smaller than PAGE_SIZE. So the page has PG_Slab.

 In my test, I add printk in the function firmware_map_add_hotplug() to 
 display
 page's flags. And sometimes the page is not allocated by slab(I use 
 PageSlab()
 to verify it).

 How did you check it? Could you send your debug patch?
 
 When the memory is not allocated from slab, the flags is 0x108000.

Thank you for sending the patch.
I think the page to not have PageSlab is a compound page. So we can check
whether the entry is allocate from bootmem or not as follow:

static void release_firmware_map_entry(struct kobject *kobj)
{
struct firmware_map_entry *entry = to_memmap_entry(kobj);
struct page *head_page;

head_page = virt_to_head_page(entry);
if (PageSlab(head_page))
kfree(etnry);
else
/* the entry is allocated from bootmem */
}

Thanks,
Yasuaki Ishimatsu

 
  From 8dd51368d6c03edf7edc89cab17441e3741c39c7 Mon Sep 17 00:00:00 2001
 From: Wen Congyang we...@cn.fujitsu.com
 Date: Wed, 4 Jul 2012 16:05:26 +0800
 Subject: [PATCH] debug

Re: UBI fastmap updates

2012-07-09 Thread Richard Weinberger

Am 09.07.2012 09:37, schrieb Shmulik Ladkani:
 Hi Richard,
 
 On Sun, 08 Jul 2012 14:07:41 +0200 Richard Weinberger rich...@nod.at wrote:
 +   /* TODO: in the new locking scheme, produce_free_peb is
 +* called under wl_lock taken.
 +* so when returning, should reacquire the lock
 +*/

 Which new locking scheme?
 
 I am diffing linux-ubi fastmap HEAD against its fork point (vanilla
 ubi), that's 6b16351..d41a140 on linux-ubi.
 
 Which gives the following diff in produce_free_pebs:

Ahh. _my_ new locking scheme. I feared someone else changed it meanwhile in 
mainline. ;)
Yes, the ubi-wl_lock in produce_free_peb() is no  longer needed.
Again, thanks for pointing this out!

Thanks,
//richard



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] mm: Warn about costly page allocation

2012-07-09 Thread Mel Gorman

On Mon, Jul 09, 2012 at 11:38:20AM +0900, Minchan Kim wrote:
 Since lumpy reclaim was introduced at 2.6.23, it helped higher
 order allocation.
 Recently, we removed it at 3.4 and we didn't enable compaction
 forcingly[1]. The reason makes sense that compaction.o + migration.o
 isn't trivial for system doesn't use higher order allocation.
 But the problem is that we have to enable compaction explicitly
 while lumpy reclaim enabled unconditionally.
 
 Normally, admin doesn't know his system have used higher order
 allocation and even lumpy reclaim have helped it.
 Admin in embdded system have a tendency to minimise code size so that
 they can disable compaction. In this case, we can see page allocation
 failure we can never see in the past. It's critical on embedded side
 because...
 
 Let's think this scenario.
 
 There is QA team in embedded company and they have tested their product.
 In test scenario, they can allocate 100 high order allocation.
 (they don't matter how many high order allocations in kernel are needed
 during test. their concern is just only working well or fail of their
 middleware/application) High order allocation will be serviced well
 by natural buddy allocation without lumpy's help. So they released
 the product and sold out all over the world.
 Unfortunately, in real practice, sometime, 105 high order allocation was
 needed rarely and fortunately, lumpy reclaim could help it so the product
 doesn't have a problem until now.
 
 If they use latest kernel, they will see the new config CONFIG_COMPACTION
 which is very poor documentation, and they can't know it's replacement of
 lumpy reclaim(even, they don't know lumpy reclaim) so they simply disable

Depending on lumpy reclaim or compaction for high-order kernel allocations
is dangerous. Both depend on being able to move MIGRATE_MOVABLE allocations
to satisy the high-order allocation. If used regularly for high-order kernel
allocations and they are long-lived, the system will eventually be unable
to grant these allocations, with or without compaction or lumpy reclaim.

Be also aware that lumpy reclaim was very aggressive when reclaiming pages
to satisfy an allocation. Compaction is not and compaction can be temporarily
disabled if an allocation attempt fails. If lumpy reclaim was being depended
upon to satisfy high-order allocations, there is no guarantee, particularly
with 3.4, that compaction will succeed as it does not reclaim aggressively.

 that option for size optimization. Of course, QA team still test it but they
 can't find the problem if they don't do test stronger than old.
 It ends up release the product and sold out all over the world, again.
 But in this time, we don't have both lumpy and compaction so the problem
 would happen in real practice. A poor enginner from Korea have to flight
 to the USA for the fix a ton of products. Otherwise, should recall products
 from all over the world. Maybe he can lose a job. :(
 
 This patch adds warning for notice. If the system try to allocate
 PAGE_ALLOC_COSTLY_ORDER above page and system enters reclaim path,
 it emits the warning. At least, it gives a chance to look into their
 system before the relase.
 
 This patch avoids false positive by alloc_large_system_hash which
 allocates with GFP_ATOMIC and a fallback mechanism so it can make
 this warning useless.
 
 [1] c53919ad(mm: vmscan: remove lumpy reclaim)
 
 Signed-off-by: Minchan Kim minc...@kernel.org
 ---
  mm/page_alloc.c |   16 
  1 file changed, 16 insertions(+)
 
 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index a4d3a19..1155e00 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -2276,6 +2276,20 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
   return alloc_flags;
  }
  
 +#if defined(CONFIG_DEBUG_VM)  !defined(CONFIG_COMPACTION)
 +static inline void check_page_alloc_costly_order(unsigned int order)
 +{
 + if (unlikely(order  PAGE_ALLOC_COSTLY_ORDER)) {
 + printk_once(WARNING: You are tring to allocate %d-order page.
 +  You might need to turn on CONFIG_COMPACTION\n, order);
 + }

WARN_ON_ONCE would tell you what is trying to satisfy the allocation.

It should further check if this is a GFP_MOVABLE allocation or not and if
not, then it should either be documented that compaction may only delay
allocation failures and that they may need to consider reserving the memory
in advance or doing something like forcing MIGRATE_RESERVE to only be used
for high-order allocations.

 +}
 +#else
 +static inline void check_page_alloc_costly_order(unsigned int order)
 +{
 +}
 +#endif
 +
  static inline struct page *
  __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
   struct zonelist *zonelist, enum zone_type high_zoneidx,
 @@ -2353,6 +2367,8 @@ rebalance:
   if (!wait)
   goto nopage;
  
 + check_page_alloc_costly_order(order);
 +
   /* Avoid recursion of direct reclaim */
   if (current-flags  PF_MEMALLOC)
   goto nopage;

Re: [PATCH 2/5] uprobes: suppress uprobe_munmap() from mmput()

2012-07-09 Thread Peter Zijlstra

On Sun, 2012-07-08 at 22:30 +0200, Oleg Nesterov wrote:
 uprobe_munmap() does get_user_pages() and it is also called from
 the final mmput()-exit_mmap() path. This slows down exit/mmput()
 for no reason, and I think  it is simply dangerous/wrong to try to
 fault-in a page into the dying mm. If nothing else, this happens
 after the last sync_mm_rss(), afaics handle_mm_fault() can change
 the task-rss_stat and make the subsequent check_mm() unhappy.
 
 Change uprobe_munmap() to check mm-mm_users != 0.
 
 Signed-off-by: Oleg Nesterov o...@redhat.com
 ---
  kernel/events/uprobes.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)
 
 diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
 index a93b6df..47c4e24 100644
 --- a/kernel/events/uprobes.c
 +++ b/kernel/events/uprobes.c
 @@ -1082,6 +1082,9 @@ void uprobe_munmap(struct vm_area_struct *vma, unsigned 
 long start, unsigned lon
   if (!atomic_read(uprobe_events) || !valid_vma(vma, false))
   return;
  
 + if (!atomic_read(vma-vm_mm-mm_users)) /* called by mmput() ? */
 + return;
 +
   if (!atomic_read(vma-vm_mm-uprobes_state.count))
   return;
  

But won't you leak uprobe refcounts like this? Those aren't tied to the
task (which is dying) but to the vma's mapping the appropriate hunk of
the text. Not doing the munmap will then not put the uprobe-ref..

Or am I missing something here?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Infinite looping in omap2430.c USB driver

2012-07-09 Thread Tony Lindgren

* NeilBrown ne...@suse.de [120706 15:44]:
 
 Hello `./scripts/get_maintainer.pl -f drivers/usb/musb/omap2430.c`
 
 omap2430_musb_set_vbus in omap2430.c contains:
 
   while (musb_readb(musb-mregs, MUSB_DEVCTL)  0x80) {
 
   cpu_relax();
 
   if (time_after(jiffies, timeout)) {
   dev_err(musb-controller,
   configured as A device timeout);
   ret = -EINVAL;
   break;
   }
   }
 
 having set
   unsigned long timeout = jiffies + msecs_to_jiffies(1000);
 
 so it can busy-loop for up to 1 second.  Probably not ideal, but if it works
 I wouldn't complain.
 
 The
   if (int_usb  MUSB_INTR_SESSREQ) {
 branch of musb_stage0_irq() called from musb_interrupt (from
 generic_interrupt) calls this:
 
   if (musb-int_usb)
   retval |= musb_stage0_irq(musb, musb-int_usb,
   devctl, power);
 
 so the busy loop can happen in an interrupt handler (not a threaded interrupt
 handler), which is probably less ideal.
 
 However this can be called with interrupt disabled, as happens at least
 during resume when resume_irqs() calls:
 
   raw_spin_lock_irqsave(desc-lock, flags);
   __enable_irq(desc, irq, true);
   raw_spin_unlock_irqrestore(desc-lock, flags);
 
 and an interrupt is found to be IRQS_PENDING.
 
 In this case interrupts are disabled so 'jiffies' never changes so this loop
 can continue forever.
 
 This happens on my (GTA04) phone fairly regularly - between 1 in 10 and 1 in
 30 resumes. The musb-hdrc interrupt is pending and reports
 
 [ 4957.624176] musb-hdrc musb-hdrc: ** IRQ peripheral usb0040 tx rx
 
 'usb0040' is MUSB_INTR_SESSREQ.  I think this is triggered by detecting a
 voltage change on the USB ID pin - is that right?  A short-to-earth would be
 a request to switch to host mode, which is why it tries to enable VBUS.
 Maybe there is some electrical noise which is being picked up?

I guess that could happen if the transceiver pins are floating during suspend?
 
 In any case I get the interrupt despite nothing being plugged in, and the 0x80
 bit of MUSB_DEVCTL never gets cleared.

As far as I remember, musb tries to be smart about changing to host mode,
and tries to do the session and vbus detection on it's own.. AFAIK, there's
nothing you can do until musb is done and detects the VBUS is not rising and
gives up. There are all kind of interrupt flag combinations trying to deal
with that mess, maybe you need to add yet another one?
 
 I've added a simple loop counter which aborts the loop after 1000 loops -
 this takes about 5 seconds, but includes some printks which probably slow it
 down.
 
 In 2 out of 2 cases, subsequent messages show that the hsmmc driver for the
 uSD card that holds my root filesystem is messed up.  It seems to be waiting
 for a request that is never going to complete.
 So maybe the hsmmc is causing the noise that triggers the musb issue.
 
 I can send a patch which add a loop count if you like, but I suspect you can
 come up with a much better approach.

Sounds like that loop should be fixed.

Regards,

Tony
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5] uprobes: kill copy_vma()-uprobe_mmap()

2012-07-09 Thread Peter Zijlstra

On Sun, 2012-07-08 at 22:30 +0200, Oleg Nesterov wrote:
 And why this uprobe_mmap() was added? I believe the intent was wrong.
 Note that the caller is going to do move_page_tables(), all registered
 uprobes are already faulted in, we only change the virtual addresses.
 

I think it was because of the copy_vma + do_munmap. Since do_munmap()
should be doing a put on the uprobe, we need an extra get to balance.

That said, I cannot actually find the uprobe_munmap() from do_munmap(),
but that might be due to lack of wakefulness etc..
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND v7 1/2] block: ioctl support for sanitize in eMMC 4.5

2012-07-09 Thread Girish K S

On 28 June 2012 14:02, Yaniv Gardi yga...@codeaurora.org wrote:
 Adding a new ioctl to support sanitize operation in eMMC
 cards version 4.5.
 The sanitize ioctl support helps performing this operation
 via user application.

 Signed-off-by: Yaniv Gardi yga...@codeaurora.org

 ---
  block/blk-core.c  |   15 ++--
  block/blk-lib.c   |   51 
 +
  block/blk-merge.c |4 +++
  block/elevator.c  |2 +-
  block/ioctl.c |9 
  include/linux/blk_types.h |5 +++-
  include/linux/blkdev.h|3 ++
  include/linux/fs.h|1 +
  kernel/trace/blktrace.c   |2 +
  9 files changed, 87 insertions(+), 5 deletions(-)

 diff --git a/block/blk-core.c b/block/blk-core.c
 index 3c923a7..4a56102b 100644
 --- a/block/blk-core.c
 +++ b/block/blk-core.c
 @@ -1641,7 +1641,7 @@ generic_make_request_checks(struct bio *bio)
 goto end_io;
 }

 -   if (unlikely(!(bio-bi_rw  REQ_DISCARD) 
 +   if (unlikely(!(bio-bi_rw  (REQ_DISCARD | REQ_SANITIZE)) 
  nr_sectors  queue_max_hw_sectors(q))) {
 printk(KERN_ERR bio too big device %s (%u  %u)\n,
bdevname(bio-bi_bdev, b),
 @@ -1689,6 +1689,14 @@ generic_make_request_checks(struct bio *bio)
 goto end_io;
 }

 +   if ((bio-bi_rw  REQ_SANITIZE) 
 +   (!blk_queue_sanitize(q))) {
 +   pr_info(%s - got a SANITIZE request but the queue 
 +  doesn't support sanitize requests, __func__);
 +   err = -EOPNOTSUPP;
 +   goto end_io;
 +   }
 +
 if (blk_throtl_bio(q, bio))
 return false;   /* throttled, will be resubmitted later */

 @@ -1794,7 +1802,8 @@ void submit_bio(int rw, struct bio *bio)
  * If it's a regular read/write or a barrier with data attached,
  * go through the normal accounting stuff before submission.
  */
 -   if (bio_has_data(bio)  !(rw  REQ_DISCARD)) {
 +   if (bio_has_data(bio) 
 +   (!(rw  (REQ_DISCARD | REQ_SANITIZE {
 if (rw  WRITE) {
 count_vm_events(PGPGOUT, count);
 } else {
 @@ -1840,7 +1849,7 @@ EXPORT_SYMBOL(submit_bio);
   */
  int blk_rq_check_limits(struct request_queue *q, struct request *rq)
  {
 -   if (rq-cmd_flags  REQ_DISCARD)
 +   if (rq-cmd_flags  (REQ_DISCARD | REQ_SANITIZE))
 return 0;

 if (blk_rq_sectors(rq)  queue_max_sectors(q) ||
 diff --git a/block/blk-lib.c b/block/blk-lib.c
 index 2b461b4..280d63e 100644
 --- a/block/blk-lib.c
 +++ b/block/blk-lib.c
 @@ -115,6 +115,57 @@ int blkdev_issue_discard(struct block_device *bdev, 
 sector_t sector,
  EXPORT_SYMBOL(blkdev_issue_discard);

  /**
 + * blkdev_issue_sanitize - queue a sanitize request
 + * @bdev:  blockdev to issue sanitize for
 + * @gfp_mask:  memory allocation flags (for bio_alloc)
 + *
 + * Description:
 + *Issue a sanitize request for the specified block device
 + */
 +int blkdev_issue_sanitize(struct block_device *bdev, gfp_t gfp_mask)
 +{
 +   DECLARE_COMPLETION_ONSTACK(wait);
 +   struct request_queue *q = bdev_get_queue(bdev);
 +   int type = REQ_WRITE | REQ_SANITIZE;
 +   struct bio_batch bb;
 +   struct bio *bio;
 +   int ret = 0;
 +
 +   if (!q)
 +   return -ENXIO;
 +
 +   if (!blk_queue_sanitize(q)) {
 +   pr_err(%s - card doesn't support sanitize, __func__);
 +   return -EOPNOTSUPP;
 +   }
 +
 +   bio = bio_alloc(gfp_mask, 1);
 +   if (!bio)
 +   return -ENOMEM;
 +
 +   atomic_set(bb.done, 1);
 +   bb.flags = 1  BIO_UPTODATE;
 +   bb.wait = wait;
 +
 +   bio-bi_end_io = bio_batch_end_io;
 +   bio-bi_bdev = bdev;
 +   bio-bi_private = bb;
 +
 +   atomic_inc(bb.done);
 +   submit_bio(type, bio);
 +
 +   /* Wait for bios in-flight */
 +   if (!atomic_dec_and_test(bb.done))
 +   wait_for_completion(wait);
 +
 +   if (!test_bit(BIO_UPTODATE, bb.flags))
 +   ret = -EIO;
 +
 +   return ret;
 +}
 +EXPORT_SYMBOL(blkdev_issue_sanitize);
 +
 +/**
   * blkdev_issue_zeroout - generate number of zero filed write bios
   * @bdev:  blockdev to issue
   * @sector:start sector
 diff --git a/block/blk-merge.c b/block/blk-merge.c
 index 160035f..7e24772 100644
 --- a/block/blk-merge.c
 +++ b/block/blk-merge.c
 @@ -477,6 +477,10 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
 if (!rq_mergeable(rq))
 return false;

 +   /* don't merge file system requests and sanitize requests */
 +   if ((req-cmd_flags  REQ_SANITIZE) != (next-cmd_flags  
 REQ_SANITIZE))
 +   return false;
 +
 /* don't merge file system requests and discard requests */
 if ((bio-bi_rw  REQ_DISCARD) != (rq-bio-bi_rw

Re: linux-next: comment on pm tree commit

2012-07-09 Thread Rafael J. Wysocki

On Monday, July 09, 2012, Stephen Rothwell wrote:
 Hi Rafael,
 
 I noticed commit b8eec56cd8e5 (PM / cpuidle: System resume hang fix with
 cpuidle) in the pm tree needs some work (I noticed it because it was
 changed in a rebase ...).
 
 diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
 index a6b3f2e..b90ccb2 100644
 --- a/include/linux/cpuidle.h
 +++ b/include/linux/cpuidle.h
 @@ -146,6 +146,8 @@ extern void cpuidle_unregister_device(struct 
 cpuidle_device *dev);
  
  extern void cpuidle_pause_and_lock(void);
  extern void cpuidle_resume_and_unlock(void);
 +extern void cpuidle_pause(void);
 +extern void cpuidle_resume(void);
  extern int cpuidle_enable_device(struct cpuidle_device *dev);
  extern void cpuidle_disable_device(struct cpuidle_device *dev);
  extern int cpuidle_wrap_enter(struct cpuidle_device *dev,
 @@ -169,6 +171,8 @@ static inline void cpuidle_unregister_device(struct 
 cpuidle_device *dev) { }
  
  static inline void cpuidle_pause_and_lock(void) { }
  static inline void cpuidle_resume_and_unlock(void) { }
 +static inline cpuidle_pause(void) { }
 +static inline cpuidle_resume(void) { }
 
 These need to be static inline void.  I wonder what review and build
 testing this went through (the above should produce warnings since they
 are non void returning functions with no return statements).

Thanks for reporting this, I tried to fix a build issue in the original patch
hastily and failed miserably as you have noticed and then I build-tested a
wrong tree.  Sorry.

It should be fixed now for real.

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: Warn about costly page allocation

2012-07-09 Thread Minchan Kim

Hi Mel,

On Mon, Jul 09, 2012 at 09:22:00AM +0100, Mel Gorman wrote:
 On Mon, Jul 09, 2012 at 11:38:20AM +0900, Minchan Kim wrote:
  Since lumpy reclaim was introduced at 2.6.23, it helped higher
  order allocation.
  Recently, we removed it at 3.4 and we didn't enable compaction
  forcingly[1]. The reason makes sense that compaction.o + migration.o
  isn't trivial for system doesn't use higher order allocation.
  But the problem is that we have to enable compaction explicitly
  while lumpy reclaim enabled unconditionally.
  
  Normally, admin doesn't know his system have used higher order
  allocation and even lumpy reclaim have helped it.
  Admin in embdded system have a tendency to minimise code size so that
  they can disable compaction. In this case, we can see page allocation
  failure we can never see in the past. It's critical on embedded side
  because...
  
  Let's think this scenario.
  
  There is QA team in embedded company and they have tested their product.
  In test scenario, they can allocate 100 high order allocation.
  (they don't matter how many high order allocations in kernel are needed
  during test. their concern is just only working well or fail of their
  middleware/application) High order allocation will be serviced well
  by natural buddy allocation without lumpy's help. So they released
  the product and sold out all over the world.
  Unfortunately, in real practice, sometime, 105 high order allocation was
  needed rarely and fortunately, lumpy reclaim could help it so the product
  doesn't have a problem until now.
  
  If they use latest kernel, they will see the new config CONFIG_COMPACTION
  which is very poor documentation, and they can't know it's replacement of
  lumpy reclaim(even, they don't know lumpy reclaim) so they simply disable
 
 Depending on lumpy reclaim or compaction for high-order kernel allocations
 is dangerous. Both depend on being able to move MIGRATE_MOVABLE allocations
 to satisy the high-order allocation. If used regularly for high-order kernel
 allocations and they are long-lived, the system will eventually be unable
 to grant these allocations, with or without compaction or lumpy reclaim.

Indeed.

 
 Be also aware that lumpy reclaim was very aggressive when reclaiming pages
 to satisfy an allocation. Compaction is not and compaction can be temporarily
 disabled if an allocation attempt fails. If lumpy reclaim was being depended
 upon to satisfy high-order allocations, there is no guarantee, particularly
 with 3.4, that compaction will succeed as it does not reclaim aggressively.

It's good explanation and let's add it in description.

 
  that option for size optimization. Of course, QA team still test it but they
  can't find the problem if they don't do test stronger than old.
  It ends up release the product and sold out all over the world, again.
  But in this time, we don't have both lumpy and compaction so the problem
  would happen in real practice. A poor enginner from Korea have to flight
  to the USA for the fix a ton of products. Otherwise, should recall products
  from all over the world. Maybe he can lose a job. :(
  
  This patch adds warning for notice. If the system try to allocate
  PAGE_ALLOC_COSTLY_ORDER above page and system enters reclaim path,
  it emits the warning. At least, it gives a chance to look into their
  system before the relase.
  
  This patch avoids false positive by alloc_large_system_hash which
  allocates with GFP_ATOMIC and a fallback mechanism so it can make
  this warning useless.
  
  [1] c53919ad(mm: vmscan: remove lumpy reclaim)
  
  Signed-off-by: Minchan Kim minc...@kernel.org
  ---
   mm/page_alloc.c |   16 
   1 file changed, 16 insertions(+)
  
  diff --git a/mm/page_alloc.c b/mm/page_alloc.c
  index a4d3a19..1155e00 100644
  --- a/mm/page_alloc.c
  +++ b/mm/page_alloc.c
  @@ -2276,6 +2276,20 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
  return alloc_flags;
   }
   
  +#if defined(CONFIG_DEBUG_VM)  !defined(CONFIG_COMPACTION)
  +static inline void check_page_alloc_costly_order(unsigned int order)
  +{
  +   if (unlikely(order  PAGE_ALLOC_COSTLY_ORDER)) {
  +   printk_once(WARNING: You are tring to allocate %d-order page.
  +You might need to turn on CONFIG_COMPACTION\n, order);
  +   }
 
 WARN_ON_ONCE would tell you what is trying to satisfy the allocation.

Do you mean that it would be better to use WARN_ON_ONCE rather than raw printk?
If so, I would like to insist raw printk because WARN_ON_ONCE could be disabled
by !CONFIG_BUG.
If I miss something, could you elaborate it more?

 
 It should further check if this is a GFP_MOVABLE allocation or not and if
 not, then it should either be documented that compaction may only delay
 allocation failures and that they may need to consider reserving the memory
 in advance or doing something like forcing MIGRATE_RESERVE to only be used
 for high-order allocations.

Okay. but I got confused you want to add above description

82571EB: Detected Hardware Unit Hang

2012-07-09 Thread Joe Jin

Hi list,

I'm seeing a Unit Hang even with the latest e1000e driver 2.0.0 when doing
scp test. this issue is easy do reproduced on SUN FIRE X2270 M2, just copy
a big file (500M) from another server will hit it at once. 

Would you please help on this?

device info:
# lspci -s 05:00.0 
05:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (Copper) (rev 06)

# lspci -s 05:00.0 -n
05:00.0 0200: 8086:10bc (rev 06)

# ethtool -i eth0
driver: e1000e
version: 2.0.0-NAPI
firmware-version: 5.10-2
bus-info: :05:00.0

# ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: on
generic-receive-offload: on

kernel log:
---
e1000e :05:00.0: eth0: Detected Hardware Unit Hang:
  TDH  6c
  TDT  81
  next_to_use  81
  next_to_clean6b
buffer_info[next_to_clean]:
  time_stamp   fffc7a23
  next_to_watch71
  jiffies  fffc8c0c
  next_to_watch.status 0
MAC Status 80387
PHY Status 792d
PHY 1000BASE-T Status  3c00
PHY Extended Status3000
PCI Status 10
e1000e :05:00.0: eth0: Detected Hardware Unit Hang:
  TDH  6c
  TDT  81
  next_to_use  81
  next_to_clean6b
buffer_info[next_to_clean]:
  time_stamp   fffc7a23
  next_to_watch71
  jiffies  fffc9bac
  next_to_watch.status 0
MAC Status 80387
PHY Status 792d
PHY 1000BASE-T Status  3c00
PHY Extended Status3000
PCI Status 10
[ cut here ]
WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x225/0x230()
Hardware name: SUN FIRE X2270 M2
NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Modules linked in: autofs4 hidp rfcomm bluetooth rfkill lockd sunrpc 
cpufreq_ondemand acpi_cpufreq mperf be2iscsi iscsi_boot_sysfs ib_iser rdma_cm 
ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i 
libcxgbi cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi video sbs sbshc 
acpi_pad acpi_ipmi ipmi_msghandler parport_pc lp parport e1000e(U) 
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device igb 
snd_pcm_oss serio_raw snd_mixer_oss snd_pcm tpm_infineon snd_timer snd 
soundcore snd_page_alloc i2c_i801 iTCO_wdt i2c_core pcspkr i7core_edac 
iTCO_vendor_support ioatdma ghes dca edac_core hed dm_snapshot dm_zero 
dm_mirror dm_region_hash dm_log dm_mod usb_storage sd_mod crc_t10dif sg ahci 
libahci ext3 jbd mbcache [last unloaded: microcode]
Pid: 0, comm: swapper Not tainted 2.6.39-200.24.1.el5uek #1
Call Trace:
 [c07d9ac5] ? dev_watchdog+0x225/0x230
 [c045ba61] warn_slowpath_common+0x81/0xa0
 [c07d9ac5] ? dev_watchdog+0x225/0x230
 [c045bb23] warn_slowpath_fmt+0x33/0x40
 [c07d9ac5] dev_watchdog+0x225/0x230
 [c07d98a0] ? dev_activate+0xb0/0xb0
 [c0468e82] call_timer_fn+0x32/0xf0
 [c04bceb0] ? rcu_check_callbacks+0x80/0x80
 [c046a76d] run_timer_softirq+0xed/0x1b0
 [c07d98a0] ? dev_activate+0xb0/0xb0
 [c0461a81] __do_softirq+0x91/0x1a0
 [c04619f0] ? local_bh_enable+0x80/0x80
 IRQ  [c0462295] ? irq_exit+0x95/0xa0
 [c087f8b8] ? smp_apic_timer_interrupt+0x38/0x42
 [c08784f5] ? apic_timer_interrupt+0x31/0x38
 [c046007b] ? do_exit+0x11b/0x370
 [c065eae4] ? intel_idle+0xa4/0x100
 [c078d9b9] ? cpuidle_idle_call+0xb9/0x1e0
 [c0411d77] ? cpu_idle+0x97/0xd0
 [c085cbbd] ? rest_init+0x5d/0x70
 [c0b07a7a] ? start_kernel+0x28a/0x340
 [c0b074b0] ? obsolete_checksetup+0xb0/0xb0
 [c0b070a4] ? i386_start_kernel+0x64/0xb0
---[ end trace 5502b55cd4d4e5cb ]---
e1000e :05:00.0: eth0: Reset adapter
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx

Thanks,
Joe
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/36] AArch64 Linux kernel port

2012-07-09 Thread Catalin Marinas

Hi Jon,

On 9 July 2012 03:01, Jon Masters jonat...@jonmasters.org wrote:
 On 07/08/2012 06:24 PM, Dennis Gilmore wrote:
 I know that the architecture really is new but thats not really clear
 by adding AArch32 into the mix to represent 32 bit arm as ARM has done
 or by calling it armv8. There is enough way to confuse them already why
 confuse things more by adding yet another variable that is AArch64.
 - From my and most of the other Fedora developers that i've discussed it
 with its more like reluctant acceptance of AArch64 than thinking is a
 good idea.

 btw, for clarification of anyone who is confused by the names...the new
 architecture is ARMv8. The 64-bit state is AArch64, during which the
 processor executes A64 instructions. The 32-bit state is AArch32, during
 which the processor executes either A32 (ARM version 7+) or T32
 (Thumb - I guess Thumb2+ really due to some of deprecation)
 instructions. I've noticed that there appears to be a clarification
 effort in which AArch64 is used as an architecture name when ARMv8 would
 be confusing, which is most of the time if you don't know whether you're
 referring to the new A64 instruction set or the older ones.

Thanks for clarifying this. I deliberately try not to use ARMv8 name
to avoid confusion. Indeed, the new architecture is ARMv8 (following
the ARM architectures numbering scheme). It has an AArch64 mode (with
new exception model, new ISA) and an *optional* AArch32 mode (pretty
much the same as ARMv7). The key here is that AArch32 is *optional* -
we can have it at all levels, only some (e.g. EL0 - user) or not at
all.

These two modes also share very little, from a software perspective
it's pretty much some register banking to allow compat mode support
(e.g. you can read the AArch32 R0 register from the lower half of
AArch64 X0). The AArch32 mode cannot switch by itself to an AArch64
mode, this requires taking an exception (can be SVC) to a higher level
that actually runs in AArch64 mode.

On an ARMv8 system, if it supports AArch32 at EL1 (kernel) you can run
an ARMv7 kernel, and that's good for virtualisation. But I have
*absolutely* no plans to support an AArch32 kernel for ARMv8 SoCs.

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2 3/3] watchdog: omap_wdt: add device tree support

2012-07-09 Thread Tony Lindgren

* Wim Van Sebroeck w...@iguana.be [120707 05:11]:
 Hi Tony,
 
  Hi Wim,
  
  * jgq...@gmail.com jgq...@gmail.com [120531 20:56]:
   From: Xiao Jiang jgq...@gmail.com
   
   Add device table for omap_wdt to support dt.
  
  Care to ack this patch in the series?
 
 Yep.
 Acked-by: Wim Van Sebroeck w...@iguana.be

Thanks, I'll apply all three into omap devel-dt branch.

Regards,

Tony
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] KVM: X86: remove read buffer for mmio read

2012-07-09 Thread Xiao Guangrong

After commit f78146b0f9230765c6315b2e14f56112513389ad:

 KVM: Fix page-crossing MMIO

MMIO that are split across a page boundary are currently broken - the
code does not expect to be aborted by the exit to userspace for the
first MMIO fragment.

This patch fixes the problem by generalizing the current code for handling
16-byte MMIOs to handle a number of fragments, and changes the MMIO
code to create those fragments.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Multiple MMIO reads can be merged into mmio_fragments, the read buffer is not
needed anymore

Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 arch/x86/include/asm/kvm_emulate.h |1 -
 arch/x86/kvm/emulate.c |   43 ---
 arch/x86/kvm/x86.c |2 -
 3 files changed, 5 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 1ac46c22..339d7c6 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -286,7 +286,6 @@ struct x86_emulate_ctxt {
struct operand *memopp;
struct fetch_cache fetch;
struct read_cache io_read;
-   struct read_cache mem_read;
 };

 /* Repeat String Operation Prefix */
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f95d242..aa455da 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1128,33 +1128,6 @@ static void fetch_bit_operand(struct x86_emulate_ctxt 
*ctxt)
ctxt-src.val = (ctxt-dst.bytes  3) - 1;
 }

-static int read_emulated(struct x86_emulate_ctxt *ctxt,
-unsigned long addr, void *dest, unsigned size)
-{
-   int rc;
-   struct read_cache *mc = ctxt-mem_read;
-
-   while (size) {
-   int n = min(size, 8u);
-   size -= n;
-   if (mc-pos  mc-end)
-   goto read_cached;
-
-   rc = ctxt-ops-read_emulated(ctxt, addr, mc-data + mc-end, n,
- ctxt-exception);
-   if (rc != X86EMUL_CONTINUE)
-   return rc;
-   mc-end += n;
-
-   read_cached:
-   memcpy(dest, mc-data + mc-pos, n);
-   mc-pos += n;
-   dest += n;
-   addr += n;
-   }
-   return X86EMUL_CONTINUE;
-}
-
 static int segmented_read(struct x86_emulate_ctxt *ctxt,
  struct segmented_address addr,
  void *data,
@@ -1166,7 +1139,9 @@ static int segmented_read(struct x86_emulate_ctxt *ctxt,
rc = linearize(ctxt, addr, size, false, linear);
if (rc != X86EMUL_CONTINUE)
return rc;
-   return read_emulated(ctxt, linear, data, size);
+
+   return ctxt-ops-read_emulated(ctxt, linear, data, size,
+   ctxt-exception);
 }

 static int segmented_write(struct x86_emulate_ctxt *ctxt,
@@ -4122,8 +4097,6 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
int rc = X86EMUL_CONTINUE;
int saved_dst_type = ctxt-dst.type;

-   ctxt-mem_read.pos = 0;
-
if (ctxt-mode == X86EMUL_MODE_PROT64  (ctxt-d  No64)) {
rc = emulate_ud(ctxt);
goto done;
@@ -4364,15 +4337,9 @@ writeback:
 * or, if it is not used, after each 1024 iteration.
 */
if ((r-end != 0 || ctxt-regs[VCPU_REGS_RCX]  0x3ff) 

-   (r-end == 0 || r-end != r-pos)) {
-   /*
-* Reset read cache. Usually happens before
-* decode, but since instruction is restarted
-* we have to do it here.
-*/
-   ctxt-mem_read.end = 0;
+   (r-end == 0 || r-end != r-pos))
return EMULATION_RESTART;
-   }
+
goto done; /* skip rip writeback */
}
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a01a424..7445545 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4399,8 +4399,6 @@ static void init_decode_cache(struct x86_emulate_ctxt 
*ctxt,
ctxt-fetch.end = 0;
ctxt-io_read.pos = 0;
ctxt-io_read.end = 0;
-   ctxt-mem_read.pos = 0;
-   ctxt-mem_read.end = 0;
 }

 static void init_emulate_ctxt(struct kvm_vcpu *vcpu)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] KVM: X86: introduce set_mmio_exit_info

2012-07-09 Thread Xiao Guangrong

Introduce set_mmio_exit_info to cleanup the common code

Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 arch/x86/kvm/x86.c |   33 +
 1 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7445545..7771f45 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3755,9 +3755,6 @@ static int read_exit_mmio(struct kvm_vcpu *vcpu, gpa_t 
gpa,
 static int write_exit_mmio(struct kvm_vcpu *vcpu, gpa_t gpa,
   void *val, int bytes)
 {
-   struct kvm_mmio_fragment *frag = vcpu-mmio_fragments[0];
-
-   memcpy(vcpu-run-mmio.data, frag-data, frag-len);
return X86EMUL_CONTINUE;
 }

@@ -3825,6 +3822,20 @@ mmio:
return X86EMUL_CONTINUE;
 }

+static void set_mmio_exit_info(struct kvm_vcpu *vcpu,
+  struct kvm_mmio_fragment *frag, bool write)
+{
+   struct kvm_run *run = vcpu-run;
+
+   run-exit_reason = KVM_EXIT_MMIO;
+   run-mmio.phys_addr = frag-gpa;
+   run-mmio.len = frag-len;
+   run-mmio.is_write = vcpu-mmio_is_write = write;
+
+   if (write)
+   memcpy(run-mmio.data, frag-data, frag-len);
+}
+
 int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr,
void *val, unsigned int bytes,
struct x86_exception *exception,
@@ -3864,14 +3875,10 @@ int emulator_read_write(struct x86_emulate_ctxt *ctxt, 
unsigned long addr,
return rc;

gpa = vcpu-mmio_fragments[0].gpa;
-
vcpu-mmio_needed = 1;
vcpu-mmio_cur_fragment = 0;

-   vcpu-run-mmio.len = vcpu-mmio_fragments[0].len;
-   vcpu-run-mmio.is_write = vcpu-mmio_is_write = ops-write;
-   vcpu-run-exit_reason = KVM_EXIT_MMIO;
-   vcpu-run-mmio.phys_addr = gpa;
+   set_mmio_exit_info(vcpu, vcpu-mmio_fragments[0], ops-write);

return ops-read_write_exit_mmio(vcpu, gpa, val, bytes);
 }
@@ -5490,7 +5497,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
  */
 static int complete_mmio(struct kvm_vcpu *vcpu)
 {
-   struct kvm_run *run = vcpu-run;
struct kvm_mmio_fragment *frag;
int r;

@@ -5501,7 +5507,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
/* Complete previous fragment */
frag = vcpu-mmio_fragments[vcpu-mmio_cur_fragment++];
if (!vcpu-mmio_is_write)
-   memcpy(frag-data, run-mmio.data, frag-len);
+   memcpy(frag-data, vcpu-run-mmio.data, frag-len);
if (vcpu-mmio_cur_fragment == vcpu-mmio_nr_fragments) {
vcpu-mmio_needed = 0;
if (vcpu-mmio_is_write)
@@ -5511,12 +5517,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
}
/* Initiate next fragment */
++frag;
-   run-exit_reason = KVM_EXIT_MMIO;
-   run-mmio.phys_addr = frag-gpa;
-   if (vcpu-mmio_is_write)
-   memcpy(run-mmio.data, frag-data, frag-len);
-   run-mmio.len = frag-len;
-   run-mmio.is_write = vcpu-mmio_is_write;
+   set_mmio_exit_info(vcpu, frag, vcpu-mmio_is_write);
return 0;

}
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND v7 1/2] block: ioctl support for sanitize in eMMC 4.5

2012-07-09 Thread Girish K S

On 28 June 2012 14:02, Yaniv Gardi yga...@codeaurora.org wrote:
 Adding a new ioctl to support sanitize operation in eMMC
 cards version 4.5.
 The sanitize ioctl support helps performing this operation
 via user application.

 Signed-off-by: Yaniv Gardi yga...@codeaurora.org

 ---
  block/blk-core.c  |   15 ++--
  block/blk-lib.c   |   51 
 +
  block/blk-merge.c |4 +++
  block/elevator.c  |2 +-
  block/ioctl.c |9 
  include/linux/blk_types.h |5 +++-
  include/linux/blkdev.h|3 ++
  include/linux/fs.h|1 +
  kernel/trace/blktrace.c   |2 +
  9 files changed, 87 insertions(+), 5 deletions(-)

 diff --git a/block/blk-core.c b/block/blk-core.c
 index 3c923a7..4a56102b 100644
 --- a/block/blk-core.c
 +++ b/block/blk-core.c
 @@ -1641,7 +1641,7 @@ generic_make_request_checks(struct bio *bio)
 goto end_io;
 }

 -   if (unlikely(!(bio-bi_rw  REQ_DISCARD) 
 +   if (unlikely(!(bio-bi_rw  (REQ_DISCARD | REQ_SANITIZE)) 
  nr_sectors  queue_max_hw_sectors(q))) {
 printk(KERN_ERR bio too big device %s (%u  %u)\n,
bdevname(bio-bi_bdev, b),
 @@ -1689,6 +1689,14 @@ generic_make_request_checks(struct bio *bio)
 goto end_io;
 }

 +   if ((bio-bi_rw  REQ_SANITIZE) 
 +   (!blk_queue_sanitize(q))) {
 +   pr_info(%s - got a SANITIZE request but the queue 
 +  doesn't support sanitize requests, __func__);
 +   err = -EOPNOTSUPP;
 +   goto end_io;
 +   }
 +
 if (blk_throtl_bio(q, bio))
 return false;   /* throttled, will be resubmitted later */

 @@ -1794,7 +1802,8 @@ void submit_bio(int rw, struct bio *bio)
  * If it's a regular read/write or a barrier with data attached,
  * go through the normal accounting stuff before submission.
  */
 -   if (bio_has_data(bio)  !(rw  REQ_DISCARD)) {
 +   if (bio_has_data(bio) 
 +   (!(rw  (REQ_DISCARD | REQ_SANITIZE {
 if (rw  WRITE) {
 count_vm_events(PGPGOUT, count);
 } else {
 @@ -1840,7 +1849,7 @@ EXPORT_SYMBOL(submit_bio);
   */
  int blk_rq_check_limits(struct request_queue *q, struct request *rq)
  {
 -   if (rq-cmd_flags  REQ_DISCARD)
 +   if (rq-cmd_flags  (REQ_DISCARD | REQ_SANITIZE))
 return 0;

 if (blk_rq_sectors(rq)  queue_max_sectors(q) ||
 diff --git a/block/blk-lib.c b/block/blk-lib.c
 index 2b461b4..280d63e 100644
 --- a/block/blk-lib.c
 +++ b/block/blk-lib.c
 @@ -115,6 +115,57 @@ int blkdev_issue_discard(struct block_device *bdev, 
 sector_t sector,
  EXPORT_SYMBOL(blkdev_issue_discard);

  /**
 + * blkdev_issue_sanitize - queue a sanitize request
 + * @bdev:  blockdev to issue sanitize for
 + * @gfp_mask:  memory allocation flags (for bio_alloc)
 + *
 + * Description:
 + *Issue a sanitize request for the specified block device
 + */
 +int blkdev_issue_sanitize(struct block_device *bdev, gfp_t gfp_mask)
 +{
 +   DECLARE_COMPLETION_ONSTACK(wait);
 +   struct request_queue *q = bdev_get_queue(bdev);
 +   int type = REQ_WRITE | REQ_SANITIZE;
 +   struct bio_batch bb;
 +   struct bio *bio;
 +   int ret = 0;
 +
 +   if (!q)
 +   return -ENXIO;
 +
 +   if (!blk_queue_sanitize(q)) {
 +   pr_err(%s - card doesn't support sanitize, __func__);
 +   return -EOPNOTSUPP;
 +   }
 +
 +   bio = bio_alloc(gfp_mask, 1);
 +   if (!bio)
 +   return -ENOMEM;
 +
 +   atomic_set(bb.done, 1);
 +   bb.flags = 1  BIO_UPTODATE;
 +   bb.wait = wait;
 +
 +   bio-bi_end_io = bio_batch_end_io;
 +   bio-bi_bdev = bdev;
 +   bio-bi_private = bb;
 +
 +   atomic_inc(bb.done);
 +   submit_bio(type, bio);
 +
 +   /* Wait for bios in-flight */
 +   if (!atomic_dec_and_test(bb.done))
 +   wait_for_completion(wait);
 +
 +   if (!test_bit(BIO_UPTODATE, bb.flags))
 +   ret = -EIO;
 +
 +   return ret;
 +}
 +EXPORT_SYMBOL(blkdev_issue_sanitize);
 +
 +/**
   * blkdev_issue_zeroout - generate number of zero filed write bios
   * @bdev:  blockdev to issue
   * @sector:start sector
 diff --git a/block/blk-merge.c b/block/blk-merge.c
 index 160035f..7e24772 100644
 --- a/block/blk-merge.c
 +++ b/block/blk-merge.c
 @@ -477,6 +477,10 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
 if (!rq_mergeable(rq))
 return false;

 +   /* don't merge file system requests and sanitize requests */
 +   if ((req-cmd_flags  REQ_SANITIZE) != (next-cmd_flags  
 REQ_SANITIZE))
this will not compile. gives compile error. Either change the function
parameter to req or change in the above condition as rq
 +   return false;
 +

Re: [PATCH 3/7] mm/slub.c: remove invalid reference to list iterator variable

2012-07-09 Thread Pekka Enberg

On Sun, 8 Jul 2012, Julia Lawall wrote:

 From: Julia Lawall julia.law...@lip6.fr
 
 If list_for_each_entry, etc complete a traversal of the list, the iterator
 variable ends up pointing to an address at an offset from the list head,
 and not a meaningful structure.  Thus this value should not be used after
 the end of the iterator.  The patch replaces s-name by al-name, which is
 referenced nearby.
 
 This problem was found using Coccinelle (http://coccinelle.lip6.fr/).
 
 Signed-off-by: Julia Lawall julia.law...@lip6.fr
 
 ---
  mm/slub.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/mm/slub.c b/mm/slub.c
 index cc4ed03..ef9bf01 100644
 --- a/mm/slub.c
 +++ b/mm/slub.c
 @@ -5395,7 +5395,7 @@ static int __init slab_sysfs_init(void)
   err = sysfs_slab_alias(al-s, al-name);
   if (err)
   printk(KERN_ERR SLUB: Unable to add boot slab alias
 -  %s to sysfs\n, s-name);
 +  %s to sysfs\n, al-name);
   kfree(al);
   }

Applied, thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: WARNING: __GFP_FS allocations with IRQs disabled (kmemcheck_alloc_shadow)

2012-07-09 Thread Pekka Enberg

On Sun, 8 Jul 2012, David Rientjes wrote:
 The correct fix is what I proposed at 
 http://marc.info/?l=linux-kernelm=133754837703630 and was awaiting 
 testing.  If Rus, Steven, or Fengguang could test this then we could add 
 it as a stable backport as well.

Looks good to me. Care to send it my way at penb...@kernel.org? It looks 
like people CC'd me as penb...@cs.helsinki.fi which is why I missed it.

Pekka
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: Warn about costly page allocation

2012-07-09 Thread Mel Gorman

On Mon, Jul 09, 2012 at 05:46:57PM +0900, Minchan Kim wrote:
   SNIP
   +#if defined(CONFIG_DEBUG_VM)  !defined(CONFIG_COMPACTION)
   +static inline void check_page_alloc_costly_order(unsigned int order)
   +{
   + if (unlikely(order  PAGE_ALLOC_COSTLY_ORDER)) {
   + printk_once(WARNING: You are tring to allocate %d-order page.
   +  You might need to turn on CONFIG_COMPACTION\n, order);
   + }
  
  WARN_ON_ONCE would tell you what is trying to satisfy the allocation.
 
 Do you mean that it would be better to use WARN_ON_ONCE rather than raw 
 printk?

Yes.

 If so, I would like to insist raw printk because WARN_ON_ONCE could be 
 disabled
 by !CONFIG_BUG.
 If I miss something, could you elaborate it more?
 

Ok, but all this will tell you is that *something* tried a high-order
allocation. It will not tell you who and because it's a printk_once, it
will also not tell you how often it's happening. You could add a
dump_stack to capture that information.

  
  It should further check if this is a GFP_MOVABLE allocation or not and if
  not, then it should either be documented that compaction may only delay
  allocation failures and that they may need to consider reserving the memory
  in advance or doing something like forcing MIGRATE_RESERVE to only be used
  for high-order allocations.
 
 Okay. but I got confused you want to add above description in code directly
 like below or write it down in comment of check_page_alloc_costly_order?
 

You're aiming this at embedded QA people according to your changelog so
do whatever you think is going to be the most effective. It's already
known that high-order kernel allocations are meant to be unreliable and
apparently this is being ignored. The in-code warning could look
something like

if (unlikely(order  PAGE_ALLOC_COSTLY_ORDER)) {
printk_once(%s: page allocation high-order stupidity: order:%d, 
mode:0x%x\n,
   current-comm, order, gfp_mask);
if (gfp_flags  __GFP_MOVABLE) {
printk_once(Enable compaction or whatever\n);
dump_stack();
} else {
printk_once(Regular high-order kernel allocations like this 
will eventually start failing.);
dump_stack();
}
}

There should be a comment above it giving more information if you think
the embedded people will actually read it. Of course, if this warning
triggers during driver initialisation then it might be a completely useless.
You could rate limit the warning (printk_ratelimit()) instead to be more
effective. As I don't know what sort of device drivers you are seeing this
problem with I can't judge what the best style of warning would be.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 0/5] uprobes: write_opcode() cleanups

2012-07-09 Thread Ingo Molnar


* Oleg Nesterov o...@redhat.com wrote:

 On 07/06, Oleg Nesterov wrote:
 
  On 07/06, Ingo Molnar wrote:
  
   * Oleg Nesterov o...@redhat.com wrote:
  
Hello,
   
write_opcode() cleanups resend + new minor fix.
   
Changes:
   
- document the new argument in 2/5.
   
- drop the buggy 5/5, thanks Anton for your quick nack.
  Probably I'll return to this later, I have another reason
  for this change.
   
- so this 5/5 is new.
   
Srikar, please add your acks unless you have some objections.
  
   Just wondering what's the review status of patches #1-#4?
 
  I hope Srikar will ack 1-4 soon.
 
  He observed the testing failures, but it turns out this series
  is innocent.
 
  I'll send more fixes soon.
 
   I'll skip #5 based on Oleg's request,
 
  Yes, thanks. I still think 5/5 makes sense, but we need to do
  something with uprobe.s:vma_address() first.
 
 Argh.
 
 I wrote this email because I wanted to say that I updated the
 changelog for 2/5 a little bit, but forgot to mention this.
 I'll send the updated patch in reply to 2/5 (once again, only
 the changelog was changed). But please let me know if you want
 me to resend 1-4.

Once Srikar's ack is in then it would be nice to update the 
patches with the ack and resend #1-#4, to make sure I have all 
the intended patches and nothing more or less than that.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 03/16] sched: aggregate load contributed by task entities on parenting cfs_rq

2012-07-09 Thread Ingo Molnar


* Peter Zijlstra pet...@infradead.org wrote:

 On Wed, 2012-07-04 at 17:28 +0200, Peter Zijlstra wrote:
  On Wed, 2012-06-27 at 19:24 -0700, Paul Turner wrote:
   For a given task t, we can compute its contribution to load as:
 task_load(t) = runnable_avg(t) * weight(t)
   
   On a parenting cfs_rq we can then aggregate
 runnable_load(cfs_rq) = \Sum task_load(t), for all runnable children t
   
   Maintain this bottom up, with task entities adding their contributed load 
   to
   the parenting cfs_rq sum.  When a task entities load changes we add the 
   same
   delta to the maintained sum.
   
   Signed-off-by: Paul Turner p...@google.com
   Signed-off-by: Ben Segall bseg...@google.com 
  
  A lot of patches have this funny sob trail.. Ben never send me these
  patches, so uhm. ?
  
  Should that be Reviewed-by, or what is the deal with those?
 
 Ben could you clarify what your exact contribution was?
 
 Ingo, supposing Ben is co-author and wrote a significant part 
 of the patch, what are we supposed to do with these tags?

There can be only one author SOB - additional contributions 
should be reflected via credits in the changelog, copyright 
lines and/or Originally-From: tags, or so.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: comment on pm tree commit

2012-07-09 Thread preeti

On 07/09/2012 01:54 PM, Rafael J. Wysocki wrote:
 On Monday, July 09, 2012, Stephen Rothwell wrote:
 Hi Rafael,

 I noticed commit b8eec56cd8e5 (PM / cpuidle: System resume hang fix with
 cpuidle) in the pm tree needs some work (I noticed it because it was
 changed in a rebase ...).

 diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
 index a6b3f2e..b90ccb2 100644
 --- a/include/linux/cpuidle.h
 +++ b/include/linux/cpuidle.h
 @@ -146,6 +146,8 @@ extern void cpuidle_unregister_device(struct 
 cpuidle_device *dev);
  
  extern void cpuidle_pause_and_lock(void);
  extern void cpuidle_resume_and_unlock(void);
 +extern void cpuidle_pause(void);
 +extern void cpuidle_resume(void);
  extern int cpuidle_enable_device(struct cpuidle_device *dev);
  extern void cpuidle_disable_device(struct cpuidle_device *dev);
  extern int cpuidle_wrap_enter(struct cpuidle_device *dev,
 @@ -169,6 +171,8 @@ static inline void cpuidle_unregister_device(struct 
 cpuidle_device *dev) { }
  
  static inline void cpuidle_pause_and_lock(void) { }
  static inline void cpuidle_resume_and_unlock(void) { }
 +static inline cpuidle_pause(void) { }
 +static inline cpuidle_resume(void) { }

 These need to be static inline void.  I wonder what review and build
 testing this went through (the above should produce warnings since they
 are non void returning functions with no return statements).
 
 Thanks for reporting this, I tried to fix a build issue in the original patch

I apologise for not having taken care of the above build scenario.

 hastily and failed miserably as you have noticed and then I build-tested a
 wrong tree.  Sorry.
 
 It should be fixed now for real.
 
 Thanks,
 Rafael
 

Regards
Preeti

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/6] staging: vt6655: Cleanup in usage of macros

2012-07-09 Thread Marcos Souza

2012/7/9 Joe Perches j...@perches.com

 On Sun, 2012-07-08 at 23:51 -0300, Marcos Paulo de Souza wrote:
  Hi kernel guys!
 
  This patchset aims to clean all unused and commented macros.
 
  For this challenge, forgotten-macros tool helped us.

 Perhaps there may be false positives in your code.

Not in this case. After each changes in a file, I compiled all driver again.

 Many times, macros like the below are used:

 #define SUBSYSTEM_PREFIX_FOO1
 #define SUBSYSTEM_PREFIX_BAR2
 #define SUBSYSTEM_PREFIX_BAZ3

 #define USE_TYPE(type)  SUBSYSTEM_PREFIX_##type

 It doesn't seem your code knows that style.

True! The tool is under development. A more robust method will be
 implemented, but for now, the tool can find the most easy dead macros.

 Also, the tool might be more flexible if it was
 written using perl or python.

Yeah, I believe it's true!

Thanks for the comments!

But, for the changes, do I have your ack?

Thanks Joe!

--
Att,

Marcos Paulo de Souza
Acadêmico de Ciencia da Computação - FURB - SC
Uma vida sem desafios é uma vida sem razão
A life without challenges, is a non reason life
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 82571EB: Detected Hardware Unit Hang

2012-07-09 Thread Eric Dumazet

On Mon, 2012-07-09 at 16:51 +0800, Joe Jin wrote:
 Hi list,
 
 I'm seeing a Unit Hang even with the latest e1000e driver 2.0.0 when doing
 scp test. this issue is easy do reproduced on SUN FIRE X2270 M2, just copy
 a big file (500M) from another server will hit it at once. 
 
 Would you please help on this?
 

Its a known problem.

But apparently Intel guys are not very responsive, as they have another
patch than the following :

http://permalink.gmane.org/gmane.linux.network/232669


We only have to wait they push their alternative patch, eventually.

In the mean time, you can use Hiroaki SHIMODA patch, it works.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] apic: fix kvm build on UP without IOAPIC

2012-07-09 Thread Ingo Molnar

* Michael S. Tsirkin m...@redhat.com wrote:

On Fri, Jul 06, 2012 at 04:12:23PM +0200, Ingo Molnar wrote:

* Marcelo Tosatti mtosa...@redhat.com wrote:

On Fri, Jul 06, 2012 at 01:13:14PM +0200, Ingo Molnar wrote:

* H. Peter Anvin h...@zytor.com wrote:

On 07/01/2012 08:05 AM, Michael S. Tsirkin wrote:
On UP i386, when APIC is disabled
# CONFIG_X86_UP_APIC is not set
# CONFIG_PCI_IOAPIC is not set

code looking at apicdrivers never has any effect but it
still gets compiled in. In particular, this causes
build failures with kvm, but it generally bloats the kernel
unnecessarily.

Fix by defining both __apicdrivers and __apicdrivers_end
to be NULL when CONFIG_X86_LOCAL_APIC is unset: I verified
that as the result any loop scanning __apicdrivers gets optimized
out by
the compiler.

Warning: a .config with apic disabled doesn't seem to boot
for me (even without this patch). Still verifying why,
meanwhile this patch is compile-tested only.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

Note: if this patch makes sense, can x86 maintainers
please ACK applying it through the kvm tree, since that is
where we see the issue that it addresses?
Avi, Marcelo, maybe you can carry this in kvm/linux-next as a
temporary
measure so that linux-next builds?

I'm not happy about that as a workflow, but since you guys have an
immediate problem I guess we can do that.

I'm rather unhappy about this workflow - we've got quite a few
apic bits in the x86 tree this cycle as well and need extra
external interaction, not.

Which KVM tree commit caused this, could someone please give a
lkml link or quote it here? It's not referenced in the fix patch
either.

Thanks,

Ingo

This tree (kvm.git next):

http://git.kernel.org/?p=virt/kvm/kvm.git;a=shortlog;h=refs/heads/next

Introduced by this commit:

http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=ab9cf4996bb989983e73da894b8dd0239aa2c3c2

This bit:

+ if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
+ struct apic **drv;
+
+ for (drv = __apicdrivers; drv __apicdrivers_end; drv++) {
+ /* Should happen once for each apic */
+ WARN_ON((*drv)-eoi_write == kvm_guest_apic_eoi_write);
+ (*drv)-eoi_write = kvm_guest_apic_eoi_write;
+ }
+ }
+

is rather disgusting I have to say.

WTH is the KVM code meddling with core x86 apic driver data
structures directly? At minimum factor this out and create a
proper apic.c function which is EXPORT_SYMBOL_GPL() exported or
so...

Thanks,

Ingo

OK, so apic_set_eoi_write()?

Yes, with a changelog comment analyzing the design decisions and
locking here - what happens if actual APIC driver use races with
this update on SMP, why is it all safe, etc?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

[PATCH v4 0/5] ARM: topology: set the capacity of each cores for big.LITTLE

This patchset creates an arch_scale_freq_power function for ARM, which is used 
to set the relative capacity of each core of a big.LITTLE system. It also 
removes
the broken power estimation of x86.

Modification since v3:
 - Add comments
 - Add optimization for SMP system
 - Ensure that capacity of a CPU will be at most 1

Modification since v2:
 - set_power_scale function becomes static
 - Rework loop in update_siblings_masks
 - Remove useless code in parse_dt_topology

Modification since v1:
 - Add and update explanation about the use of the table and the range of the 
value 
 - Remove the use of NR_CPUS and use nr_cpu_ids instead
 - Remove broken power estimation of x86

Peter Zijlstra (1):
  sched, x86: Remove broken power estimation

Vincent Guittot (4):
  ARM: topology: Add arch_scale_freq_power function
  ARM: topology: factorize the update of sibling masks
  ARM: topology: Update cpu_power according to DT information
  sched: cpu_power: enable ARCH_POWER

 arch/arm/kernel/topology.c   |  239 ++
 arch/x86/kernel/cpu/Makefile |2 +-
 arch/x86/kernel/cpu/sched.c  |   55 --
 kernel/sched/features.h  |2 +-
 4 files changed, 219 insertions(+), 79 deletions(-)
 delete mode 100644 arch/x86/kernel/cpu/sched.c

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 1/5] ARM: topology: Add arch_scale_freq_power function

Add infrastructure to be able to modify the cpu_power of each core

Signed-off-by: Vincent Guittot vincent.guit...@linaro.org
Reviewed-by: Namhyung Kim namhy...@kernel.org
---
 arch/arm/kernel/topology.c |   38 +-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 8200dea..51f23b3 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -22,6 +22,37 @@
 #include asm/cputype.h
 #include asm/topology.h
 
+/*
+ * cpu power scale management
+ */
+
+/*
+ * cpu power table
+ * This per cpu data structure describes the relative capacity of each core.
+ * On a heteregenous system, cores don't have the same computation capacity
+ * and we reflect that difference in the cpu_power field so the scheduler can
+ * take this difference into account during load balance. A per cpu structure
+ * is preferred because each CPU updates its own cpu_power field during the
+ * load balance except for idle cores. One idle core is selected to run the
+ * rebalance_domains for all idle cores and the cpu_power can be updated
+ * during this sequence.
+ */
+static DEFINE_PER_CPU(unsigned long, cpu_scale);
+
+unsigned long arch_scale_freq_power(struct sched_domain *sd, int cpu)
+{
+   return per_cpu(cpu_scale, cpu);
+}
+
+static void set_power_scale(unsigned int cpu, unsigned long power)
+{
+   per_cpu(cpu_scale, cpu) = power;
+}
+
+/*
+ * cpu topology management
+ */
+
 #define MPIDR_SMP_BITMASK (0x3  30)
 #define MPIDR_SMP_VALUE (0x2  30)
 
@@ -41,6 +72,9 @@
 #define MPIDR_LEVEL2_MASK 0xFF
 #define MPIDR_LEVEL2_SHIFT 16
 
+/*
+ * cpu topology table
+ */
 struct cputopo_arm cpu_topology[NR_CPUS];
 
 const struct cpumask *cpu_coregroup_mask(int cpu)
@@ -134,7 +168,7 @@ void init_cpu_topology(void)
 {
unsigned int cpu;
 
-   /* init core mask */
+   /* init core mask and power*/
for_each_possible_cpu(cpu) {
struct cputopo_arm *cpu_topo = (cpu_topology[cpu]);
 
@@ -143,6 +177,8 @@ void init_cpu_topology(void)
cpu_topo-socket_id = -1;
cpumask_clear(cpu_topo-core_sibling);
cpumask_clear(cpu_topo-thread_sibling);
+
+   set_power_scale(cpu, SCHED_POWER_SCALE);
}
smp_wmb();
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 2/5] ARM: topology: factorize the update of sibling masks

This factorization has also been proposed in another patch that has not been
merged yet:
http://lists.infradead.org/pipermail/linux-arm-kernel/2012-January/080873.html
So, this patch could be dropped depending of the state of the other one.

Signed-off-by: Lorenzo Pieralisi lorenzo.pieral...@arm.com
Signed-off-by: Vincent Guittot vincent.guit...@linaro.org
Reviewed-by: Namhyung Kim namhy...@kernel.org
---
 arch/arm/kernel/topology.c |   48 +---
 1 file changed, 27 insertions(+), 21 deletions(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 51f23b3..eb5fc81 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -82,6 +82,32 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
return cpu_topology[cpu].core_sibling;
 }
 
+void update_siblings_masks(unsigned int cpuid)
+{
+   struct cputopo_arm *cpu_topo, *cpuid_topo = cpu_topology[cpuid];
+   int cpu;
+
+   /* update core and thread sibling masks */
+   for_each_possible_cpu(cpu) {
+   cpu_topo = cpu_topology[cpu];
+
+   if (cpuid_topo-socket_id != cpu_topo-socket_id)
+   continue;
+
+   cpumask_set_cpu(cpuid, cpu_topo-core_sibling);
+   if (cpu != cpuid)
+   cpumask_set_cpu(cpu, cpuid_topo-core_sibling);
+
+   if (cpuid_topo-core_id != cpu_topo-core_id)
+   continue;
+
+   cpumask_set_cpu(cpuid, cpu_topo-thread_sibling);
+   if (cpu != cpuid)
+   cpumask_set_cpu(cpu, cpuid_topo-thread_sibling);
+   }
+   smp_wmb();
+}
+
 /*
  * store_cpu_topology is called at boot when only one cpu is running
  * and with the mutex cpu_hotplug.lock locked, when several cpus have booted,
@@ -91,7 +117,6 @@ void store_cpu_topology(unsigned int cpuid)
 {
struct cputopo_arm *cpuid_topo = cpu_topology[cpuid];
unsigned int mpidr;
-   unsigned int cpu;
 
/* If the cpu topology has been already set, just return */
if (cpuid_topo-core_id != -1)
@@ -133,26 +158,7 @@ void store_cpu_topology(unsigned int cpuid)
cpuid_topo-socket_id = -1;
}
 
-   /* update core and thread sibling masks */
-   for_each_possible_cpu(cpu) {
-   struct cputopo_arm *cpu_topo = cpu_topology[cpu];
-
-   if (cpuid_topo-socket_id == cpu_topo-socket_id) {
-   cpumask_set_cpu(cpuid, cpu_topo-core_sibling);
-   if (cpu != cpuid)
-   cpumask_set_cpu(cpu,
-   cpuid_topo-core_sibling);
-
-   if (cpuid_topo-core_id == cpu_topo-core_id) {
-   cpumask_set_cpu(cpuid,
-   cpu_topo-thread_sibling);
-   if (cpu != cpuid)
-   cpumask_set_cpu(cpu,
-   cpuid_topo-thread_sibling);
-   }
-   }
-   }
-   smp_wmb();
+   update_siblings_masks(cpuid);
 
printk(KERN_INFO CPU%u: thread %d, cpu %d, socket %d, mpidr %x\n,
cpuid, cpu_topology[cpuid].thread_id,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 3/5] ARM: topology: Update cpu_power according to DT information

Use cpu compatibility field and clock-frequency field of DT to
estimate the capacity of each core of the system and to update
the cpu_power field accordingly.
This patch enables to put more running tasks on big cores than
on LITTLE ones. But this patch doesn't ensure that long running
tasks will run on big cores and short ones on LITTLE cores.

Signed-off-by: Vincent Guittot vincent.guit...@linaro.org
Reviewed-by: Namhyung Kim namhy...@kernel.org
---
 arch/arm/kernel/topology.c |  153 
 1 file changed, 153 insertions(+)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index eb5fc81..198b084 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -17,7 +17,9 @@
 #include linux/percpu.h
 #include linux/node.h
 #include linux/nodemask.h
+#include linux/of.h
 #include linux/sched.h
+#include linux/slab.h
 
 #include asm/cputype.h
 #include asm/topology.h
@@ -49,6 +51,152 @@ static void set_power_scale(unsigned int cpu, unsigned long 
power)
per_cpu(cpu_scale, cpu) = power;
 }
 
+#ifdef CONFIG_OF
+struct cpu_efficiency {
+   const char *compatible;
+   unsigned long efficiency;
+};
+
+/*
+ * Table of relative efficiency of each processors
+ * The efficiency value must fit in 20bit and the final
+ * cpu_scale value must be in the range
+ *   0  cpu_scale  3*SCHED_POWER_SCALE/2
+ * in order to return at most 1 when DIV_ROUND_CLOSEST
+ * is used to compute the capacity of a CPU.
+ * Processors that are not defined in the table,
+ * use the default SCHED_POWER_SCALE value for cpu_scale.
+ */
+struct cpu_efficiency table_efficiency[] = {
+   {arm,cortex-a15, 3891},
+   {arm,cortex-a7,  2048},
+   {NULL, },
+};
+
+struct cpu_capacity {
+   unsigned long hwid;
+   unsigned long capacity;
+};
+
+struct cpu_capacity *cpu_capacity;
+
+unsigned long middle_capacity = 1;
+
+/*
+ * Iterate all CPUs' descriptor in DT and compute the efficiency
+ * (as per table_efficiency). Also calculate a middle efficiency
+ * as close as possible to  (max{eff_i} - min{eff_i}) / 2
+ * This is later used to scale the cpu_power field such that an
+ * 'average' CPU is of middle power. Also see the comments near
+ * table_efficiency[] and update_cpu_power().
+ */
+static void __init parse_dt_topology(void)
+{
+   struct cpu_efficiency *cpu_eff;
+   struct device_node *cn = NULL;
+   unsigned long min_capacity = (unsigned long)(-1);
+   unsigned long max_capacity = 0;
+   unsigned long capacity = 0;
+   int alloc_size, cpu = 0;
+
+   alloc_size = nr_cpu_ids * sizeof(struct cpu_capacity);
+   cpu_capacity = (struct cpu_capacity *)kzalloc(alloc_size, GFP_NOWAIT);
+
+   while ((cn = of_find_node_by_type(cn, cpu))) {
+   const u32 *rate, *reg;
+   int len;
+
+   if (cpu = num_possible_cpus())
+   break;
+
+   for (cpu_eff = table_efficiency; cpu_eff-compatible; cpu_eff++)
+   if (of_device_is_compatible(cn, cpu_eff-compatible))
+   break;
+
+   if (cpu_eff-compatible == NULL)
+   continue;
+
+   rate = of_get_property(cn, clock-frequency, len);
+   if (!rate || len != 4) {
+   pr_err(%s missing clock-frequency property\n,
+   cn-full_name);
+   continue;
+   }
+
+   reg = of_get_property(cn, reg, len);
+   if (!reg || len != 4) {
+   pr_err(%s missing reg property\n, cn-full_name);
+   continue;
+   }
+
+   capacity = ((be32_to_cpup(rate))  20) * cpu_eff-efficiency;
+
+   /* Save min capacity of the system */
+   if (capacity  min_capacity)
+   min_capacity = capacity;
+
+   /* Save max capacity of the system */
+   if (capacity  max_capacity)
+   max_capacity = capacity;
+
+   cpu_capacity[cpu].capacity = capacity;
+   cpu_capacity[cpu++].hwid = be32_to_cpup(reg);
+   }
+
+   if (cpu  num_possible_cpus())
+   cpu_capacity[cpu].hwid = (unsigned long)(-1);
+
+   /* If min and max capacities are equals, we bypass the update of the
+* cpu_scale because all CPUs have the same capacity. Otherwise, we
+* compute a middle_capacity factor that will ensure that the capacity
+* of an 'average' CPU of the system will be as close as possible to
+* SCHED_POWER_SCALE, which is the default value, but with the
+* constraint explained near table_efficiency[].
+*/
+   if (min_capacity == max_capacity)
+   cpu_capacity[0].hwid = (unsigned long)(-1);
+   else if (4*max_capacity  (3*(max_capacity + min_capacity)))
+   middle_capacity = (min_capacity + max_capacity)
+

[PATCH v4 4/5] sched, x86: Remove broken power estimation

From: Peter Zijlstra a.p.zijls...@chello.nl

The x86 sched power implementation has been broken forever and gets in
the way of other stuff, remove it.

For archaeological interest, fixing this code would require dealing with
the cross-cpu calling of these functions and more importantly, we need
to filter idle time out of the a/m-perf stuff because the ratio will go
down to 0 when idle, giving a 0 capacity which is not what we'd want.

Signed-off-by: Peter Zijlstra a.p.zijls...@chello.nl
Link: http://lkml.kernel.org/n/tip-wjjwelpti8f8k7i1pdnzm...@git.kernel.org
---
 arch/x86/kernel/cpu/Makefile |2 +-
 arch/x86/kernel/cpu/sched.c  |   55 --
 2 files changed, 1 insertion(+), 56 deletions(-)
 delete mode 100644 arch/x86/kernel/cpu/sched.c

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 6ab6aa2..c598126 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -14,7 +14,7 @@ CFLAGS_common.o   := $(nostackp)
 
 obj-y  := intel_cacheinfo.o scattered.o topology.o
 obj-y  += proc.o capflags.o powerflags.o common.o
-obj-y  += vmware.o hypervisor.o sched.o mshyperv.o
+obj-y  += vmware.o hypervisor.o mshyperv.o
 obj-y  += rdrand.o
 obj-y  += match.o
 
diff --git a/arch/x86/kernel/cpu/sched.c b/arch/x86/kernel/cpu/sched.c
deleted file mode 100644
index a640ae5..000
--- a/arch/x86/kernel/cpu/sched.c
+++ /dev/null
@@ -1,55 +0,0 @@
-#include linux/sched.h
-#include linux/math64.h
-#include linux/percpu.h
-#include linux/irqflags.h
-
-#include asm/cpufeature.h
-#include asm/processor.h
-
-#ifdef CONFIG_SMP
-
-static DEFINE_PER_CPU(struct aperfmperf, old_perf_sched);
-
-static unsigned long scale_aperfmperf(void)
-{
-   struct aperfmperf val, *old = __get_cpu_var(old_perf_sched);
-   unsigned long ratio, flags;
-
-   local_irq_save(flags);
-   get_aperfmperf(val);
-   local_irq_restore(flags);
-
-   ratio = calc_aperfmperf_ratio(old, val);
-   *old = val;
-
-   return ratio;
-}
-
-unsigned long arch_scale_freq_power(struct sched_domain *sd, int cpu)
-{
-   /*
-* do aperf/mperf on the cpu level because it includes things
-* like turbo mode, which are relevant to full cores.
-*/
-   if (boot_cpu_has(X86_FEATURE_APERFMPERF))
-   return scale_aperfmperf();
-
-   /*
-* maybe have something cpufreq here
-*/
-
-   return default_scale_freq_power(sd, cpu);
-}
-
-unsigned long arch_scale_smt_power(struct sched_domain *sd, int cpu)
-{
-   /*
-* aperf/mperf already includes the smt gain
-*/
-   if (boot_cpu_has(X86_FEATURE_APERFMPERF))
-   return SCHED_LOAD_SCALE;
-
-   return default_scale_smt_power(sd, cpu);
-}
-
-#endif
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4 5/5] sched: cpu_power: enable ARCH_POWER