date:20170706

[PATCH] mtd: nand: vf610_nfc: add NULL check on of_match_device() return value

2017-07-06 Thread Gustavo A. R. Silva

Check return value from call to of_match_device()
in order to prevent a NULL pointer dereference.

In case of NULL print error message and return -ENODEV

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/mtd/nand/vf610_nfc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/mtd/nand/vf610_nfc.c b/drivers/mtd/nand/vf610_nfc.c
index 744ab10..ca0ab96 100644
--- a/drivers/mtd/nand/vf610_nfc.c
+++ b/drivers/mtd/nand/vf610_nfc.c
@@ -674,6 +674,11 @@ static int vf610_nfc_probe(struct platform_device *pdev)
}
 
of_id = of_match_device(vf610_nfc_dt_ids, &pdev->dev);
+   if (!of_id) {
+   dev_err(&pdev->dev, "Failed to match device!\n");
+   return -ENODEV;
+   }
+
nfc->variant = (enum vf610_nfc_variant)of_id->data;
 
for_each_available_child_of_node(nfc->dev->of_node, child) {
-- 
2.5.0

Re: [PATCH v12 02/10] powerpc/powernv: Autoload IMC device driver module

2017-07-06 Thread Michael Ellerman

Hi Maddy/Anju,

Comments below :)

Anju T Sudhakar  writes:
> Code to create platform device for the IMC counters.
> Paltform devices are created based on the IMC compatibility
> string.
>
> New Config flag "CONFIG_HV_PERF_IMC_CTRS" add to contain the
> IMC counter changes.

I don't think we need a separate config, it can just use
CONFIG_PPC_POWERNV.

I don't think we'll ever want to turn it off for powernv, unless we're
trying to build a small kernel, in which case we'll turn of PERF
entirely.

> diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
> b/arch/powerpc/platforms/powernv/opal-imc.c
> new file mode 100644
> index ..5b1045c81af4
> --- /dev/null
> +++ b/arch/powerpc/platforms/powernv/opal-imc.c
> @@ -0,0 +1,73 @@
> +/*
> + * OPAL IMC interface detection driver
> + * Supported on POWERNV platform
> + *
> + * Copyright (C) 2017 Madhavan Srinivasan, IBM Corporation.
> + *   (C) 2017 Anju T Sudhakar, IBM Corporation.
> + *   (C) 2017 Hemant K Shaw, IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.

We usually don't include that part in every file.

> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static int opal_imc_counters_probe(struct platform_device *pdev)
> +{
> + struct device_node *imc_dev = NULL;
> +
> + if (!pdev || !pdev->dev.of_node)
> + return -ENODEV;

We don't need that level of paranoia :)

> + /*
> +  * Check whether this is kdump kernel. If yes, just return.
> +  */
> + if (is_kdump_kernel())
> + return -ENODEV;

Hmm, that's a bit unusual. Is there any particular reason to do that for
this driver?

> + imc_dev = pdev->dev.of_node;
> + if (!imc_dev)
> + return -ENODEV;
> +
> + return 0;
> +}
> +
> +static const struct of_device_id opal_imc_match[] = {
> + { .compatible = IMC_DTB_COMPAT },
> + {},
> +};
> +
> +static struct platform_driver opal_imc_driver = {
> + .driver = {
> + .name = "opal-imc-counters",
> + .of_match_table = opal_imc_match,
> + },
> + .probe = opal_imc_counters_probe,
> +};
> +

This can't be built as a module, so it should not be using MODULE macros.

> +MODULE_DEVICE_TABLE(of, opal_imc_match);

Drop that.

> +module_platform_driver(opal_imc_driver);

Use builtin_platform_driver().

> +MODULE_DESCRIPTION("PowerNV OPAL IMC driver");
> +MODULE_LICENSE("GPL");

Drop those.

> diff --git a/arch/powerpc/platforms/powernv/opal.c 
> b/arch/powerpc/platforms/powernv/opal.c
> index 59684b4af4d1..fbdca259ea76 100644
> --- a/arch/powerpc/platforms/powernv/opal.c
> +++ b/arch/powerpc/platforms/powernv/opal.c
> @@ -705,6 +707,17 @@ static void opal_pdev_init(const char *compatible)
>   of_platform_device_create(np, NULL, NULL);
>  }
>  
> +#ifdef CONFIG_HV_PERF_IMC_CTRS
> +static void __init opal_imc_init_dev(void)
> +{
> + struct device_node *np;
> +
> + np = of_find_compatible_node(NULL, NULL, IMC_DTB_COMPAT);
> + if (np)
> + of_platform_device_create(np, NULL, NULL);
> +}
> +#endif

That doesn't need the #ifdef.

>  static int kopald(void *unused)
>  {
>   unsigned long timeout = msecs_to_jiffies(opal_heartbeat) + 1;
> @@ -778,6 +791,11 @@ static int __init opal_init(void)
>   /* Setup a heatbeat thread if requested by OPAL */
>   opal_init_heartbeat();
>  
> +#ifdef CONFIG_HV_PERF_IMC_CTRS
> + /* Detect IMC pmu counters support and create PMUs */
> + opal_imc_init_dev();
> +#endif
> +

Neither here.

>   /* Create leds platform devices */
>   leds = of_find_node_by_path("/ibm,opal/leds");
>   if (leds) {
> -- 
> 2.11.0


cheers

[PATCH] iio: adc: rockchip_saradc: add NULL check on of_match_device() return value

2017-07-06 Thread Gustavo A. R. Silva

Check return value from call to of_match_device()
in order to prevent a NULL pointer dereference.

In case of NULL print error message and return -ENODEV

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/iio/adc/rockchip_saradc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/iio/adc/rockchip_saradc.c 
b/drivers/iio/adc/rockchip_saradc.c
index ae6d332..2bf2ed1 100644
--- a/drivers/iio/adc/rockchip_saradc.c
+++ b/drivers/iio/adc/rockchip_saradc.c
@@ -224,6 +224,11 @@ static int rockchip_saradc_probe(struct platform_device 
*pdev)
info = iio_priv(indio_dev);
 
match = of_match_device(rockchip_saradc_match, &pdev->dev);
+   if (!match) {
+   dev_err(&pdev->dev, "failed to match device\n");
+   return -ENODEV;
+   }
+
info->data = match->data;
 
mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-- 
2.5.0

Re: [PATCH 21/21] x86/intel_rdt/mbm: Handle counter overflow

2017-07-06 Thread Thomas Gleixner

On Thu, 6 Jul 2017, Shivappa Vikas wrote:
> On Sun, 2 Jul 2017, Thomas Gleixner wrote:
> > On Mon, 26 Jun 2017, Vikas Shivappa wrote:
> > > +static void mbm_update(struct rdt_domain *d, int rmid)
> > > +{
> > > + struct rmid_read rr;
> > > +
> > > + rr.first = false;
> > > + rr.d = d;
> > > +
> > > + if (is_mbm_total_enabled()) {
> > > + rr.evtid = QOS_L3_MBM_TOTAL_EVENT_ID;
> > > + __mon_event_count(rmid, &rr);
> > 
> > This is broken as it is not protected against a concurrent read from user
> > space which comes in via a smp function call.
> 
> The read from user also has the rdtgroup_mutex.

Which is again, completely non obvious and undocumented in the code.

Aside of that, are you really serious about serializing the world and
everything on a single global mutex?

Thanks,

tglx

[PATCH v2] net: axienet: add support for standard phy-mode binding

2017-07-06 Thread Alvaro Gamez Machado

Keep supporting proprietary "xlnx,phy-type" attribute and add support for
MII connectivity to the PHY.

Signed-off-by: Alvaro Gamez Machado 
---

Changes since v1:

* Renamed phy_type to phy_mode.  No other instances of this struct
  member were found except for those we wanted to change, so there's
  no other hidden meaning behind this.

* Added Device Tree Binding document specifying required and
optional properties, an example.  Also, make a explicit note of why
this driver is incompatible with AXI DMA driver

 .../devicetree/bindings/net/xilinx_axienet.txt | 55 ++
 drivers/net/ethernet/xilinx/xilinx_axienet.h   |  4 +-
 drivers/net/ethernet/xilinx/xilinx_axienet_main.c  | 48 ++-
 3 files changed, 93 insertions(+), 14 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/xilinx_axienet.txt

diff --git a/Documentation/devicetree/bindings/net/xilinx_axienet.txt 
b/Documentation/devicetree/bindings/net/xilinx_axienet.txt
new file mode 100644
index ..38f9ec076743
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/xilinx_axienet.txt
@@ -0,0 +1,55 @@
+XILINX AXI ETHERNET Device Tree Bindings
+
+
+Also called  AXI 1G/2.5G Ethernet Subsystem, the xilinx axi ethernet IP core
+provides connectivity to an external ethernet PHY supporting different
+interfaces: MII, GMII, RGMII, SGMII, 1000BaseX. It also includes two
+segments of memory for buffering TX and RX, as well as the capability of
+offloading TX/RX checksum calculation off the processor.
+
+Management configuration is done through the AXI interface, while payload is
+sent and received through means of an AXI DMA controller. This driver
+includes the DMA driver code, so this driver is incompatible with AXI DMA
+driver.
+
+For more details about mdio please refer phy.txt file in the same directory.
+
+Required properties:
+- compatible   : Must be one of "xlnx,axi-ethernet-1.00.a",
+ "xlnx,axi-ethernet-1.01.a", "xlnx,axi-ethernet-2.01.a"
+- reg  : Address and length of the IO space.
+- interrupts   : Should be a list of two interrupt, TX and RX.
+- phy-handle   : Should point to the external phy device.
+ See ethernet.txt file in the same directory.
+- xlnx,rxmem   : Set to allocated memory buffer for Rx/Tx in the hardware
+
+Optional properties:
+- phy-mode : See ethernet.txt
+- xlnx,phy-type: Deprecated, do not use, but still accepted in 
preference
+ to phy-mode.
+- xlnx,txcsum  : 0 or empty for disabling TX checksum offload,
+ 1 to enable partial TX checksum offload,
+ 2 to enable full TX checksum offload
+- xlnx,rxcsum  : Same values as xlnx,txcsum but for RX checksum offload
+
+Example:
+   axi_ethernet_eth: ethernet@40c0 {
+   compatible = "xlnx,axi-ethernet-1.00.a";
+   device_type = "network";
+   interrupt-parent = <µblaze_0_axi_intc>;
+   interrupts = <2 0>;
+   phy-mode = "mii";
+   reg = <0x40c0 0x4>;
+   xlnx,rxcsum = <0x2>;
+   xlnx,rxmem = <0x800>;
+   xlnx,txcsum = <0x2>;
+   phy-handle = <&phy0>;
+   axi_ethernetlite_0_mdio: mdio {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   phy0: phy@0 {
+   device_type = "ethernet-phy";
+   reg = <1>;
+   };
+   };
+   };
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h 
b/drivers/net/ethernet/xilinx/xilinx_axienet.h
index af27f7d1cbf3..48e939a5fa31 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet.h
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h
@@ -389,7 +389,7 @@ struct axidma_bd {
  * @dma_err_tasklet: Tasklet structure to process Axi DMA errors
  * @tx_irq:Axidma TX IRQ number
  * @rx_irq:Axidma RX IRQ number
- * @phy_type:  Phy type to identify between MII/GMII/RGMII/SGMII/1000 Base-X
+ * @phy_mode:  Phy type to identify between MII/GMII/RGMII/SGMII/1000 Base-X
  * @options:   AxiEthernet option word
  * @last_link: Phy link state in which the PHY was negotiated earlier
  * @features:  Stores the extended features supported by the axienet hw
@@ -432,7 +432,7 @@ struct axienet_local {
 
int tx_irq;
int rx_irq;
-   u32 phy_type;
+   u32 phy_mode;
 
u32 options;/* Current options word */
u32 last_link;
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c 
b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
index 33c595f4691d..2f759d448383 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
@@ -531,11 +531,11 @@ static void axienet_adjust_link(struct net_device

[PATCH] include: platform_device: add pdev_info(), pdev_warn, ... convencience macros

2017-07-06 Thread Enrico Weigelt, metux IT consult

---
 include/linux/platform_device.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 98c2a7c7108e..723c209d3760 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -368,4 +368,11 @@ extern int platform_pm_restore(struct device *dev);
 #define USE_PLATFORM_PM_SLEEP_OPS
 #endif
 
+#define pdev_crit(pdev, fmt, args...)  dev_crit(&pdev->dev, fmt, ##args)
+#define pdev_err(pdev, fmt, args...)   dev_err(&pdev->dev, fmt, ##args)
+#define pdev_warn(pdev, fmt, args...)  dev_warn(&pdev->dev, fmt, ##args)
+#define pdev_notice(pdev, fmt, args...)dev_notice(&pdev->dev, fmt, 
##args)
+#define pdev_info(pdev, fmt, args...)  dev_info(&pdev->dev, fmt, ##args)
+#define pdev_debug(pdev, fmt, args...) dev_debug(&pdev->dev, fmt, ##args)
+
 #endif /* _PLATFORM_DEVICE_H_ */
-- 
2.11.0.rc0.7.gbe5a750

Re: [PATCH 19/21] x86/intel_rdt/mbm: Basic counting of MBM events (total and local)

2017-07-06 Thread Thomas Gleixner

On Thu, 6 Jul 2017, Shivappa Vikas wrote:
> On Sun, 2 Jul 2017, Thomas Gleixner wrote:
> > >   INIT_LIST_HEAD(&r->evt_list);
> > > 
> > >   if (rdt_mon_features & (1 << QOS_L3_OCCUP_EVENT_ID))
> > >   list_add_tail(&llc_occupancy_event.list, &r->evt_list);
> > > + if (is_mbm_total_enabled())
> > > + list_add_tail(&mbm_total_event.list, &r->evt_list);
> > > + if (is_mbm_local_enabled())
> > > + list_add_tail(&mbm_local_event.list, &r->evt_list);
> > 
> > Confused. This hooks all monitoring features to RDT_RESOURCE_L3. Why?
> 
> They are really L3 resource events as per the spec.
> CPUID.(EAX=0FH, ECX=0):EDX.L3[bit 1] = 1 if L3 monitoring and we query for all
> the llc_occupancy, l3 total and local b/w with the same resource id 1.

Then this should be documented somewhere in the code

[PATCH] iio: adc: meson-saradc: add NULL check on of_match_device() return value

2017-07-06 Thread Gustavo A. R. Silva

Check return value from call to of_match_device()
in order to prevent a NULL pointer dereference.

In case of NULL print error message and return -ENODEV

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/iio/adc/meson_saradc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/iio/adc/meson_saradc.c b/drivers/iio/adc/meson_saradc.c
index 83da50e..07dbcc3 100644
--- a/drivers/iio/adc/meson_saradc.c
+++ b/drivers/iio/adc/meson_saradc.c
@@ -915,6 +915,11 @@ static int meson_sar_adc_probe(struct platform_device 
*pdev)
init_completion(&priv->done);
 
match = of_match_device(meson_sar_adc_of_match, &pdev->dev);
+   if (!match) {
+   dev_err(&pdev->dev, "failed to match device\n");
+   return -ENODEV;
+   }
+
priv->data = match->data;
 
indio_dev->name = priv->data->name;
-- 
2.5.0

Re: [PATCH 13/21] x86/intel_rdt/cqm: Add cpus file support

2017-07-06 Thread Thomas Gleixner

On Thu, 6 Jul 2017, Shivappa Vikas wrote:
> On Sun, 2 Jul 2017, Thomas Gleixner wrote:
> > > + /* Check whether cpus belong to parent ctrl group */
> > > + cpumask_andnot(tmpmask, newmask, &pr->cpu_mask);
> > > + if (cpumask_weight(tmpmask)) {
> > > + ret = -EINVAL;
> > > + goto out;
> > > + }
> > > +
> > > + /* Check whether cpus are dropped from this group */
> > > + cpumask_andnot(tmpmask, &rdtgrp->cpu_mask, newmask);
> > > + if (cpumask_weight(tmpmask)) {
> > > + /* Give any dropped cpus to parent rdtgroup */
> > > + cpumask_or(&pr->cpu_mask, &pr->cpu_mask, tmpmask);
> > 
> > This does not make any sense. The check above verifies that all cpus in
> > newmask belong to the parent->cpu_mask. If they don't then you return
> > -EINVAL, but here you give them back to parent->cpu_mask. How is that
> > supposed to work? You never get into this code path!
> 
> The parent->cpu_mask always is the parent->cpus_valid_mask if i understand
> right. With monitor group, the cpu is present is always present in "one"
> ctrl_mon group and one mon_group. And the mon group can have only cpus in its
> parent. May be it needs a comment? (its explaind in the documentation patch).

Sigh, the code needs to be written in a way that it is halfways obvious
what's going on.

> # mkdir /sys/fs/resctrl/p1
> # mkdir /sys/fs/resctrl/p1/mon_groups/m1
> # echo 5-10 > /sys/fs/resctr/p1/cpus_list
> Say p1 has RMID 2
> cpus 5-10 have RMID 2

So what you say, is that parent is always the resource control group
itself.

Can we please have a proper distinction in the code? I tripped over that
ambigiousities several times.

The normal meaning of parent->child relations is that both have the same
type. While this is the case at the implementation detail level (both are
type struct rdtgroup), from a conceptual level they are different:

  parent is a resource group and child is a monitoring group

That should be expressed in the code, at the very least by variable naming,
so it becomes immediately clear that this operates on two different
entities.

The proper solution is to have different data types or at least embedd the
monitoring bits in a seperate entity inside of struct rdtgroup.

struct mongroup {
monitoring stuff;
};

struct rdtgroup {
common stuff;
struct mongroup mon;
};

So the code can operate on r->mon.foo or mon->foo which makes it entirely
clear what kind of operation this is.

Sigh, cramming everything into a single struct without distinction is the
same as operating on a pile of global variables, which is the most common
pattern used by people learning C. You certainly belong not to that group,
so dammit, get your act together and structure the code so it's obvious and
maintainable.

Thanks,

tglx

Re: [PATCH v2 4/6] cpufreq: schedutil: update CFS util only if used

2017-07-06 Thread Vikram Mulukutla


On 2017-07-04 10:34, Patrick Bellasi wrote:

Currently the utilization of the FAIR class is collected before locking
the policy. Although that should not be a big issue for most cases, we
also don't really know how much latency there can be between the
utilization reading and its usage.

Let's get the FAIR utilization right before its usage to be better in
sync with the current status of a CPU.

Signed-off-by: Patrick Bellasi 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Rafael J. Wysocki 
Cc: Viresh Kumar 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@vger.kernel.org
---
 kernel/sched/cpufreq_schedutil.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c 
b/kernel/sched/cpufreq_schedutil.c

index 98704d8..df433f1 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -308,10 +308,9 @@ static void sugov_update_shared(struct
update_util_data *hook, u64 time,
if (unlikely(current == sg_policy->thread))
return;

-   sugov_get_util(&util, &max);
-
raw_spin_lock(&sg_policy->update_lock);

+   sugov_get_util(&util, &max);
sg_cpu->util = util;
sg_cpu->max = max;


Given that the utilization update hooks are called with the per-cpu rq 
lock held (for all classes), I don't think PELT utilization can change 
throughout the lifetime of the cpufreq_update_{util,this_cpu} call? Even 
with Viresh's remote cpu callback series we'd still have to hold the rq 
lock across cpufreq_update_util..  what can change today is 'max' 
(arch_scale_cpu_capacity) when a cpufreq policy is shared, so the patch 
is still needed for that reason I think?


Thanks,
Vikram

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH] pwm: sun4i: add NULL check on of_match_device() return value

2017-07-06 Thread Gustavo A. R. Silva

Check return value from call to of_match_device()
in order to prevent a NULL pointer dereference.

In case of NULL print error message and return -ENODEV

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/pwm/pwm-sun4i.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/pwm/pwm-sun4i.c b/drivers/pwm/pwm-sun4i.c
index 1284ffa..295ca19 100644
--- a/drivers/pwm/pwm-sun4i.c
+++ b/drivers/pwm/pwm-sun4i.c
@@ -321,6 +321,10 @@ static int sun4i_pwm_probe(struct platform_device *pdev)
const struct of_device_id *match;
 
match = of_match_device(sun4i_pwm_dt_ids, &pdev->dev);
+   if (!match) {
+   dev_err(&pdev->dev, "failed to match device\n");
+   return -ENODEV;
+   }
 
pwm = devm_kzalloc(&pdev->dev, sizeof(*pwm), GFP_KERNEL);
if (!pwm)
-- 
2.5.0

Re: [next-20170609] Oops while running CPU off-on (cpuset.c/cpuset_can_attach)

2017-07-06 Thread Abdul Haleem

On Wed, 2017-07-05 at 11:28 -0400, Tejun Heo wrote:
> Hello, Abdul.
> 
> Thanks for the debug info.  Can you please see whether the following
> patch fixes the issue?  

It is my pleasure and yes the patch fixes the problem.

> If the problem is too difficult to reproduce

The problem was reproducible all the time. 

With the patch fix, I tried multiple times and long runs of cpu off-on
cycles but no Oops is seen.

Thank you for spending your valuable time on fixing this issue.

Reported-and-tested-by : Abdul Haleem 

> to confirm the fix by seeing whether it no longer triggers, please let
> me know.  We can instead apply a patch which triggers WARN on the
> failing condition to confirm the diagnosis.
> 
> Thanks.
> 
> diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
> index 793565c05742..8b4c3c2f2509 100644
> --- a/kernel/cgroup/cgroup-internal.h
> +++ b/kernel/cgroup/cgroup-internal.h
> @@ -33,6 +33,9 @@ struct cgroup_taskset {
>   struct list_headsrc_csets;
>   struct list_headdst_csets;
> 
> + /* the number of tasks in the set */
> + int nr_tasks;
> +
>   /* the subsys currently being processed */
>   int ssid;
> 
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index dbfd7028b1c6..e3c4152741a3 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -1954,6 +1954,8 @@ static void cgroup_migrate_add_task(struct task_struct 
> *task,
>   if (!cset->mg_src_cgrp)
>   return;
> 
> + mgctx->tset.nr_tasks++;
> +
>   list_move_tail(&task->cg_list, &cset->mg_tasks);
>   if (list_empty(&cset->mg_node))
>   list_add_tail(&cset->mg_node,
> @@ -2047,16 +2049,18 @@ static int cgroup_migrate_execute(struct cgroup_mgctx 
> *mgctx)
>   return 0;
> 
>   /* check that we can legitimately attach to the cgroup */
> - do_each_subsys_mask(ss, ssid, mgctx->ss_mask) {
> - if (ss->can_attach) {
> - tset->ssid = ssid;
> - ret = ss->can_attach(tset);
> - if (ret) {
> - failed_ssid = ssid;
> - goto out_cancel_attach;
> + if (tset->nr_tasks) {
> + do_each_subsys_mask(ss, ssid, mgctx->ss_mask) {
> + if (ss->can_attach) {
> + tset->ssid = ssid;
> + ret = ss->can_attach(tset);
> + if (ret) {
> + failed_ssid = ssid;
> + goto out_cancel_attach;
> + }
>   }
> - }
> - } while_each_subsys_mask();
> + } while_each_subsys_mask();
> + }
> 
>   /*
>* Now that we're guaranteed success, proceed to move all tasks to
> @@ -2085,25 +2089,29 @@ static int cgroup_migrate_execute(struct cgroup_mgctx 
> *mgctx)
>*/
>   tset->csets = &tset->dst_csets;
> 
> - do_each_subsys_mask(ss, ssid, mgctx->ss_mask) {
> - if (ss->attach) {
> - tset->ssid = ssid;
> - ss->attach(tset);
> - }
> - } while_each_subsys_mask();
> + if (tset->nr_tasks) {
> + do_each_subsys_mask(ss, ssid, mgctx->ss_mask) {
> + if (ss->attach) {
> + tset->ssid = ssid;
> + ss->attach(tset);
> + }
> + } while_each_subsys_mask();
> + }
> 
>   ret = 0;
>   goto out_release_tset;
> 
>  out_cancel_attach:
> - do_each_subsys_mask(ss, ssid, mgctx->ss_mask) {
> - if (ssid == failed_ssid)
> - break;
> - if (ss->cancel_attach) {
> - tset->ssid = ssid;
> - ss->cancel_attach(tset);
> - }
> - } while_each_subsys_mask();
> + if (tset->nr_tasks) {
> + do_each_subsys_mask(ss, ssid, mgctx->ss_mask) {
> + if (ssid == failed_ssid)
> + break;
> + if (ss->cancel_attach) {
> + tset->ssid = ssid;
> + ss->cancel_attach(tset);
> + }
> + } while_each_subsys_mask();
> + }
>  out_release_tset:
>   spin_lock_irq(&css_set_lock);
>   list_splice_init(&tset->dst_csets, &tset->src_csets);
> 


-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre

Re: [RFC][PATCH] exec: Use init rlimits for setuid exec

2017-07-06 Thread Kees Cook

On Thu, Jul 6, 2017 at 10:49 PM, Kees Cook  wrote:
> On Thu, Jul 6, 2017 at 10:39 PM, Linus Torvalds
>> And I think the credentials switch (which is the point of no return
>> anyway) happens before we start mmap'ing the executable etc. We used
>> to have some odd code there and do it in the completely wrong order
>> (checking that the binary was executable for the *old* user, which
>> makes no sense, iirc)
>
> Yeah, it all happens in setup_new_exec(). The first thing is layout
> selection, then switching credentials. It could be made to take a hint
> from GNU_STACK (which was parsed before setup_new_exec() is called),
> check security_bprm_secureexec() and then make the rlimit changes, all
> before the layout selection.

At Andy's suggestion I'm using security_bprm_secureexec() to test for
setuid-ness. However, this seems to expect the credentials to have
already been installed. And yet ... the following patch still works
correctly when I call it "early". I'm going to look again in the
morning.

diff --git a/fs/exec.c b/fs/exec.c
index b60804216b59..a4d2433a44ec 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1334,9 +1334,20 @@ EXPORT_SYMBOL(would_dump);

 void setup_new_exec(struct linux_binprm * bprm)
 {
+   /* This is the point of no return */
+
+   /*
+* If this is a setuid execution, reset the stack limit to
+* a sane default to avoid bad behavior from the prior rlimits.
+*/
+   if (security_bprm_secureexec(bprm)) {
+   struct rlimit default_stack = { _STK_LIM, RLIM_INFINITY };
+
+   current->signal->rlim[RLIMIT_STACK] = default_stack;
+   }
+
arch_pick_mmap_layout(current->mm);

-   /* This is the point of no return */
current->sas_ss_sp = current->sas_ss_size = 0;

if (uid_eq(current_euid(), current_uid()) &&
gid_eq(current_egid(), current_gid()))



-- 
Kees Cook
Pixel Security

[PATCH] crypto: brcm: add NULL check on of_match_device() return value

2017-07-06 Thread Gustavo A. R. Silva

Check return value from call to of_match_device()
in order to prevent a NULL pointer dereference.

In case of NULL print error message and return -ENODEV

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/crypto/bcm/cipher.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/crypto/bcm/cipher.c b/drivers/crypto/bcm/cipher.c
index 9cfd36c..f1a826f 100644
--- a/drivers/crypto/bcm/cipher.c
+++ b/drivers/crypto/bcm/cipher.c
@@ -4813,6 +4813,11 @@ static int spu_dt_read(struct platform_device *pdev)
int err;
 
match = of_match_device(of_match_ptr(bcm_spu_dt_ids), dev);
+   if (!match) {
+   dev_err(&pdev->dev, "Failed to match device\n");
+   return -ENODEV;
+   }
+
matched_spu_type = match->data;
 
if (iproc_priv.spu.num_spu > 1) {
-- 
2.5.0

Re: [RFC v2 0/5] surface heterogeneous memory performance information

2017-07-06 Thread Balbir Singh

On Thu, 2017-07-06 at 15:52 -0600, Ross Zwisler wrote:
>  Quick Summary 
> 
> Platforms in the very near future will have multiple types of memory
> attached to a single CPU.  These disparate memory ranges will have some
> characteristics in common, such as CPU cache coherence, but they can have
> wide ranges of performance both in terms of latency and bandwidth.
> 
> For example, consider a system that contains persistent memory, standard
> DDR memory and High Bandwidth Memory (HBM), all attached to the same CPU.
> There could potentially be an order of magnitude or more difference in
> performance between the slowest and fastest memory attached to that CPU.
> 
> With the current Linux code NUMA nodes are CPU-centric, so all the memory
> attached to a given CPU will be lumped into the same NUMA node.  This makes
> it very difficult for userspace applications to understand the performance
> of different memory ranges on a given CPU.
> 
> We solve this issue by providing userspace with performance information on
> individual memory ranges.  This performance information is exposed via
> sysfs:
> 
>   # grep . mem_tgt2/* mem_tgt2/local_init/* 2>/dev/null
>   mem_tgt2/firmware_id:1
>   mem_tgt2/is_cached:0
>   mem_tgt2/is_enabled:1
>   mem_tgt2/is_isolated:0

Could you please explain these charactersitics, are they in the patches
to follow?

>   mem_tgt2/phys_addr_base:0x0
>   mem_tgt2/phys_length_bytes:0x8
>   mem_tgt2/local_init/read_bw_MBps:30720
>   mem_tgt2/local_init/read_lat_nsec:100
>   mem_tgt2/local_init/write_bw_MBps:30720
>   mem_tgt2/local_init/write_lat_nsec:100
>

How to these numbers compare to normal system memory?
 
> This allows applications to easily find the memory that they want to use.
> We expect that the existing NUMA APIs will be enhanced to use this new
> information so that applications can continue to use them to select their
> desired memory.
> 
> This series is built upon acpica-1705:
> 
> https://github.com/zetalog/linux/commits/acpica-1705
> 
> And you can find a working tree here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/zwisler/linux.git/log/?h=hmem_sysfs
> 
>  Lots of Details 
> 
> This patch set is only concerned with CPU-addressable memory types, not
> on-device memory like what we have with Jerome Glisse's HMM series:
> 
> https://lwn.net/Articles/726691/
> 
> This patch set works by enabling the new Heterogeneous Memory Attribute
> Table (HMAT) table, newly defined in ACPI 6.2. One major conceptual change
> in ACPI 6.2 related to this work is that proximity domains no longer need
> to contain a processor.  We can now have memory-only proximity domains,
> which means that we can now have memory-only Linux NUMA nodes.
> 
> Here is an example configuration where we have a single processor, one
> range of regular memory and one range of HBM:
> 
>   +---+   ++
>   | Processor |   | Memory |
>   | prox domain 0 +---+ prox domain 1  |
>   | NUMA node 1   |   | NUMA node 2|
>   +---+---+   ++
>   |
>   +---+--+
>   | HBM  |
>   | prox domain 2|
>   | NUMA node 0  |
>   +--+
> 
> This gives us one initiator (the processor) and two targets (the two memory
> ranges).  Each of these three has its own ACPI proximity domain and
> associated Linux NUMA node.  Note also that while there is a 1:1 mapping
> from each proximity domain to each NUMA node, the numbers don't necessarily
> match up.  Additionally we can have extra NUMA nodes that don't map back to
> ACPI proximity domains.

Could you expand on proximity domains, are they the same as node distance
or is this ACPI terminology for something more?

> 
> The above configuration could also have the processor and one of the two
> memory ranges sharing a proximity domain and NUMA node, but for the
> purposes of the HMAT the two memory ranges will always need to be
> separated.
> 
> The overall goal of this series and of the HMAT is to allow users to
> identify memory using its performance characteristics.  This can broadly be
> done in one of two ways:
> 
> Option 1: Provide the user with a way to map between proximity domains and
> NUMA nodes and a way to access the HMAT directly (probably via
> /sys/firmware/acpi/tables).  Then, through possibly a library and a daemon,
> provide an API so that applications can either request information about
> memory ranges, or request memory allocations that meet a given set of
> performance characteristics.
> 
> Option 2: Provide the user with HMAT performance data directly in sysfs,
> allowing applications to directly access it without the need for the
> library and daemon.
> 
> The kernel work for option 1 is started by patches 1-3.  These just surface
> the minimal amount of information in sysfs to allow userspace to map
> between proximity domains and NUMA nodes so that the raw data in the HMAT
> table can be understood.
> 
> Patches 4

[PATCH] input: tegra-kbc: add NULL check on of_match_device() return value

2017-07-06 Thread Gustavo A. R. Silva

Check return value from call to of_match_device()
in order to prevent a NULL pointer dereference.

In case of NULL print error message and return -ENODEV

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/input/keyboard/tegra-kbc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/input/keyboard/tegra-kbc.c 
b/drivers/input/keyboard/tegra-kbc.c
index 0c07e10..742c5ac 100644
--- a/drivers/input/keyboard/tegra-kbc.c
+++ b/drivers/input/keyboard/tegra-kbc.c
@@ -617,6 +617,10 @@ static int tegra_kbc_probe(struct platform_device *pdev)
const struct of_device_id *match;
 
match = of_match_device(tegra_kbc_of_match, &pdev->dev);
+   if (!match) {
+   dev_err(&pdev->dev, "failed to match device\n");
+   return -ENODEV;
+   }
 
kbc = devm_kzalloc(&pdev->dev, sizeof(*kbc), GFP_KERNEL);
if (!kbc) {
-- 
2.5.0

[RFC PATCH v4] acpi: indicate to platform when hot remove returns busy

2017-07-06 Thread Lee, Chun-Yi

In hotplug logic, it always indicates non-specific failure to
platform through _OST when handing acpi hot-remove event failed. Then
platform terminates the hot-remove process but it can not identify
the reason.

Base on current hot-remove code, there have two situations that it
returns busy:
 - OSPM try to offline an individual device, but the device offline
   function returns busy.
 - When the ejection event is applied to an "not offlined yet" container.
   OSPM send kobject change event to userspace and returns busy.

Both of them will returns -EBUSY to acpi device hotplug function then
hotplug function indicates non-specific failure to platform just like
any other error, e.g. -ENODEV or -EIO.

The benefit to platform for identifying the OS busy state is that
platform can be applied different approach to handle the busy but
not just terminate the hot-remove process by unknown reason. For
example, platform can wait for a while then triggers hot-remove
again.

This RFC patch adds one more parameter to the handler function of
acpi generic hotplug event to give the function a chance to propose
the return code of _OST. In this case, it sets ost return code to
ACPI_OST_SC_DEVICE_BUSY when the acpi hot remove function returns
-EBUSY.

v4:
Use switch-case statements to simplify code. (Andy Shevchenko, Rafael J. 
Wysocki)

v3:
Removed redundant 'else' in acpi_ost_status_code(). (Andy Shevchenko)

v2:
Do not overwrite ost code in acpi_generic_hotplug_event(). Move
the "error code to ost code" logic to a help function. (Andy Shevchenko)

Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Michal Hocko 
Reviewed-by: Andy Shevchenko 
Signed-off-by: "Lee, Chun-Yi" 
---
 drivers/acpi/scan.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index d531629..ce88175 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -404,10 +404,6 @@ void acpi_device_hotplug(struct acpi_device *adev, u32 src)
error = dock_notify(adev, src);
} else if (adev->flags.hotplug_notify) {
error = acpi_generic_hotplug_event(adev, src);
-   if (error == -EPERM) {
-   ost_code = ACPI_OST_SC_EJECT_NOT_SUPPORTED;
-   goto err_out;
-   }
} else {
int (*notify)(struct acpi_device *, u32);
 
@@ -423,8 +419,20 @@ void acpi_device_hotplug(struct acpi_device *adev, u32 src)
else
goto out;
}
-   if (!error)
+   switch (error) {
+   case 0:
ost_code = ACPI_OST_SC_SUCCESS;
+   break;
+   case -EPERM:
+   ost_code = ACPI_OST_SC_EJECT_NOT_SUPPORTED;
+   break;
+   case -EBUSY:
+   ost_code = ACPI_OST_SC_DEVICE_BUSY;
+   break;
+   default:
+   ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE;
+   break;
+   }
 
  err_out:
acpi_evaluate_ost(adev->handle, src, ost_code, NULL);
-- 
2.10.2

[PATCH v3] acpi: handle the acpi hotplug schedule error

2017-07-06 Thread Lee, Chun-Yi

Kernel should decrements the reference count of acpi device
when the scheduling of acpi hotplug work failed, and evaluates
_OST to notify BIOS the failure.

v3:
More simplify the code. (Rafael J. Wysocki)

v2:
To simplify the code. (Andy Shevchenko)

Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Michal Hocko 
Reviewed-by: Andy Shevchenko 
Signed-off-by: "Lee, Chun-Yi" 
---
 drivers/acpi/bus.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 784bda6..9d4fea6 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -432,11 +432,15 @@ static void acpi_bus_notify(acpi_handle handle, u32 type, 
void *data)
(driver->flags & ACPI_DRIVER_ALL_NOTIFY_EVENTS))
driver->ops.notify(adev, type);
 
-   if (hotplug_event && ACPI_SUCCESS(acpi_hotplug_schedule(adev, type)))
+   if (!hotplug_event) {
+   acpi_bus_put_acpi_device(adev);
+   return;
+   }
+
+   if (ACPI_SUCCESS(acpi_hotplug_schedule(adev, type)))
return;
 
acpi_bus_put_acpi_device(adev);
-   return;
 
  err:
acpi_evaluate_ost(handle, type, ost_code, NULL);
-- 
2.10.2

Re: [PATCH 14/21] x86/intel_rdt/cqm: Add mon_data

2017-07-06 Thread Thomas Gleixner

On Thu, 6 Jul 2017, Shivappa Vikas wrote:
> On Sun, 2 Jul 2017, Thomas Gleixner wrote:
> > > +static bool __mon_event_count(u32 rmid, struct rmid_read *rr)
> > > +{
> > > + u64 tval;
> > > +
> > > + tval = __rmid_read(rmid, rr->evtid);
> > > + if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
> > > + rr->val = tval;
> > > + return false;
> > > + }
> > > + switch (rr->evtid) {
> > > + case QOS_L3_OCCUP_EVENT_ID:
> > > + rr->val += tval;
> > > + return true;
> > > + default:
> > > + return false;
> > 
> > I have no idea what that return code means.
> 
> false for the invalid event id and all errors for __rmid_read. (IOW all errors
> for __mon_event-read)

Sure, but why bool? What's wrong with proper error return codes, so issues
can be distinguished and potentially propagated in the callchain?

Thanks,

tglx

[PATCH] usb: dwc3: omap: remove IRQ_NOAUTOEN used with shared irq

2017-07-06 Thread Vignesh R

IRQ_NOAUTOEN cannot be used with shared IRQs, since commit 04c848d39879
("genirq: Warn when IRQ_NOAUTOEN is used with shared interrupts") and
kernel now throws a warn dump. But OMAP DWC3 driver uses this flag. As
per commit 12a7f17fac5b ("usb: dwc3: omap: fix race of pm runtime with
irq handler in probe") that introduced this flag, PM runtime can race
with IRQ handler when deferred probing happens due to extcon,
therefore IRQ_NOAUTOEN needs to be set so that irq is not enabled until
extcon is registered.

Remove setting of IRQ_NOAUTOEN and move the registration of
shared irq to a point after dwc3_omap_extcon_register() and
of_platform_populate(). This avoids possibility of probe deferring and
above said race condition.

Signed-off-by: Vignesh R 
---
 drivers/usb/dwc3/dwc3-omap.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/usb/dwc3/dwc3-omap.c b/drivers/usb/dwc3/dwc3-omap.c
index 98926504b55b..f5aaa0cf3873 100644
--- a/drivers/usb/dwc3/dwc3-omap.c
+++ b/drivers/usb/dwc3/dwc3-omap.c
@@ -512,15 +512,6 @@ static int dwc3_omap_probe(struct platform_device *pdev)
 
/* check the DMA Status */
reg = dwc3_omap_readl(omap->base, USBOTGSS_SYSCONFIG);
-   irq_set_status_flags(omap->irq, IRQ_NOAUTOEN);
-   ret = devm_request_threaded_irq(dev, omap->irq, dwc3_omap_interrupt,
-   dwc3_omap_interrupt_thread, IRQF_SHARED,
-   "dwc3-omap", omap);
-   if (ret) {
-   dev_err(dev, "failed to request IRQ #%d --> %d\n",
-   omap->irq, ret);
-   goto err1;
-   }
 
ret = dwc3_omap_extcon_register(omap);
if (ret < 0)
@@ -532,8 +523,15 @@ static int dwc3_omap_probe(struct platform_device *pdev)
goto err1;
}
 
+   ret = devm_request_threaded_irq(dev, omap->irq, dwc3_omap_interrupt,
+   dwc3_omap_interrupt_thread, IRQF_SHARED,
+   "dwc3-omap", omap);
+   if (ret) {
+   dev_err(dev, "failed to request IRQ #%d --> %d\n",
+   omap->irq, ret);
+   goto err1;
+   }
dwc3_omap_enable_irqs(omap);
-   enable_irq(omap->irq);
return 0;
 
 err1:
-- 
2.13.0

[PATCH] fpga: altera-hps2fpga: add NULL check on of_match_device() return value

2017-07-06 Thread Gustavo A. R. Silva

Check return value from call to of_match_device()
in order to prevent a NULL pointer dereference.

In case of NULL print error message and return -ENODEV

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/fpga/altera-hps2fpga.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/fpga/altera-hps2fpga.c b/drivers/fpga/altera-hps2fpga.c
index 3066b80..ca8212c 100644
--- a/drivers/fpga/altera-hps2fpga.c
+++ b/drivers/fpga/altera-hps2fpga.c
@@ -143,6 +143,11 @@ static int alt_fpga_bridge_probe(struct platform_device 
*pdev)
int ret;
 
of_id = of_match_device(altera_fpga_of_match, dev);
+   if (!of_id) {
+   dev_err(dev, "failed to match device\n");
+   return -ENODEV;
+   }
+
priv = (struct altera_hps2fpga_data *)of_id->data;
 
priv->bridge_reset = of_reset_control_get_by_index(dev->of_node, 0);
-- 
2.5.0

[PATCH] drm/tegra: vic: add NULL check on of_match_device() return value

2017-07-06 Thread Gustavo A. R. Silva

Check return value from call to of_match_device()
in order to prevent a NULL pointer dereference.

In case of NULL print error message and return.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/gpu/drm/tegra/vic.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index 47cb1aa..7e11825 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -283,6 +283,11 @@ static int vic_probe(struct platform_device *pdev)
int err;
 
match = of_match_device(vic_match, dev);
+   if (!match) {
+   dev_err(&pdev->dev, "failed to match device\n");
+   return -ENODEV;
+   }
+
vic_config = (struct vic_config *)match->data;
 
vic = devm_kzalloc(dev, sizeof(*vic), GFP_KERNEL);
-- 
2.5.0

Re: [PATCH] x86/boot/KASLR: exclude EFI_BOOT_SERVICES_{CODE|DATA} from KASLR's choice

2017-07-06 Thread Naoya Horiguchi

On Fri, Jul 07, 2017 at 11:07:59AM +0800, Baoquan He wrote:
> On 07/06/17 at 03:57pm, Matt Fleming wrote:
> > On Thu, 06 Jul, at 08:31:07AM, Naoya Horiguchi wrote:
> > > + for (i = 0; i < nr_desc; i++) {
> > > + md = (efi_memory_desc_t *)(pmap + (i * e->efi_memdesc_size));
> > > +
> > > + /*
> > > +  * EFI_BOOT_SERVICES_{CODE|DATA} are avoided because boot
> > > +  * services regions could be accessed after ExitBootServices()
> > > +  * due to the workaround for buggy firmware.
> > > +  */
> > > + if (!(md->type == EFI_LOADER_CODE ||
> > > +   md->type == EFI_LOADER_DATA ||
> > > +   md->type == EFI_CONVENTIONAL_MEMORY))
> > > + continue;
> > 
> > Wouldn't it make more sense to *only* use EFI_CONVENTIONAL_MEMORY?
> > 
> > You can't re-use EFI_LOADER_* regions because the kaslr code is run so
> > early in boot that you've no idea if data the kernel will need is in
> > those EFI_LOADER_* regions.
> > 
> > For example, we pass struct setup_data objects inside of
> > EFI_LOADER_DATA regions.
> 
> It doesn't matter because we have tried to avoid those memory setup_data
> resides in in mem_avoid_overlap(). Here discarding EFI_LOADER_* could
> discard the whole regions while setup_data could occupy small part of
> them.

Hi Matt, Baoquan,

I added these three checks to accept any regions corresponding to
E820_TYPE_RAM except EFI_BOOT_SERVICES_*, just thinking of that it's minimum
surprising.  Baoquan gave a good justification on that, so I'll leave it
as-is in next version.

Thanks,
Naoya Horiguchi

[PATCH v2] f2fs: use spin_{,un}lock_irq{save,restore}

2017-07-06 Thread Chao Yu

From: Chao Yu 

generic/361 reports below warning, this is because: once, there is
someone entering into critical region of sbi.cp_lock, if write_end_io.
f2fs_stop_checkpoint is invoked from an triggered IRQ, we will encounter
deadlock.

So this patch changes to use spin_{,un}lock_irq{save,restore} to create
critical region without IRQ enabled to avoid potential deadlock.

 irq event stamp: 83391573
 loop: Write error at byte offset 438729728, length 1024.
 hardirqs last  enabled at (83391573): [] restore_all+0xf/0x65
 hardirqs last disabled at (83391572): [] 
reschedule_interrupt+0x30/0x3c
 loop: Write error at byte offset 438860288, length 1536.
 softirqs last  enabled at (83389244): [] __do_softirq+0x1ae/0x476
 softirqs last disabled at (83389237): [] 
do_softirq_own_stack+0x2c/0x40
 loop: Write error at byte offset 438990848, length 2048.
 
 WARNING: inconsistent lock state
 4.12.0-rc2+ #30 Tainted: G   O
 
 inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
 xfs_io/7959 [HC1[1]:SC0[0]:HE0:SE1] takes:
  (&(&sbi->cp_lock)->rlock){?.+...}, at: [] 
f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
 {HARDIRQ-ON-W} state was registered at:
   __lock_acquire+0x527/0x7b0
   lock_acquire+0xae/0x220
   _raw_spin_lock+0x42/0x50
   do_checkpoint+0x165/0x9e0 [f2fs]
   write_checkpoint+0x33f/0x740 [f2fs]
   __f2fs_sync_fs+0x92/0x1f0 [f2fs]
   f2fs_sync_fs+0x12/0x20 [f2fs]
   sync_filesystem+0x67/0x80
   generic_shutdown_super+0x27/0x100
   kill_block_super+0x22/0x50
   kill_f2fs_super+0x3a/0x40 [f2fs]
   deactivate_locked_super+0x3d/0x70
   deactivate_super+0x40/0x60
   cleanup_mnt+0x39/0x70
   __cleanup_mnt+0x10/0x20
   task_work_run+0x69/0x80
   exit_to_usermode_loop+0x57/0x85
   do_fast_syscall_32+0x18c/0x1b0
   entry_SYSENTER_32+0x4c/0x7b
 irq event stamp: 1957420
 hardirqs last  enabled at (1957419): [] 
_raw_spin_unlock_irq+0x27/0x50
 hardirqs last disabled at (1957420): [] 
call_function_single_interrupt+0x30/0x3c
 softirqs last  enabled at (1953784): [] __do_softirq+0x1ae/0x476
 softirqs last disabled at (1953773): [] 
do_softirq_own_stack+0x2c/0x40

 other info that might help us debug this:
  Possible unsafe locking scenario:

CPU0

   lock(&(&sbi->cp_lock)->rlock);
   
 lock(&(&sbi->cp_lock)->rlock);

  *** DEADLOCK ***

 2 locks held by xfs_io/7959:
  #0:  (sb_writers#13){.+.+.+}, at: [] vfs_write+0x16a/0x190
  #1:  (&sb->s_type->i_mutex_key#16){+.+.+.}, at: [] 
f2fs_file_write_iter+0x25/0x140 [f2fs]

 stack backtrace:
 CPU: 2 PID: 7959 Comm: xfs_io Tainted: G   O4.12.0-rc2+ #30
 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
 Call Trace:
  dump_stack+0x5f/0x92
  print_usage_bug+0x1d3/0x1dd
  ? check_usage_backwards+0xe0/0xe0
  mark_lock+0x23d/0x280
  __lock_acquire+0x699/0x7b0
  ? __this_cpu_preempt_check+0xf/0x20
  ? trace_hardirqs_off_caller+0x91/0xe0
  lock_acquire+0xae/0x220
  ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  _raw_spin_lock+0x42/0x50
  ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  f2fs_write_end_io+0x147/0x150 [f2fs]
  bio_endio+0x7a/0x1e0
  blk_update_request+0xad/0x410
  blk_mq_end_request+0x16/0x60
  lo_complete_rq+0x3c/0x70
  __blk_mq_complete_request_remote+0x11/0x20
  flush_smp_call_function_queue+0x6d/0x120
  ? debug_smp_processor_id+0x12/0x20
  generic_smp_call_function_single_interrupt+0x12/0x30
  smp_call_function_single_interrupt+0x25/0x40
  call_function_single_interrupt+0x37/0x3c
 EIP: _raw_spin_unlock_irq+0x2d/0x50
 EFLAGS: 0296 CPU: 2
 EAX: 0001 EBX: d2ccc51c ECX: 0001 EDX: c1aacebd
 ESI:  EDI:  EBP: c96c9d1c ESP: c96c9d18
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
  ? inherit_task_group.isra.98.part.99+0x6b/0xb0
  __add_to_page_cache_locked+0x1d4/0x290
  add_to_page_cache_lru+0x38/0xb0
  pagecache_get_page+0x8e/0x200
  f2fs_write_begin+0x96/0xf00 [f2fs]
  ? trace_hardirqs_on_caller+0xdd/0x1c0
  ? current_time+0x17/0x50
  ? trace_hardirqs_on+0xb/0x10
  generic_perform_write+0xa9/0x170
  __generic_file_write_iter+0x1a2/0x1f0
  ? f2fs_preallocate_blocks+0x137/0x160 [f2fs]
  f2fs_file_write_iter+0x6e/0x140 [f2fs]
  ? __lock_acquire+0x429/0x7b0
  __vfs_write+0xc1/0x140
  vfs_write+0x9b/0x190
  SyS_pwrite64+0x63/0xa0
  do_fast_syscall_32+0xa1/0x1b0
  entry_SYSENTER_32+0x4c/0x7b
 EIP: 0xb7786c61
 EFLAGS: 0293 CPU: 2
 EAX: ffda EBX: 0003 ECX: 08416000 EDX: 1000
 ESI: 18b24000 EDI:  EBP: 0003 ESP: bf9b36b0
  DS: 007b ES: 007b FS:  GS: 0033 SS: 007b

Fixes: aaec2b1d1879 ("f2fs: introduce cp_lock to protect updating of 
ckpt_flags")
Cc: sta...@vger.kernel.org
Signed-off-by: Chao Yu 
---
v2: spread spin_{,un}lock_irq{save,restore}.
 fs/f2fs/checkpoint.c | 11 ++-
 fs/f2fs/f2fs.h   | 18 --
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 954917d582f8..56bbf592e487 10064

[PATCH] drm/tegra: sor: add NULL check on of_match_device() return value

2017-07-06 Thread Gustavo A. R. Silva

Check return value from call to of_match_device()
in order to prevent a NULL pointer dereference.

In case of NULL print error message and return.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/gpu/drm/tegra/sor.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/tegra/sor.c b/drivers/gpu/drm/tegra/sor.c
index a8f5289..ca3d9b1 100644
--- a/drivers/gpu/drm/tegra/sor.c
+++ b/drivers/gpu/drm/tegra/sor.c
@@ -2538,6 +2538,10 @@ static int tegra_sor_probe(struct platform_device *pdev)
int err;
 
match = of_match_device(tegra_sor_of_match, &pdev->dev);
+   if (!match) {
+   dev_err(&pdev->dev, "failed to match device\n");
+   return -ENODEV;
+   }
 
sor = devm_kzalloc(&pdev->dev, sizeof(*sor), GFP_KERNEL);
if (!sor)
-- 
2.5.0

Re: [RFC][PATCH] exec: Use init rlimits for setuid exec

2017-07-06 Thread Kees Cook

On Thu, Jul 6, 2017 at 11:02 PM, Linus Torvalds
 wrote:
> So 2+MB is still definitely something people can do (and probably *do* do).

With the default 8MB stack, most people are already limited to 2MB
here. I guess the question is, do people raise their stack rlimit to
gain more arguments? Should I pick a different value for the args?

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH] mm: make allocation counters per-order

2017-07-06 Thread kbuild test robot

Hi Roman,

[auto build test WARNING on mmotm/master]
[also build test WARNING on next-20170706]
[cannot apply to v4.12]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Roman-Gushchin/mm-make-allocation-counters-per-order/20170707-091630
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: s390-default_defconfig (attached as .config)
compiler: s390x-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=s390 

All warnings (new ones prefixed by >>):

   mm/memcontrol.c: In function 'memory_stat_show':
>> mm/memcontrol.c:5262:1: warning: the frame size of 1136 bytes is larger than 
>> 1024 bytes [-Wframe-larger-than=]
}
^

vim +5262 mm/memcontrol.c

f10eb7534 Roman Gushchin  2017-06-30  5246 
events[PGSCAN_DIRECT]);
f10eb7534 Roman Gushchin  2017-06-30  5247  seq_printf(m, "pgsteal %lu\n", 
events[PGSTEAL_KSWAPD] +
f10eb7534 Roman Gushchin  2017-06-30  5248 
events[PGSTEAL_DIRECT]);
f10eb7534 Roman Gushchin  2017-06-30  5249  seq_printf(m, "pgactivate 
%lu\n", events[PGACTIVATE]);
f10eb7534 Roman Gushchin  2017-06-30  5250  seq_printf(m, "pgdeactivate 
%lu\n", events[PGDEACTIVATE]);
f10eb7534 Roman Gushchin  2017-06-30  5251  seq_printf(m, "pglazyfree 
%lu\n", events[PGLAZYFREE]);
f10eb7534 Roman Gushchin  2017-06-30  5252  seq_printf(m, "pglazyfreed 
%lu\n", events[PGLAZYFREED]);
f10eb7534 Roman Gushchin  2017-06-30  5253  
2a2e48854 Johannes Weiner 2017-05-03  5254  seq_printf(m, 
"workingset_refault %lu\n",
71cd31135 Johannes Weiner 2017-05-03  5255 
stat[WORKINGSET_REFAULT]);
2a2e48854 Johannes Weiner 2017-05-03  5256  seq_printf(m, 
"workingset_activate %lu\n",
71cd31135 Johannes Weiner 2017-05-03  5257 
stat[WORKINGSET_ACTIVATE]);
2a2e48854 Johannes Weiner 2017-05-03  5258  seq_printf(m, 
"workingset_nodereclaim %lu\n",
71cd31135 Johannes Weiner 2017-05-03  5259 
stat[WORKINGSET_NODERECLAIM]);
2a2e48854 Johannes Weiner 2017-05-03  5260  
587d9f726 Johannes Weiner 2016-01-20  5261  return 0;
587d9f726 Johannes Weiner 2016-01-20 @5262  }
587d9f726 Johannes Weiner 2016-01-20  5263  
241994ed8 Johannes Weiner 2015-02-11  5264  static struct cftype memory_files[] 
= {
241994ed8 Johannes Weiner 2015-02-11  5265  {
241994ed8 Johannes Weiner 2015-02-11  5266  .name = "current",
f5fc3c5d8 Johannes Weiner 2015-11-05  5267  .flags = 
CFTYPE_NOT_ON_ROOT,
241994ed8 Johannes Weiner 2015-02-11  5268  .read_u64 = 
memory_current_read,
241994ed8 Johannes Weiner 2015-02-11  5269  },
241994ed8 Johannes Weiner 2015-02-11  5270  {

:: The code at line 5262 was first introduced by commit
:: 587d9f726aaec52157e4156e50363dbe6cb82bdb mm: memcontrol: basic memory 
statistics in cgroup2 memory controller

:: TO: Johannes Weiner 
:: CC: Linus Torvalds 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [RFC][PATCH] exec: Use init rlimits for setuid exec

2017-07-06 Thread Linus Torvalds

On Thu, Jul 6, 2017 at 10:45 PM, Kees Cook  wrote:
> On Thu, Jul 6, 2017 at 10:36 PM, Andy Lutomirski  wrote:
>>
>> Aren't there real use cases that use many megs of arguments?
>
> They'd be relatively new since the args were pretty limited before.
> I'd be curious to see them.

"megs" yes. "many megs" no.

The traditional kernel limit was 32 pages (so 128kB on x86, explaining
our MAX_ARG value).

We moved to the much nider "two active VM's at the same time" model a
fairly long time ago, though - it was back in v2.6.23 or so. So about
10 years ago.

I would have expected lots of scripts to have been written since that
just end up going *far* over the old 128kB limit, because it's really
easy to do.

Things like big directories and the shell expanding "*" can easily be
a megabyte of arguments. I know I used to have scripts where I had to
use "xargs" in the past, and with the > 128kB change I just stopped,
because "a couple of megabytes" is enough for a lot of things where
128kB wasn't necessarily.

Oh, one example is actually the kernel source tree. I don't do it any
more (because "git grep" is much better), but I used to do things like

grep something $(find . -name '*.[ch]')

all the time.

And that actually currently *just* overflows the 2MB argument size,
but used to work (easily) ten years ago. Oh, how the kernel has
grown..

Yes, yes, *portably* you should always have done

  find . -print0 -name '*.[ch]' | xargs -0 grep

but be honest now: that first thing is what you actually write when
you do some throw-away one-liner.

So 2+MB is still definitely something people can do (and probably *do* do).

 Linus

Re: [PATCH 2/2] net: ethernet: fsl: add phy reset after clk enable option

2017-07-06 Thread Richard Leitner

On 07/07/2017 07:30 AM, Andy Duan wrote:

From: Richard Leitner  Sent: Thursday, July 06, 
2017 9:06 PM

To: Andy Duan ; robh...@kernel.org;
mark.rutl...@arm.com
Cc: net...@vger.kernel.org; devicet...@vger.kernel.org; linux-
ker...@vger.kernel.org; d...@g0hl1n.net; Richard Leitner

Subject: [PATCH 2/2] net: ethernet: fsl: add phy reset after clk enable option

Some PHYs (for example the LAN8710) doesn't allow turning the clocks off and
on again without reset (according to their datasheet). Exactly this behaviour
was introduced for power saving reasons by commit e8fcfcd5684a
("net: fec: optimize the clock management to save power") Therefore add a
devictree option to perform a PHY reset and configuration after every clock
enable.

For a better understanding here's a outline of the time response of the clock
and reset lines before and after this patch:

  v--fec_probe()  v--fec_enet_open()
  v   v
w/o patch eCLK:
___|
w/o patch nRST: __--
w/o patch CONF:
___XX___

w/  patch eCLK:
___|
w/  patch nRST: __--__--
w/  patch CONF:
___XX__XX___
  ^   ^
  ^--fec_probe()  ^--fec_enet_open()

In our case this problem does occur at about every 10th to 50th POR of an
LAN8710 connected to an i.MX6 SoC. The typical symptom of this problem is a
"swinging" ethernet link. Similar issues were experienced by users of the NXP
forum:
https://community.nxp.com/thread/389902
https://community.nxp.com/message/309354
With this patch applied the issue didn't occur for at least a few hundred PORs
of our board.

Fixes: e8fcfcd5684a ("net: fec: optimize the clock management to sa...")
Signed-off-by: Richard Leitner 

...

*ndev, bool enable)
ret = clk_prepare_enable(fep->clk_ref);
if (ret)
goto failed_clk_ref;
+
+   /* reset the phy after clk is enabled if it's configured */
+   if (fep->phy_reset_after_clk_enable) {
+   ret = fec_reset_phy(ndev);
+   if (ret)
+   goto failed_reset;
+   if (ndev->phydev) {
+   ret = phy_init_hw(ndev->phydev);
+   if (ret)
+   goto failed_reset;
+   }
+   }

Since it is common issue so long as using the PHY, can you move it into smsc 
phy driver like in .smsc_phy_reset() function ?
And get the reset pin from phy dts node.

During my first glance at this problem I had the same "feeling" that it 
should go into smsc.c. Therefore I've tried that already, but I haven't 
found a suitable "place" for that. My basic problem is: Where do I know 
(from smsc.c view) when the enetrefclk was disabled/enabled again 
without changing fec_main.c?

Some more points that come into my mind:
 - The smsc_phy_reset function is registered as "soft_reset". Would it 
be OK to use nRST in it?
 - Would it be OK to call the phy_init_hw function from within the 
smsc_phy_reset?
 - IMHO I'd have to move the reset gpio binding inside the phy node 
then. Isn't that a pretty big change doing that for all PHYs/FECs? Would 
it be "worth" it?

Sorry for these many (maybe noobish) questions, but I'm pretty new to 
the kernels fec/phy stuff ;-)

Andy

Thanks & regards,
Richard.L

[PATCH] mmc: mxcmmc: fix error return code in mxcmci_probe()

2017-07-06 Thread Gustavo A. R. Silva

platform_get_irq() returns an error code, but the mxcmmc driver
ignores it and always returns -EINVAL. This is not correct,
and prevents -EPROBE_DEFER from being propagated properly.

Print error message and propagate the return value of
platform_get_irq on failure.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/mmc/host/mxcmmc.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/mxcmmc.c b/drivers/mmc/host/mxcmmc.c
index fb3ca82..f3c2832 100644
--- a/drivers/mmc/host/mxcmmc.c
+++ b/drivers/mmc/host/mxcmmc.c
@@ -1014,8 +1014,10 @@ static int mxcmci_probe(struct platform_device *pdev)
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
irq = platform_get_irq(pdev, 0);
-   if (irq < 0)
-   return -EINVAL;
+   if (irq < 0) {
+   dev_err(&pdev->dev, "failed to get IRQ: %d\n", irq);
+   return irq;
+   }
 
mmc = mmc_alloc_host(sizeof(*host), &pdev->dev);
if (!mmc)
-- 
2.5.0

Re: [PATCH v2 4/6] cpufreq: schedutil: update CFS util only if used

2017-07-06 Thread Joel Fernandes

On Tue, Jul 4, 2017 at 10:34 AM, Patrick Bellasi
 wrote:
> Currently the utilization of the FAIR class is collected before locking
> the policy. Although that should not be a big issue for most cases, we
> also don't really know how much latency there can be between the
> utilization reading and its usage.
>
> Let's get the FAIR utilization right before its usage to be better in
> sync with the current status of a CPU.
>
> Signed-off-by: Patrick Bellasi 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Rafael J. Wysocki 
> Cc: Viresh Kumar 
> Cc: linux-kernel@vger.kernel.org
> Cc: linux...@vger.kernel.org
> ---
>  kernel/sched/cpufreq_schedutil.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c 
> b/kernel/sched/cpufreq_schedutil.c
> index 98704d8..df433f1 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -308,10 +308,9 @@ static void sugov_update_shared(struct update_util_data 
> *hook, u64 time,
> if (unlikely(current == sg_policy->thread))
> return;
>
> -   sugov_get_util(&util, &max);
> -
> raw_spin_lock(&sg_policy->update_lock);
>
> +   sugov_get_util(&util, &max);
> sg_cpu->util = util;
> sg_cpu->max = max;

Is your concern that there will we spinlock contention before calling
sugov_get_util?

If that's the case, with your patch it seems to me such contention
(and hence spinning) itself could cause the utilization to be inflated
- thus calling sugov_get_util after acquiring the lock will not be as
accurate as before. In that case it seems to me its better to let
sugov_get_util be called before acquiring the lock (as before).

thanks,

-Joel

Re: [RFC v2 3/5] hmem: add heterogeneous memory sysfs support

2017-07-06 Thread John Hubbard

On 07/06/2017 02:52 PM, Ross Zwisler wrote:
[...]
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index b1aacfc..31e3f20 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -72,6 +72,7 @@ obj-$(CONFIG_ACPI_PROCESSOR)+= processor.o
>  obj-$(CONFIG_ACPI)   += container.o
>  obj-$(CONFIG_ACPI_THERMAL)   += thermal.o
>  obj-$(CONFIG_ACPI_NFIT)  += nfit/
> +obj-$(CONFIG_ACPI_HMEM)  += hmem/
>  obj-$(CONFIG_ACPI)   += acpi_memhotplug.o
>  obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o
>  obj-$(CONFIG_ACPI_BATTERY)   += battery.o

Hi Ross,

Following are a series of suggestions, intended to clarify naming just
enough so that, when Jerome's HMM patchset lands, we'll be able to
tell the difference between the two types of Heterogeneous Memory.


> diff --git a/drivers/acpi/hmem/Kconfig b/drivers/acpi/hmem/Kconfig
> new file mode 100644
> index 000..09282be
> --- /dev/null
> +++ b/drivers/acpi/hmem/Kconfig
> @@ -0,0 +1,7 @@
> +config ACPI_HMEM
> + bool "ACPI Heterogeneous Memory Support"

How about:

   bool "ACPI Heterogeneous Memory Attribute Table Support"

The idea here, and throughout, is that this type of 
Heterogeneous Memory support is all about "the Heterogeneous Memory
that you found via ACPI's Heterogeneous Memory Attribute Table".

That's different from "the Heterogeneous Memory that you found
when you installed a PCIe device that supports HMM". Or, at least
it is different, until the day that someone decides to burn in
support for an HMM device, into the ACPI tables. Seems unlikely,
though. :) And even so, I think it would still work.


> + depends on ACPI_NUMA
> + depends on SYSFS
> + help
> +   Exports a sysfs representation of the ACPI Heterogeneous Memory
> +   Attributes Table (HMAT).
> diff --git a/drivers/acpi/hmem/Makefile b/drivers/acpi/hmem/Makefile
> new file mode 100644
> index 000..d2aa546
> --- /dev/null
> +++ b/drivers/acpi/hmem/Makefile
> @@ -0,0 +1,2 @@
> +obj-$(CONFIG_ACPI_HMEM) := hmem.o
> +hmem-y := core.o initiator.o target.o
> diff --git a/drivers/acpi/hmem/core.c b/drivers/acpi/hmem/core.c
> new file mode 100644
> index 000..f7638db
> --- /dev/null
> +++ b/drivers/acpi/hmem/core.c
> @@ -0,0 +1,569 @@
> +/*
> + * Heterogeneous memory representation in sysfs

Heterogeneous memory, as discovered via ACPI's Heterogeneous Memory
Attribute Table: representation in sysfs

> + *
> + * Copyright (c) 2017, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + */
> +
> +#include 

[...]

> diff --git a/drivers/acpi/hmem/hmem.h b/drivers/acpi/hmem/hmem.h
> new file mode 100644
> index 000..38ff540
> --- /dev/null
> +++ b/drivers/acpi/hmem/hmem.h
> @@ -0,0 +1,47 @@
> +/*
> + * Heterogeneous memory representation in sysfs

Heterogeneous memory, as discovered via ACPI's Heterogeneous Memory
Attribute Table: representation in sysfs

> + *
> + * Copyright (c) 2017, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + */
> +
> +#ifndef _ACPI_HMEM_H_
> +#define _ACPI_HMEM_H_
> +
> +struct memory_initiator {
> + struct list_head list;
> + struct device dev;
> +
> + /* only one of the following three will be set */
> + struct acpi_srat_cpu_affinity *cpu;
> + struct acpi_srat_x2apic_cpu_affinity *x2apic;
> + struct acpi_srat_gicc_affinity *gicc;
> +
> + int pxm;
> + bool is_registered;
> +};
> +#define to_memory_initiator(d) container_of((d), struct memory_initiator, 
> dev)
> +
> +struct memory_target {
> + struct list_head list;
> + struct device dev;
> + struct acpi_srat_mem_affinity *ma;
> + struct acpi_hmat_address_range *spa;
> + struct memory_initiator *local_init;
> +
> + bool is_cached;
> + bool is_registered;
> +};
> +#define to_memory_target(d) container_of((d), struct memory_target, dev)
> +
> +extern const struct attribute_group *memory_initiator_attribute_groups[];
> +extern const struct attribute_group *memory_target_attribute_groups[];
> +#endif /* _ACPI_HMEM_H_ */
> diff --git a/drivers/acpi/hmem/initiator.c b/drivers/acpi/hmem/i

[PATCH] drm/udl: Make page_flip asynchronous

2017-07-06 Thread Dawid Kurek

In page_flip vblank is sent with no delay. Driver does not know when the
actual update is present on the display and has no means for getting
this information from a device. It is practically impossible to say
exactly *when* as there is also i.e. a usb delay.

When we are unable to determine when the vblank actually happens we may
assume it will behave accordingly, i.e. it will present frames with
proper timing. In the worst case scenario it should take up to duration
of one frame (we may get new frame in the device just after presenting
current one so we would need to wait for the whole frame).

Because of the asynchronous nature of the delay we need to synchronize:
 * read/write vrefresh/page_flip data when changing mode and
   preparing/executing vblank
 * USB requests to prevent interleaved access to URBs for two different
   frame buffers

All those changes are backports from ChromeOS:
  1. https://chromium-review.googlesource.com/250622
  2. https://chromium-review.googlesource.com/249450
  partially, only change in udl_modeset.c for 'udl_flip_queue'
  3. https://chromium-review.googlesource.com/321378
  4. https://chromium-review.googlesource.com/324119
+ fixes for checkpatch and latest drm changes

Cc: h...@chromium.org
Cc: marc...@chromium.org
Cc: za...@chromium.org
Cc: db...@google.com
Signed-off-by: Dawid Kurek 
---
 drivers/gpu/drm/udl/udl_drv.h |   4 +
 drivers/gpu/drm/udl/udl_fb.c  |  28 ---
 drivers/gpu/drm/udl/udl_main.c|   3 +
 drivers/gpu/drm/udl/udl_modeset.c | 160 ++
 4 files changed, 171 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/udl/udl_drv.h b/drivers/gpu/drm/udl/udl_drv.h
index 2a75ab8..b93fb8d 100644
--- a/drivers/gpu/drm/udl/udl_drv.h
+++ b/drivers/gpu/drm/udl/udl_drv.h
@@ -47,6 +47,7 @@ struct urb_list {
 };
 
 struct udl_fbdev;
+struct udl_flip_queue;
 
 struct udl_device {
struct device *dev;
@@ -66,6 +67,9 @@ struct udl_device {
atomic_t bytes_identical; /* saved effort with backbuffer comparison */
atomic_t bytes_sent; /* to usb, after compression including overhead */
atomic_t cpu_kcycles_used; /* transpired during pixel processing */
+
+   struct udl_flip_queue *flip_queue;
+   struct mutex transfer_lock; /* to serialize transfers */
 };
 
 struct udl_gem_object {
diff --git a/drivers/gpu/drm/udl/udl_fb.c b/drivers/gpu/drm/udl/udl_fb.c
index 4a65003..6dd9a49 100644
--- a/drivers/gpu/drm/udl/udl_fb.c
+++ b/drivers/gpu/drm/udl/udl_fb.c
@@ -82,7 +82,7 @@ int udl_handle_damage(struct udl_framebuffer *fb, int x, int 
y,
 {
struct drm_device *dev = fb->base.dev;
struct udl_device *udl = dev->dev_private;
-   int i, ret;
+   int i, ret = 0;
char *cmd;
cycles_t start_cycles, end_cycles;
int bytes_sent = 0;
@@ -91,18 +91,22 @@ int udl_handle_damage(struct udl_framebuffer *fb, int x, 
int y,
int aligned_x;
int bpp = fb->base.format->cpp[0];
 
+   mutex_lock(&udl->transfer_lock);
+
if (!fb->active_16)
-   return 0;
+   goto out;
 
if (!fb->obj->vmapping) {
ret = udl_gem_vmap(fb->obj);
if (ret == -ENOMEM) {
DRM_ERROR("failed to vmap fb\n");
-   return 0;
+   ret = 0;
+   goto out;
}
if (!fb->obj->vmapping) {
DRM_ERROR("failed to vmapping\n");
-   return 0;
+   ret = 0;
+   goto out;
}
}
 
@@ -112,14 +116,18 @@ int udl_handle_damage(struct udl_framebuffer *fb, int x, 
int y,
 
if ((width <= 0) ||
(x + width > fb->base.width) ||
-   (y + height > fb->base.height))
-   return -EINVAL;
+   (y + height > fb->base.height)) {
+   ret = -EINVAL;
+   goto out;
+   }
 
start_cycles = get_cycles();
 
urb = udl_get_urb(dev);
-   if (!urb)
-   return 0;
+   if (!urb) {
+   ret = 0;
+   goto out;
+   }
cmd = urb->transfer_buffer;
 
for (i = y; i < y + height ; i++) {
@@ -151,7 +159,9 @@ int udl_handle_damage(struct udl_framebuffer *fb, int x, 
int y,
>> 10)), /* Kcycles */
   &udl->cpu_kcycles_used);
 
-   return 0;
+out:
+   mutex_unlock(&udl->transfer_lock);
+   return ret;
 }
 
 static int udl_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
diff --git a/drivers/gpu/drm/udl/udl_main.c b/drivers/gpu/drm/udl/udl_main.c
index a9d93b8..d376233 100644
--- a/drivers/gpu/drm/udl/udl_main.c
+++ b/drivers/gpu/drm/udl/udl_main.c
@@ -319,6 +319,8 @@ int udl_driver_load(struct drm_device *dev, unsigned long 
flags)
if (!udl)
return -ENOMEM;
 
+   mutex_init(&udl->transfer_lock);
+
udl->udev = ud

Re: [RFC][PATCH] exec: Use init rlimits for setuid exec

2017-07-06 Thread Kees Cook

On Thu, Jul 6, 2017 at 10:39 PM, Linus Torvalds
 wrote:
> On Thu, Jul 6, 2017 at 10:15 PM, Kees Cook  wrote:
>>
>> I always say this backwards. :P Default is top-down (allocate at high
>> addresses and work down toward low). With unlimited stack, allocations
>> start at low addresses and work up. Here's the results (shown with
>> randomize_va_space sysctl set to 0):
>
> But this doesn't affect the stack layout itself.
>
> So we could do the stack copying without much caring, because that
> happens first, right?
>
> So I think we can do all the envp/argv copying first, and then - as we
> change the credentials, change the rlimit. And the string copies
> wouldn't need to care much - although I guess they are also fine
> checking against a possible *smaller* stack rlimit, which is actually
> what we'd want.

Yup, agreed.

> And I think the credentials switch (which is the point of no return
> anyway) happens before we start mmap'ing the executable etc. We used
> to have some odd code there and do it in the completely wrong order
> (checking that the binary was executable for the *old* user, which
> makes no sense, iirc)

Yeah, it all happens in setup_new_exec(). The first thing is layout
selection, then switching credentials. It could be made to take a hint
from GNU_STACK (which was parsed before setup_new_exec() is called),
check security_bprm_secureexec() and then make the rlimit changes, all
before the layout selection.

> So I'm getting the sense that none of this should be a problem.
>
> But it's entirely possible that I missed something, and am just full
> of shit. Our execve() path has traditionally been very hard to read.
> It's actually gotten a bit better, but the whole "jump back and forth
> between the generic fs/exec.c code and the binfmt code" is certainly
> still there.

Yeah, there might be something else lurking here, but so far, I'm
satisfied this will work too.

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v2] ASoC: samsung: i2s: Support more resolution rates

2017-07-06 Thread Krzysztof Kozlowski

On Fri, Jul 07, 2017 at 10:31:10AM +0900, Jaechul Lee wrote:
> This driver can support more frequencies over 96KHz. There are no reasons
> to limit the frequency range below 96KHz. If codecs/amps or something else
> can't support higher resolution rates, the constraints would be set rates
> properly because each drivers have its own limits.
> 
> I added the 'pcm_rates' field to the dai_data to be set rates by the
> compatibilities. As a result, rates will be set each devices respectively.
> For example of exynos5433, rates will be set from 8KHz to 192KHz.

Code looks okay but commtit message could be still improved. Simply, the
goal of this change is to add 192 kHz ferquency to Exynos5433 and
Exynos7 DAI.

Reviewed-by: Krzysztof Kozlowski 

Best regards,
Krzysztof


> 
> Signed-off-by: Jaechul Lee 
> ---
> v2:
>  - changed the name of variable to pcm_rates. 
>  
>  - removed duplicated code.
>  - modified commit message.
> ---
>  sound/soc/samsung/i2s.c | 20 +---
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/sound/soc/samsung/i2s.c b/sound/soc/samsung/i2s.c
> index af3ba4d4ccc5..c9f87f7bae99 100644
> --- a/sound/soc/samsung/i2s.c
> +++ b/sound/soc/samsung/i2s.c
> @@ -50,6 +50,7 @@ struct samsung_i2s_variant_regs {
>  
>  struct samsung_i2s_dai_data {
>   u32 quirks;
> + unsigned int pcm_rates;
>   const struct samsung_i2s_variant_regs *i2s_variant_regs;
>  };
>  
> @@ -1076,13 +1077,13 @@ static const struct snd_soc_component_driver 
> samsung_i2s_component = {
>   .name   = "samsung-i2s",
>  };
>  
> -#define SAMSUNG_I2S_RATESSNDRV_PCM_RATE_8000_96000
> -
>  #define SAMSUNG_I2S_FMTS (SNDRV_PCM_FMTBIT_S8 | \
>   SNDRV_PCM_FMTBIT_S16_LE | \
>   SNDRV_PCM_FMTBIT_S24_LE)
>  
> -static struct i2s_dai *i2s_alloc_dai(struct platform_device *pdev, bool sec)
> +static struct i2s_dai *i2s_alloc_dai(struct platform_device *pdev,
> + const struct samsung_i2s_dai_data *i2s_dai_data,
> + bool sec)
>  {
>   struct i2s_dai *i2s;
>  
> @@ -1101,13 +1102,13 @@ static struct i2s_dai *i2s_alloc_dai(struct 
> platform_device *pdev, bool sec)
>   i2s->i2s_dai_drv.resume = i2s_resume;
>   i2s->i2s_dai_drv.playback.channels_min = 1;
>   i2s->i2s_dai_drv.playback.channels_max = 2;
> - i2s->i2s_dai_drv.playback.rates = SAMSUNG_I2S_RATES;
> + i2s->i2s_dai_drv.playback.rates = i2s_dai_data->pcm_rates;
>   i2s->i2s_dai_drv.playback.formats = SAMSUNG_I2S_FMTS;
>  
>   if (!sec) {
>   i2s->i2s_dai_drv.capture.channels_min = 1;
>   i2s->i2s_dai_drv.capture.channels_max = 2;
> - i2s->i2s_dai_drv.capture.rates = SAMSUNG_I2S_RATES;
> + i2s->i2s_dai_drv.capture.rates = i2s_dai_data->pcm_rates;
>   i2s->i2s_dai_drv.capture.formats = SAMSUNG_I2S_FMTS;
>   }
>   return i2s;
> @@ -1242,7 +1243,7 @@ static int samsung_i2s_probe(struct platform_device 
> *pdev)
>   i2s_dai_data = (struct samsung_i2s_dai_data *)
>   platform_get_device_id(pdev)->driver_data;
>  
> - pri_dai = i2s_alloc_dai(pdev, false);
> + pri_dai = i2s_alloc_dai(pdev, i2s_dai_data, false);
>   if (!pri_dai) {
>   dev_err(&pdev->dev, "Unable to alloc I2S_pri\n");
>   return -ENOMEM;
> @@ -1316,7 +1317,7 @@ static int samsung_i2s_probe(struct platform_device 
> *pdev)
>   goto err_disable_clk;
>  
>   if (quirks & QUIRK_SEC_DAI) {
> - sec_dai = i2s_alloc_dai(pdev, true);
> + sec_dai = i2s_alloc_dai(pdev, i2s_dai_data, true);
>   if (!sec_dai) {
>   dev_err(&pdev->dev, "Unable to alloc I2S_sec\n");
>   ret = -ENOMEM;
> @@ -1452,29 +1453,34 @@ static const struct samsung_i2s_variant_regs 
> i2sv5_i2s1_regs = {
>  
>  static const struct samsung_i2s_dai_data i2sv3_dai_type = {
>   .quirks = QUIRK_NO_MUXPSR,
> + .pcm_rates = SNDRV_PCM_RATE_8000_96000,
>   .i2s_variant_regs = &i2sv3_regs,
>  };
>  
>  static const struct samsung_i2s_dai_data i2sv5_dai_type = {
>   .quirks = QUIRK_PRI_6CHAN | QUIRK_SEC_DAI | QUIRK_NEED_RSTCLR |
>   QUIRK_SUPPORTS_IDMA,
> + .pcm_rates = SNDRV_PCM_RATE_8000_96000,
>   .i2s_variant_regs = &i2sv3_regs,
>  };
>  
>  static const struct samsung_i2s_dai_data i2sv6_dai_type = {
>   .quirks = QUIRK_PRI_6CHAN | QUIRK_SEC_DAI | QUIRK_NEED_RSTCLR |
>   QUIRK_SUPPORTS_TDM | QUIRK_SUPPORTS_IDMA,
> + .pcm_rates = SNDRV_PCM_RATE_8000_96000,
>   .i2s_variant_regs = &i2sv6_regs,
>  };
>  
>  static const struct samsung_i2s_dai_data i2sv7_dai_type = {
>   .quirks = QUIRK_PRI_6CHAN | QUIRK_SEC_DAI | QUIRK_NEED_RSTCLR |
>   QUIRK_SUPPORTS_TDM,
> + .pcm_rates = SNDRV_PCM_RATE_8000_1

Re: [RFC][PATCH] exec: Use init rlimits for setuid exec

2017-07-06 Thread Kees Cook

On Thu, Jul 6, 2017 at 10:36 PM, Andy Lutomirski  wrote:
> On Thu, Jul 6, 2017 at 10:15 PM, Kees Cook  wrote:
>> On Thu, Jul 6, 2017 at 10:10 PM, Kees Cook  wrote:
>>> On Thu, Jul 6, 2017 at 9:48 PM, Andy Lutomirski  wrote:
 How about a much simpler solution: don't read rlimit at all in
 copy_strings(), let alone try to enforce it.  Instead, just before the
 point of no return, check how much stack space is already used and, if
 it's more than an appropriate threshold (e.g. 1/4 of the rlimit),
 abort.  Sure, this adds overhead if we're going to abort, but does
 that really matter?
>>>
>>> We should avoid using up tons of memory and then failing. Better to
>>> cap it as we use it. Plumbing a sane value into this shouldn't be hard
>>> at all. Just making this a hardcoded 2MB seems sane (1/4 of 8MB).
>
> Aren't there real use cases that use many megs of arguments?

They'd be relatively new since the args were pretty limited before.
I'd be curious to see them.

> We could probably get away with saying max(rlimit(RLIMIT_STACK), 2MB)
> as long as we make sure later on that we don't screw up if we've
> overallocated?

min, not max, but yeah. Here's part of what I have for get_arg_page():

rlim = current->signal->rlim;
-   if (size > READ_ONCE(rlim[RLIMIT_STACK].rlim_cur) / 4)
+   arg_stack = READ_ONCE(rlim[RLIMIT_STACK].rlim_cur);
+   arg_stack = min_t(unsigned long, arg_stack, _STK_LIM) / 4;
+   if (size > arg_stack)
goto fail;

>>> IIUC, this is a big deal on 32-bit. Unlimited stack triggers top-down
>>> mmap instead of bottom-up. I mean, I'd be delighted to get rid of
>>> this, but I thought it was relied on by userspace.
>>
>> I always say this backwards. :P Default is top-down (allocate at high
>> addresses and work down toward low). With unlimited stack, allocations
>> start at low addresses and work up. Here's the results (shown with
>> randomize_va_space sysctl set to 0):
>
> Uhh, crikey!  Where's the code that does that?

That was the call path I quoted earlier:

> The stack rlimit defines the mmap layout too:
>
> do_execveat_common() ->
> exec_binprm() ->
> search_binary_handler() ->
> fmt->load_binary (load_elf_binary()) ->
> setup_new_exec() ->
> arch_pick_mmap_layout() ->
> mmap_is_legacy() ->
> rlimit(RLIMIT_STACK) == RLIM_INFINITY

i.e. arch_pick_mmap_layout().

-Kees

-- 
Kees Cook
Pixel Security

[PATCH] staging: rtl8712: fix "Alignment match open parenthesis"

2017-07-06 Thread Arushi Singhal

Fix checkpatch issues: "CHECK: Alignment should match open parenthesis".

Signed-off-by: Arushi Singhal 
---
 drivers/staging/rtl8712/mlme_linux.c| 4 ++--
 drivers/staging/rtl8712/rtl8712_cmd.c   | 2 +-
 drivers/staging/rtl8712/rtl8712_efuse.c | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/rtl8712/mlme_linux.c 
b/drivers/staging/rtl8712/mlme_linux.c
index 2037265..a077069 100644
--- a/drivers/staging/rtl8712/mlme_linux.c
+++ b/drivers/staging/rtl8712/mlme_linux.c
@@ -111,8 +111,8 @@ void r8712_os_indicate_disconnect(struct _adapter *adapter)
 */
 
memcpy(&backupPMKIDList[0],
-   &adapter->securitypriv.PMKIDList[0],
-   sizeof(struct RT_PMKID_LIST) * NUM_PMKID_CACHE);
+  &adapter->securitypriv.PMKIDList[0],
+  sizeof(struct RT_PMKID_LIST) * NUM_PMKID_CACHE);
backupPMKIDIndex = adapter->securitypriv.PMKIDIndex;
backupTKIPCountermeasure =
adapter->securitypriv.btkip_countermeasure;
diff --git a/drivers/staging/rtl8712/rtl8712_cmd.c 
b/drivers/staging/rtl8712/rtl8712_cmd.c
index 5346c65..0104ace 100644
--- a/drivers/staging/rtl8712/rtl8712_cmd.c
+++ b/drivers/staging/rtl8712/rtl8712_cmd.c
@@ -385,7 +385,7 @@ int r8712_cmd_thread(void *context)
if (blnPending)
wr_sz += 8;   /* Append 8 bytes */
r8712_write_mem(padapter, RTL8712_DMA_H2CCMD, wr_sz,
-  (u8 *)pdesc);
+   (u8 *)pdesc);
pcmdpriv->cmd_seq++;
if (pcmd->cmdcode == GEN_CMD_CODE(_CreateBss)) {
pcmd->res = H2C_SUCCESS;
diff --git a/drivers/staging/rtl8712/rtl8712_efuse.c 
b/drivers/staging/rtl8712/rtl8712_efuse.c
index 205298e..d90213e 100644
--- a/drivers/staging/rtl8712/rtl8712_efuse.c
+++ b/drivers/staging/rtl8712/rtl8712_efuse.c
@@ -347,7 +347,7 @@ static u8 fix_header(struct _adapter *padapter, u8 header, 
u16 header_addr)
ret = false;
if (value == 0xFF) /* write again */
efuse_one_byte_write(padapter, addr,
-   pkt.data[i * 2]);
+pkt.data[i * 2]);
}
if (!efuse_one_byte_read(padapter, addr + 1, &value)) {
ret = false;
-- 
2.7.4

Re: [RFC][PATCH] exec: Use init rlimits for setuid exec

2017-07-06 Thread Linus Torvalds

On Thu, Jul 6, 2017 at 10:15 PM, Kees Cook  wrote:
>
> I always say this backwards. :P Default is top-down (allocate at high
> addresses and work down toward low). With unlimited stack, allocations
> start at low addresses and work up. Here's the results (shown with
> randomize_va_space sysctl set to 0):

But this doesn't affect the stack layout itself.

So we could do the stack copying without much caring, because that
happens first, right?

So I think we can do all the envp/argv copying first, and then - as we
change the credentials, change the rlimit. And the string copies
wouldn't need to care much - although I guess they are also fine
checking against a possible *smaller* stack rlimit, which is actually
what we'd want.

And I think the credentials switch (which is the point of no return
anyway) happens before we start mmap'ing the executable etc. We used
to have some odd code there and do it in the completely wrong order
(checking that the binary was executable for the *old* user, which
makes no sense, iirc)

So I'm getting the sense that none of this should be a problem.

But it's entirely possible that I missed something, and am just full
of shit. Our execve() path has traditionally been very hard to read.
It's actually gotten a bit better, but the whole "jump back and forth
between the generic fs/exec.c code and the binfmt code" is certainly
still there.

 Linus

Re: [RFC][PATCH] exec: Use init rlimits for setuid exec

2017-07-06 Thread Andy Lutomirski

On Thu, Jul 6, 2017 at 10:15 PM, Kees Cook  wrote:
> On Thu, Jul 6, 2017 at 10:10 PM, Kees Cook  wrote:
>> On Thu, Jul 6, 2017 at 9:48 PM, Andy Lutomirski  wrote:
>>> How about a much simpler solution: don't read rlimit at all in
>>> copy_strings(), let alone try to enforce it.  Instead, just before the
>>> point of no return, check how much stack space is already used and, if
>>> it's more than an appropriate threshold (e.g. 1/4 of the rlimit),
>>> abort.  Sure, this adds overhead if we're going to abort, but does
>>> that really matter?
>>
>> We should avoid using up tons of memory and then failing. Better to
>> cap it as we use it. Plumbing a sane value into this shouldn't be hard
>> at all. Just making this a hardcoded 2MB seems sane (1/4 of 8MB).

Aren't there real use cases that use many megs of arguments?

We could probably get away with saying max(rlimit(RLIMIT_STACK), 2MB)
as long as we make sure later on that we don't screw up if we've
overallocated?

>>
>>> I don't see why using rlimit for layout control makes any sense
>>> whatsoever.  Is there some historical reason we need that?  As far as
>>> I can see (on insufficient inspection) is that the kernel is trying to
>>> guarantee that, if we have so much arg crap that our remaining stack
>>> is less than 128k, then we don't exceed our limit by a little bit.
>>
>> IIUC, this is a big deal on 32-bit. Unlimited stack triggers top-down
>> mmap instead of bottom-up. I mean, I'd be delighted to get rid of
>> this, but I thought it was relied on by userspace.
>
> I always say this backwards. :P Default is top-down (allocate at high
> addresses and work down toward low). With unlimited stack, allocations
> start at low addresses and work up. Here's the results (shown with
> randomize_va_space sysctl set to 0):

Uhh, crikey!  Where's the code that does that?

[PATCH] regulator: qcom_smd: add NULL check on of_match_device() return value

2017-07-06 Thread Gustavo A. R. Silva

Check return value from call to of_match_device()
in order to prevent a NULL pointer dereference.

In case of NULL print error message and return.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/regulator/qcom_smd-regulator.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/regulator/qcom_smd-regulator.c 
b/drivers/regulator/qcom_smd-regulator.c
index f35994a..940fe1b 100644
--- a/drivers/regulator/qcom_smd-regulator.c
+++ b/drivers/regulator/qcom_smd-regulator.c
@@ -570,6 +570,11 @@ static int rpm_reg_probe(struct platform_device *pdev)
}
 
match = of_match_device(rpm_of_match, &pdev->dev);
+   if (!match) {
+   dev_err(&pdev->dev, "failed to match device\n");
+   return -ENODEV;
+   }
+
for (reg = match->data; reg->name; reg++) {
vreg = devm_kzalloc(&pdev->dev, sizeof(*vreg), GFP_KERNEL);
if (!vreg)
-- 
2.5.0

Re: [RFC v2 0/5] surface heterogeneous memory performance information

2017-07-06 Thread John Hubbard

On 07/06/2017 02:52 PM, Ross Zwisler wrote:
[...]
> 
> The naming collision between Jerome's "Heterogeneous Memory Management
> (HMM)" and this "Heterogeneous Memory (HMEM)" series is unfortunate, but I
> was trying to stick with the word "Heterogeneous" because of the naming of
> the ACPI 6.2 Heterogeneous Memory Attribute Table table.  Suggestions for
> better naming are welcome.
> 

Hi Ross,

Say, most of the places (file names, function and variable names, and even
print statements) where this patchset uses hmem or HMEM, it really seems to
mean, the Heterogeneous Memory Attribute Table. That's not *always* true, but
given that it's a pretty severe naming conflict, how about just changing:

hmem --> hmat
HMEM --> HMAT

...everywhere? Then you still have Heterogeneous Memory in the name, but
there is enough lexical distance (is that a thing? haha) between HMM and HMAT
to keep us all sane. :)

With or without the above suggestion, there are a few places (Kconfig, comments,
prints) where we can more easily make it clear that HMM != HMEM (or HMAT), 
so for those I can just comment on them separately in the individual patches.

thanks,
john h

[PATCH] regulator: qcom_rpm-regulator: add NULL check on of_match_device() return value

2017-07-06 Thread Gustavo A. R. Silva

Check return value from call to of_match_device()
in order to prevent a NULL pointer dereference.

In case of NULL print error message and return.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/regulator/qcom_rpm-regulator.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/regulator/qcom_rpm-regulator.c 
b/drivers/regulator/qcom_rpm-regulator.c
index 1b2acc4..88dc0b0 100644
--- a/drivers/regulator/qcom_rpm-regulator.c
+++ b/drivers/regulator/qcom_rpm-regulator.c
@@ -959,6 +959,11 @@ static int rpm_reg_probe(struct platform_device *pdev)
}
 
match = of_match_device(rpm_of_match, &pdev->dev);
+   if (!match) {
+   dev_err(&pdev->dev, "failed to match device\n");
+   return -ENODEV;
+   }
+
for (reg = match->data; reg->name; reg++) {
vreg = devm_kmalloc(&pdev->dev, sizeof(*vreg), GFP_KERNEL);
if (!vreg)
-- 
2.5.0

RE: [PATCH 2/2] net: ethernet: fsl: add phy reset after clk enable option

2017-07-06 Thread Andy Duan

From: Richard Leitner  Sent: Thursday, July 06, 
2017 9:06 PM
>To: Andy Duan ; robh...@kernel.org;
>mark.rutl...@arm.com
>Cc: net...@vger.kernel.org; devicet...@vger.kernel.org; linux-
>ker...@vger.kernel.org; d...@g0hl1n.net; Richard Leitner
>
>Subject: [PATCH 2/2] net: ethernet: fsl: add phy reset after clk enable option
>
>Some PHYs (for example the LAN8710) doesn't allow turning the clocks off and
>on again without reset (according to their datasheet). Exactly this behaviour
>was introduced for power saving reasons by commit e8fcfcd5684a
>("net: fec: optimize the clock management to save power") Therefore add a
>devictree option to perform a PHY reset and configuration after every clock
>enable.
>
>For a better understanding here's a outline of the time response of the clock
>and reset lines before and after this patch:
>
> v--fec_probe()  v--fec_enet_open()
> v   v
>   w/o patch eCLK:
>___|
>   w/o patch nRST: __--
>   w/o patch CONF:
>___XX___
>
>   w/  patch eCLK:
>___|
>   w/  patch nRST: __--__--
>   w/  patch CONF:
>___XX__XX___
> ^   ^
> ^--fec_probe()  ^--fec_enet_open()
>
>In our case this problem does occur at about every 10th to 50th POR of an
>LAN8710 connected to an i.MX6 SoC. The typical symptom of this problem is a
>"swinging" ethernet link. Similar issues were experienced by users of the NXP
>forum:
>   https://community.nxp.com/thread/389902
>   https://community.nxp.com/message/309354
>With this patch applied the issue didn't occur for at least a few hundred PORs
>of our board.
>
>Fixes: e8fcfcd5684a ("net: fec: optimize the clock management to sa...")
>Signed-off-by: Richard Leitner 
>---
> Documentation/devicetree/bindings/net/fsl-fec.txt |  3 +++
> drivers/net/ethernet/freescale/fec.h  |  1 +
> drivers/net/ethernet/freescale/fec_main.c | 16 
> 3 files changed, 20 insertions(+)
>
>diff --git a/Documentation/devicetree/bindings/net/fsl-fec.txt
>b/Documentation/devicetree/bindings/net/fsl-fec.txt
>index 6f55bdd..1766579 100644
>--- a/Documentation/devicetree/bindings/net/fsl-fec.txt
>+++ b/Documentation/devicetree/bindings/net/fsl-fec.txt
>@@ -23,6 +23,9 @@ Optional properties:
> - phy-handle : phandle to the PHY device connected to this device.
> - fixed-link : Assume a fixed link. See fixed-link.txt in the same directory.
>   Use instead of phy-handle.
>+- phy-reset-after-clk-enable : If present then a phy reset and
>+configuration
>+  will be performed everytime after the clocks are enabled. This is
>+required
>+  for some PHYs to work properly.
> - fsl,num-tx-queues : The property is valid for enet-avb IP, which supports
>   hw multi queues. Should specify the tx queue number, otherwise set tx
>queue
>   number to 1.
>diff --git a/drivers/net/ethernet/freescale/fec.h
>b/drivers/net/ethernet/freescale/fec.h
>index 3da8084..d4f658a 100644
>--- a/drivers/net/ethernet/freescale/fec.h
>+++ b/drivers/net/ethernet/freescale/fec.h
>@@ -538,6 +538,7 @@ struct fec_enet_private {
>   int phy_reset_duration;
>   int phy_reset_post_delay;
>   boolphy_reset_active_high;
>+  boolphy_reset_after_clk_enable;
>
>   struct  napi_struct napi;
>   int csum_flags;
>diff --git a/drivers/net/ethernet/freescale/fec_main.c
>b/drivers/net/ethernet/freescale/fec_main.c
>index cbbf7b8..4e744f3 100644
>--- a/drivers/net/ethernet/freescale/fec_main.c
>+++ b/drivers/net/ethernet/freescale/fec_main.c
>@@ -68,6 +68,7 @@
>
> static void set_multicast_list(struct net_device *ndev);  static void
>fec_enet_itr_coal_init(struct net_device *ndev);
>+static int fec_reset_phy(struct net_device *ndev);
>
> #define DRIVER_NAME   "fec"
>
>@@ -1862,6 +1863,18 @@ static int fec_enet_clk_enable(struct net_device
>*ndev, bool enable)
>   ret = clk_prepare_enable(fep->clk_ref);
>   if (ret)
>   goto failed_clk_ref;
>+
>+  /* reset the phy after clk is enabled if it's configured */
>+  if (fep->phy_reset_after_clk_enable) {
>+  ret = fec_reset_phy(ndev);
>+  if (ret)
>+  goto failed_reset;
>+  if (ndev->phydev) {
>+  ret = phy_init_hw(ndev->phydev);
>+  if (ret)
>+  goto failed_reset;
>+  }
>+  }

Since it is common issue so long as using the PHY, can you move it into smsc 
phy driver like in .smsc_phy_reset() function ?
And get the reset

Re: [PATCH v2 3/6] cpufreq: schedutil: ensure max frequency while running RT/DL tasks

2017-07-06 Thread Joel Fernandes

Hi,

On Wed, Jul 5, 2017 at 6:41 AM, Patrick Bellasi  wrote:
[..]
>
>> But what about clearing the sched-class's flag from .pick_next_task() 
>> callback
>> when they return NULL ?
>>
>> What about something like this instead (completely untested), with which we
>> don't need the 2/3 patch as well:
>
> Just had a fast check but I think something like that can work.
> We had an internal discussion with Brendan (in CC now) which had a
> similar proposal.
>
> Main counter arguments for me was:
> 1. we wanna to reduce the pollution of scheduling classes code with
>schedutil specific code, unless strictly necessary
> 2. we never had a "clear bit" semantics for flags updates
>
> Thus this proposal seemed to me less of a discontinuity wrt the
> current interface. However, something similar to what you propose
> below should also work. Let's collect some more feedback...
>

I was going to say something similar. I feel its much cleaner if the
scheduler clears the flags that it set, thus taking "ownership" of the
flag. I feel that will avoid complication like this where the governor
has to peek into what's currently running and such (and also help with
the suggestion I made for patch 2/3.

Maybe the interface needs an extension for "clear flag" semantics?

thanks,

-Joel

Re: [git pull] vfs.git pile 10

2017-07-06 Thread Linus Torvalds

On Thu, Jul 6, 2017 at 2:11 AM, Al Viro  wrote:
> uaccess str...() dead code removals.

Side note: you left a couple of references to strlen_user() still in the tree.

None of them *matter* (two comments and one declaration for the
function that no longer exists), but it just strikes me as a bit of an
odd oversight.

Are those removals perhaps in other trees (and you didn't want to
clash) or was it literally just "oops, I caught almost everything
except for those three cases"?

 Linus

Re: [RFC][PATCH] exec: Use init rlimits for setuid exec

2017-07-06 Thread Kees Cook

On Thu, Jul 6, 2017 at 10:10 PM, Kees Cook  wrote:
> On Thu, Jul 6, 2017 at 9:48 PM, Andy Lutomirski  wrote:
>> How about a much simpler solution: don't read rlimit at all in
>> copy_strings(), let alone try to enforce it.  Instead, just before the
>> point of no return, check how much stack space is already used and, if
>> it's more than an appropriate threshold (e.g. 1/4 of the rlimit),
>> abort.  Sure, this adds overhead if we're going to abort, but does
>> that really matter?
>
> We should avoid using up tons of memory and then failing. Better to
> cap it as we use it. Plumbing a sane value into this shouldn't be hard
> at all. Just making this a hardcoded 2MB seems sane (1/4 of 8MB).
>
>> I don't see why using rlimit for layout control makes any sense
>> whatsoever.  Is there some historical reason we need that?  As far as
>> I can see (on insufficient inspection) is that the kernel is trying to
>> guarantee that, if we have so much arg crap that our remaining stack
>> is less than 128k, then we don't exceed our limit by a little bit.
>
> IIUC, this is a big deal on 32-bit. Unlimited stack triggers top-down
> mmap instead of bottom-up. I mean, I'd be delighted to get rid of
> this, but I thought it was relied on by userspace.

I always say this backwards. :P Default is top-down (allocate at high
addresses and work down toward low). With unlimited stack, allocations
start at low addresses and work up. Here's the results (shown with
randomize_va_space sysctl set to 0):

$ ulimit -s
8192
$ cat /proc/self/maps
08048000-0805 r-xp  fc:01 843/bin/cat
0805-08051000 r--p 7000 fc:01 843/bin/cat
08051000-08052000 rw-p 8000 fc:01 843/bin/cat
08052000-08073000 rw-p  00:00 0  [heap]
b7be7000-b7de7000 r--p  fc:01 403307 /usr/lib/locale/locale-archive
b7de7000-b7f9a000 r-xp  fc:01 276980
/lib/i386-linux-gnu/libc-2.24.so
b7f9a000-b7f9b000 ---p 001b3000 fc:01 276980
/lib/i386-linux-gnu/libc-2.24.so
b7f9b000-b7f9d000 r--p 001b3000 fc:01 276980
/lib/i386-linux-gnu/libc-2.24.so
b7f9d000-b7f9e000 rw-p 001b5000 fc:01 276980
/lib/i386-linux-gnu/libc-2.24.so
b7f9e000-b7fa1000 rw-p  00:00 0
b7fb2000-b7fd7000 rw-p  00:00 0
b7fd7000-b7fd9000 r--p  00:00 0  [vvar]
b7fd9000-b7fdb000 r-xp  00:00 0  [vdso]
b7fdb000-b7ffd000 r-xp  fc:01 276959 /lib/i386-linux-gnu/ld-2.24.so
b7ffd000-b7ffe000 r--p 002d7000 fc:01 403307 /usr/lib/locale/locale-archive
b7ffe000-b7fff000 r--p 00022000 fc:01 276959 /lib/i386-linux-gnu/ld-2.24.so
b7fff000-b800 rw-p 00023000 fc:01 276959 /lib/i386-linux-gnu/ld-2.24.so
bffdf000-c000 rw-p  00:00 0  [stack]
$ ulimit -s unlimited
$ cat /proc/self/maps
08048000-0805 r-xp  fc:01 843/bin/cat
0805-08051000 r--p 7000 fc:01 843/bin/cat
08051000-08052000 rw-p 8000 fc:01 843/bin/cat
08052000-08073000 rw-p  00:00 0  [heap]
4000-40022000 r-xp  fc:01 276959 /lib/i386-linux-gnu/ld-2.24.so
40022000-40023000 r--p 002d7000 fc:01 403307 /usr/lib/locale/locale-archive
40023000-40024000 r--p 00022000 fc:01 276959 /lib/i386-linux-gnu/ld-2.24.so
40024000-40025000 rw-p 00023000 fc:01 276959 /lib/i386-linux-gnu/ld-2.24.so
40025000-40027000 r--p  00:00 0  [vvar]
40027000-40029000 r-xp  00:00 0  [vdso]
40029000-4004e000 rw-p  00:00 0
4005f000-40212000 r-xp  fc:01 276980
/lib/i386-linux-gnu/libc-2.24.so
40212000-40213000 ---p 001b3000 fc:01 276980
/lib/i386-linux-gnu/libc-2.24.so
40213000-40215000 r--p 001b3000 fc:01 276980
/lib/i386-linux-gnu/libc-2.24.so
40215000-40216000 rw-p 001b5000 fc:01 276980
/lib/i386-linux-gnu/libc-2.24.so
40216000-40219000 rw-p  00:00 0
40219000-40419000 r--p  fc:01 403307 /usr/lib/locale/locale-archive
bffdf000-c000 rw-p  00:00 0  [stack]


-- 
Kees Cook
Pixel Security

[PATCH] HID: hid-logitech-hidpp: add NULL check on devm_kmemdup() return value

2017-07-06 Thread Gustavo A. R. Silva

Check return value from call to devm_kmemdup()
in order to prevent a NULL pointer dereference.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/hid/hid-logitech-hidpp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-logitech-hidpp.c
index 41b3946..501e16a 100644
--- a/drivers/hid/hid-logitech-hidpp.c
+++ b/drivers/hid/hid-logitech-hidpp.c
@@ -2732,6 +2732,9 @@ static int hidpp_initialize_battery(struct hidpp_device 
*hidpp)
 hidpp_battery_props,
 sizeof(hidpp_battery_props),
 GFP_KERNEL);
+   if (!battery_props)
+   return -ENOMEM;
+
num_battery_props = ARRAY_SIZE(hidpp_battery_props) - 2;
 
if (hidpp->capabilities & HIDPP_CAPABILITY_BATTERY_MILEAGE)
-- 
2.5.0

Re: [PATCH] perf report: Fix broken arrow at row 0 in annotate view

2017-07-06 Thread Jin, Yao


Hi Arnaldo,

Could this patch be merged?

Otherwise the jump arrow is broken when it's displayed at the row 0 in 
annotate view.


Thanks

Jin Yao

On 6/8/2017 2:01 PM, Jin Yao wrote:

When the jump instruction is displayed at the row 0 in annotate view,
the arrow is broken. An example:

  16.86 │   ┌──je 82
   0.01 │  movsd  (%rsp),%xmm0
│  movsd  0x8(%rsp),%xmm4
│  movsd  0x8(%rsp),%xmm1
│  movsd  (%rsp),%xmm3
│  divsd  %xmm4,%xmm0
│  divsd  %xmm3,%xmm1
│  movsd  (%rsp),%xmm2
│  addsd  %xmm1,%xmm0
│  addsd  %xmm2,%xmm0
│  movsd  %xmm0,(%rsp)
│82:   sub$0x1,%ebx
  83.03 │↑ jne38
│  add$0x10,%rsp
│  xor%eax,%eax
│  pop%rbx
│← retq

The patch increments the row number before checking with 0.

Signed-off-by: Jin Yao 
---
  tools/perf/ui/browser.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/ui/browser.c b/tools/perf/ui/browser.c
index a4d3762..83874b0 100644
--- a/tools/perf/ui/browser.c
+++ b/tools/perf/ui/browser.c
@@ -704,7 +704,7 @@ static void __ui_browser__line_arrow_down(struct ui_browser 
*browser,
ui_browser__gotorc(browser, row, column + 1);
SLsmg_draw_hline(2);
  
-		if (row++ == 0)

+   if (++row == 0)
goto out;
} else
row = 0;

Re: [RFC][PATCH] exec: Use init rlimits for setuid exec

2017-07-06 Thread Kees Cook

On Thu, Jul 6, 2017 at 9:48 PM, Andy Lutomirski  wrote:
> How about a much simpler solution: don't read rlimit at all in
> copy_strings(), let alone try to enforce it.  Instead, just before the
> point of no return, check how much stack space is already used and, if
> it's more than an appropriate threshold (e.g. 1/4 of the rlimit),
> abort.  Sure, this adds overhead if we're going to abort, but does
> that really matter?

We should avoid using up tons of memory and then failing. Better to
cap it as we use it. Plumbing a sane value into this shouldn't be hard
at all. Just making this a hardcoded 2MB seems sane (1/4 of 8MB).

> I don't see why using rlimit for layout control makes any sense
> whatsoever.  Is there some historical reason we need that?  As far as
> I can see (on insufficient inspection) is that the kernel is trying to
> guarantee that, if we have so much arg crap that our remaining stack
> is less than 128k, then we don't exceed our limit by a little bit.

IIUC, this is a big deal on 32-bit. Unlimited stack triggers top-down
mmap instead of bottom-up. I mean, I'd be delighted to get rid of
this, but I thought it was relied on by userspace.

-Kees

-- 
Kees Cook
Pixel Security

Re: [git pull] vfs.git pile 11

2017-07-06 Thread Linus Torvalds

On Thu, Jul 6, 2017 at 2:20 PM, Al Viro  wrote:
>
> Linus, could you hold that one back until tomorrow?  I want to tweak the
> last commit in there a bit, but I want to give it a local beating first...

Ok, dropping this one. All your other branches are merged now.

Linus

Re: [RFC][PATCH] exec: Use init rlimits for setuid exec

2017-07-06 Thread Linus Torvalds

On Thu, Jul 6, 2017 at 9:48 PM, Andy Lutomirski  wrote:
>
> How about a much simpler solution: don't read rlimit at all in
> copy_strings(), let alone try to enforce it.

People have historically relied on E2BIG and then splitting things
into multiple chunks (ie do the whole 'xargs' thing).

But I agree that *if* we abort cleanly with E2BIG, then it's fine if
we copy a bit too much first. It's just that we need to check early
enough that an E2BIG error is possible (ie not after the "point of no
return").

And yes, I think we can get rid of the layout differences based on
rlimit. After all, if I read that right the thing currently clamps
that "gap" thing to 128MB anyway (and that's on the *low* side).

 Linus

Re: [PATCH] PCI: Do not enable extended tags on pre-dated (v1.x) systems

2017-07-06 Thread Jike Song

On Wed, Jul 5, 2017 at 9:19 PM, Sinan Kaya  wrote:
> According to extended tags ECN document, all PCIe receivers are expected
> to support extended tags support. It should be safe to enable extended
> tags on endpoints without checking compatibility.
>
> This assumption seems to be working fine except for the legacy systems.
> The ECN has been written against PCIE spec version 2.0. Therefore, we need
> to exclude all version 1.0 devices from this change as there is HW out
> there that can't handle extended tags.
>
> Note that the default value of Extended Tags Enable bit is implementation
> specific. Therefore, we are clearing the bit by default when incompatible
> HW is found without assuming that value is zero.
>
> Reported-by: Wim ten Have 
> Link: 
> https://pcisig.com/sites/default/files/specification_documents/ECN_Extended_Tag_Enable_Default_05Sept2008_final.pdf
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=1467674
> Fixes: 60db3a4d8cc9 ("PCI: Enable PCIe Extended Tags if supported")
> Tested-by: Wim ten Have 
> Signed-off-by: Sinan Kaya 
> ---
>  drivers/pci/probe.c | 52 +---
>  1 file changed, 45 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 19c8950..5e39013 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1684,21 +1684,58 @@ static void program_hpp_type2(struct pci_dev *dev, 
> struct hpp_type2 *hpp)
>  */
>  }
>
> -static void pci_configure_extended_tags(struct pci_dev *dev)
> +static bool pcie_bus_exttags_supported(struct pci_bus *bus)
> +{
> +   bool exttags_supported = true;
> +   struct pci_dev *bridge;
> +   int rc;
> +   u16 flags;
> +
> +   bridge = bus->self;
> +   while (bridge) {
> +   if (pci_is_pcie(bridge)) {
> +   rc = pcie_capability_read_word(bridge, PCI_EXP_FLAGS,
> +  &flags);
> +   if (!rc && ((flags & PCI_EXP_FLAGS_VERS) < 2)) {
> +   exttags_supported = false;
> +   break;
> +   }
> +   }
> +   if (!bridge->bus->parent)
> +   break;
> +   bridge = bridge->bus->parent->self;
> +   }
> +
> +   return exttags_supported;
> +}
> +
> +static int pcie_bus_configure_exttags(struct pci_dev *dev, void *data)
>  {
> u32 dev_cap;
> int ret;
> +   bool supported;
>
> if (!pci_is_pcie(dev))
> -   return;
> +   return 0;
>
> ret = pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &dev_cap);
> if (ret)
> -   return;
> +   return 0;
>
> -   if (dev_cap & PCI_EXP_DEVCAP_EXT_TAG)
> -   pcie_capability_set_word(dev, PCI_EXP_DEVCTL,
> -PCI_EXP_DEVCTL_EXT_TAG);
> +   if (dev_cap & PCI_EXP_DEVCAP_EXT_TAG) {
> +   supported = pcie_bus_exttags_supported(dev->bus);
> +

Maybe checking the version of this endpoint at first? Do you expect a
v1 endpoint
to be working under v2+ ports?

-- 
Thanks,
Jike

[PATCH v4 2/2] perf report: Implement visual marker for macro fusion in annotate

2017-07-06 Thread Jin Yao

For marking the fused instructions clearly, This patch adds a
line before the first instruction of pair and joins it with the
arrow of the jump.

For example, when je is selected in annotate view, the line
before cmpl is displayed and joins the arrow of je.

   │   ┌──cmpl   $0x0,argp_program_version_hook
 81.93 │   ├──je 20
   │   │  lock   cmpxchg %esi,0x38a9a4(%rip)
   │   │↓ jne29
   │   │↓ jmp43
 11.47 │20:└─→cmpxch %esi,0x38a999(%rip)

That means the cmpl+je is fused instruction pair and they should
be considered together.

Change-log:
---
v4: Since the first patch in patch set is changed, this is mainly
changed for compilation.

v3: Use Arnaldo's fix to let the display be better.
To get the evsel->evlist->env->cpuid, save the evsel in
annotate_browser.

v2: No more changes, just uses a new function "ins__is_fused"
to check if the instructions are fused.

v1: Initial post

Signed-off-by: Jin Yao 
---
 tools/perf/ui/browser.c   | 29 +
 tools/perf/ui/browser.h   |  2 ++
 tools/perf/ui/browsers/annotate.c | 26 ++
 tools/perf/util/annotate.c|  5 +
 tools/perf/util/annotate.h|  1 +
 5 files changed, 63 insertions(+)

diff --git a/tools/perf/ui/browser.c b/tools/perf/ui/browser.c
index a4d3762..9ef7677 100644
--- a/tools/perf/ui/browser.c
+++ b/tools/perf/ui/browser.c
@@ -738,6 +738,35 @@ void __ui_browser__line_arrow(struct ui_browser *browser, 
unsigned int column,
__ui_browser__line_arrow_down(browser, column, start, end);
 }
 
+void ui_browser__mark_fused(struct ui_browser *browser, unsigned int column,
+   unsigned int row, bool arrow_down)
+{
+   unsigned int end_row;
+
+   if (row >= browser->top_idx)
+   end_row = row - browser->top_idx;
+   else
+   return;
+
+   SLsmg_set_char_set(1);
+
+   if (arrow_down) {
+   ui_browser__gotorc(browser, end_row, column - 1);
+   SLsmg_write_char(SLSMG_ULCORN_CHAR);
+   ui_browser__gotorc(browser, end_row, column);
+   SLsmg_draw_hline(2);
+   ui_browser__gotorc(browser, end_row + 1, column - 1);
+   SLsmg_write_char(SLSMG_LTEE_CHAR);
+   } else {
+   ui_browser__gotorc(browser, end_row, column - 1);
+   SLsmg_write_char(SLSMG_LTEE_CHAR);
+   ui_browser__gotorc(browser, end_row, column);
+   SLsmg_draw_hline(2);
+   }
+
+   SLsmg_set_char_set(0);
+}
+
 void ui_browser__init(void)
 {
int i = 0;
diff --git a/tools/perf/ui/browser.h b/tools/perf/ui/browser.h
index be3b70e..a12eff7 100644
--- a/tools/perf/ui/browser.h
+++ b/tools/perf/ui/browser.h
@@ -43,6 +43,8 @@ void ui_browser__printf(struct ui_browser *browser, const 
char *fmt, ...);
 void ui_browser__write_graph(struct ui_browser *browser, int graph);
 void __ui_browser__line_arrow(struct ui_browser *browser, unsigned int column,
  u64 start, u64 end);
+void ui_browser__mark_fused(struct ui_browser *browser, unsigned int column,
+   unsigned int row, bool arrow_down);
 void __ui_browser__show_title(struct ui_browser *browser, const char *title);
 void ui_browser__show_title(struct ui_browser *browser, const char *title);
 int ui_browser__show(struct ui_browser *browser, const char *title,
diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index ae05ebb..b376637 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -273,6 +273,25 @@ static bool disasm_line__is_valid_jump(struct disasm_line 
*dl, struct symbol *sy
return true;
 }
 
+static bool is_fused(struct annotate_browser *ab, struct disasm_line *cursor)
+{
+   struct disasm_line *pos = list_prev_entry(cursor, node);
+   const char *name;
+
+   if (!pos)
+   return false;
+
+   if (ins__is_lock(&pos->ins))
+   name = pos->ops.locked.ins.name;
+   else
+   name = pos->ins.name;
+
+   if (!name || !cursor->ins.name)
+   return false;
+
+   return ins__is_fused(ab->arch, name, cursor->ins.name);
+}
+
 static void annotate_browser__draw_current_jump(struct ui_browser *browser)
 {
struct annotate_browser *ab = container_of(browser, struct 
annotate_browser, b);
@@ -308,6 +327,13 @@ static void annotate_browser__draw_current_jump(struct 
ui_browser *browser)
ui_browser__set_color(browser, HE_COLORSET_JUMP_ARROWS);
__ui_browser__line_arrow(browser, pcnt_width + 2 + ab->addr_width,
 from, to);
+
+   if (is_fused(ab, cursor)) {
+   ui_browser__mark_fused(browser,
+  pcnt_width + 3 + ab->addr_width,
+  from - 1,
+  to > from ? tru

[PATCH v4 1/2] perf util: Check for fused instruction

2017-07-06 Thread Jin Yao

Macro fusion merges two instructions to a single micro-op. Intel
core platform performs this hardware optimization under limited
circumstances.

For example, CMP + JCC can be "fused" and executed /retired
together. While with sampling this can result in the sample
sometimes being on the JCC and sometimes on the CMP.
So for the fused instruction pair, they could be considered
together.

On Nehalem, fused instruction pairs:
cmp/test + jcc.

On other new CPU:
cmp/test/add/sub/and/inc/dec + jcc.

This patch adds an x86-specific function which checks if 2
instructions are in a "fused" pair. For non-x86 arch, the
function is just NULL.

Change-log:
---
v4: Move the CPU model checking to symbol__disassemble and
save the CPU family/model in arch structure.

It avoids checking every time when jump arrow printed.

v3: Add checking for Nehalem (CMP, TEST). For other newer
Intel CPUs just check it by default (CMP, TEST, ADD,
SUB, AND, INC, DEC).

v2: Remove the origial weak function. Arnaldo points out
that doing it as a weak function that will be overridden
by the host arch doesn't work. So now it's implemented
as an arch-specific function.

v1: Initial post

Signed-off-by: Jin Yao 
---
 tools/perf/arch/x86/annotate/instructions.c | 46 +
 tools/perf/builtin-top.c|  2 +-
 tools/perf/ui/browsers/annotate.c   |  4 ++-
 tools/perf/ui/gtk/annotate.c|  2 +-
 tools/perf/util/annotate.c  | 22 --
 tools/perf/util/annotate.h  |  3 +-
 6 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/tools/perf/arch/x86/annotate/instructions.c 
b/tools/perf/arch/x86/annotate/instructions.c
index c1625f2..d84b720 100644
--- a/tools/perf/arch/x86/annotate/instructions.c
+++ b/tools/perf/arch/x86/annotate/instructions.c
@@ -76,3 +76,49 @@ static struct ins x86__instructions[] = {
{ .name = "xbeginq",.ops = &jump_ops, },
{ .name = "retq",   .ops = &ret_ops,  },
 };
+
+static bool x86__ins_is_fused(struct arch *arch, const char *ins1,
+ const char *ins2)
+{
+   if (arch->family != 6 || arch->model < 0x1e || strstr(ins2, "jmp"))
+   return false;
+
+   if (arch->model == 0x1e) {
+   /* Nehalem */
+   if ((strstr(ins1, "cmp") && !strstr(ins1, "xchg")) ||
+strstr(ins1, "test")) {
+   return true;
+   }
+   } else {
+   /* Newer platform */
+   if ((strstr(ins1, "cmp") && !strstr(ins1, "xchg")) ||
+strstr(ins1, "test") ||
+strstr(ins1, "add") ||
+strstr(ins1, "sub") ||
+strstr(ins1, "and") ||
+strstr(ins1, "inc") ||
+strstr(ins1, "dec")) {
+   return true;
+   }
+   }
+
+   return false;
+}
+
+static int x86__cpuid_parse(struct arch *arch, char *cpuid)
+{
+   unsigned int family, model, stepping;
+   int ret;
+
+   /*
+* cpuid = "GenuineIntel,family,model,stepping"
+*/
+   ret = sscanf(cpuid, "%*[^,],%u,%u,%u", &family, &model, &stepping);
+   if (ret == 3) {
+   arch->family = family;
+   arch->model = model;
+   return 0;
+   }
+
+   return -1;
+}
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 6052376..022486d 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -134,7 +134,7 @@ static int perf_top__parse_source(struct perf_top *top, 
struct hist_entry *he)
return err;
}
 
-   err = symbol__disassemble(sym, map, NULL, 0, NULL);
+   err = symbol__disassemble(sym, map, NULL, 0, NULL, NULL);
if (err == 0) {
 out_assign:
top->sym_filter_entry = he;
diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 27f41f2..ae05ebb 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -9,6 +9,7 @@
 #include "../../util/symbol.h"
 #include "../../util/evsel.h"
 #include "../../util/config.h"
+#include "../../util/evlist.h"
 #include 
 #include 
 #include 
@@ -1074,7 +1075,8 @@ int symbol__tui_annotate(struct symbol *sym, struct map 
*map,
}
 
err = symbol__disassemble(sym, map, perf_evsel__env_arch(evsel),
- sizeof_bdl, &browser.arch);
+ sizeof_bdl, &browser.arch,
+ evsel->evlist->env->cpuid);
if (err) {
char msg[BUFSIZ];
symbol__strerror_disassemble(sym, map, err, msg, sizeof(msg));
diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c
index d903fd4..87e3760 100644
--- a/tools/perf/ui/gtk/annotate.c
+++ b/tools/perf/ui/gtk/annotate.c
@@ -169,7 +169,7 @@ static i

[PATCH v4 0/2] perf report: Implement visual marker for macro fusion in annotate

2017-07-06 Thread Jin Yao

Macro fusion merges two instructions to a single micro-op. Intel
core platform performs this hardware optimization under limited
circumstances. For example, CMP + JCC can be "fused" and executed
/retired together. While with sampling this can result in the
sample sometimes being on the JCC and sometimes on the CMP.
So for the fused instruction pair, they could be considered
together.

On Nehalem, fused instruction pairs:
cmp/test + jcc.

On other new CPU:
cmp/test/add/sub/and/inc/dec + jcc.

This patch series marks the case clearly by joining the fused
instruction pair in the arrow of the jump.

For example:

   │   ┌──cmpl   $0x0,argp_program_version_hook
 81.93 │   ├──je 20
   │   │  lock   cmpxchg %esi,0x38a9a4(%rip)
   │   │↓ jne29
   │   │↓ jmp43
 11.47 │20:└─→cmpxch %esi,0x38a999(%rip)

Change-log:
---
v4: Move the CPU model checking to symbol__disassemble and save
the family/model in arch structure. It avoids checking everytime
when display the jump arrows.

The patch set is still performing the fused checking when user
moves the cursor on the jump instruction.

v3: 1.  Add checking for Nehalem (CMP, TEST). For other newer
Intel CPUs just check it by default (CMP, TEST, ADD,
SUB, AND, INC, DEC).

2.  Use Arnaldo's fix to let the display be better

v2: According to Arnaldo's comments, remove the weak function and
use an arch-specific function instead to check fused instruction
pair.

v1: Inital post

Jin Yao (2):
  perf util: Check for fused instruction
  perf report: Implement visual marker for macro fusion in annotate

 tools/perf/arch/x86/annotate/instructions.c | 46 +
 tools/perf/builtin-top.c|  2 +-
 tools/perf/ui/browser.c | 29 ++
 tools/perf/ui/browser.h |  2 ++
 tools/perf/ui/browsers/annotate.c   | 30 ++-
 tools/perf/ui/gtk/annotate.c|  2 +-
 tools/perf/util/annotate.c  | 27 +++--
 tools/perf/util/annotate.h  |  4 ++-
 8 files changed, 136 insertions(+), 6 deletions(-)

-- 
2.7.4

Re: [PATCH] iio: multiplexer: add NULL check on devm_kzalloc() and devm_kmemdup() return values

2017-07-06 Thread Peter Rosin

On 2017-07-07 06:53, Gustavo A. R. Silva wrote:
> Check return values from call to devm_kzalloc() and devm_kmemup()

If someone cares enough: s/devm_kmemup/evm_kmemdup/


> in order to prevent a NULL pointer dereference.
> 
> This issue was detected using Coccinelle and the following semantic patch:
> 
> @@
> expression x;
> identifier fld;
> @@
> 
> * x = devm_kzalloc(...);
>... when != x == NULL
>x->fld
> 
> Cc: Peter Rosin 
> Signed-off-by: Gustavo A. R. Silva 

Either way,

Reviewed-by: Peter Rosin 

Thanks!

linux-next: Tree for Jul 7

2017-07-06 Thread Stephen Rothwell

Hi all,

Please do not add any v4.14 material to you linux-next included branches
until after v4.13-rc1 has been released.

Changes since 20170706:

The f2fs tree gained a conflict against Linus' tree.

The akpm tree lost a patch that turned up elsewhere.

Non-merge commits (relative to Linus' tree): 4006
 3387 files changed, 456543 insertions(+), 62457 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And
finally, a simple boot test of the powerpc pseries_le_defconfig kernel
in qemu.

Below is a summary of the state of the merge.

I am currently merging 266 trees (counting Linus' and 41 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (90311148415a Merge tag 'scsi-misc' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi)
Merging fixes/master (b4b8cbf679c4 Cavium CNN55XX: fix broken default Kconfig 
entry)
Merging kbuild-current/fixes (ad8181060788 kconfig: fix sparse warnings in 
nconfig)
Merging arc-current/for-curr (c0bc126f97fb Linux 4.12-rc7)
Merging arm-current/fixes (9e25ebfe56ec ARM: 8685/1: ensure memblock-limit is 
pmd-aligned)
Merging m68k-current/for-linus (204a2be30a7a m68k: Remove ptrace_signal_deliver)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (d6bd8194e286 powerpc/32: Avoid miscompilation 
w/GCC 4.6.3 - don't inline copy_to/from_user())
Merging sparc/master (dbd2667a4fb9 sparc64: Fix gup_huge_pmd)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (f630c38ef0d7 vrf: fix bug_on triggered by rx when 
destroying a vrf)
Merging ipsec/master (ca3a1b856636 esp6_offload: Fix IP6CB(skb)->nhoff for ESP 
GRO)
Merging netfilter/master (c644bd79c0a7 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf)
Merging ipvs/master (3c5ab3f395d6 ipvs: SNAT packet replies only for NATed 
connections)
Merging wireless-drivers/master (35abcd4f9f30 brcmfmac: fix uninitialized 
warning in brcmf_usb_probe_phase2())
Merging mac80211/master (4b153ca989a9 Merge tag 'mac80211-for-davem-2017-06-16' 
of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211)
Merging sound-current/for-linus (6ede2b7df92f ALSA: opl4: Move inline before 
return type)
Merging pci-current/for-linus (98dbf5af4fdd PCI: endpoint: Select CRC32 to fix 
test build error)
Merging driver-core.current/driver-core-linus (650fc870a2ef Merge tag 
'docs-4.13' of git://git.lwn.net/linux)
Merging tty.current/tty-linus (650fc870a2ef Merge tag 'docs-4.13' of 
git://git.lwn.net/linux)
Merging usb.current/usb-linus (dec08194ffec xhci: Limit USB2 port wake support 
for AMD Promontory hosts)
Merging usb-gadget-fixes/fixes (f50b878fed33 USB: gadget: fix GPF in gadgetfs)
Merging usb-serial-fixes/usb-linus (996fab55d864 USB: serial: qcserial: new 
Sierra Wireless EM7305 device ID)
Merging usb-chipidea-fixes/ci-for-usb-stable (cbb22ebcfb99 usb: chipidea: core: 
check before accessing ci_role in ci_role_show)
Merging phy/fixes (9605bc46433d phy: qualcomm: phy-qcom-qmp: fix application of 
sizeof to pointer)
Merging staging.current/staging-linus (650fc870a2ef Merge tag 'docs-4.13' of 
git://git.lwn.net/linux)
Merging char-misc.current/char-misc-linus (650fc870a2ef Merge tag 'docs-4.13' 
of git://git.lwn.net/linux)
Merging input-current/for-linus (ede2e7cdc58e Merge

[PATCH] arcnet: com20020-pci: Fix an error handling path in 'com20020pci_probe()'

2017-07-06 Thread Christophe JAILLET

If this memory allocation fails, we should go through the error handling
path as done everywhere else in this function before returning.

Signed-off-by: Christophe JAILLET 
---
 drivers/net/arcnet/com20020-pci.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/arcnet/com20020-pci.c 
b/drivers/net/arcnet/com20020-pci.c
index 239de38fbd6a..8e89a2ac071e 100644
--- a/drivers/net/arcnet/com20020-pci.c
+++ b/drivers/net/arcnet/com20020-pci.c
@@ -196,8 +196,10 @@ static int com20020pci_probe(struct pci_dev *pdev,
 
card = devm_kzalloc(&pdev->dev, sizeof(struct com20020_dev),
GFP_KERNEL);
-   if (!card)
-   return -ENOMEM;
+   if (!card) {
+   ret = -ENOMEM;
+   goto out_port;
+   }
 
card->index = i;
card->pci_priv = priv;
-- 
2.11.0

[PATCH] iio: multiplexer: add NULL check on devm_kzalloc() and devm_kmemdup() return values

2017-07-06 Thread Gustavo A. R. Silva

Check return values from call to devm_kzalloc() and devm_kmemup()
in order to prevent a NULL pointer dereference.

This issue was detected using Coccinelle and the following semantic patch:

@@
expression x;
identifier fld;
@@

* x = devm_kzalloc(...);
   ... when != x == NULL
   x->fld

Cc: Peter Rosin 
Signed-off-by: Gustavo A. R. Silva 
---
Changes in v2:
 Add NULL check on devm_kmemup() return value.

 drivers/iio/multiplexer/iio-mux.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/iio/multiplexer/iio-mux.c 
b/drivers/iio/multiplexer/iio-mux.c
index 37ba007..74831fc 100644
--- a/drivers/iio/multiplexer/iio-mux.c
+++ b/drivers/iio/multiplexer/iio-mux.c
@@ -285,6 +285,9 @@ static int mux_configure_channel(struct device *dev, struct 
mux *mux,
child->ext_info_cache = devm_kzalloc(dev,
 sizeof(*child->ext_info_cache) *
 num_ext_info, GFP_KERNEL);
+   if (!child->ext_info_cache)
+   return -ENOMEM;
+
for (i = 0; i < num_ext_info; ++i) {
child->ext_info_cache[i].size = -1;
 
@@ -309,6 +312,9 @@ static int mux_configure_channel(struct device *dev, struct 
mux *mux,
 
child->ext_info_cache[i].data = devm_kmemdup(dev, page, ret + 1,
 GFP_KERNEL);
+   if (!child->ext_info_cache[i].data)
+   return -ENOMEM;
+
child->ext_info_cache[i].data[ret] = 0;
child->ext_info_cache[i].size = ret;
}
-- 
2.5.0

Re: [PATCH] iio: multiplexer: add NULL check on devm_kzalloc() return value

2017-07-06 Thread Gustavo A. R. Silva



Quoting Peter Rosin :


On 2017-07-07 06:35, Gustavo A. R. Silva wrote:

Hi Peter,

Quoting Peter Rosin :


On 2017-07-07 00:08, Gustavo A. R. Silva wrote:

Check return value from call to devm_kzalloc()
in order to prevent a NULL pointer dereference.


Right, thanks for finding that one! There's another one inside the
for loop that is just starting in the context of this patch. Care
to fix checking the return value of that devm_kmemdup as well?



Sure, I'll send a new patch shortly.


And someone should perhaps teach Coccinelle about devm_kmemdup...



Good catch, I just implemented that script.


This issue was detected using Coccinelle and the following semantic patch:

@@
expression x;
identifier fld;
@@

* x = devm_kzalloc(...);
  ... when != x == NULL
  x->fld




One of these blank lines should perhaps be a "Fixes:" tag?



mmm, I don't get this...


If you add a Fixes-tag, like below, you help the stable kernel maintainers
decide what to look at. In this case it might be overkill since the thing
you fix is so fresh and does not apply to any old kernel. But I think it
is a good habit...

Fixes: 7ba9df54b091 ("iio: multiplexer: new iio category and iio-mux driver")

(and it is a bit unusual to see two blank lines before the SoB-tag)

Sorry for not spelling it out the first time.



Oh, I get it now. Thank you very much for clarifying. I will  
definitely keep that in mind for future fixes when that could apply.


Thanks!
--
Gustavo A. R. Silva

Re: [RFC][PATCH] exec: Use init rlimits for setuid exec

2017-07-06 Thread Andy Lutomirski

On Thu, Jul 6, 2017 at 12:12 PM, Kees Cook  wrote:
> On Thu, Jul 6, 2017 at 10:52 AM, Linus Torvalds
>  wrote:
>> On Thu, Jul 6, 2017 at 10:29 AM, Kees Cook  wrote:

  (a) minimal: just use our existing default stack (and stack _only_)
 limit value for suid binaries that actually get extra permissions: {
 _STK_LIM, RLIM_INFINITY }.
>>>
>>> This would look a lot like the existing patch; it'd just not copy the
>>> init process rlimits.
>>
>> Can't we just do the final rlimit setting so late in execve that we
>> don't need that whole "saved_rlimit" thing?
>
> The stack rlimit defines the mmap layout too:
>
> do_execveat_common() ->
> exec_binprm() ->
> search_binary_handler() ->
> fmt->load_binary (load_elf_binary()) ->
> setup_new_exec() ->
> arch_pick_mmap_layout() ->
> mmap_is_legacy() ->
> rlimit(RLIMIT_STACK) == RLIM_INFINITY
>
> exec_binprm() happens after the other stack setup (copy_strings()), so
> if we wanted to avoid saved_rlimit, we'd have to replumb how
> arch_pick_mmap_layout() works and how copy_strings() performs its
> calculations (neither looks too terrible).

How about a much simpler solution: don't read rlimit at all in
copy_strings(), let alone try to enforce it.  Instead, just before the
point of no return, check how much stack space is already used and, if
it's more than an appropriate threshold (e.g. 1/4 of the rlimit),
abort.  Sure, this adds overhead if we're going to abort, but does
that really matter?

I don't see why using rlimit for layout control makes any sense
whatsoever.  Is there some historical reason we need that?  As far as
I can see (on insufficient inspection) is that the kernel is trying to
guarantee that, if we have so much arg crap that our remaining stack
is less than 128k, then we don't exceed our limit by a little bit.

Re: [PATCH v2] arm: eBPF JIT compiler

2017-07-06 Thread Shubham Bansal

Okay Kees. I will take a look at it.
Best,
Shubham Bansal


On Fri, Jul 7, 2017 at 10:12 AM, Kees Cook  wrote:
> On Wed, Jul 5, 2017 at 8:49 PM, Shubham Bansal
>  wrote:
>> Hi Kees,
>>
>> Problem is my ARM machine don't have clang and iproute2 which is
>> keeping me from testing the bpf tail calls.
>>
>> You should do the following to test it,.
>>
>> 1. tools/testing/selftests/bpf/
>> 2. make
>> 3. sudo ./test_progs
>>
>> And, before testing, you have to do "make headers_install".
>> These tests are for tail calls with the attached patch. If its too
>> much work, Can you please upload your arm image so that I can test it?
>> I just need a good machine.
>
> I've got all this set up now, and it faults during the test:
>
> Unable to handle kernel NULL pointer dereference at virtual address 0008
> ...
> CPU: 0 PID: 1922 Comm: test_progs Not tainted 4.12.0+ #60
> ...
> PC is at __htab_map_lookup_elem+0x54/0x1f4
>
> I'll see if I can send you this disk image...
>
> -Kees
>
>
> --
> Kees Cook
> Pixel Security

Re: [PATCH 2/2] serial: earlycon: Make early_con as __initdata

2017-07-06 Thread Sergey Senozhatsky

On (07/06/17 11:38), Matt Redfearn wrote:
> All early console drivers that may be registered as the earlycon are
> marked __init to be placed in the init section. The drivers' code and
> data are freed during free_initmem_default() but the early console is
> not unregistered in printk_late_init() as the init_section_intersects()
> test fails. This leads to the earlycon still being active, potentially
> with dangling pointers into the freed init section, which may have been
> poisoned by the slab debugger. Attempting to use the boot console after
> this will likely lead to weird behaviour and kernel crashes.

people want to use early-con as a panic() console fallback.

-ss

> Fix this by marking the early_con struct __initdata so that the
> init_section_intersects() test succeeds and the console is unregistered
> in printk_late_init() before the init section is freed.
> 
> The 8250 earlycon, on which the generic earlycon was based, had these
> attributes on it's console struct. The switch to the generic
> implementation in commit d2fd6810a823 ("tty/serial: convert 8250 to
> generic earlycon") appears to have broken unregistraton of the boot
> console when its code and data are in __init.
> 
> Fixes: d2fd6810a823 ("tty/serial: convert 8250 to generic earlycon")
> Signed-off-by: Matt Redfearn 
> 
> ---
> 
>  drivers/tty/serial/earlycon.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/tty/serial/earlycon.c b/drivers/tty/serial/earlycon.c
> index c3651540e1ba..388aaafcca2b 100644
> --- a/drivers/tty/serial/earlycon.c
> +++ b/drivers/tty/serial/earlycon.c
> @@ -29,13 +29,13 @@
>  
>  #include 
>  
> -static struct console early_con = {
> +static struct console early_con __initdata = {
>   .name = "uart", /* fixed up at earlycon registration */
>   .flags =CON_PRINTBUFFER | CON_BOOT,
>   .index =0,
>  };
>  
> -static struct earlycon_device early_console_dev = {
> +static struct earlycon_device early_console_dev __initdata = {
>   .con = &early_con,
>  };
>  
> -- 
> 2.7.4
>

Re: [PATCH] iio: multiplexer: add NULL check on devm_kzalloc() return value

2017-07-06 Thread Peter Rosin

On 2017-07-07 06:35, Gustavo A. R. Silva wrote:
> Hi Peter,
> 
> Quoting Peter Rosin :
> 
>> On 2017-07-07 00:08, Gustavo A. R. Silva wrote:
>>> Check return value from call to devm_kzalloc()
>>> in order to prevent a NULL pointer dereference.
>>
>> Right, thanks for finding that one! There's another one inside the
>> for loop that is just starting in the context of this patch. Care
>> to fix checking the return value of that devm_kmemdup as well?
>>
> 
> Sure, I'll send a new patch shortly.
> 
>> And someone should perhaps teach Coccinelle about devm_kmemdup...
>>
> 
> Good catch, I just implemented that script.
> 
>>> This issue was detected using Coccinelle and the following semantic patch:
>>>
>>> @@
>>> expression x;
>>> identifier fld;
>>> @@
>>>
>>> * x = devm_kzalloc(...);
>>>   ... when != x == NULL
>>>   x->fld
>>>
>>>
>>
>> One of these blank lines should perhaps be a "Fixes:" tag?
>>
> 
> mmm, I don't get this...

If you add a Fixes-tag, like below, you help the stable kernel maintainers
decide what to look at. In this case it might be overkill since the thing
you fix is so fresh and does not apply to any old kernel. But I think it
is a good habit...

Fixes: 7ba9df54b091 ("iio: multiplexer: new iio category and iio-mux driver")

(and it is a bit unusual to see two blank lines before the SoB-tag)

Sorry for not spelling it out the first time.

Cheers,
peda

Re: [x86/time] 03fa63cc96: ACPI_Error:Table[DMAR]is_not_invalidated_during_early_boot_stage(#/tbxface -#)

2017-07-06 Thread Dou Liyang


Hi Thomas,

At 07/07/2017 11:04 AM, Ye Xiaolong wrote:

On 07/07, Dou Liyang wrote:

Hi xiaolong,

Really thanks for your testing.

At 07/07/2017 09:54 AM, Ye Xiaolong wrote:

On 07/06, Thomas Gleixner wrote:

On Thu, 6 Jul 2017, kernel test robot wrote:


commit: 03fa63cc96ab35592e0a7d522b8edbc1e6b02d22 ("x86/time: Initialize interrupt 
mode behind timer init")



++++
|| 43436935b7 | 03fa63cc96 |
++++
| boot_successes | 0  | 4  |
++++


So 03fa63cc96 makes the box boot again. I'm confused as usual by the
output of this tool.,


kern  :info  : [0.005000] tsc: Fast TSC calibration using PIT
kern  :info  : [0.006000] tsc: Detected 2195.020 MHz processor
kern  :info  : [0.007000] Calibrating delay loop (skipped), value 
calculated using timer frequency.. 4390.04 BogoMIPS (lpj=2195020)
kern  :info  : [0.008001] pid_max: default: 90112 minimum: 704
kern  :info  : [0.009037] ACPI: Core revision 20170303
kern  :err   : [0.010002] ACPI Error: Table [DMAR] is not invalidated 
during early boot stage (20170303/tbxface-193)


Sure we have a error message here, but compared to what? Compared to
something which does not boot at all?


Sorry for the confusion, here commit 43436935b7 boot failed due to OOM which
happened at the late stage of kernel boot while the ACPI error showed at the
early boot stage for commit 03fa63cc96 and it didn't appear in 43436935b7's
dmesg.



let's make the problem clearly firstly:

1) Commit 43436935b7 ("x86/xen: Bypass intr mode setup in enlighten_pv
system") made kernel boot failed, which caused by OOM.



It is so interesting!

This patch only work for XEN in pv mode, if we boot a kernel w/o XEN,
it will not work in fact.



2) Commit 03fa63cc96 ("x86/time: Initialize interrupt mode behind
timer init") can make the kernel boot success again, but with an ACPI
error happened.


Indeed, the ACPI Error appeared with this patch.

If x2apic is enabled in x86-64, initializing interrupt mode contains
irq remapping setup and the ACPI DMAR table initialization.

default_setup_apic_routing()
  enable_IR_x2apic()
irq_remapping_prepare()
  intel_prepare_irq_remapping()
dmar_table_init()

this patch make interrupt mode setup before acpi_early_init() which has
the check of ACPI table state in acpi_reallocate_root_table() where the
ACPI Error generated.

I have got a machine which supports x2apic. I am tracking the code,
try to find out more.

Thanks,

dou.


And both *1* and *2* used the same configuration showed in the
attachment.

Does anything I missed?


Yes, this is exactly what I meant.

Thanks,
Xiaolong


Thanks,

dou.


Thanks,
Xiaolong



Thanks,

tglx

Re: [PATCH 1/2] printk: Unconditionally unregister boot consoles if in init section

2017-07-06 Thread Sergey Senozhatsky

On (07/06/17 11:38), Matt Redfearn wrote:
> Commit 4c30c6f566c0 ("kernel/printk: do not turn off bootconsole in
> printk_late_init() if keep_bootcon") added a check on keep_bootcon to
> ensure that boot consoles were kept around until the real console is
> registered.
> This can lead to problems if the boot console data and code are in the
> init section, since it can be freed before the boot console is
> deregistered. This was fixed by commit 81cc26f2bd11 ("printk: only
> unregister boot consoles when necessary").
> The keep_bootcon flag prevents the unregistration of a boot console,
> even if it's data and code reside in the init section and are about to
> be freed. This can lead to crashes and weird panics when the bootconsole
> is accessed after free, especially if page poisoning is in use and the
> code / data have been overwritten with a poison value.
> To prevent this, always free the boot console if it is within the init
> section.
> 
> Fixes 81cc26f2bd11 ("printk: only unregister boot consoles when necessary").
> Signed-off-by: Matt Redfearn 

well... hm. don't specify `keep_bootcon' if your bootcon access initdata?
keeping `early_printk' sometimes can be helpful and people even want to
use `early_printk' as a panic() console fallback (because of those nasty
deadlocks that spoil printk->call_console_drivers()).

if someone asked to `keep_bootcon' but we actually don't keep it, then
what's the purpose of the flag and
pr_info("debug: skip boot console de-registration.\n")?


> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index a1db38abac5b..b5411f44eed4 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2660,7 +2660,7 @@ static int __init printk_late_init(void)
>   int ret;
>  
>   for_each_console(con) {
> - if (!keep_bootcon && con->flags & CON_BOOT) {
> + if (con->flags & CON_BOOT) {
>   /*
>* Make sure to unregister boot consoles whose data
>* resides in the init section before the init section

what about bootconsoles that are still not getting de-registered due
to keep_bootcon in register_console()?

-ss

Re: [PATCH v2 2/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter

2017-07-06 Thread Joel Fernandes

On Tue, Jul 4, 2017 at 10:34 AM, Patrick Bellasi
 wrote:
> Currently, sg_cpu's flags are set to the value defined by the last call of
> the cpufreq_update_util()/cpufreq_update_this_cpu(); for RT/DL classes
> this corresponds to the SCHED_CPUFREQ_{RT/DL} flags always being set.
>
> When multiple CPU shares the same frequency domain it might happen that a
> CPU which executed a RT task, right before entering IDLE, has one of the
> SCHED_CPUFREQ_RT_DL flags set, permanently, until it exits IDLE.
>
> Although such an idle CPU is _going to be_ ignored by the
> sugov_next_freq_shared():
>   1. this kind of "useless RT requests" are ignored only if more then
>  TICK_NSEC have elapsed since the last update
>   2. we can still potentially trigger an already too late switch to
>  MAX, which starts also a new throttling interval
>   3. the internal state machine is not consistent with what the
>  scheduler knows, i.e. the CPU is now actually idle
>
> Thus, in sugov_next_freq_shared(), where utilisation and flags are
> aggregated across all the CPUs of a frequency domain, it can turn out
> that all the CPUs of that domain can run unnecessary at the maximum OPP
> until another event happens in the idle CPU, which eventually clear the
> SCHED_CPUFREQ_{RT/DL} flag, or the IDLE CPUs gets ignored after
> TICK_NSEC since the CPU entering IDLE.
>
> Such a behaviour can harm the energy efficiency of systems where RT
> workloads are not so frequent and other CPUs in the same frequency
> domain are running small utilisation workloads, which is a quite common
> scenario in mobile embedded systems.
>
> This patch proposes a solution which is aligned with the current principle
> to update the flags each time a scheduling event happens. The scheduling
> of the idle_task on a CPU is considered one of such meaningful events.
> That's why when the idle_task is selected for execution we poke the
> schedutil policy to reset the flags for that CPU.
>
> No frequency transitions are activated at that point, which is fair in
> case the RT workload should come back in the future. However, this still
> allows other CPUs in the same frequency domain to scale down the
> frequency in case that should be possible.
>
> Signed-off-by: Patrick Bellasi 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Rafael J. Wysocki 
> Cc: Viresh Kumar 
> Cc: linux-kernel@vger.kernel.org
> Cc: linux...@vger.kernel.org
>
> ---
> Changes from v1:
> - added "unlikely()" around the statement (SteveR)
> ---
>  include/linux/sched/cpufreq.h| 1 +
>  kernel/sched/cpufreq_schedutil.c | 7 +++
>  kernel/sched/idle_task.c | 4 
>  3 files changed, 12 insertions(+)
>
> diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h
> index d2be2cc..36ac8d2 100644
> --- a/include/linux/sched/cpufreq.h
> +++ b/include/linux/sched/cpufreq.h
> @@ -10,6 +10,7 @@
>  #define SCHED_CPUFREQ_RT   (1U << 0)
>  #define SCHED_CPUFREQ_DL   (1U << 1)
>  #define SCHED_CPUFREQ_IOWAIT   (1U << 2)
> +#define SCHED_CPUFREQ_IDLE (1U << 3)
>
>  #define SCHED_CPUFREQ_RT_DL(SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c 
> b/kernel/sched/cpufreq_schedutil.c
> index eaba6d6..004ae18 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -304,6 +304,12 @@ static void sugov_update_shared(struct update_util_data 
> *hook, u64 time,
>
> sg_cpu->util = util;
> sg_cpu->max = max;
> +
> +   /* CPU is entering IDLE, reset flags without triggering an update */
> +   if (unlikely(flags & SCHED_CPUFREQ_IDLE)) {
> +   sg_cpu->flags = 0;
> +   goto done;
> +   }

Instead of defining a new flag for idle, wouldn't another way be to
just clear the flag from the RT scheduling class with an extra call to
cpufreq_update_util with flags = 0 during dequeue_rt_entity? That
seems to me to be also the right place to clear the flag since the
flag is set in the corresponding class to begin with.

thanks,

-Joel

Re: [PATCH v2] arm: eBPF JIT compiler

2017-07-06 Thread Kees Cook

On Wed, Jul 5, 2017 at 8:49 PM, Shubham Bansal
 wrote:
> Hi Kees,
>
> Problem is my ARM machine don't have clang and iproute2 which is
> keeping me from testing the bpf tail calls.
>
> You should do the following to test it,.
>
> 1. tools/testing/selftests/bpf/
> 2. make
> 3. sudo ./test_progs
>
> And, before testing, you have to do "make headers_install".
> These tests are for tail calls with the attached patch. If its too
> much work, Can you please upload your arm image so that I can test it?
> I just need a good machine.

I've got all this set up now, and it faults during the test:

Unable to handle kernel NULL pointer dereference at virtual address 0008
...
CPU: 0 PID: 1922 Comm: test_progs Not tainted 4.12.0+ #60
...
PC is at __htab_map_lookup_elem+0x54/0x1f4

I'll see if I can send you this disk image...

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH] iio: multiplexer: add NULL check on devm_kzalloc() return value

2017-07-06 Thread Gustavo A. R. Silva


Hi Peter,

Quoting Peter Rosin :


On 2017-07-07 00:08, Gustavo A. R. Silva wrote:

Check return value from call to devm_kzalloc()
in order to prevent a NULL pointer dereference.


Right, thanks for finding that one! There's another one inside the
for loop that is just starting in the context of this patch. Care
to fix checking the return value of that devm_kmemdup as well?



Sure, I'll send a new patch shortly.


And someone should perhaps teach Coccinelle about devm_kmemdup...



Good catch, I just implemented that script.


This issue was detected using Coccinelle and the following semantic patch:

@@
expression x;
identifier fld;
@@

* x = devm_kzalloc(...);
  ... when != x == NULL
  x->fld




One of these blank lines should perhaps be a "Fixes:" tag?



mmm, I don't get this...


Cheers,
peda


Signed-off-by: Gustavo A. R. Silva 
---
 drivers/iio/multiplexer/iio-mux.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iio/multiplexer/iio-mux.c  
b/drivers/iio/multiplexer/iio-mux.c

index 37ba007..a8d672b 100644
--- a/drivers/iio/multiplexer/iio-mux.c
+++ b/drivers/iio/multiplexer/iio-mux.c
@@ -285,6 +285,9 @@ static int mux_configure_channel(struct device  
*dev, struct mux *mux,

child->ext_info_cache = devm_kzalloc(dev,
 sizeof(*child->ext_info_cache) *
 num_ext_info, GFP_KERNEL);
+   if (!child->ext_info_cache)
+   return -ENOMEM;
+
for (i = 0; i < num_ext_info; ++i) {
child->ext_info_cache[i].size = -1;




Thanks!
--
Gustavo A. R. Silva

Re: [PATCH] target: make device_mutex and device_list static

2017-07-06 Thread Nicholas A. Bellinger

On Wed, 2017-07-05 at 13:15 -0500, Mike Christie wrote:
> On 07/04/2017 03:44 AM, Colin King wrote:
> > From: Colin Ian King 
> > 
> > Variables device_mutex and device_list static are local to the source,
> > so make them static.
> > 
> > Cleans up sparse warnings:
> > "symbol 'device_list' was not declared. Should it be static?"
> > "symbol 'device_mutex' was not declared. Should it be static?"
> > 
> > Signed-off-by: Colin Ian King 
> > ---
> >  drivers/target/target_core_device.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/target/target_core_device.c 
> > b/drivers/target/target_core_device.c
> > index 3ae8fbf01bdf..bbcef3bc66c8 100644
> > --- a/drivers/target/target_core_device.c
> > +++ b/drivers/target/target_core_device.c
> > @@ -49,8 +49,8 @@
> >  #include "target_core_pr.h"
> >  #include "target_core_ua.h"
> >  
> > -DEFINE_MUTEX(device_mutex);
> > -LIST_HEAD(device_list);
> > +static DEFINE_MUTEX(device_mutex);
> > +static LIST_HEAD(device_list);
> >  static DEFINE_IDR(devices_idr);
> >  
> >  static struct se_hba *lun0_hba;
> > 
> 
> My fault. Thanks.
> 
> Reviewed-by: Mike Christie 

Applied.

Thanks Colin + MNC.

Re: [PATCH] extcon: int3496: constify acpi_device_id.

2017-07-06 Thread Chanwoo Choi

On 2017년 07월 07일 01:55, Arvind Yadav wrote:
> acpi_device_id are not supposed to change at runtime. All functions
> working with acpi_device_id provided by  work with
> const acpi_device_id. So mark the non-const structs as const.
> 
> File size before:
>text  data bss dec hex filename
>1733   352   02085 825 
> drivers/extcon/extcon-intel-int3496.o
> 
> File size After adding 'const':
>text  data bss dec hex filename
>1797   272   02069 815 
> drivers/extcon/extcon-intel-int3496.o
> 
> Signed-off-by: Arvind Yadav 
> ---
>  drivers/extcon/extcon-intel-int3496.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/extcon/extcon-intel-int3496.c 
> b/drivers/extcon/extcon-intel-int3496.c
> index 9d17984..49bf9c6 100644
> --- a/drivers/extcon/extcon-intel-int3496.c
> +++ b/drivers/extcon/extcon-intel-int3496.c
> @@ -174,7 +174,7 @@ static int int3496_remove(struct platform_device *pdev)
>   return 0;
>  }
>  
> -static struct acpi_device_id int3496_acpi_match[] = {
> +static const struct acpi_device_id int3496_acpi_match[] = {
>   { "INT3496" },
>   { }
>  };
> 

Applied it.

-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

Re: [PATCH] iio: multiplexer: add NULL check on devm_kzalloc() return value

2017-07-06 Thread Peter Rosin

On 2017-07-07 00:08, Gustavo A. R. Silva wrote:
> Check return value from call to devm_kzalloc()
> in order to prevent a NULL pointer dereference.

Right, thanks for finding that one! There's another one inside the
for loop that is just starting in the context of this patch. Care
to fix checking the return value of that devm_kmemdup as well?

And someone should perhaps teach Coccinelle about devm_kmemdup...

> This issue was detected using Coccinelle and the following semantic patch:
> 
> @@
> expression x;
> identifier fld;
> @@
> 
> * x = devm_kzalloc(...);
>   ... when != x == NULL
>   x->fld
> 
> 

One of these blank lines should perhaps be a "Fixes:" tag?

Cheers,
peda

> Signed-off-by: Gustavo A. R. Silva 
> ---
>  drivers/iio/multiplexer/iio-mux.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/iio/multiplexer/iio-mux.c 
> b/drivers/iio/multiplexer/iio-mux.c
> index 37ba007..a8d672b 100644
> --- a/drivers/iio/multiplexer/iio-mux.c
> +++ b/drivers/iio/multiplexer/iio-mux.c
> @@ -285,6 +285,9 @@ static int mux_configure_channel(struct device *dev, 
> struct mux *mux,
>   child->ext_info_cache = devm_kzalloc(dev,
>sizeof(*child->ext_info_cache) *
>num_ext_info, GFP_KERNEL);
> + if (!child->ext_info_cache)
> + return -ENOMEM;
> +
>   for (i = 0; i < num_ext_info; ++i) {
>   child->ext_info_cache[i].size = -1;
>  
>

[PATCH v3 2/3] dt-bindings: cpufreq: move MediaTek cpufreq dt-bindings document to proper place

2017-07-06 Thread sean.wang

From: Sean Wang 

The old place is Documentation/devicetree/bindings/clock/ that would
let people hard to find how to use MediaTek cpufreq driver, so moving
it to the appropriate place as other cpufreq drivers done would be
better.

Signed-off-by: Sean Wang 
Acked-by: Viresh Kumar 
---
 .../bindings/{clock/mt8173-cpu-dvfs.txt => cpufreq/cpufreq-mediatek.txt}  | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename Documentation/devicetree/bindings/{clock/mt8173-cpu-dvfs.txt => 
cpufreq/cpufreq-mediatek.txt} (100%)

diff --git a/Documentation/devicetree/bindings/clock/mt8173-cpu-dvfs.txt 
b/Documentation/devicetree/bindings/cpufreq/cpufreq-mediatek.txt
similarity index 100%
rename from Documentation/devicetree/bindings/clock/mt8173-cpu-dvfs.txt
rename to Documentation/devicetree/bindings/cpufreq/cpufreq-mediatek.txt
-- 
2.7.4

[PATCH v3 0/3] some fixups for MediaTek cpufreq driver

2017-07-06 Thread sean.wang

From: Sean Wang 

Changes since v2:
- correct the typo in the binding document

Changes since v1:
- drop those patches already accepted
- refine the commit messages and Kconfig dependency
- Kconfig menu entry and file name itself are updated with more
generic name to drop "MT8173" since this driver actually supports
all MediaTek SoCs.
- generate patchset again with git format-patch -C -M --thread=shallow
- fix binding examples by replacing '@' with '-' as the OPP nodes will
never have a "reg" property.

Hi,

The purpose of the series is
- (patch 1 to 3) to fix up current Mediatek cpufreq driver
can't work with the latest tree since one required CPU clock muxer missing
would cause the driver getting the resource fails when driver probe gets
called.
- (patch 4) to enable cpufreq feature on MT2701/MT7623 platform.
- (patch 5 to 6) to update the binding document to reflect
latest driver logic and add more examples guiding people how to apply for
Mediatek cpufreq driver.

Sean Wang (3):
  cpufreq: mediatek: Add support of cpufreq to MT2701/MT7623 SoC
  dt-bindings: cpufreq: move MediaTek cpufreq dt-bindings document to
proper place
  dt-bindings: cpufreq: enhance MediaTek cpufreq dt-binding document

 .../devicetree/bindings/clock/mt8173-cpu-dvfs.txt  |  83 ---
 .../bindings/cpufreq/cpufreq-mediatek.txt  | 247 +
 drivers/cpufreq/Kconfig.arm|   8 +-
 drivers/cpufreq/Makefile   |   2 +-
 .../cpufreq/{mt8173-cpufreq.c => mtk-cpufreq.c}|   2 +
 5 files changed, 254 insertions(+), 88 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/clock/mt8173-cpu-dvfs.txt
 create mode 100644 
Documentation/devicetree/bindings/cpufreq/cpufreq-mediatek.txt
 rename drivers/cpufreq/{mt8173-cpufreq.c => mtk-cpufreq.c} (99%)

-- 
2.7.4

[PATCH v3 3/3] dt-bindings: cpufreq: enhance MediaTek cpufreq dt-binding document

2017-07-06 Thread sean.wang

From: Sean Wang 

Update binding document with adding operating-points-v2 as the required
property and the cooling level as the optional properties and adding more
examples guiding people how to use MediaTek cpufreq driver for MediaTek
SoCs.

Signed-off-by: Sean Wang 
Acked-by: Viresh Kumar 
---
 .../bindings/cpufreq/cpufreq-mediatek.txt  | 170 -
 1 file changed, 167 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-mediatek.txt 
b/Documentation/devicetree/bindings/cpufreq/cpufreq-mediatek.txt
index 52b457c..f640308 100644
--- a/Documentation/devicetree/bindings/cpufreq/cpufreq-mediatek.txt
+++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-mediatek.txt
@@ -1,4 +1,5 @@
-Device Tree Clock bindins for CPU DVFS of Mediatek MT8173 SoC
+Binding for MediaTek's CPUFreq driver
+=
 
 Required properties:
 - clocks: A list of phandle + clock-specifier pairs for the clocks listed in 
clock names.
@@ -9,6 +10,8 @@ Required properties:
  transition and not stable yet.
Please refer to 
Documentation/devicetree/bindings/clk/clock-bindings.txt for
generic clock consumer properties.
+- operating-points-v2: Please refer to 
Documentation/devicetree/bindings/opp/opp.txt
+   for detail.
 - proc-supply: Regulator for Vproc of CPU cluster.
 
 Optional properties:
@@ -17,9 +20,166 @@ Optional properties:
   Vsram to fit SoC specific needs. When absent, the voltage scaling
   flow is handled by hardware, hence no software "voltage 
tracking" is
   needed.
+- #cooling-cells:
+- cooling-min-level:
+- cooling-max-level:
+   Please refer to Documentation/devicetree/bindings/thermal/thermal.txt
+   for detail.
+
+Example 1 (MT7623 SoC):
+
+   cpu_opp_table: opp_table {
+   compatible = "operating-points-v2";
+   opp-shared;
+
+   opp-59800 {
+   opp-hz = /bits/ 64 <59800>;
+   opp-microvolt = <105>;
+   };
+
+   opp-74750 {
+   opp-hz = /bits/ 64 <74750>;
+   opp-microvolt = <105>;
+   };
+
+   opp-104000 {
+   opp-hz = /bits/ 64 <104000>;
+   opp-microvolt = <115>;
+   };
+
+   opp-119600 {
+   opp-hz = /bits/ 64 <119600>;
+   opp-microvolt = <120>;
+   };
+
+   opp-13 {
+   opp-hz = /bits/ 64 <13>;
+   opp-microvolt = <130>;
+   };
+   };
+
+   cpu0: cpu@0 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a7";
+   reg = <0x0>;
+   clocks = <&infracfg CLK_INFRA_CPUSEL>,
+<&apmixedsys CLK_APMIXED_MAINPLL>;
+   clock-names = "cpu", "intermediate";
+   operating-points-v2 = <&cpu_opp_table>;
+   #cooling-cells = <2>;
+   cooling-min-level = <0>;
+   cooling-max-level = <7>;
+   };
+   cpu@1 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a7";
+   reg = <0x1>;
+   operating-points-v2 = <&cpu_opp_table>;
+   };
+   cpu@2 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a7";
+   reg = <0x2>;
+   operating-points-v2 = <&cpu_opp_table>;
+   };
+   cpu@3 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a7";
+   reg = <0x3>;
+   operating-points-v2 = <&cpu_opp_table>;
+   };
+
+Example 2 (MT8173 SoC):
+   cpu_opp_table_a: opp_table_a {
+   compatible = "operating-points-v2";
+   opp-shared;
+
+   opp-50700 {
+   opp-hz = /bits/ 64 <50700>;
+   opp-microvolt = <859000>;
+   };
+
+   opp-70200 {
+   opp-hz = /bits/ 64 <70200>;
+   opp-microvolt = <908000>;
+   };
+
+   opp-100100 {
+   opp-hz = /bits/ 64 <100100>;
+   opp-microvolt = <983000>;
+   };
+
+   opp-110500 {
+   opp-hz = /bits/ 64 <110500>;
+   opp-microvolt = <1009000>;
+   };
+
+   opp-118300 {
+   opp-hz = /bits/ 64 <118300>;
+   opp-microvolt = <1028000>;
+   };
+
+   opp-140400 {
+   opp-hz = /bits/ 64 <140400>;
+   opp-microvolt = <1083000>;
+   };
+
+   opp-150800 {
+

[PATCH v3 1/3] cpufreq: mediatek: Add support of cpufreq to MT2701/MT7623 SoC

2017-07-06 Thread sean.wang

From: Sean Wang 

MT2701/MT7623 is a 32-bit ARMv7 based quad-core (4 * Cortex-A7) with
single cluster and this hardware is also compatible with the existing
driver through enabling CPU frequency feature with operating-points-v2
bindings. Also, this driver actually supports all MediaTek SoCs, the
Kconfig menu entry and file name itself should be updated with more
generic name to drop "MT8173"

Signed-off-by: Sean Wang 
Acked-by: Viresh Kumar 
---
 drivers/cpufreq/Kconfig.arm | 8 
 drivers/cpufreq/Makefile| 2 +-
 drivers/cpufreq/{mt8173-cpufreq.c => mtk-cpufreq.c} | 2 ++
 3 files changed, 7 insertions(+), 5 deletions(-)
 rename drivers/cpufreq/{mt8173-cpufreq.c => mtk-cpufreq.c} (99%)

diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
index 74ed7e9..79aece7 100644
--- a/drivers/cpufreq/Kconfig.arm
+++ b/drivers/cpufreq/Kconfig.arm
@@ -87,14 +87,14 @@ config ARM_KIRKWOOD_CPUFREQ
  This adds the CPUFreq driver for Marvell Kirkwood
  SoCs.
 
-config ARM_MT8173_CPUFREQ
-   tristate "Mediatek MT8173 CPUFreq support"
+config ARM_MEDIATEK_CPUFREQ
+   tristate "CPU Frequency scaling support for MediaTek SoCs"
depends on ARCH_MEDIATEK && REGULATOR
-   depends on ARM64 || (ARM_CPU_TOPOLOGY && COMPILE_TEST)
+   depends on ARM || ARM64 || COMPILE_TEST
depends on !CPU_THERMAL || THERMAL
select PM_OPP
help
- This adds the CPUFreq driver support for Mediatek MT8173 SoC.
+ This adds the CPUFreq driver support for MediaTek SoCs.
 
 config ARM_OMAP2PLUS_CPUFREQ
bool "TI OMAP2+"
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index b7e78f0..4956f5d 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -58,7 +58,7 @@ obj-$(CONFIG_ARM_EXYNOS5440_CPUFREQ)  += exynos5440-cpufreq.o
 obj-$(CONFIG_ARM_HIGHBANK_CPUFREQ) += highbank-cpufreq.o
 obj-$(CONFIG_ARM_IMX6Q_CPUFREQ)+= imx6q-cpufreq.o
 obj-$(CONFIG_ARM_KIRKWOOD_CPUFREQ) += kirkwood-cpufreq.o
-obj-$(CONFIG_ARM_MT8173_CPUFREQ)   += mt8173-cpufreq.o
+obj-$(CONFIG_ARM_MEDIATEK_CPUFREQ) += mtk-cpufreq.o
 obj-$(CONFIG_ARM_OMAP2PLUS_CPUFREQ)+= omap-cpufreq.o
 obj-$(CONFIG_ARM_PXA2xx_CPUFREQ)   += pxa2xx-cpufreq.o
 obj-$(CONFIG_PXA3xx)   += pxa3xx-cpufreq.o
diff --git a/drivers/cpufreq/mt8173-cpufreq.c b/drivers/cpufreq/mtk-cpufreq.c
similarity index 99%
rename from drivers/cpufreq/mt8173-cpufreq.c
rename to drivers/cpufreq/mtk-cpufreq.c
index fd1886f..481ec77 100644
--- a/drivers/cpufreq/mt8173-cpufreq.c
+++ b/drivers/cpufreq/mtk-cpufreq.c
@@ -575,6 +575,8 @@ static struct platform_driver mt8173_cpufreq_platdrv = {
 
 /* List of machines supported by this driver */
 static const struct of_device_id mt8173_cpufreq_machines[] __initconst = {
+   { .compatible = "mediatek,mt2701", },
+   { .compatible = "mediatek,mt7623", },
{ .compatible = "mediatek,mt817x", },
{ .compatible = "mediatek,mt8173", },
{ .compatible = "mediatek,mt8176", },
-- 
2.7.4

Re: [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE

2017-07-06 Thread Joel Fernandes

Hi Juri,

On Wed, Jul 5, 2017 at 1:59 AM, Juri Lelli  wrote:
> Worker kthread needs to be able to change frequency for all other
> threads.
>
> Make it special, just under STOP class.
>
> Signed-off-by: Juri Lelli 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Rafael J. Wysocki 
> Cc: Viresh Kumar 
> Cc: Luca Abeni 
> Cc: Claudio Scordino 
> ---
> Changes from RFCv0:
>
>  - use a high bit of sched_flags and don't expose the new flag to
>userspace (also add derisory comments) (Peter)
> ---
>  include/linux/sched.h|  1 +
>  kernel/sched/core.c  | 15 +--
>  kernel/sched/cpufreq_schedutil.c | 15 ---
>  kernel/sched/deadline.c  | 13 +
>  kernel/sched/sched.h | 20 +++-
>  5 files changed, 58 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 1f0f427e0292..0ba767fdedae 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1359,6 +1359,7 @@ extern int idle_cpu(int cpu);
>  extern int sched_setscheduler(struct task_struct *, int, const struct 
> sched_param *);
>  extern int sched_setscheduler_nocheck(struct task_struct *, int, const 
> struct sched_param *);
>  extern int sched_setattr(struct task_struct *, const struct sched_attr *);
> +extern int sched_setattr_nocheck(struct task_struct *, const struct 
> sched_attr *);
>  extern struct task_struct *idle_task(int cpu);
>
>  /**
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 5186797908dc..0e40c7ff2bf4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3998,7 +3998,9 @@ static int __sched_setscheduler(struct task_struct *p,
> }
>
> if (attr->sched_flags &
> -   ~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM))
> +   ~(SCHED_FLAG_RESET_ON_FORK |
> + SCHED_FLAG_RECLAIM |
> + SCHED_FLAG_SPECIAL))
> return -EINVAL;

I was thinking if its better to name SCHED_FLAG_SPECIAL something more
descriptive than "special", since it will not be clear looking at
other parts of the code that this is used for the frequency scaling
thread or that its a DL task. I was thinking since this flag is
specifically setup for the sugov thread frequency scaling purpose, a
better flag name than SPECIAL might be SCHED_FLAG_DL_FREQ_SCALE or
SCHED_FLAG_FREQ_SCALE. But I am Ok with whatever makes sense to do
here. Thanks,

Regards,

-Joel

Re: [PATCH 1/2] genirq: Get the fwnode back for irqchips being probed via ACPI namespace

2017-07-06 Thread Hanjun Guo

On 2017/7/6 21:05, Marc Zyngier wrote:
> On 06/07/17 10:01, Hanjun Guo wrote:
>> Hi Marc,
>>
>> On 2017/7/6 15:43, Marc Zyngier wrote:
>>> On 06/07/17 05:35, Hanjun Guo wrote:
 From: Hanjun Guo 

 commit d59f6617eef0 (genirq: Allow fwnode to carry name information only)
 forgot to do "domain->fwnode = fwnode;" for irqchips being probed via
 ACPI namesapce (DSDT/SSDT), that will break platforms with irqchip such
 as mbigen or qcom irq combiner, set the fwnode back to fix the issue.

 Reported-by: John Garry 
 Signed-off-by: Hanjun Guo 
 ---
  kernel/irq/irqdomain.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

 diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
 index 14fe862..1bc38fa 100644
 --- a/kernel/irq/irqdomain.c
 +++ b/kernel/irq/irqdomain.c
 @@ -151,7 +151,6 @@ struct irq_domain *__irq_domain_add(struct 
 fwnode_handle *fwnode, int size,
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
break;
default:
 -  domain->fwnode = fwnode;
domain->name = fwid->name;
break;
}
 @@ -172,7 +171,6 @@ struct irq_domain *__irq_domain_add(struct 
 fwnode_handle *fwnode, int size,
strreplace(name, '/', ':');
  
domain->name = name;
 -  domain->fwnode = fwnode;
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
}
  
 @@ -196,6 +194,7 @@ struct irq_domain *__irq_domain_add(struct 
 fwnode_handle *fwnode, int size,
INIT_RADIX_TREE(&domain->revmap_tree, GFP_KERNEL);
domain->ops = ops;
domain->host_data = host_data;
 +  domain->fwnode = fwnode;
domain->hwirq_max = hwirq_max;
domain->revmap_size = size;
domain->revmap_direct_max_irq = direct_max;

>>> This doesn't seem right.
>>>
>>> Why is is_fwnode_irqchip() returning false when presented with an
>>> irqchip probed via the ACPI namespace? That's what you should consider
>>> fixing instead of moving that code around.
>> irqchips probed via the ACPI namespace will have a fwnode handler
>> with the fwnode type FWNODE_ACPI, similar to irqchips probed
>> via DT having a fwnode handler with type FWNODE_OF, in this
>> function with DT case, fwnode is set for irqdomain's fwnode,
>> ACPI just reuse that code because they are similar.
>>
>> is_fwnode_irqchip() just checks if it's FWNODE_IRQCHIP, which is only
>> available for irqchips probing via ACPI static tables such as ITS, GIC
>> on ARM, or via DMAR on x86, those cases don't have a fwnode handler then
>> need to create one via __irq_domain_alloc_fwnode(), which is different
>> from DT/ACPI namesapce scan mechanism. So the patch above it's the best
>> solution I got so far which just resume the code as before, correct me
>> if you have something new :)
> This violates the irqdomain API that takes two kind of fwnodes: 
> FWNODE_OF or FWNODE_IRQCHIP, and no others. You got away with it so 
> far, but it is over.
>
> You have two choices here: either you allocate a FWNODE_IRQCHIP in your
> irqchip driver, and use this as a handle for your IRQ domain, or you 
> make the irqdomain code able to deal with any FWNODE_ACPI fwnode.

Later one is better to me as allocate a FWNODE_IRQCHIP in the irqchip
driver will override the FWNODE_ACPI handler.

>
> Does the following hack work for you?

Yes, it works. shall we go this way with a proper fix?

>
> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
> index 14fe862aa2e3..37f4aa3985b3 100644
> --- a/kernel/irq/irqdomain.c
> +++ b/kernel/irq/irqdomain.c
> @@ -155,6 +155,8 @@ struct irq_domain *__irq_domain_add(struct fwnode_handle 
> *fwnode, int size,
>   domain->name = fwid->name;
>   break;
>   }
> + } else if (is_acpi_device_node(fwnode)) {
> + domain->fwnode = fwnode;
>   } else if (of_node) {
>   char *name;
>  
> If it does, we can then find weird and wonderful ways to give the
> domain a shiny name in debugfs...

How about the weird way below (using dev_name() which shows ACPI HID + number) ?

+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -172,6 +174,15 @@ struct irq_domain *__irq_domain_add(struct fwnode_handle 
*fwnode, int size,
 
domain->name = name;
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
+ } else if (is_acpi_device_node(fwnode)) {
+ struct acpi_device *adev = to_acpi_device_node(fwnode);
+
+ domain->name = kstrdup(dev_name(&adev->dev), GFP_KERNEL);
+ if (!domain->name)
+ return NULL;
+
+ domain->fwnode = fwnode;
+ domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
}

But my hack code retrieving the ACPI data in irq domain core is really
weird :)

Thanks
Hanjun

Re: Make HWSPINLOCK a menuconfig to ease disabling

2017-07-06 Thread Dave Jones

On Thu, Jul 06, 2017 at 11:23:50PM +, Linux Kernel wrote:

 > Make HWSPINLOCK a menuconfig to ease disabling
 > 
 > So that there's no need to get into the submenu to disable all related 
 > config
 > entries.

Here's how that looks on x86...

*
* Hardware Spinlock drivers
*
Hardware Spinlock drivers (HWSPINLOCK) [N/m/y] (NEW) ?

There is no help available for this option.
Symbol: HWSPINLOCK [=n]
Type  : tristate
Prompt: Hardware Spinlock drivers
  Location:
-> Device Drivers
  Defined at drivers/hwspinlock/Kconfig:5



No help ? Ok whatever, let's see what happens..

Hardware Spinlock drivers (HWSPINLOCK) [N/m/y] (NEW) m

And then nothing, because none of the drivers that depend on this are available 
for x86
because of stuff like this..


 >  config HWSPINLOCK_OMAP
 >  tristate "OMAP Hardware Spinlock device"
 >  depends on ARCH_OMAP4 || SOC_OMAP5 || SOC_DRA7XX || SOC_AM33XX || 
 > SOC_AM43XX

 >  config HWSPINLOCK_QCOM
 >  tristate "Qualcomm Hardware Spinlock device"
 >  depends on ARCH_QCOM

 >  config HWSPINLOCK_SIRF
 >  tristate "SIRF Hardware Spinlock device"
 >  depends on ARCH_SIRF

 >  config HSEM_U8500
 >  tristate "STE Hardware Semaphore functionality"
 >  depends on ARCH_U8500

HWSPINLOCK needs all those depends too.

Dave

Re: [RFC PATCH v1 4/8] sched/cpufreq_schedutil: split utilization signals

2017-07-06 Thread Joel Fernandes

Hi Juri,

On Wed, Jul 5, 2017 at 1:59 AM, Juri Lelli  wrote:
> To be able to treat utilization signals of different scheduling classes
> in different ways (e.g., CFS signal might be stale while DEADLINE signal
> is never stale by design) we need to split sugov_cpu::util signal in two:
> util_cfs and util_dl.
>
> This patch does that by also changing sugov_get_util() parameter list.
> After this change, aggregation of the different signals has to be performed
> by sugov_get_util() users (so that they can decide what to do with the
> different signals).
>
> Suggested-by: Rafael J. Wysocki 
> Signed-off-by: Juri Lelli 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Rafael J. Wysocki 
> Cc: Viresh Kumar 
> Cc: Luca Abeni 
> Cc: Claudio Scordino 
> ---
> Changes from RFCv0:
>
>  - refactor aggregation of utilization in sugov_aggregate_util()
> ---
>  kernel/sched/cpufreq_schedutil.c | 28 
>  1 file changed, 16 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c 
> b/kernel/sched/cpufreq_schedutil.c
> index ba6227625f24..e835fa886225 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -58,7 +58,8 @@ struct sugov_cpu {
> u64 last_update;
>
> /* The fields below are only needed when sharing a policy. */
> -   unsigned long util;
> +   unsigned long util_cfs;
> +   unsigned long util_dl;
> unsigned long max;
> unsigned int flags;
>
> @@ -154,20 +155,24 @@ static unsigned int get_next_freq(struct sugov_policy 
> *sg_policy,
> return cpufreq_driver_resolve_freq(policy, freq);
>  }
>
> -static void sugov_get_util(unsigned long *util, unsigned long *max)
> +static void sugov_get_util(struct sugov_cpu *sg_cpu)
>  {
> struct rq *rq = this_rq();
> -   unsigned long dl_util = (rq->dl.running_bw * SCHED_CAPACITY_SCALE)
> -   >> BW_SHIFT;
>
> -   *max = arch_scale_cpu_capacity(NULL, smp_processor_id());
> +   sg_cpu->max = arch_scale_cpu_capacity(NULL, smp_processor_id());
> +   sg_cpu->util_cfs = rq->cfs.avg.util_avg;
> +   sg_cpu->util_dl = (rq->dl.running_bw * SCHED_CAPACITY_SCALE)
> + >> BW_SHIFT;
> +}
>
> +static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> +{
> /*
>  * Ideally we would like to set util_dl as min/guaranteed freq and
>  * util_cfs + util_dl as requested freq. However, cpufreq is not yet
>  * ready for such an interface. So, we only do the latter for now.
>  */
> -   *util = min(rq->cfs.avg.util_avg + dl_util, *max);
> +   return min(sg_cpu->util_cfs + sg_cpu->util_dl, sg_cpu->max);
>  }

I am wondering why the need for a separate aggregation API. To me, it
looks like using sugov_get_util to set the sg_cpu util elements and
then do the aggregation at the same time would have the same effect
(without changing the existing parameter list). Is this to handle a
future usecase where aggregation may need to be done differently? For
all the user's of sugov_get_util, aggregation is done in the same way.
Anyway if I missed something, sorry for the noise.

thanks,

-Joel

[PATCH] sched/cputime: Fix using smp_processor_id() in preemptible

2017-07-06 Thread Wanpeng Li

From: Wanpeng Li 

 BUG: using smp_processor_id() in preemptible [] code: 99-trinity/181
 caller is debug_smp_processor_id+0x17/0x19
 CPU: 0 PID: 181 Comm: 99-trinity Not tainted 4.12.0-01059-g2a42eb9 #1
 Call Trace:
  dump_stack+0x82/0xb8
  check_preemption_disabled+0xd1/0xe3
  debug_smp_processor_id+0x17/0x19
  vtime_delta+0xd/0x2c
  task_cputime+0x89/0xdb
  thread_group_cputime+0x11b/0x1ed
  thread_group_cputime_adjusted+0x1f/0x47
  wait_consider_task+0x2a9/0xaf9
  ? lock_acquire+0x97/0xa4
  do_wait+0xdf/0x1f4
  SYSC_wait4+0x8e/0xb5
  ? list_add+0x34/0x34
  SyS_wait4+0x9/0xb
  do_syscall_64+0x70/0x82
  entry_SYSCALL64_slow_path+0x25/0x25

As Frederic pointed out:

| Although those sched_clock_cpu() things seem to only matter when the
| sched_clock() is unstable. And that stability is a condition for nohz_full
| to work anyway. So probably sched_clock() alone would be enough.

This patch fixes it by replacing sched_clock_cpu() in vtime_delta() by 
sched_clock() to avoid to call smp_processor_id() in preemptible context.

Reported-by: Xiaolong Ye 
Cc: Thomas Gleixner 
Cc: Luiz Capitulino 
Cc: Frederic Weisbecker 
Cc: Rik van Riel 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Signed-off-by: Wanpeng Li 
---
 kernel/sched/cputime.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 6e3ea4a..d86b94e 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -683,7 +683,7 @@ static u64 vtime_delta(struct vtime *vtime)
 {
unsigned long long clock;
 
-   clock = sched_clock_cpu(smp_processor_id());
+   clock = sched_clock();
if (clock < vtime->starttime)
return 0;
 
-- 
2.7.4

[PATCH] module: fix ddebug_remove_module()

2017-07-06 Thread Zhou Chengming

ddebug_remove_module() use mod->name to find the ddebug_table of the
module and remove it. But dynamic_debug_setup() use the first
_ddebug->modname to create ddebug_table for the module. It's ok when
the _ddebug->modname is the same with the mod->name.

But livepatch module is special, it may contain _ddebugs of other
modules, the modname of which is different from the name of livepatch
module. So ddebug_remove_module() can't use mod->name to find the
right ddebug_table and remove it. It can cause kernel crash when we cat
the file /dynamic_debug/control.

Signed-off-by: Zhou Chengming 
---
 kernel/module.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 4a3665f..dac9805 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2703,21 +2703,21 @@ static void add_kallsyms(struct module *mod, const 
struct load_info *info)
 }
 #endif /* CONFIG_KALLSYMS */
 
-static void dynamic_debug_setup(struct _ddebug *debug, unsigned int num)
+static void dynamic_debug_setup(struct module *mod, struct _ddebug *debug, 
unsigned int num)
 {
if (!debug)
return;
 #ifdef CONFIG_DYNAMIC_DEBUG
-   if (ddebug_add_module(debug, num, debug->modname))
+   if (ddebug_add_module(debug, num, mod->name))
pr_err("dynamic debug error adding module: %s\n",
debug->modname);
 #endif
 }
 
-static void dynamic_debug_remove(struct _ddebug *debug)
+static void dynamic_debug_remove(struct module *mod, struct _ddebug *debug)
 {
if (debug)
-   ddebug_remove_module(debug->modname);
+   ddebug_remove_module(mod->name);
 }
 
 void * __weak module_alloc(unsigned long size)
@@ -3697,7 +3697,7 @@ static int load_module(struct load_info *info, const char 
__user *uargs,
goto free_arch_cleanup;
}
 
-   dynamic_debug_setup(info->debug, info->num_debug);
+   dynamic_debug_setup(mod, info->debug, info->num_debug);
 
/* Ftrace init must be called in the MODULE_STATE_UNFORMED state */
ftrace_module_init(mod);
@@ -3761,7 +3761,7 @@ static int load_module(struct load_info *info, const char 
__user *uargs,
module_disable_nx(mod);
 
  ddebug_cleanup:
-   dynamic_debug_remove(info->debug);
+   dynamic_debug_remove(mod, info->debug);
synchronize_sched();
kfree(mod->args);
  free_arch_cleanup:
-- 
1.8.3.1

Re: [PATCH] x86/boot/KASLR: exclude EFI_BOOT_SERVICES_{CODE|DATA} from KASLR's choice

2017-07-06 Thread Baoquan He

On 07/06/17 at 03:57pm, Matt Fleming wrote:
> On Thu, 06 Jul, at 08:31:07AM, Naoya Horiguchi wrote:
> > +   for (i = 0; i < nr_desc; i++) {
> > +   md = (efi_memory_desc_t *)(pmap + (i * e->efi_memdesc_size));
> > +
> > +   /*
> > +* EFI_BOOT_SERVICES_{CODE|DATA} are avoided because boot
> > +* services regions could be accessed after ExitBootServices()
> > +* due to the workaround for buggy firmware.
> > +*/
> > +   if (!(md->type == EFI_LOADER_CODE ||
> > + md->type == EFI_LOADER_DATA ||
> > + md->type == EFI_CONVENTIONAL_MEMORY))
> > +   continue;
> 
> Wouldn't it make more sense to *only* use EFI_CONVENTIONAL_MEMORY?
> 
> You can't re-use EFI_LOADER_* regions because the kaslr code is run so
> early in boot that you've no idea if data the kernel will need is in
> those EFI_LOADER_* regions.
> 
> For example, we pass struct setup_data objects inside of
> EFI_LOADER_DATA regions.

It doesn't matter because we have tried to avoid those memory setup_data
resides in in mem_avoid_overlap(). Here discarding EFI_LOADER_* could
discard the whole regions while setup_data could occupy small part of
them.

Re: [x86/time] 03fa63cc96: ACPI_Error:Table[DMAR]is_not_invalidated_during_early_boot_stage(#/tbxface -#)

2017-07-06 Thread Ye Xiaolong

On 07/07, Dou Liyang wrote:
>Hi xiaolong,
>
>Really thanks for your testing.
>
>At 07/07/2017 09:54 AM, Ye Xiaolong wrote:
>>On 07/06, Thomas Gleixner wrote:
>>>On Thu, 6 Jul 2017, kernel test robot wrote:
>>>
commit: 03fa63cc96ab35592e0a7d522b8edbc1e6b02d22 ("x86/time: Initialize 
interrupt mode behind timer init")
>>>
++++
|| 43436935b7 | 03fa63cc96 |
++++
| boot_successes | 0  | 4  |
++++
>>>
>>>So 03fa63cc96 makes the box boot again. I'm confused as usual by the
>>>output of this tool.,
>>>
kern  :info  : [0.005000] tsc: Fast TSC calibration using PIT
kern  :info  : [0.006000] tsc: Detected 2195.020 MHz processor
kern  :info  : [0.007000] Calibrating delay loop (skipped), value 
calculated using timer frequency.. 4390.04 BogoMIPS (lpj=2195020)
kern  :info  : [0.008001] pid_max: default: 90112 minimum: 704
kern  :info  : [0.009037] ACPI: Core revision 20170303
kern  :err   : [0.010002] ACPI Error: Table [DMAR] is not invalidated 
during early boot stage (20170303/tbxface-193)
>>>
>>>Sure we have a error message here, but compared to what? Compared to
>>>something which does not boot at all?
>>
>>Sorry for the confusion, here commit 43436935b7 boot failed due to OOM which
>>happened at the late stage of kernel boot while the ACPI error showed at the
>>early boot stage for commit 03fa63cc96 and it didn't appear in 43436935b7's
>>dmesg.
>>
>
>let's make the problem clearly firstly:
>
>1) Commit 43436935b7 ("x86/xen: Bypass intr mode setup in enlighten_pv
>system") made kernel boot failed, which caused by OOM.
>
>2) Commit 03fa63cc96 ("x86/time: Initialize interrupt mode behind
>timer init") can make the kernel boot success again, but with an ACPI
>error happened.
>
>And both *1* and *2* used the same configuration showed in the
>attachment.
>
>Does anything I missed?

Yes, this is exactly what I meant.

Thanks,
Xiaolong
>
>Thanks,
>
>   dou.
>
>>Thanks,
>>Xiaolong
>>
>>>
>>>Thanks,
>>>
>>> tglx
>>>
>>
>>
>>
>
>

Re: [lkp-robot] [sched/cputime] 2a42eb9594: BUG:using_smp_processor_id()in_preemptible

2017-07-06 Thread Wanpeng Li

2017-07-07 10:09 GMT+08:00 kernel test robot :
>
> FYI, we noticed the following commit:
>
> commit: 2a42eb9594a1480b4ead9e036e06ee1290e5fa6d ("sched/cputime: Accumulate 
> vtime on top of nsec clocksource")
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/urgent
>
> in testcase: boot
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu Haswell,+smep,+smap -smp 
> 2 -m 512M
>
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
>
>
> ++++
> || bac5b6b6b1 | 2a42eb9594 |
> ++++
> | boot_successes | 8  | 2  |
> | boot_failures  | 0  | 6  |
> | BUG:kernel_hang_in_test_stage  | 0  | 2  |
> | BUG:using_smp_processor_id()in_preemptible | 0  | 4  |
> ++++
>
>
>
> [   55.289944] BUG: using smp_processor_id() in preemptible [] code: 
> 99-trinity/181
> [   55.302094] caller is debug_smp_processor_id+0x17/0x19
> [   55.313831] CPU: 0 PID: 181 Comm: 99-trinity Not tainted 
> 4.12.0-01059-g2a42eb9 #1
> [   55.324812] Call Trace:
> [   55.328601]  dump_stack+0x82/0xb8
> [   55.333467]  check_preemption_disabled+0xd1/0xe3
> [   55.340370]  debug_smp_processor_id+0x17/0x19
> [   55.346815]  vtime_delta+0xd/0x2c
> [   55.352843]  task_cputime+0x89/0xdb
> [   55.358026]  thread_group_cputime+0x11b/0x1ed
> [   55.364517]  thread_group_cputime_adjusted+0x1f/0x47
> [   55.371895]  wait_consider_task+0x2a9/0xaf9
> [   55.378025]  ? lock_acquire+0x97/0xa4
> [   55.383463]  do_wait+0xdf/0x1f4
> [   55.388199]  SYSC_wait4+0x8e/0xb5
> [   55.393219]  ? list_add+0x34/0x34
> [   55.398317]  SyS_wait4+0x9/0xb
> [   55.403401]  do_syscall_64+0x70/0x82
> [   55.408977]  entry_SYSCALL64_slow_path+0x25/0x25
> [   55.409006] RIP: 0033:0x7ff2f15c3c3e
> [   55.409032] RSP: 002b:7ffd3e91eab0 EFLAGS: 0246 ORIG_RAX: 
> 003d
> [   55.409064] RAX: ffda RBX: 7ff2f1f0b6a0 RCX: 
> 7ff2f15c3c3e
> [   55.409070] RDX: 0001 RSI: 7ffd3e91eb18 RDI: 
> 
> [   55.409075] RBP: 0001 R08:  R09: 
> 
> [   55.409101] R10:  R11: 0246 R12: 
> 
> [   55.409106] R13:  R14: 0001 R15: 
> 
> [   55.550819] random: trinity: uninitialized urandom read (4 bytes read)
> mountall: Event failed
> [   55.947396] random: mountall: uninitialized urandom read (12 bytes read)
> [   56.044974] BUG: using smp_processor_id() in preemptible [] code: 
> mountall/194
> [   56.057287] caller is debug_smp_processor_id+0x17/0x19
> [   56.066939] CPU: 1 PID: 194 Comm: mountall Not tainted 
> 4.12.0-01059-g2a42eb9 #1
> [   56.078248] Call Trace:
> [   56.082297]  dump_stack+0x82/0xb8
> [   56.087256]  check_preemption_disabled+0xd1/0xe3
> [   56.093512]  debug_smp_processor_id+0x17/0x19
> [   56.099800]  vtime_delta+0xd/0x2c
> [   56.105862]  task_cputime+0x89/0xdb
> [   56.111237]  thread_group_cputime+0x11b/0x1ed
> [   56.117693]  thread_group_cputime_adjusted+0x1f/0x47
> [   56.125298]  wait_consider_task+0x2a9/0xaf9
> [   56.131983]  ? do_raw_read_lock+0x39/0x3c
> [   56.138293]  do_wait+0xdf/0x1f4
> [   56.143000]  SyS_waitid+0xac/0x1e3
> [   56.148566]  ? list_add+0x34/0x34
> [   56.153694]  do_syscall_64+0x70/0x82
> [   56.159267]  entry_SYSCALL64_slow_path+0x25/0x25
> [   56.166324] RIP: 0033:0x7fdc9c642d2f
> [   56.171770] RSP: 002b:7ffe6f4dda30 EFLAGS: 0246 ORIG_RAX: 
> 00f7
> [   56.181888] RAX: ffda RBX: 560f25433d70 RCX: 
> 7fdc9c642d2f
> [   56.192353] RDX: 7ffe6f4dda78 RSI: 00c7 RDI: 
> 0001
> [   56.203076] RBP: 560f2543bf20 R08:  R09: 
> 0020
> [   56.212903] R10: 0004 R11: 0246 R12: 
> 560f25439be0
> [   56.223108] R13: 00c7 R14: 00c7 R15: 
> 560f25222c1f
> [   56.522906] init: Failed to create pty - disabling logging for job
> [   56.535738] init: Temporary process spawn error: No such file or directory
> [   56.769705] init: Failed to create pty - disabling logging for job
> [   56.781761] init: Temporary process spawn error: No such file or directory
> [   57.258638] init: Failed to create pty - disabling logging for job
> [   57.271867] init: Temporary process spawn error: No such file or directory
> [   57.318928] init: Failed to create pty - disabling logging for job
> [   57.340639] init: Temporary process spawn error: No such file or directory
> [   57.499535] BUG: using smp_processor_id() in preemptible [] code: 
> sh/217
> [   57.529507] caller is debug_smp_

[PATCH v2] clk: fractional-divider: fix up the fractional clk's jitter

2017-07-06 Thread Elaine Zhang

add clk_fractional_divider_special_ops for rockchip specific requirements,
fractional divider must set that denominator is 20 times larger than
numerator to generate precise clock frequency.
Otherwise the CLK jitter is very big, poor quality of the clock signal.

RK document description:
3.1.9  Fractional divider usage
To get specific frequency, clocks of I2S, SPDIF, UARTcan be generated by
fractional divider. Generally you must set that denominator is 20 times
larger than numerator to generate precise clock frequency. So the
fractional divider applies only to generate low frequency clock like
I2S, UART.igned-off-by: Elaine Zhang 

Signed-off-by: Elaine Zhang 
---
 drivers/clk/clk-fractional-divider.c | 32 
 drivers/clk/rockchip/clk.c   |  2 +-
 include/linux/clk-provider.h |  1 +
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/clk-fractional-divider.c 
b/drivers/clk/clk-fractional-divider.c
index aab904618eb6..3107b33327f9 100644
--- a/drivers/clk/clk-fractional-divider.c
+++ b/drivers/clk/clk-fractional-divider.c
@@ -158,6 +158,38 @@ struct clk_hw *clk_hw_register_fractional_divider(struct 
device *dev,
 }
 EXPORT_SYMBOL_GPL(clk_hw_register_fractional_divider);
 
+static long clk_fd_round_rate_special(struct clk_hw *hw, unsigned long rate,
+ unsigned long *parent_rate)
+{
+   struct clk_hw *p_parent;
+   unsigned long p_rate, p_parent_rate;
+
+   if (!rate || rate >= *parent_rate)
+   return *parent_rate;
+
+   /*
+* Get rate closer to *parent_rate to guarantee there is no overflow
+* for m and n. In the result it will be the nearest rate left shifted
+* by (scale - fd->nwidth) bits.
+* fractional divider must set that denominator is 20 times larger than
+* numerator to generate precise clock frequency.
+*/
+   p_rate = clk_hw_get_rate(clk_hw_get_parent(hw));
+   if ((rate * 20 > p_rate) && (p_rate % rate != 0)) {
+   p_parent = clk_hw_get_parent(clk_hw_get_parent(hw));
+   p_parent_rate = clk_hw_get_rate(p_parent);
+   *parent_rate = p_parent_rate;
+   }
+   return clk_fd_round_rate(hw, rate, parent_rate);
+}
+
+const struct clk_ops clk_fractional_divider_special_ops = {
+   .recalc_rate = clk_fd_recalc_rate,
+   .round_rate = clk_fd_round_rate_special,
+   .set_rate = clk_fd_set_rate,
+};
+EXPORT_SYMBOL_GPL(clk_fractional_divider_special_ops);
+
 struct clk *clk_register_fractional_divider(struct device *dev,
const char *name, const char *parent_name, unsigned long flags,
void __iomem *reg, u8 mshift, u8 mwidth, u8 nshift, u8 nwidth,
diff --git a/drivers/clk/rockchip/clk.c b/drivers/clk/rockchip/clk.c
index fe1d393cf678..9eac9a579d18 100644
--- a/drivers/clk/rockchip/clk.c
+++ b/drivers/clk/rockchip/clk.c
@@ -210,7 +210,7 @@ static struct clk *rockchip_clk_register_frac_branch(
div->nwidth = 16;
div->nmask = GENMASK(div->nwidth - 1, 0) << div->nshift;
div->lock = lock;
-   div_ops = &clk_fractional_divider_ops;
+   div_ops = &clk_fractional_divider_special_ops;
 
clk = clk_register_composite(NULL, name, parent_names, num_parents,
 NULL, NULL,
diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h
index a428aec36ace..3f5f36973df6 100644
--- a/include/linux/clk-provider.h
+++ b/include/linux/clk-provider.h
@@ -570,6 +570,7 @@ struct clk_fractional_divider {
 #define to_clk_fd(_hw) container_of(_hw, struct clk_fractional_divider, hw)
 
 extern const struct clk_ops clk_fractional_divider_ops;
+extern const struct clk_ops clk_fractional_divider_special_ops;
 struct clk *clk_register_fractional_divider(struct device *dev,
const char *name, const char *parent_name, unsigned long flags,
void __iomem *reg, u8 mshift, u8 mwidth, u8 nshift, u8 nwidth,
-- 
1.9.1

[PATCH 3/3] rtmutex: remove unnecessary adjust prio

2017-07-06 Thread Alex Shi

We don't need to adjust prio before new pi_waiter adding. The prio
only need update after pi_waiter change or task priority change.

Signed-off-by: Alex Shi 
Cc: Steven Rostedt 
Cc: Sebastian Siewior 
Cc: Mathieu Poirier 
Cc: Juri Lelli 
Cc: Thomas Gleixner 
To: linux-kernel@vger.kernel.org
To: Ingo Molnar 
To: Peter Zijlstra 
---
 kernel/locking/rtmutex.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 28cd09e..d1fe41f 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -963,7 +963,6 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
return -EDEADLK;
 
raw_spin_lock(&task->pi_lock);
-   rt_mutex_adjust_prio(task);
waiter->task = task;
waiter->lock = lock;
waiter->prio = task->prio;
-- 
2.7.4

[PATCH v2 2/3] rtmutex: update rt-mutex

2017-07-06 Thread Alex Shi

The rtmutex remove a pending owner bit in in rt_mutex::owner, in
commit 8161239a8bcc ("rtmutex: Simplify PI algorithm and make highest prio task 
get lock")
But the document was changed accordingly. Updating it to a meaningful
state.

BTW, as 'Steven Rostedt' mentioned:
There is still technically a "Pending Owner", it's just not called
that anymore. The pending owner happens to be the top_waiter of a lock
that has no owner and has been woken up to grab the lock.

Signed-off-by: Alex Shi 
Cc: Steven Rostedt 
Cc: Sebastian Siewior 
Cc: Mathieu Poirier 
Cc: Juri Lelli 
Cc: Thomas Gleixner 
To: linux-...@vger.kernel.org
To: linux-kernel@vger.kernel.org
To: Jonathan Corbet 
To: Ingo Molnar 
To: Peter Zijlstra 
---
 Documentation/locking/rt-mutex.txt | 58 +-
 1 file changed, 26 insertions(+), 32 deletions(-)

diff --git a/Documentation/locking/rt-mutex.txt 
b/Documentation/locking/rt-mutex.txt
index 243393d..35793e0 100644
--- a/Documentation/locking/rt-mutex.txt
+++ b/Documentation/locking/rt-mutex.txt
@@ -28,14 +28,13 @@ magic bullet for poorly designed applications, but it allows
 well-designed applications to use userspace locks in critical parts of
 an high priority thread, without losing determinism.
 
-The enqueueing of the waiters into the rtmutex waiter list is done in
+The enqueueing of the waiters into the rtmutex waiter tree is done in
 priority order. For same priorities FIFO order is chosen. For each
 rtmutex, only the top priority waiter is enqueued into the owner's
-priority waiters list. This list too queues in priority order. Whenever
+priority waiters tree. This tree too queues in priority order. Whenever
 the top priority waiter of a task changes (for example it timed out or
-got a signal), the priority of the owner task is readjusted. [The
-priority enqueueing is handled by "plists", see include/linux/plist.h
-for more details.]
+got a signal), the priority of the owner task is readjusted. The
+priority enqueueing is handled by "pi_waiters".
 
 RT-mutexes are optimized for fastpath operations and have no internal
 locking overhead when locking an uncontended mutex or unlocking a mutex
@@ -46,34 +45,29 @@ is used]
 The state of the rt-mutex is tracked via the owner field of the rt-mutex
 structure:
 
-rt_mutex->owner holds the task_struct pointer of the owner. Bit 0 and 1
-are used to keep track of the "owner is pending" and "rtmutex has
-waiters" state.
+lock->owner holds the task_struct pointer of the owner. Bit 0 is used to
+keep track of the "lock has waiters" state.
 
- owner bit1bit0
- NULL  0   0   mutex is free (fast acquire possible)
- NULL  0   1   invalid state
- NULL  1   0   Transitional state*
- NULL  1   1   invalid state
- taskpointer   0   0   mutex is held (fast release possible)
- taskpointer   0   1   task is pending owner
- taskpointer   1   0   mutex is held and has waiters
- taskpointer   1   1   task is pending owner and mutex has waiters
+ ownerbit0
+ NULL 0   lock is free (fast acquire possible)
+ NULL 1   lock is free and has waiters and the top waiter
+   is going to take the lock*
+ taskpointer  0   lock is held (fast release possible)
+ taskpointer  1   lock is held and has waiters**
 
-Pending-ownership handling is a performance optimization:
-pending-ownership is assigned to the first (highest priority) waiter of
-the mutex, when the mutex is released. The thread is woken up and once
-it starts executing it can acquire the mutex. Until the mutex is taken
-by it (bit 0 is cleared) a competing higher priority thread can "steal"
-the mutex which puts the woken up thread back on the waiters list.
+The fast atomic compare exchange based acquire and release is only
+possible when bit 0 of lock->owner is 0.
 
-The pending-ownership optimization is especially important for the
-uninterrupted workflow of high-prio tasks which repeatedly
-takes/releases locks that have lower-prio waiters. Without this
-optimization the higher-prio thread would ping-pong to the lower-prio
-task [because at unlock time we always assign a new owner].
+(*) It also can be a transitional state when grabbing the lock
+with ->wait_lock is held. To prevent any fast path cmpxchg to the lock,
+we need to set the bit0 before looking at the lock, and the owner may be
+NULL in this small time, hence this can be a transitional state.
 
-(*) The "mutex has waiters" bit gets set to take the lock. If the lock
-doesn't already have an owner, this bit is quickly cleared if there are
-no waiters.  So this is a transitional state to synchronize with looking
-at the owner field of the mutex and the mutex owner releasing the lock.
+(**) There is a small time when bit 0 is set but there are no
+waiters. This can happen when grabbing the lock in the slow path.
+To prevent a cmpxchg of the owner releasing the lock

[PATCH v4 1/3] rtmutex: update rt-mutex-design

2017-07-06 Thread Alex Shi

The rt-mutex-design documents didn't gotten meaningful update from its
first version. Even after owner's pending bit was removed in commit 8161239a8bcc
("rtmutex: Simplify PI algorithm and make highest prio task get lock")
and priority list 'plist' changed to rbtree. And Peter Zijlstra did some
clean up and fix for deadline task changes on tip tree.

So update it to latest code and make it meaningful.
Steven Rostedt and Sebastian Siewior gave much of comments and input
in this doc. Thanks!

Signed-off-by: Alex Shi 
Cc: Steven Rostedt 
Cc: Sebastian Siewior 
Cc: Mathieu Poirier 
Cc: Juri Lelli 
Cc: Thomas Gleixner 
To: linux-...@vger.kernel.org
To: linux-kernel@vger.kernel.org
To: Jonathan Corbet 
To: Ingo Molnar 
To: Peter Zijlstra 
---
 Documentation/locking/rt-mutex-design.txt | 432 --
 1 file changed, 105 insertions(+), 327 deletions(-)

diff --git a/Documentation/locking/rt-mutex-design.txt 
b/Documentation/locking/rt-mutex-design.txt
index 8666070..6c6e8c2 100644
--- a/Documentation/locking/rt-mutex-design.txt
+++ b/Documentation/locking/rt-mutex-design.txt
@@ -97,9 +97,9 @@ waiter   - A waiter is a struct that is stored on the stack 
of a blocked
a process being blocked on the mutex, it is fine to allocate
the waiter on the process's stack (local variable).  This
structure holds a pointer to the task, as well as the mutex that
-   the task is blocked on.  It also has the plist node structures to
-   place the task in the waiter_list of a mutex as well as the
-   pi_list of a mutex owner task (described below).
+   the task is blocked on.  It also has rbtree node structures to
+   place the task in the waiters rbtree of a mutex as well as the
+   pi_waiters rbtree of a mutex owner task (described below).
 
waiter is sometimes used in reference to the task that is waiting
on a mutex. This is the same as waiter->task.
@@ -179,53 +179,34 @@ again.
  |
F->L5-+
 
+If process G has the highest priority in the chain, then all the tasks up
+the chain (A and B in this example), must have their priorities increased
+to that of G.
 
-Plist
--
-
-Before I go further and talk about how the PI chain is stored through lists
-on both mutexes and processes, I'll explain the plist.  This is similar to
-the struct list_head functionality that is already in the kernel.
-The implementation of plist is out of scope for this document, but it is
-very important to understand what it does.
-
-There are a few differences between plist and list, the most important one
-being that plist is a priority sorted linked list.  This means that the
-priorities of the plist are sorted, such that it takes O(1) to retrieve the
-highest priority item in the list.  Obviously this is useful to store processes
-based on their priorities.
-
-Another difference, which is important for implementation, is that, unlike
-list, the head of the list is a different element than the nodes of a list.
-So the head of the list is declared as struct plist_head and nodes that will
-be added to the list are declared as struct plist_node.
-
-
-Mutex Waiter List
+Mutex Waiters Tree
 -
 
-Every mutex keeps track of all the waiters that are blocked on itself. The 
mutex
-has a plist to store these waiters by priority.  This list is protected by
-a spin lock that is located in the struct of the mutex. This lock is called
-wait_lock.  Since the modification of the waiter list is never done in
-interrupt context, the wait_lock can be taken without disabling interrupts.
+Every mutex keeps track of all the waiters that are blocked on itself. The
+mutex has a rbtree to store these waiters by priority.  This tree is protected
+by a spin lock that is located in the struct of the mutex. This lock is called
+wait_lock.
 
 
-Task PI List
+Task PI Tree
 
 
-To keep track of the PI chains, each process has its own PI list.  This is
-a list of all top waiters of the mutexes that are owned by the process.
-Note that this list only holds the top waiters and not all waiters that are
+To keep track of the PI chains, each process has its own PI rbtree.  This is
+a tree of all top waiters of the mutexes that are owned by the process.
+Note that this tree only holds the top waiters and not all waiters that are
 blocked on mutexes owned by the process.
 
-The top of the task's PI list is always the highest priority task that
+The top of the task's PI tree is always the highest priority task that
 is waiting on a mutex that is owned by the task.  So if the task has
 inherited a priority, it will always be the priority of the task that is
-at the top of this list.
+at the top of this tree.
 
-This list is stored in the task structure of a process as a plist called
-pi_list.  This list is protected by a spin lock also in the task structure,
+This tree is stored in the task struct

Re: [PATCH v3 1/3] rtmutex: update rt-mutex-design

2017-07-06 Thread Alex Shi



On 07/06/2017 09:25 PM, Steven Rostedt wrote:

> 
> This looks fine.
> 

Thanks a lot!

>>
>> 
>>
>> -Reviewers:  Ingo Molnar, Thomas Gleixner, Thomas Duetsch, and Randy Dunlap
>> +Reviewers:  Ingo Molnar, Thomas Gleixner, Thomas Duetsch, Randy Dunlap
>> +   and Sebastian Siewior
> 
> Since the updated was not reviewed by all of the above, you should
> change this to:
> 
>  Original Reviewers: Ingo Molnar, Thomas Gleixner, Thomas Deutsch, and
>  Randy Dunlap
> 
>  Update (7/6/2017) Reviewers: Steven Rostedt and Sebastian Siewior 
> 

This looks better. Thanks! :)
I will send out the new patch ASAP.

> Did Randy make comments?

No.

Regards
Alex

Re: [x86/time] 03fa63cc96: ACPI_Error:Table[DMAR]is_not_invalidated_during_early_boot_stage(#/tbxface -#)

2017-07-06 Thread Dou Liyang


Hi xiaolong,

Really thanks for your testing.

At 07/07/2017 09:54 AM, Ye Xiaolong wrote:

On 07/06, Thomas Gleixner wrote:

On Thu, 6 Jul 2017, kernel test robot wrote:


commit: 03fa63cc96ab35592e0a7d522b8edbc1e6b02d22 ("x86/time: Initialize interrupt 
mode behind timer init")



++++
|| 43436935b7 | 03fa63cc96 |
++++
| boot_successes | 0  | 4  |
++++


So 03fa63cc96 makes the box boot again. I'm confused as usual by the
output of this tool.,


kern  :info  : [0.005000] tsc: Fast TSC calibration using PIT
kern  :info  : [0.006000] tsc: Detected 2195.020 MHz processor
kern  :info  : [0.007000] Calibrating delay loop (skipped), value 
calculated using timer frequency.. 4390.04 BogoMIPS (lpj=2195020)
kern  :info  : [0.008001] pid_max: default: 90112 minimum: 704
kern  :info  : [0.009037] ACPI: Core revision 20170303
kern  :err   : [0.010002] ACPI Error: Table [DMAR] is not invalidated 
during early boot stage (20170303/tbxface-193)


Sure we have a error message here, but compared to what? Compared to
something which does not boot at all?


Sorry for the confusion, here commit 43436935b7 boot failed due to OOM which
happened at the late stage of kernel boot while the ACPI error showed at the
early boot stage for commit 03fa63cc96 and it didn't appear in 43436935b7's
dmesg.



let's make the problem clearly firstly:

1) Commit 43436935b7 ("x86/xen: Bypass intr mode setup in enlighten_pv
system") made kernel boot failed, which caused by OOM.

2) Commit 03fa63cc96 ("x86/time: Initialize interrupt mode behind timer 
init") can make the kernel boot success again, but with an ACPI error 
happened.


And both *1* and *2* used the same configuration showed in the
attachment.

Does anything I missed?

Thanks,

dou.


Thanks,
Xiaolong



Thanks,

tglx

Re: [PATCH 0/5 v4] Fix sp5100_tco watchdog driver regression

2017-07-06 Thread Marcelo "Marc" Ranolfi

(Reposting without html subpart, second attempt)

On Thu, Jul 6, 2017 at 11:09 PM, Marcelo "Marc" Ranolfi
 wrote:
Hi Zoltán and all,

I _was_ testing the patch series (the latest version), system worked
OK for two days, but then I got a system crash (probably hardware
related) and lost my files in the main partition. Posted about it in
the F2FS mailing list
(https://sourceforge.net/p/linux-f2fs/mailman/message/35928135/).

While I don't believe that the freeze/crash was caused by the watchdog
driver (many other things to blame: amdgpu driver on Southern Islands,
CPU voltage lower than recommended...), it's worth mentioning that the
computer was completely frozen for a little over 4 minutes, and did
not reset as I'd expect the watchdog to. Just mentioning because I'm
not sure of the implications.

Once I get my system back online this weekend I'll have a good look at
the code. Anyway, thanks again for the patches and I'm also expecting
someone more qualified (than me, at least) to review them.

Regards
Marcelo "Marc" Ranolfi

Re: 答复: [f2fs-dev] [PATCH] f2fs: relax permission for atomic/volatile ioctls

2017-07-06 Thread gaoxiang (P)


-邮件原件-
发件人: Jaegeuk Kim [mailto:jaeg...@kernel.org] 
发送时间: 2017年7月7日 10:12
收件人: gaoxiang (P)
抄送: Chao Yu; linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 
linux-f2fs-de...@lists.sourceforge.net
主题: Re: 答复: [f2fs-dev] [PATCH] f2fs: relax permission for atomic/volatile ioctls

On 07/07, gaoxiang (P) wrote:
> Hi,
> I think Sdcardfs should override task_struct cred fsuid/fsgid before calling 
> the underlay fs operation.
> And it seems sdcardfs implementations misses override cred fsuid/fsgid before 
> the ioctl operation.

Oh, good catch!

Thank you. ;)


;) We are fixing it internally. And the AOSP sdcardfs patch would be
https://android-review.googlesource.com/#/c/431259/
It is none of f2fs business. :) :)
Thanks.

> 
> Thanks.
> 
> -邮件原件-
> 发件人: Chao Yu [mailto:c...@kernel.org]
> 发送时间: 2017年7月7日 8:58
> 收件人: Jaegeuk Kim
> 抄送: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> linux-f2fs-de...@lists.sourceforge.net
> 主题: Re: [f2fs-dev] [PATCH] f2fs: relax permission for atomic/volatile 
> ioctls
> 
> On 2017/7/7 8:16, Jaegeuk Kim wrote:
> > On 07/07, Chao Yu wrote:
> >> On 2017/7/6 10:23, Jaegeuk Kim wrote:
> >>> This patch allows atomic/volatile ioctls for sqlite under sdcardfs.
> >>
> >> Out of curiosity, we will lose some capable when passing through sdcardfs?
> > 
> > I don't think so. But, it seems a test applicaion tries to access 
> > database from difference uid.
> 
> Oh, is that really allowed? if the sqlite database is public in sdcard 
> directory, application needs to apply WRITE_EXTERNAL_STORAGE in order 
> to add itself to sdcard_rw group, then it can access the database. Right?
> 
> Thanks,
> 
> > 
> > Thanks,
> > 
> >>
> >> Thanks,
> >>
> >>>
> >>> Signed-off-by: Jaegeuk Kim > ---  
> >>> fs/f2fs/file.c | 15 ---
> >>>  1 file changed, 15 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 
> >>> f5d6357e8360..dd8f5d2caa48 100644
> >>> --- a/fs/f2fs/file.c
> >>> +++ b/fs/f2fs/file.c
> >>> @@ -1587,9 +1587,6 @@ static int f2fs_ioc_start_atomic_write(struct file 
> >>> *filp)
> >>>   struct inode *inode = file_inode(filp);
> >>>   int ret;
> >>>  
> >>> - if (!inode_owner_or_capable(inode))
> >>> - return -EACCES;
> >>> -
> >>>   if (!S_ISREG(inode->i_mode))
> >>>   return -EINVAL;
> >>>  
> >>> @@ -1636,9 +1633,6 @@ static int f2fs_ioc_commit_atomic_write(struct file 
> >>> *filp)
> >>>   struct inode *inode = file_inode(filp);
> >>>   int ret;
> >>>  
> >>> - if (!inode_owner_or_capable(inode))
> >>> - return -EACCES;
> >>> -
> >>>   ret = mnt_want_write_file(filp);
> >>>   if (ret)
> >>>   return ret;
> >>> @@ -1672,9 +1666,6 @@ static int f2fs_ioc_start_volatile_write(struct 
> >>> file *filp)
> >>>   struct inode *inode = file_inode(filp);
> >>>   int ret;
> >>>  
> >>> - if (!inode_owner_or_capable(inode))
> >>> - return -EACCES;
> >>> -
> >>>   if (!S_ISREG(inode->i_mode))
> >>>   return -EINVAL;
> >>>  
> >>> @@ -1707,9 +1698,6 @@ static int f2fs_ioc_release_volatile_write(struct 
> >>> file *filp)
> >>>   struct inode *inode = file_inode(filp);
> >>>   int ret;
> >>>  
> >>> - if (!inode_owner_or_capable(inode))
> >>> - return -EACCES;
> >>> -
> >>>   ret = mnt_want_write_file(filp);
> >>>   if (ret)
> >>>   return ret;
> >>> @@ -1736,9 +1724,6 @@ static int f2fs_ioc_abort_volatile_write(struct 
> >>> file *filp)
> >>>   struct inode *inode = file_inode(filp);
> >>>   int ret;
> >>>  
> >>> - if (!inode_owner_or_capable(inode))
> >>> - return -EACCES;
> >>> -
> >>>   ret = mnt_want_write_file(filp);
> >>>   if (ret)
> >>>   return ret;
> >>>
> 
> --
>  Check out the vibrant tech community on one of the world's 
> most engaging tech sites, Slashdot.org! http://sdm.link/slashdot 
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Re: 答复: [f2fs-dev] [PATCH] f2fs: relax permission for atomic/volatile ioctls

2017-07-06 Thread Jaegeuk Kim

On 07/07, gaoxiang (P) wrote:
> Hi,
> I think Sdcardfs should override task_struct cred fsuid/fsgid before calling 
> the underlay fs operation.
> And it seems sdcardfs implementations misses override cred fsuid/fsgid before 
> the ioctl operation.

Oh, good catch!

Thank you. ;)

> 
> Thanks.
> 
> -邮件原件-
> 发件人: Chao Yu [mailto:c...@kernel.org] 
> 发送时间: 2017年7月7日 8:58
> 收件人: Jaegeuk Kim
> 抄送: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> linux-f2fs-de...@lists.sourceforge.net
> 主题: Re: [f2fs-dev] [PATCH] f2fs: relax permission for atomic/volatile ioctls
> 
> On 2017/7/7 8:16, Jaegeuk Kim wrote:
> > On 07/07, Chao Yu wrote:
> >> On 2017/7/6 10:23, Jaegeuk Kim wrote:
> >>> This patch allows atomic/volatile ioctls for sqlite under sdcardfs.
> >>
> >> Out of curiosity, we will lose some capable when passing through sdcardfs?
> > 
> > I don't think so. But, it seems a test applicaion tries to access database 
> > from
> > difference uid.
> 
> Oh, is that really allowed? if the sqlite database is public in sdcard 
> directory,
> application needs to apply WRITE_EXTERNAL_STORAGE in order to add itself to
> sdcard_rw group, then it can access the database. Right?
> 
> Thanks,
> 
> > 
> > Thanks,
> > 
> >>
> >> Thanks,
> >>
> >>>
> >>> Signed-off-by: Jaegeuk Kim > ---
> >>>  fs/f2fs/file.c | 15 ---
> >>>  1 file changed, 15 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> >>> index f5d6357e8360..dd8f5d2caa48 100644
> >>> --- a/fs/f2fs/file.c
> >>> +++ b/fs/f2fs/file.c
> >>> @@ -1587,9 +1587,6 @@ static int f2fs_ioc_start_atomic_write(struct file 
> >>> *filp)
> >>>   struct inode *inode = file_inode(filp);
> >>>   int ret;
> >>>  
> >>> - if (!inode_owner_or_capable(inode))
> >>> - return -EACCES;
> >>> -
> >>>   if (!S_ISREG(inode->i_mode))
> >>>   return -EINVAL;
> >>>  
> >>> @@ -1636,9 +1633,6 @@ static int f2fs_ioc_commit_atomic_write(struct file 
> >>> *filp)
> >>>   struct inode *inode = file_inode(filp);
> >>>   int ret;
> >>>  
> >>> - if (!inode_owner_or_capable(inode))
> >>> - return -EACCES;
> >>> -
> >>>   ret = mnt_want_write_file(filp);
> >>>   if (ret)
> >>>   return ret;
> >>> @@ -1672,9 +1666,6 @@ static int f2fs_ioc_start_volatile_write(struct 
> >>> file *filp)
> >>>   struct inode *inode = file_inode(filp);
> >>>   int ret;
> >>>  
> >>> - if (!inode_owner_or_capable(inode))
> >>> - return -EACCES;
> >>> -
> >>>   if (!S_ISREG(inode->i_mode))
> >>>   return -EINVAL;
> >>>  
> >>> @@ -1707,9 +1698,6 @@ static int f2fs_ioc_release_volatile_write(struct 
> >>> file *filp)
> >>>   struct inode *inode = file_inode(filp);
> >>>   int ret;
> >>>  
> >>> - if (!inode_owner_or_capable(inode))
> >>> - return -EACCES;
> >>> -
> >>>   ret = mnt_want_write_file(filp);
> >>>   if (ret)
> >>>   return ret;
> >>> @@ -1736,9 +1724,6 @@ static int f2fs_ioc_abort_volatile_write(struct 
> >>> file *filp)
> >>>   struct inode *inode = file_inode(filp);
> >>>   int ret;
> >>>  
> >>> - if (!inode_owner_or_capable(inode))
> >>> - return -EACCES;
> >>> -
> >>>   ret = mnt_want_write_file(filp);
> >>>   if (ret)
> >>>   return ret;
> >>>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Re: [PATCH 2/9] RISC-V: Atomic and Locking Code

2017-07-06 Thread Boqun Feng

On Thu, Jul 06, 2017 at 06:04:13PM -0700, Palmer Dabbelt wrote:
[...]
> >> +#define __smp_load_acquire(p) 
> >> \
> >> +do {  
> >> \
> >> +  union { typeof(*p) __val; char __c[1]; } __u =  \
> >> +  { .__val = (__force typeof(*p)) (v) };  \
> >> +  compiletime_assert_atomic_type(*p); \
> >> +  switch (sizeof(*p)) {   \
> >> +  case 1: \
> >> +  case 2: \
> >> +  __u.__val = READ_ONCE(*p);  \
> >> +  smb_mb();   \
> >> +  break;  \
> >> +  case 4: \
> >> +  __asm__ __volatile__ (  \
> >> +  "amoor.w.aq %1, zero, %0"   \
> >> +  : "+A" (*p) \
> >> +  : "=r" (__u.__val)  \
> >> +  : "memory");\
> >> +  break;  \
> >> +  case 8: \
> >> +  __asm__ __volatile__ (  \
> >> +  "amoor.d.aq %1, zero, %0"   \
> >> +  : "+A" (*p) \
> >> +  : "=r" (__u.__val)  \
> >> +  : "memory");\
> >> +  break;  \
> >> +  }   \
> >> +  __u.__val;  \
> >> +} while (0)
> >
> > 'creative' use of amoswap and amoor :-)
> >
> > You should really look at a normal load with ordering instruction
> > though, that amoor.aq is a rmw and will promote the cacheline to
> > exclusive (and dirty it).
> 
> The thought here was that implementations could elide the MW by pattern
> matching the "zero" (x0, the architectural zero register) forms of AMOs where
> it's interesting.  I talked to one of our microarchitecture guys, and while he
> agrees that's easy he points out that eliding half the AMO may wreak havoc on
> the consistency model.  Since we're not sure what the memory model is actually
> going to look like, we thought it'd be best to just write the simplest code
> here
> 
>   /*
>* TODO_RISCV_MEMORY_MODEL: While we could emit AMOs for the W and D sized
>* accesses here, it's questionable if that actually helps or not: the lack 
> of
>* offsets in the AMOs means they're usually preceded by an addi, so they
>* probably won't save code space.  For now we'll just emit the fence.
>*/
>   #define __smp_store_release(p, v)   \
>   ({  \
>   compiletime_assert_atomic_type(*p); \
>   smp_mb();   \
>   WRITE_ONCE(*p, v);  \
>   })
> 
>   #define __smp_load_acquire(p)   \
>   ({  \
>   union{typeof(*p) __p; long __l;} __u;   \

AFAICT, there seems to be an endian issue if you do this. No?

Let us assume typeof(*p) is char and *p == 1, and on a big endian 32bit
platform:

>   compiletime_assert_atomic_type(*p); \
>   __u.__l = READ_ONCE(*p);\

READ_ONCE(*p) is 1 so
__u.__l is 0x00 00 00 01 now

>   smp_mb();   \
>   __u.__p;\

__u.__p is then 0x00.

Am I missing something here?

Even so why not use the simple definition as in include/asm-generic/barrier.h?

Regards,
Boqun



>   })
> 
[...]


signature.asc
Description: PGP signature

[PATCH v1] i2c: mediatek: send i2c master code at 400k

2017-07-06 Thread Jun Gao

From: Jun Gao 

The speed of sending i2c master code in high-speed mode depends on
source clock, clock-div and TIMING register. The source clock and
clock-div of different SoC are not all the same. In order to send
i2c master code at 400k in high-speed mode, a appropriate value
should be set to TIMING register for a certain source clock and
clock-div.

Signed-off-by: Jun Gao 
---
 drivers/i2c/busses/i2c-mt65xx.c | 65 +
 1 file changed, 46 insertions(+), 19 deletions(-)

diff --git a/drivers/i2c/busses/i2c-mt65xx.c b/drivers/i2c/busses/i2c-mt65xx.c
index 45d6171..9bedf0b 100644
--- a/drivers/i2c/busses/i2c-mt65xx.c
+++ b/drivers/i2c/busses/i2c-mt65xx.c
@@ -50,7 +50,6 @@
 #define I2C_FS_START_CON   0x1800
 #define I2C_TIME_CLR_VALUE 0x
 #define I2C_TIME_DEFAULT_VALUE 0x0003
-#define I2C_FS_TIME_INIT_VALUE 0x1303
 #define I2C_WRRD_TRANAC_VALUE  0x0002
 #define I2C_RD_TRANAC_VALUE0x0001
 
@@ -154,6 +153,7 @@ struct mtk_i2c {
bool use_push_pull; /* IO config push-pull mode */
 
u16 irq_stat;   /* interrupt status */
+   unsigned int clk_src_div;
unsigned int speed_hz;  /* The speed in transfer */
enum mtk_trans_op op;
u16 timing_reg;
@@ -285,23 +285,20 @@ static void mtk_i2c_init_hw(struct mtk_i2c *i2c)
  * less than or equal to i2c->speed_hz. The calculation try to get
  * sample_cnt and step_cn
  */
-static int mtk_i2c_set_speed(struct mtk_i2c *i2c, unsigned int parent_clk,
-unsigned int clock_div)
+static int mtk_i2c_calculate_speed(struct mtk_i2c *i2c, unsigned int clk_src,
+  unsigned int target_speed,
+  unsigned int *timing_step_cnt,
+  unsigned int *timing_sample_cnt)
 {
-   unsigned int clk_src;
unsigned int step_cnt;
unsigned int sample_cnt;
unsigned int max_step_cnt;
-   unsigned int target_speed;
unsigned int base_sample_cnt = MAX_SAMPLE_CNT_DIV;
unsigned int base_step_cnt;
unsigned int opt_div;
unsigned int best_mul;
unsigned int cnt_mul;
 
-   clk_src = parent_clk / clock_div;
-   target_speed = i2c->speed_hz;
-
if (target_speed > MAX_HS_MODE_SPEED)
target_speed = MAX_HS_MODE_SPEED;
 
@@ -347,16 +344,48 @@ static int mtk_i2c_set_speed(struct mtk_i2c *i2c, 
unsigned int parent_clk,
return -EINVAL;
}
 
-   step_cnt--;
-   sample_cnt--;
+   *timing_step_cnt = step_cnt - 1;
+   *timing_sample_cnt = sample_cnt - 1;
+
+   return 0;
+}
+
+static int mtk_i2c_set_speed(struct mtk_i2c *i2c, unsigned int parent_clk)
+{
+   unsigned int clk_src;
+   unsigned int step_cnt;
+   unsigned int sample_cnt;
+   unsigned int target_speed;
+   int ret;
+
+   clk_src = parent_clk / i2c->clk_src_div;
+   target_speed = i2c->speed_hz;
 
if (target_speed > MAX_FS_MODE_SPEED) {
+   /* Set master code speed register */
+   ret = mtk_i2c_calculate_speed(i2c, clk_src, MAX_FS_MODE_SPEED,
+ &step_cnt, &sample_cnt);
+   if (ret < 0)
+   return ret;
+
+   i2c->timing_reg = (sample_cnt << 8) | step_cnt;
+
/* Set the high speed mode register */
-   i2c->timing_reg = I2C_FS_TIME_INIT_VALUE;
+   ret = mtk_i2c_calculate_speed(i2c, clk_src, target_speed,
+ &step_cnt, &sample_cnt);
+   if (ret < 0)
+   return ret;
+
i2c->high_speed_reg = I2C_TIME_DEFAULT_VALUE |
(sample_cnt << 12) | (step_cnt << 8);
} else {
-   i2c->timing_reg = (sample_cnt << 8) | (step_cnt << 0);
+   ret = mtk_i2c_calculate_speed(i2c, clk_src, target_speed,
+ &step_cnt, &sample_cnt);
+   if (ret < 0)
+   return ret;
+
+   i2c->timing_reg = (sample_cnt << 8) | step_cnt;
+
/* Disable the high speed transaction */
i2c->high_speed_reg = I2C_TIME_CLR_VALUE;
}
@@ -647,8 +676,7 @@ static u32 mtk_i2c_functionality(struct i2c_adapter *adap)
.functionality = mtk_i2c_functionality,
 };
 
-static int mtk_i2c_parse_dt(struct device_node *np, struct mtk_i2c *i2c,
-   unsigned int *clk_src_div)
+static int mtk_i2c_parse_dt(struct device_node *np, struct mtk_i2c *i2c)
 {
int ret;
 
@@ -656,11 +684,11 @@ static int mtk_i2c_parse_dt(struct device_node *np, 
struct mtk_i2c *i2c,
if (ret < 0)
i2c->speed_hz = I2C_DEFAULT_SPEED;
 
-   ret = of_property_read_u32(np, "clock-div", clk_src_div);
+   ret = of_property_re

[lkp-robot] [sched/cputime] 2a42eb9594: BUG:using_smp_processor_id()in_preemptible

2017-07-06 Thread kernel test robot


FYI, we noticed the following commit:

commit: 2a42eb9594a1480b4ead9e036e06ee1290e5fa6d ("sched/cputime: Accumulate 
vtime on top of nsec clocksource")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/urgent

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu Haswell,+smep,+smap -smp 2 
-m 512M

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


++++
|| bac5b6b6b1 | 2a42eb9594 |
++++
| boot_successes | 8  | 2  |
| boot_failures  | 0  | 6  |
| BUG:kernel_hang_in_test_stage  | 0  | 2  |
| BUG:using_smp_processor_id()in_preemptible | 0  | 4  |
++++



[   55.289944] BUG: using smp_processor_id() in preemptible [] code: 
99-trinity/181
[   55.302094] caller is debug_smp_processor_id+0x17/0x19
[   55.313831] CPU: 0 PID: 181 Comm: 99-trinity Not tainted 
4.12.0-01059-g2a42eb9 #1
[   55.324812] Call Trace:
[   55.328601]  dump_stack+0x82/0xb8
[   55.333467]  check_preemption_disabled+0xd1/0xe3
[   55.340370]  debug_smp_processor_id+0x17/0x19
[   55.346815]  vtime_delta+0xd/0x2c
[   55.352843]  task_cputime+0x89/0xdb
[   55.358026]  thread_group_cputime+0x11b/0x1ed
[   55.364517]  thread_group_cputime_adjusted+0x1f/0x47
[   55.371895]  wait_consider_task+0x2a9/0xaf9
[   55.378025]  ? lock_acquire+0x97/0xa4
[   55.383463]  do_wait+0xdf/0x1f4
[   55.388199]  SYSC_wait4+0x8e/0xb5
[   55.393219]  ? list_add+0x34/0x34
[   55.398317]  SyS_wait4+0x9/0xb
[   55.403401]  do_syscall_64+0x70/0x82
[   55.408977]  entry_SYSCALL64_slow_path+0x25/0x25
[   55.409006] RIP: 0033:0x7ff2f15c3c3e
[   55.409032] RSP: 002b:7ffd3e91eab0 EFLAGS: 0246 ORIG_RAX: 
003d
[   55.409064] RAX: ffda RBX: 7ff2f1f0b6a0 RCX: 7ff2f15c3c3e
[   55.409070] RDX: 0001 RSI: 7ffd3e91eb18 RDI: 
[   55.409075] RBP: 0001 R08:  R09: 
[   55.409101] R10:  R11: 0246 R12: 
[   55.409106] R13:  R14: 0001 R15: 
[   55.550819] random: trinity: uninitialized urandom read (4 bytes read)
mountall: Event failed
[   55.947396] random: mountall: uninitialized urandom read (12 bytes read)
[   56.044974] BUG: using smp_processor_id() in preemptible [] code: 
mountall/194
[   56.057287] caller is debug_smp_processor_id+0x17/0x19
[   56.066939] CPU: 1 PID: 194 Comm: mountall Not tainted 4.12.0-01059-g2a42eb9 
#1
[   56.078248] Call Trace:
[   56.082297]  dump_stack+0x82/0xb8
[   56.087256]  check_preemption_disabled+0xd1/0xe3
[   56.093512]  debug_smp_processor_id+0x17/0x19
[   56.099800]  vtime_delta+0xd/0x2c
[   56.105862]  task_cputime+0x89/0xdb
[   56.111237]  thread_group_cputime+0x11b/0x1ed
[   56.117693]  thread_group_cputime_adjusted+0x1f/0x47
[   56.125298]  wait_consider_task+0x2a9/0xaf9
[   56.131983]  ? do_raw_read_lock+0x39/0x3c
[   56.138293]  do_wait+0xdf/0x1f4
[   56.143000]  SyS_waitid+0xac/0x1e3
[   56.148566]  ? list_add+0x34/0x34
[   56.153694]  do_syscall_64+0x70/0x82
[   56.159267]  entry_SYSCALL64_slow_path+0x25/0x25
[   56.166324] RIP: 0033:0x7fdc9c642d2f
[   56.171770] RSP: 002b:7ffe6f4dda30 EFLAGS: 0246 ORIG_RAX: 
00f7
[   56.181888] RAX: ffda RBX: 560f25433d70 RCX: 7fdc9c642d2f
[   56.192353] RDX: 7ffe6f4dda78 RSI: 00c7 RDI: 0001
[   56.203076] RBP: 560f2543bf20 R08:  R09: 0020
[   56.212903] R10: 0004 R11: 0246 R12: 560f25439be0
[   56.223108] R13: 00c7 R14: 00c7 R15: 560f25222c1f
[   56.522906] init: Failed to create pty - disabling logging for job
[   56.535738] init: Temporary process spawn error: No such file or directory
[   56.769705] init: Failed to create pty - disabling logging for job
[   56.781761] init: Temporary process spawn error: No such file or directory
[   57.258638] init: Failed to create pty - disabling logging for job
[   57.271867] init: Temporary process spawn error: No such file or directory
[   57.318928] init: Failed to create pty - disabling logging for job
[   57.340639] init: Temporary process spawn error: No such file or directory
[   57.499535] BUG: using smp_processor_id() in preemptible [] code: 
sh/217
[   57.529507] caller is debug_smp_processor_id+0x17/0x19
[   57.537158] CPU: 1 PID: 217 Comm: sh Not tainted 4.12.0-01059-g2a42eb9 #1
[   57.547065] Call Trace:
[   57.550929]  dump_stack+0x82/0xb8
[   57.555948]  check_preemption_disabled+0xd1/0xe3
[   57.562849]  debug_smp_processor_id+0x17/0x1

Re: [PATCH] f2fs: relax permission for atomic/volatile ioctls

2017-07-06 Thread Eric Biggers

On Wed, Jul 05, 2017 at 07:23:26PM -0700, Jaegeuk Kim wrote:
> This patch allows atomic/volatile ioctls for sqlite under sdcardfs.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/file.c | 15 ---
>  1 file changed, 15 deletions(-)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index f5d6357e8360..dd8f5d2caa48 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -1587,9 +1587,6 @@ static int f2fs_ioc_start_atomic_write(struct file 
> *filp)
>   struct inode *inode = file_inode(filp);
>   int ret;
>  
> - if (!inode_owner_or_capable(inode))
> - return -EACCES;
> -

Why not require that the file be open for writing (FMODE_WRITE)?

Re: [PATCH 1/3] drm: Widen vblank count to 64 bits. Change vblank time precision to ns

2017-07-06 Thread Michel Dänzer

On 07/07/17 10:34 AM, Michel Dänzer wrote:
> On 07/07/17 12:04 AM, Keith Packard wrote:
>> Michel Dänzer  writes:
>>
 @@ -317,6 +317,9 @@ int via_driver_irq_postinstall(struct drm_device *dev)
if (!dev_priv)
return -EINVAL;
  
 +  if (dev->driver->get_vblank_counter)
 +  dev->max_vblank_count = 0x;
>>>
>>> What's the purpose of this? All drivers providing get_vblank_counter
>>> should already initialize max_vblank_count correctly.
>>
>> Yeah, I couldn't prove that this driver did that,
> 
> Which driver?
> 
>> and as Daniel says, we haven't ever audited the drivers to make sure
>> they do.
> 
> I don't think that's what he meant, rather that with the change above,
> all drivers have to be audited to make sure the added assignment doesn't
> clobber an earlier assignment by the driver.

... and if there are any drivers that set
dev->driver->get_vblank_counter but don't set dev->max_vblank_count to a
non-0 value, that the hardware counter actually has 32 bits.


I'd say don't bother, just drop this hunk.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] f2fs: no need to create issue_discard_thread if it exists

2017-07-06 Thread Yunlong Song

It's fine unless create_discard_cmd_control is used in remount flow in the 
future.

On 2017/7/6 21:16, Chao Yu wrote:
> Hi Yunlong,
>
> It looks there is no way to create discard thread redundantly,
> so here we don't need to check this?
>
> Thanks,
>
> On 2017/7/6 19:05, Yunlong Song wrote:
>> Signed-off-by: Yunlong Song 
>> ---
>>  fs/f2fs/segment.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>> index 4c246e3..b48d004 100644
>> --- a/fs/f2fs/segment.c
>> +++ b/fs/f2fs/segment.c
>> @@ -1417,6 +1417,8 @@ static int create_discard_cmd_control(struct 
>> f2fs_sb_info *sbi)
>>  
>>  if (SM_I(sbi)->dcc_info) {
>>  dcc = SM_I(sbi)->dcc_info;
>> +if (dcc->f2fs_issue_discard)
>> +return err;
>>  goto init_thread;
>>  }
>>  
>>
> .
>


-- 
Thanks,
Yunlong Song

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 919 matches

Mail list logo