date:20190712

Re: [PATCH v3 3/3] powerpc/module64: Use symbolic instructions names.

2019-07-12 Thread Michael Ellerman

Christophe Leroy  writes:
> Le 08/07/2019 à 02:56, Michael Ellerman a écrit :
>> Christophe Leroy  writes:
>>> To increase readability/maintainability, replace hard coded
>>> instructions values by symbolic names.
>>>
>>> Signed-off-by: Christophe Leroy 
>>> ---
>>> v3: fixed warning by adding () in an 'if' around X | Y (unlike said in v2 
>>> history, this change was forgotten in v2)
>>> v2: rearranged comments
>>>
>>>   arch/powerpc/kernel/module_64.c | 53 
>>> +++--
>>>   1 file changed, 35 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kernel/module_64.c 
>>> b/arch/powerpc/kernel/module_64.c
>>> index c2e1b06253b8..b33a5d5e2d35 100644
>>> --- a/arch/powerpc/kernel/module_64.c
>>> +++ b/arch/powerpc/kernel/module_64.c
>>> @@ -704,18 +711,21 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
>> ...
>>> /*
>>>  * If found, replace it with:
>>>  *  addis r2, r12, (.TOC.-func)@ha
>>>  *  addi r2, r12, (.TOC.-func)@l
>>>  */
>>> -   ((uint32_t *)location)[0] = 0x3c4c + PPC_HA(value);
>>> -   ((uint32_t *)location)[1] = 0x3842 + PPC_LO(value);
>>> +   ((uint32_t *)location)[0] = PPC_INST_ADDIS | 
>>> __PPC_RT(R2) |
>>> +   __PPC_RA(R12) | 
>>> PPC_HA(value);
>>> +   ((uint32_t *)location)[1] = PPC_INST_ADDI | 
>>> __PPC_RT(R2) |
>>> +   __PPC_RA(R12) | 
>>> PPC_LO(value);
>>> break;
>> 
>> This was crashing and it's amazing how long you can stare at a
>> disassembly and not see the difference between `r2` and `r12` :)
>
> Argh, yes. I was misleaded by the comment I guess. Sorry for that and 
> thanks for fixing.

No worries, yes the comment was the problem. I fixed that as well.

cheers

Re: [PATCH v2 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA

2019-07-12 Thread Michal Hocko

On Thu 11-07-19 23:25:44, Hoan Tran OS wrote:
> In NUMA layout which nodes have memory ranges that span across other nodes,
> the mm driver can detect the memory node id incorrectly.
> 
> For example, with layout below
> Node 0 address:    
> Node 1 address:    
> 
> Note:
>  - Memory from low to high
>  - 0/1: Node id
>  - x: Invalid memory of a node
> 
> When mm probes the memory map, without CONFIG_NODES_SPAN_OTHER_NODES
> config, mm only checks the memory validity but not the node id.
> Because of that, Node 1 also detects the memory from node 0 as below
> when it scans from the start address to the end address of node 1.
> 
> Node 0 address:    
> Node 1 address:    
> 
> This layout could occur on any architecture. This patch enables
> CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA to fix this issue.

Yes it can occur on any arch but most sane platforms simply do not
overlap physical ranges. So I do not really see any reason to
unconditionally enable the config for everybody. What is an advantage?

-- 
Michal Hocko
SUSE Labs

Re: [PATCH] vfio: platform: reset: add support for XHCI reset hook

2019-07-12 Thread Auger Eric

Hi Gregory,

On 7/11/19 4:31 PM, Gregory CLEMENT wrote:
> The VFIO reset hook is called every time a platform device is passed
> to a guest or removed from a guest.
> 
> When the XHCI device is unbound from the host, the host driver
> disables the XHCI clocks/phys/regulators so when the device is passed
> to the guest it becomes dis-functional.
> 
> This initial implementation uses the VFIO reset hook to enable the
> XHCI clocks/phys on behalf of the guest.

the platform reset module must also make sure there are no more DMA
requests and interrupts that can be sent by the device anymore.
> 
> Ported from Marvell LSP code originally written by Yehuda Yitschak
> 
> Signed-off-by: Gregory CLEMENT 
> ---
>  drivers/vfio/platform/reset/Kconfig   |  8 +++
>  drivers/vfio/platform/reset/Makefile  |  2 +
>  .../vfio/platform/reset/vfio_platform_xhci.c  | 60 +++
>  3 files changed, 70 insertions(+)
>  create mode 100644 drivers/vfio/platform/reset/vfio_platform_xhci.c
> 
> diff --git a/drivers/vfio/platform/reset/Kconfig 
> b/drivers/vfio/platform/reset/Kconfig
> index 392e3c09def0..14f620fd250d 100644
> --- a/drivers/vfio/platform/reset/Kconfig
> +++ b/drivers/vfio/platform/reset/Kconfig
> @@ -22,3 +22,11 @@ config VFIO_PLATFORM_BCMFLEXRM_RESET
> Enables the VFIO platform driver to handle reset for Broadcom FlexRM
>  
> If you don't know what to do here, say N.
> +
> +config VFIO_PLATFORM_XHCI_RESET
> + tristate "VFIO support for USB XHCI reset"
> + depends on VFIO_PLATFORM
> + help
> +   Enables the VFIO platform driver to handle reset for USB XHCI
> +
> +   If you don't know what to do here, say N.
> diff --git a/drivers/vfio/platform/reset/Makefile 
> b/drivers/vfio/platform/reset/Makefile
> index 7294c5ea122e..d84c4d3dc041 100644
> --- a/drivers/vfio/platform/reset/Makefile
> +++ b/drivers/vfio/platform/reset/Makefile
> @@ -1,7 +1,9 @@
>  # SPDX-License-Identifier: GPL-2.0
>  vfio-platform-calxedaxgmac-y := vfio_platform_calxedaxgmac.o
>  vfio-platform-amdxgbe-y := vfio_platform_amdxgbe.o
> +vfio-platform-xhci-y := vfio_platform_xhci.o
>  
>  obj-$(CONFIG_VFIO_PLATFORM_CALXEDAXGMAC_RESET) += 
> vfio-platform-calxedaxgmac.o
>  obj-$(CONFIG_VFIO_PLATFORM_AMDXGBE_RESET) += vfio-platform-amdxgbe.o
>  obj-$(CONFIG_VFIO_PLATFORM_BCMFLEXRM_RESET) += vfio_platform_bcmflexrm.o
> +obj-$(CONFIG_VFIO_PLATFORM_XHCI_RESET) += vfio-platform-xhci.o
> diff --git a/drivers/vfio/platform/reset/vfio_platform_xhci.c 
> b/drivers/vfio/platform/reset/vfio_platform_xhci.c
> new file mode 100644
> index ..7b75a04402ee
> --- /dev/null
> +++ b/drivers/vfio/platform/reset/vfio_platform_xhci.c
> @@ -0,0 +1,60 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * VFIO platform driver specialized for XHCI reset
> + *
> + * Copyright 2016 Marvell Semiconductors, Inc.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
io, init, kernel should be removable (noticed init and kernel.h also are
in other reset modules though)
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "../vfio_platform_private.h"
> +
> +#define MAX_XHCI_CLOCKS  4
Where does this number come from?

>From Documentation/devicetree/bindings/usb/usb-xhci.txt I understand
there are max 2 clocks, "core" and "reg" (I don't have any specific
knowledge on the device though).

> +#define MAX_XHCI_PHYS2
not used
> +
> +int vfio_platform_xhci_reset(struct vfio_platform_device *vdev)
> +{
> + struct device *dev = vdev->device;
> + struct device_node *np = dev->of_node;
> + struct usb_phy *usb_phy;
> + struct clk *clk;
> + int ret, i;
> +
> + /*
> +  * Compared to the native driver, no need to handle the
> +  * deferred case, because the resources are already
> +  * there
> +  */
> + for (i = 0; i < MAX_XHCI_CLOCKS; i++) {
> + clk = of_clk_get(np, i);
> + if (!IS_ERR(clk)) {
> + ret = clk_prepare_enable(clk);
> + if (ret)
> + return -ENODEV;
return ret?
> + }
> + }
> +
> + usb_phy = devm_usb_get_phy_by_phandle(dev, "usb-phy", 0);
> + if (!IS_ERR(usb_phy)) {
> + ret = usb_phy_init(usb_phy);
> + if (ret)
> + return -ENODEV;
return ret?
> + }

> +
> + return 0;
> +}
> +
> +module_vfio_reset_handler("generic-xhci", vfio_platform_xhci_reset);
> +
> +MODULE_AUTHOR("Yehuda Yitschak");
> +MODULE_DESCRIPTION("Reset support for XHCI vfio platform device");
> +MODULE_LICENSE("GPL");
> 
Thanks

Eric

[PATCH v4 0/3] Forced-wakeup for stop states on Powernv

2019-07-12 Thread Abhishek Goel

Currently, the cpuidle governors determine what idle state a idling CPU
should enter into based on heuristics that depend on the idle history on
that CPU. Given that no predictive heuristic is perfect, there are cases
where the governor predicts a shallow idle state, hoping that the CPU will
be busy soon. However, if no new workload is scheduled on that CPU in the
near future, the CPU will end up in the shallow state.

Motivation
--
In case of POWER, this is problematic, when the predicted state in the
aforementioned scenario is a shallow stop state on a tickless system. As
we might get stuck into shallow states even for hours, in absence of ticks
or interrupts.

To address this, We forcefully wakeup the cpu by setting the decrementer.
The decrementer is set to a value that corresponds with the residency of
the next available state. Thus firing up a timer that will forcefully
wakeup the cpu. Few such iterations will essentially train the governor to
select a deeper state for that cpu, as the timer here corresponds to the
next available cpuidle state residency. Thus, cpu will eventually end up
in the deepest possible state and we won't get stuck in a shallow state
for long duration.

Experiment
--
For earlier versions when this feature was meat to be only for shallow lite
states, I performed experiments for three scenarios to collect some data.

case 1 :
Without this patch and without tick retained, i.e. in a upstream kernel,
It would spend more than even a second to get out of stop0_lite.

case 2 : With tick retained in a upstream kernel -

Generally, we have a sched tick at 4ms(CONF_HZ = 250). Ideally I expected
it to take 8 sched tick to get out of stop0_lite. Experimentally,
observation was

=
sample  minmax   99percentile
20  4ms12ms  4ms
=

It would take atleast one sched tick to get out of stop0_lite.

case 2 :  With this patch (not stopping tick, but explicitly queuing a
  timer)


sample  min max 99percentile

20  144us   192us   144us



Description of current implementation
-

We calculate timeout for the current idle state as the residency value
of the next available idle state. If the decrementer is set to be
greater than this timeout, we update the decrementer value with the
residency of next available idle state. Thus, essentially training the
governor to select the next available deeper state until we reach the
deepest state. Hence, we won't get stuck unnecessarily in shallow states
for longer duration.


v1 of auto-promotion : https://lkml.org/lkml/2019/3/22/58 This patch was
implemented only for shallow lite state in generic cpuidle driver.

v2 : Removed timeout_needed and rebased to current
upstream kernel

Then,
v1 of forced-wakeup : Moved the code to cpuidle powernv driver and started
as forced wakeup instead of auto-promotion

v2 : Extended the forced wakeup logic for all states.
Setting the decrementer instead of queuing up a hrtimer to implement the
logic.

v3 : 1) Cleanly handle setting the decrementer after exiting out of stop
   states.
 2) Added a disable_callback feature to compute timeout whenever a
state is enbaled or disabled instead of computing everytime in fast
idle path.
 3) Use disable callback to recompute timeout whenever state usage
is changed for a state. Also, cleaned up the get_snooze_timeout
function.

v4 :Changed the type and name of set/reset decrementer function.
Handled irq work pending in try_set_dec_before_idle.
No change in patch 2 and 3.

Abhishek Goel (3):
  cpuidle-powernv : forced wakeup for stop states
  cpuidle : Add callback whenever a state usage is enabled/disabled
  cpuidle-powernv : Recompute the idle-state timeouts when state usage
is enabled/disabled

 arch/powerpc/include/asm/time.h   |  2 ++
 arch/powerpc/kernel/time.c| 43 
 drivers/cpuidle/cpuidle-powernv.c | 55 +++
 drivers/cpuidle/sysfs.c   | 15 -
 include/linux/cpuidle.h   |  5 +++
 5 files changed, 106 insertions(+), 14 deletions(-)

-- 
2.17.1

[RFC v4 2/3] cpuidle : Add callback whenever a state usage is enabled/disabled

2019-07-12 Thread Abhishek Goel

To force wakeup a cpu, we need to compute the timeout in the fast idle
path as a state may be enabled or disabled but there did not exist a
feedback to driver when a state is enabled or disabled.
This patch adds a callback whenever a state_usage records a store for
disable attribute.

Signed-off-by: Abhishek Goel 
---
 drivers/cpuidle/sysfs.c | 15 ++-
 include/linux/cpuidle.h |  4 
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c
index eb20adb5de23..141671a53967 100644
--- a/drivers/cpuidle/sysfs.c
+++ b/drivers/cpuidle/sysfs.c
@@ -415,8 +415,21 @@ static ssize_t cpuidle_state_store(struct kobject *kobj, 
struct attribute *attr,
struct cpuidle_state_usage *state_usage = kobj_to_state_usage(kobj);
struct cpuidle_state_attr *cattr = attr_to_stateattr(attr);
 
-   if (cattr->store)
+   if (cattr->store) {
ret = cattr->store(state, state_usage, buf, size);
+   if (ret == size &&
+   strncmp(cattr->attr.name, "disable",
+   strlen("disable"))) {
+   struct kobject *cpuidle_kobj = kobj->parent;
+   struct cpuidle_device *dev =
+   to_cpuidle_device(cpuidle_kobj);
+   struct cpuidle_driver *drv =
+   cpuidle_get_cpu_driver(dev);
+
+   if (drv->disable_callback)
+   drv->disable_callback(dev, drv);
+   }
+   }
 
return ret;
 }
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index bb9a0db89f1a..8a0e54bd0d5d 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -119,6 +119,10 @@ struct cpuidle_driver {
 
/* the driver handles the cpus in cpumask */
struct cpumask  *cpumask;
+
+   void (*disable_callback)(struct cpuidle_device *dev,
+   struct cpuidle_driver *drv);
+
 };
 
 #ifdef CONFIG_CPU_IDLE
-- 
2.17.1

[PATCH v4 1/3] cpuidle-powernv : forced wakeup for stop states

2019-07-12 Thread Abhishek Goel

Currently, the cpuidle governors determine what idle state a idling CPU
should enter into based on heuristics that depend on the idle history on
that CPU. Given that no predictive heuristic is perfect, there are cases
where the governor predicts a shallow idle state, hoping that the CPU will
be busy soon. However, if no new workload is scheduled on that CPU in the
near future, the CPU may end up in the shallow state.

This is problematic, when the predicted state in the aforementioned
scenario is a shallow stop state on a tickless system. As we might get
stuck into shallow states for hours, in absence of ticks or interrupts.

To address this, We forcefully wakeup the cpu by setting the
decrementer. The decrementer is set to a value that corresponds with the
residency of the next available state. Thus firing up a timer that will
forcefully wakeup the cpu. Few such iterations will essentially train the
governor to select a deeper state for that cpu, as the timer here
corresponds to the next available cpuidle state residency. Thus, cpu will
eventually end up in the deepest possible state.

Signed-off-by: Abhishek Goel 
---

Auto-promotion
  v1 : started as auto promotion logic for cpuidle states in generic
driver
  v2 : Removed timeout_needed and rebased the code to upstream kernel
Forced-wakeup
  v1 : New patch with name of forced wakeup started
  v2 : Extending the forced wakeup logic for all states. Setting the
decrementer instead of queuing up a hrtimer to implement the logic.
  v3 : Cleanly handle setting/resetting of decrementer so as to not break
irq work
  v4 : Changed type and name of set/reset decrementer fucntion
   Handled irq_work_pending in try_set_dec_before_idle

 arch/powerpc/include/asm/time.h   |  2 ++
 arch/powerpc/kernel/time.c| 43 +++
 drivers/cpuidle/cpuidle-powernv.c | 40 
 3 files changed, 85 insertions(+)

diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 54f4ec1f9fab..294a472ce161 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -188,6 +188,8 @@ static inline unsigned long tb_ticks_since(unsigned long 
tstamp)
 extern u64 mulhdu(u64, u64);
 #endif
 
+extern bool try_set_dec_before_idle(u64 timeout);
+extern void try_reset_dec_after_idle(void);
 extern void div128_by_32(u64 dividend_high, u64 dividend_low,
 unsigned divisor, struct div_result *dr);
 
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 694522308cd5..d004c0d8e099 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -576,6 +576,49 @@ void arch_irq_work_raise(void)
 
 #endif /* CONFIG_IRQ_WORK */
 
+/*
+ * This function tries setting decrementer before entering into idle.
+ * Returns true if we have reprogrammed the decrementer for idle.
+ * Returns false if the decrementer is unchanged.
+ */
+bool try_set_dec_before_idle(u64 timeout)
+{
+   u64 *next_tb = this_cpu_ptr(&decrementers_next_tb);
+   u64 now = get_tb_or_rtc();
+
+   if (now + timeout > *next_tb)
+   return false;
+
+   set_dec(timeout);
+   if (test_irq_work_pending())
+   set_dec(1);
+
+   return true;
+}
+
+/*
+ * This function gets called if we have set decrementer before
+ * entering into idle. It tries to reset/restore the decrementer
+ * to its original value.
+ */
+void try_reset_dec_after_idle(void)
+{
+   u64 now;
+   u64 *next_tb;
+
+   if (test_irq_work_pending())
+   return;
+
+   now = get_tb_or_rtc();
+   next_tb = this_cpu_ptr(&decrementers_next_tb);
+   if (now >= *next_tb)
+   return;
+
+   set_dec(*next_tb - now);
+   if (test_irq_work_pending())
+   set_dec(1);
+}
+
 /*
  * timer_interrupt - gets called when the decrementer overflows,
  * with interrupts disabled.
diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 84b1ebe212b3..17e20e408ffe 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Expose only those Hardware idle states via the cpuidle framework
@@ -46,6 +47,26 @@ static struct stop_psscr_table 
stop_psscr_table[CPUIDLE_STATE_MAX] __read_mostly
 static u64 default_snooze_timeout __read_mostly;
 static bool snooze_timeout_en __read_mostly;
 
+static u64 forced_wakeup_timeout(struct cpuidle_device *dev,
+struct cpuidle_driver *drv,
+int index)
+{
+   int i;
+
+   for (i = index + 1; i < drv->state_count; i++) {
+   struct cpuidle_state *s = &drv->states[i];
+   struct cpuidle_state_usage *su = &dev->states_usage[i];
+
+   if (s->disabled || su->disable)
+   continue;
+
+   return (s->target_residency + 2 * s->exit_latency) *
+

[RFC v4 3/3] cpuidle-powernv : Recompute the idle-state timeouts when state usage is enabled/disabled

2019-07-12 Thread Abhishek Goel

The disable callback can be used to compute timeout for other states
whenever a state is enabled or disabled. We store the computed timeout
in "timeout" defined in cpuidle state strucure. So, we compute timeout
only when some state is enabled or disabled and not every time in the
fast idle path.
We also use the computed timeout to get timeout for snooze, thus getting
rid of get_snooze_timeout for snooze loop.

Signed-off-by: Abhishek Goel 
---
 drivers/cpuidle/cpuidle-powernv.c | 35 +++
 include/linux/cpuidle.h   |  1 +
 2 files changed, 13 insertions(+), 23 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 17e20e408ffe..29add322d0c4 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -45,7 +45,6 @@ struct stop_psscr_table {
 static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX] 
__read_mostly;
 
 static u64 default_snooze_timeout __read_mostly;
-static bool snooze_timeout_en __read_mostly;
 
 static u64 forced_wakeup_timeout(struct cpuidle_device *dev,
 struct cpuidle_driver *drv,
@@ -67,26 +66,13 @@ static u64 forced_wakeup_timeout(struct cpuidle_device *dev,
return 0;
 }
 
-static u64 get_snooze_timeout(struct cpuidle_device *dev,
- struct cpuidle_driver *drv,
- int index)
+static void pnv_disable_callback(struct cpuidle_device *dev,
+struct cpuidle_driver *drv)
 {
int i;
 
-   if (unlikely(!snooze_timeout_en))
-   return default_snooze_timeout;
-
-   for (i = index + 1; i < drv->state_count; i++) {
-   struct cpuidle_state *s = &drv->states[i];
-   struct cpuidle_state_usage *su = &dev->states_usage[i];
-
-   if (s->disabled || su->disable)
-   continue;
-
-   return s->target_residency * tb_ticks_per_usec;
-   }
-
-   return default_snooze_timeout;
+   for (i = 0; i < drv->state_count; i++)
+   drv->states[i].timeout = forced_wakeup_timeout(dev, drv, i);
 }
 
 static int snooze_loop(struct cpuidle_device *dev,
@@ -94,16 +80,20 @@ static int snooze_loop(struct cpuidle_device *dev,
int index)
 {
u64 snooze_exit_time;
+   u64 snooze_timeout = drv->states[index].timeout;
+
+   if (!snooze_timeout)
+   snooze_timeout = default_snooze_timeout;
 
set_thread_flag(TIF_POLLING_NRFLAG);
 
local_irq_enable();
 
-   snooze_exit_time = get_tb() + get_snooze_timeout(dev, drv, index);
+   snooze_exit_time = get_tb() + snooze_timeout;
ppc64_runlatch_off();
HMT_very_low();
while (!need_resched()) {
-   if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
+   if (get_tb() > snooze_exit_time) {
/*
 * Task has not woken up but we are exiting the polling
 * loop anyway. Require a barrier after polling is
@@ -168,7 +158,7 @@ static int stop_loop(struct cpuidle_device *dev,
u64 timeout_tb;
bool forced_wakeup = false;
 
-   timeout_tb = forced_wakeup_timeout(dev, drv, index);
+   timeout_tb = drv->states[index].timeout;
 
/* Ensure that the timeout is at least one microsecond
 * greater than current decrement value. Else, we will
@@ -263,6 +253,7 @@ static int powernv_cpuidle_driver_init(void)
 */
 
drv->cpumask = (struct cpumask *)cpu_present_mask;
+   drv->disable_callback = pnv_disable_callback;
 
return 0;
 }
@@ -422,8 +413,6 @@ static int powernv_idle_probe(void)
/* Device tree can indicate more idle states */
max_idle_state = powernv_add_idle_states();
default_snooze_timeout = TICK_USEC * tb_ticks_per_usec;
-   if (max_idle_state > 1)
-   snooze_timeout_en = true;
} else
return -ENODEV;
 
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 8a0e54bd0d5d..31662b657b9c 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -50,6 +50,7 @@ struct cpuidle_state {
int power_usage; /* in mW */
unsigned inttarget_residency; /* in US */
booldisabled; /* disabled on all CPUs */
+   unsigned long long timeout; /* timeout for exiting out of a state */
 
int (*enter)(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
-- 
2.17.1

Re: [PATCH] scatterlist: Allocate a contiguous array instead of chaining

2019-07-12 Thread Thomas Gleixner

On Fri, 12 Jul 2019, Ming Lei wrote:
> On Thu, Jul 11, 2019 at 11:36:56PM -0700, Sultan Alsawaf wrote:
> > From: Sultan Alsawaf 
> > 
> > Typically, drivers allocate sg lists of sizes up to a few MiB in size.
> > The current algorithm deals with large sg lists by splitting them into
> > several smaller arrays and chaining them together. But if the sg list
> > allocation is large, and we know the size ahead of time, sg chaining is
> > both inefficient and unnecessary.
> > 
> > Rather than calling kmalloc hundreds of times in a loop for chaining
> > tiny arrays, we can simply do it all at once with kvmalloc, which has
> > the proper tradeoff on when to stop using kmalloc and instead use
> > vmalloc.
> 
> vmalloc() may sleep, so it is impossible to be called in atomic context.

Allocations from atomic context should be avoided wherever possible and you
really have to have a very convincing argument why an atomic allocation is
absolutely necessary. I cleaned up quite some GFP_ATOMIC users over the
last couple of years and all of them were doing it for the very wrong
reasons and mostly just to silence the warning which is triggered with
GFP_KERNEL when called from a non-sleepable context.

So I suggest to audit all call sites first and figure out whether they
really must use GFP_ATOMIC and if possible clean them up, remove the GFP
argument and then do the vmalloc thing on top.

Thanks,

tglx

[PATCH] KVM: Boosting vCPUs that are delivering interrupts

2019-07-12 Thread Wanpeng Li

From: Wanpeng Li 

Inspired by commit 9cac38dd5d (KVM/s390: Set preempted flag during vcpu wakeup 
and interrupt delivery), except the lock holder, we want to also boost vCPUs 
that are delivering interrupts. Actually most smp_call_function_many calls are 
synchronous ipi calls, the ipi target vCPUs are also good yield candidates. 
This patch sets preempted flag during wakeup and interrupt delivery time.

Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
ebizzy -M

vanilla boostingimproved
1VM  23000   21232-9%  
2VM   28008000   180%
3VM   1800310072%

Testing on my Haswell desktop 8 HT, with 8 vCPUs VM 8GB RAM, two VMs, 
one running ebizzy -M, the other running 'stress --cpu 2':

w/ boosting + w/o pv sched yield(vanilla)   

vanilla boosting   improved 
 1570 4000   55%

w/ boosting + w/ pv sched yield(vanilla)

vanilla boosting   improved 
 1844 5157   79%   

w/o boosting, perf top in VM:

 72.33%  [kernel]   [k] smp_call_function_many
  4.22%  [kernel]   [k] call_function_i
  3.71%  [kernel]   [k] async_page_fault

w/ boosting, perf top in VM:

 38.43%  [kernel]   [k] smp_call_function_many
  6.31%  [kernel]   [k] async_page_fault
  6.13%  libc-2.23.so   [.] __memcpy_avx_unaligned
  4.88%  [kernel]   [k] call_function_interrupt

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Christian Borntraeger 
Signed-off-by: Wanpeng Li 
---
 virt/kvm/kvm_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b4ab59d..2c46705 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2404,8 +2404,10 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
int me;
int cpu = vcpu->cpu;
 
-   if (kvm_vcpu_wake_up(vcpu))
+   if (kvm_vcpu_wake_up(vcpu)) {
+   vcpu->preempted = true;
return;
+   }
 
me = get_cpu();
if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
-- 
2.7.4

Re: [PATCH] mm: vmscan: scan anonymous pages on file refaults

2019-07-12 Thread Michal Hocko

On Fri 05-07-19 20:45:05, Kuo-Hsin Yang wrote:
> With 4 processes accessing non-overlapping parts of a large file, 30316
> pages swapped out with this patch, 5152 pages swapped out without this
> patch. The swapout number is small comparing to pgpgin.

which is 5 times more swapout. This may be seen to be a lot for
workloads that prefer no swapping (e.g. large in memory databases) with
an occasional heavy IO (e.g. backup). And I am worried those would
regress. I do agree that the current behavior is far from optimal
because the trashing is real. I believe that we really need a different
approach. Johannes has brought this up few years back (sorry I do not
have a link handy) but it was essentially about implementing refault
logic to anonymous memory and swap out based on the refault price. If
there is effectively no swapin then it simply makes more sense to swap
out rather than refault a page cache.

That being said, I am not nacking the patch. Let's see whether something
regresses as there is a no clear cut for the proper behavior. But I am
bringing that up because we really need a better and more robust plan
for the future.

-- 
Michal Hocko
SUSE Labs

Re: [PATCH 2/3] DMA mapping: Move SME handling to x86-specific files

2019-07-12 Thread Christoph Hellwig

Honestly I think this code should go away without any replacement.
There is no reason why we should have a special debug printk just
for one specific reason why there is a requirement for a large DMA
mask.

[PATCH RESEND] KVM: Boosting vCPUs that are delivering interrupts

2019-07-12 Thread Wanpeng Li

From: Wanpeng Li 

Inspired by commit 9cac38dd5d (KVM/s390: Set preempted flag during vcpu wakeup 
and interrupt delivery), except the lock holder, we want to also boost vCPUs 
that are delivering interrupts. Actually most smp_call_function_many calls are 
synchronous ipi calls, the ipi target vCPUs are also good yield candidates. 
This patch sets preempted flag during wakeup and interrupt delivery time.

Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
ebizzy -M

vanilla boostingimproved
1VM  23000   21232-9%  
2VM   28008000   180%
3VM   1800310072%

Testing on my Haswell desktop 8 HT, with 8 vCPUs VM 8GB RAM, two VMs, 
one running ebizzy -M, the other running 'stress --cpu 2':

w/ boosting + w/o pv sched yield(vanilla)   

vanilla boosting   improved 
  1570 4000   55%

w/ boosting + w/ pv sched yield(vanilla)

vanilla boosting   improved 
  1844 5157   79%   

w/o boosting, perf top in VM:

 72.33%  [kernel]   [k] smp_call_function_many
  4.22%  [kernel]   [k] call_function_i
  3.71%  [kernel]   [k] async_page_fault

w/ boosting, perf top in VM:

 38.43%  [kernel]   [k] smp_call_function_many
  6.31%  [kernel]   [k] async_page_fault
  6.13%  libc-2.23.so   [.] __memcpy_avx_unaligned
  4.88%  [kernel]   [k] call_function_interrupt

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Christian Borntraeger 
Signed-off-by: Wanpeng Li 
---
 virt/kvm/kvm_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b4ab59d..2c46705 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2404,8 +2404,10 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
int me;
int cpu = vcpu->cpu;
 
-   if (kvm_vcpu_wake_up(vcpu))
+   if (kvm_vcpu_wake_up(vcpu)) {
+   vcpu->preempted = true;
return;
+   }
 
me = get_cpu();
if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
-- 
2.7.4

Re: [PATCH v1] Bluetooth: hci_qca: Send VS pre shutdown command.

2019-07-12 Thread Marcel Holtmann

Hi Harish,

> WCN399x chips are coex chips, it needs a VS pre shutdown
> command while turning off the BT. So that chip can inform
> BT is OFF to other active clients.
> 
> Signed-off-by: Harish Bandi 
> ---
> drivers/bluetooth/btqca.c   | 21 +
> drivers/bluetooth/btqca.h   |  7 +++
> drivers/bluetooth/hci_qca.c |  3 +++
> 3 files changed, 31 insertions(+)

patch has been applied to bluetooth-next tree.

Regards

Marcel

[PATCHv3] clk: add imx8 clk defines

2019-07-12 Thread Oliver Graute

From: Anson Huang 

added header defines for imx8qm clock

Signed-off-by: Anson Huang 
Signed-off-by: Oliver Graute 
Reviewed-by: Rob Herring 
---

- fixed authorship

 include/dt-bindings/clock/imx8qm-clock.h | 851 +++
 1 file changed, 851 insertions(+)
 create mode 100644 include/dt-bindings/clock/imx8qm-clock.h

diff --git a/include/dt-bindings/clock/imx8qm-clock.h 
b/include/dt-bindings/clock/imx8qm-clock.h
new file mode 100644
index ..47217e4eaa6b
--- /dev/null
+++ b/include/dt-bindings/clock/imx8qm-clock.h
@@ -0,0 +1,851 @@
+/* SPDX-License-Identifier: (GPL-2.0+ OR MIT) */
+/*
+ * Copyright (C) 2016 Freescale Semiconductor, Inc.
+ * Copyright 2017 NXP
+*/
+
+#ifndef __DT_BINDINGS_CLOCK_IMX8QM_H
+#define __DT_BINDINGS_CLOCK_IMX8QM_H
+
+#define IMX8QM_CLK_DUMMY   0
+
+#define IMX8QM_A53_DIV 1
+#define IMX8QM_A53_CLK 2
+#define IMX8QM_A72_DIV 3
+#define IMX8QM_A72_CLK 4
+
+/* SC Clocks. */
+#define IMX8QM_SC_I2C_DIV  5
+#define IMX8QM_SC_I2C_CLK  6
+#define IMX8QM_SC_PID0_DIV 7
+#define IMX8QM_SC_PID0_CLK 8
+#define IMX8QM_SC_PIT_DIV  9
+#define IMX8QM_SC_PIT_CLK  10
+#define IMX8QM_SC_TPM_DIV  11
+#define IMX8QM_SC_TPM_CLK  12
+#define IMX8QM_SC_UART_DIV 13
+#define IMX8QM_SC_UART_CLK 14
+
+/* LSIO */
+#define IMX8QM_PWM0_DIV15
+#define IMX8QM_PWM0_CLK16
+#define IMX8QM_PWM1_DIV17
+#define IMX8QM_PWM1_CLK18
+#define IMX8QM_PWM2_DIV19
+#define IMX8QM_PWM2_CLK20
+#define IMX8QM_PWM3_DIV21
+#define IMX8QM_PWM3_CLK22
+#define IMX8QM_PWM4_DIV23
+#define IMX8QM_PWM4_CLK24
+#define IMX8QM_PWM5_DIV26
+#define IMX8QM_PWM5_CLK27
+#define IMX8QM_PWM6_DIV28
+#define IMX8QM_PWM6_CLK29
+#define IMX8QM_PWM7_DIV30
+#define IMX8QM_PWM7_CLK31
+#define IMX8QM_FSPI0_DIV   32
+#define IMX8QM_FSPI0_CLK   33
+#define IMX8QM_FSPI1_DIV   34
+#define IMX8QM_FSPI1_CLK   35
+#define IMX8QM_GPT0_DIV36
+//#define IMX8QM_GPT0_CLK  37
+#define IMX8QM_GPT1_DIV38
+//#define IMX8QM_GPT1_CLK  39
+#define IMX8QM_GPT2_DIV40
+#define IMX8QM_GPT2_CLK41
+#define IMX8QM_GPT3_DIV42
+#define IMX8QM_GPT3_CLK43
+#define IMX8QM_GPT4_DIV44
+#define IMX8QM_GPT4_CLK45
+
+/* Connectivity */
+#define IMX8QM_APBHDMA_CLK 46
+#define IMX8QM_GPMI_APB_CLK47
+#define IMX8QM_GPMI_APB_BCH_CLK48
+#define IMX8QM_GPMI_BCH_IO_DIV 49
+#define IMX8QM_GPMI_BCH_IO_CLK 50
+#define IMX8QM_GPMI_BCH_DIV51
+#define IMX8QM_GPMI_BCH_CLK52
+#define IMX8QM_SDHC0_IPG_CLK   53
+#define IMX8QM_SDHC0_DIV   54
+#define IMX8QM_SDHC0_CLK   55
+#define IMX8QM_SDHC1_IPG_CLK   56
+#define IMX8QM_SDHC1_DIV   57
+#define IMX8QM_SDHC1_CLK   58
+#define IMX8QM_SDHC2_IPG_CLK   59
+#define IMX8QM_SDHC2_DIV   60
+#define IMX8QM_SDHC2_CLK   61
+#define IMX8QM_USB2_OH_AHB_CLK 62
+#define IMX8QM_USB2_OH_IPG_S_CL63
+#define IMX8QM_USB2_OH_IPG_S_PL301_CLK 64
+#define IMX8QM_USB2_PHY_IPG_CLK65
+#define IMX8QM_USB3_IPG_CLK66
+#define IMX8QM_USB3_CORE_PCLK  67

Re: [PATCH] scatterlist: Allocate a contiguous array instead of chaining

2019-07-12 Thread Sultan Alsawaf

On Fri, Jul 12, 2019 at 09:06:40AM +0200, Thomas Gleixner wrote:
> On Fri, 12 Jul 2019, Ming Lei wrote:
> > vmalloc() may sleep, so it is impossible to be called in atomic context.
> 
> Allocations from atomic context should be avoided wherever possible and you
> really have to have a very convincing argument why an atomic allocation is
> absolutely necessary. I cleaned up quite some GFP_ATOMIC users over the
> last couple of years and all of them were doing it for the very wrong
> reasons and mostly just to silence the warning which is triggered with
> GFP_KERNEL when called from a non-sleepable context.
> 
> So I suggest to audit all call sites first and figure out whether they
> really must use GFP_ATOMIC and if possible clean them up, remove the GFP
> argument and then do the vmalloc thing on top.

Hello Thomas and Ming,

It looks like the following call sites are atomic:
drivers/crypto/qce/ablkcipher.c:92: ret = sg_alloc_table(&rctx->dst_tbl, 
rctx->dst_nents, gfp);
drivers/crypto/ccp/ccp-crypto-aes-cmac.c:110:   ret = 
sg_alloc_table(&rctx->data_sg, sg_count, gfp);
drivers/crypto/ccp/ccp-crypto-sha.c:103:ret = 
sg_alloc_table(&rctx->data_sg, sg_count, gfp);
drivers/spi/spi-pl022.c:1035:   ret = sg_alloc_table(&pl022->sgt_rx, pages, 
GFP_ATOMIC);
drivers/spi/spi-pl022.c:1039:   ret = sg_alloc_table(&pl022->sgt_tx, pages, 
GFP_ATOMIC);

The crypto ones are conditionally made atomic depending on the presence of
CRYPTO_TFM_REQ_MAY_SLEEP.

Additionally, the following allocation could be problematic with kvmalloc:
net/ceph/crypto.c:180:  ret = sg_alloc_table(sgt, chunk_cnt, GFP_NOFS);

This is a snippet from kvmalloc:
/*
 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page 
tables)
 * so the given set of flags has to be compatible.
 */
if ((flags & GFP_KERNEL) != GFP_KERNEL)
return kmalloc_node(size, flags, node);

Use of GFP_NOFS in net/ceph/crypto.c would cause kvmalloc to fall back to
kmalloc_node, which could cause problems if the allocation size is too large for
kmalloc_node to reasonably accomodate.

Also, it looks like the vmalloc family doesn't have kvmalloc's GFP_KERNEL check.
Is this intentional, or does vmalloc really not require GFP_KERNEL context?

Thanks,
Sultan

Re: [PATCH v4 4/4] mm: introduce MADV_PAGEOUT

2019-07-12 Thread Michal Hocko

On Fri 12-07-19 14:18:28, Minchan Kim wrote:
[...]
> >From 41592f23e876ec21e49dc3c76dc89538e2bb16be Mon Sep 17 00:00:00 2001
> From: Minchan Kim 
> Date: Fri, 12 Jul 2019 14:05:36 +0900
> Subject: [PATCH] mm: factor out common parts between MADV_COLD and
>  MADV_PAGEOUT
> 
> There are many common parts between MADV_COLD and MADV_PAGEOUT.
> This patch factor them out to save code duplication.

This looks better indeed. I still hope that this can get improved even
further but let's do that in a follow up patch.

> Signed-off-by: Minchan Kim 

Acked-by: Michal Hocko 

> ---
>  mm/madvise.c | 201 +--
>  1 file changed, 52 insertions(+), 149 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index bc2f0138982e..3d3d14517cc8 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -30,6 +30,11 @@
>  
>  #include "internal.h"
>  
> +struct madvise_walk_private {
> + struct mmu_gather *tlb;
> + bool pageout;
> +};
> +
>  /*
>   * Any behaviour which results in changes to the vma->vm_flags needs to
>   * take mmap_sem for writing. Others, which simply traverse vmas, need
> @@ -310,16 +315,23 @@ static long madvise_willneed(struct vm_area_struct *vma,
>   return 0;
>  }
>  
> -static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr,
> - unsigned long end, struct mm_walk *walk)
> +static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
> + unsigned long addr, unsigned long end,
> + struct mm_walk *walk)
>  {
> - struct mmu_gather *tlb = walk->private;
> + struct madvise_walk_private *private = walk->private;
> + struct mmu_gather *tlb = private->tlb;
> + bool pageout = private->pageout;
>   struct mm_struct *mm = tlb->mm;
>   struct vm_area_struct *vma = walk->vma;
>   pte_t *orig_pte, *pte, ptent;
>   spinlock_t *ptl;
> - struct page *page;
>   unsigned long next;
> + struct page *page = NULL;
> + LIST_HEAD(page_list);
> +
> + if (fatal_signal_pending(current))
> + return -EINTR;
>  
>   next = pmd_addr_end(addr, end);
>   if (pmd_trans_huge(*pmd)) {
> @@ -358,6 +370,12 @@ static int madvise_cold_pte_range(pmd_t *pmd, unsigned 
> long addr,
>   return 0;
>   }
>  
> + if (pageout) {
> + if (isolate_lru_page(page))
> + goto huge_unlock;
> + list_add(&page->lru, &page_list);
> + }
> +
>   if (pmd_young(orig_pmd)) {
>   pmdp_invalidate(vma, addr, pmd);
>   orig_pmd = pmd_mkold(orig_pmd);
> @@ -366,10 +384,14 @@ static int madvise_cold_pte_range(pmd_t *pmd, unsigned 
> long addr,
>   tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
>   }
>  
> + ClearPageReferenced(page);
>   test_and_clear_page_young(page);
> - deactivate_page(page);
>  huge_unlock:
>   spin_unlock(ptl);
> + if (pageout)
> + reclaim_pages(&page_list);
> + else
> + deactivate_page(page);
>   return 0;
>   }
>  
> @@ -423,6 +445,12 @@ static int madvise_cold_pte_range(pmd_t *pmd, unsigned 
> long addr,
>  
>   VM_BUG_ON_PAGE(PageTransCompound(page), page);
>  
> + if (pageout) {
> + if (isolate_lru_page(page))
> + continue;
> + list_add(&page->lru, &page_list);
> + }
> +
>   if (pte_young(ptent)) {
>   ptent = ptep_get_and_clear_full(mm, addr, pte,
>   tlb->fullmm);
> @@ -437,12 +465,16 @@ static int madvise_cold_pte_range(pmd_t *pmd, unsigned 
> long addr,
>* As a side effect, it makes confuse idle-page tracking
>* because they will miss recent referenced history.
>*/
> + ClearPageReferenced(page);
>   test_and_clear_page_young(page);
> - deactivate_page(page);
> + if (!pageout)
> + deactivate_page(page);
>   }
>  
>   arch_enter_lazy_mmu_mode();
>   pte_unmap_unlock(orig_pte, ptl);
> + if (pageout)
> + reclaim_pages(&page_list);
>   cond_resched();
>  
>   return 0;
> @@ -452,10 +484,15 @@ static void madvise_cold_page_range(struct mmu_gather 
> *tlb,
>struct vm_area_struct *vma,
>unsigned long addr, unsigned long end)
>  {
> + struct madvise_walk_private walk_private = {
> + .tlb = tlb,
> + .pageout = false,
> + };
> +
>   struct mm_walk cold_walk = {
> - .pmd_entry = madvise_cold_pte_range,
> + .pmd_entry = madvise_cold_or_pageout_pte_range,
>

Re: [PATCH] phy: Change the configuration interface param to void* to make it more general

2019-07-12 Thread Maxime Ripard

On Fri, Jul 12, 2019 at 05:26:04PM +0800, Zeng Tao wrote:
> The phy framework now allows runtime configurations, but only limited
> to mipi now, and it's not reasonable to introduce user specified
> configurations into the union phy_configure_opts structure. An simple
> way is to replace with a void *.
>
> We have already got some phy drivers which introduce private phy API
> for runtime configurations, and with this patch, they can switch to
> the phy_configure as a replace.
>
> Signed-off-by: Zeng Tao 

I still don't believe this is the right approach, for the reasons
exposed in my first review of that patch.

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


signature.asc
Description: PGP signature

Re: [PATCH] [media] media: mtk-mdp: fix reference count on old device tree

2019-07-12 Thread houlong wei




On Mon, 2019-07-08 at 17:06 +0800, Matthias Brugger wrote:
> 
> On 21/06/2019 13:32, Matthias Brugger wrote:
> > of_get_next_child() increments the reference count of the returning
> > device_node. Decrement it in the check if we are using the old or the
> > new DTB.
> > 
> > Fixes: ba1f1f70c2c0 ("[media] media: mtk-mdp: Fix mdp device tree")
> > Signed-off-by: Matthias Brugger 
> 
> Any comments on that?
> 

Hi Matthias,
Thanks for fixing the bug. Sorry to reply late~

Acked-by: Houlong Wei 


> > ---
> >  drivers/media/platform/mtk-mdp/mtk_mdp_core.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/media/platform/mtk-mdp/mtk_mdp_core.c 
> > b/drivers/media/platform/mtk-mdp/mtk_mdp_core.c
> > index bbb24fb95b95..bafe53c5d54a 100644
> > --- a/drivers/media/platform/mtk-mdp/mtk_mdp_core.c
> > +++ b/drivers/media/platform/mtk-mdp/mtk_mdp_core.c
> > @@ -118,7 +118,9 @@ static int mtk_mdp_probe(struct platform_device *pdev)
> > mutex_init(&mdp->vpulock);
> >  
> > /* Old dts had the components as child nodes */
> > -   if (of_get_next_child(dev->of_node, NULL)) {
> > +   parent = of_get_next_child(dev->of_node, NULL);
> > +   if (parent) {
> > +   of_node_put(parent);
> > parent = dev->of_node;
> > dev_warn(dev, "device tree is out of date\n");
> > } else {
> >

Re: [PATCH] input: API for Setting a Timestamp from a Driver

2019-07-12 Thread Benjamin Tissoires

On Fri, Jul 12, 2019 at 8:41 AM Dmitry Torokhov
 wrote:
>
> Hi Atif,
>
> On Wed, Jul 10, 2019 at 04:04:10PM -0700, Atif Niyaz wrote:
> > Currently, evdev stamps time with timestamps acquired in
> > evdev_events. However, this timestamping may not be accurate in terms of
> > measuring when the actual event happened. This API allows any 3rd party
> > driver to be able to call input_set_timestamp, and provide a timestamp
> > that can be utilized in order to provide a more accurate sense of time
> > for the event
> >
> > Signed-off-by: Atif Niyaz 
>
> This looks OK to me. Benjamin, Peter, any concerns here?
>

No red flags from me (though Peter is the one using all of this).

Just curious, which drivers do you think will be using this new API?
I can see that we might want to use hid-multitouch for it, with the
Scan Time forwarded by the device, but what do you have in mind?

Cheers,
Benjamin

>
> > ---
> >  drivers/input/evdev.c | 42 --
> >  drivers/input/input.c | 17 +
> >  include/linux/input.h | 38 ++
> >  3 files changed, 71 insertions(+), 26 deletions(-)
> >
> > diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
> > index 867c2cfd0038..a331efa0a3f6 100644
> > --- a/drivers/input/evdev.c
> > +++ b/drivers/input/evdev.c
> > @@ -25,13 +25,6 @@
> >  #include 
> >  #include "input-compat.h"
> >
> > -enum evdev_clock_type {
> > - EV_CLK_REAL = 0,
> > - EV_CLK_MONO,
> > - EV_CLK_BOOT,
> > - EV_CLK_MAX
> > -};
> > -
> >  struct evdev {
> >   int open;
> >   struct input_handle handle;
> > @@ -53,7 +46,7 @@ struct evdev_client {
> >   struct fasync_struct *fasync;
> >   struct evdev *evdev;
> >   struct list_head node;
> > - unsigned int clk_type;
> > + input_clk_t clk_type;
> >   bool revoked;
> >   unsigned long *evmasks[EV_CNT];
> >   unsigned int bufsize;
> > @@ -150,16 +143,18 @@ static void __evdev_flush_queue(struct evdev_client 
> > *client, unsigned int type)
> >  static void __evdev_queue_syn_dropped(struct evdev_client *client)
> >  {
> >   struct input_event ev;
> > - ktime_t time;
> >   struct timespec64 ts;
> > + ktime_t *time = input_get_timestamp(client->evdev->handle.dev);
> >
> > - time = client->clk_type == EV_CLK_REAL ?
> > - ktime_get_real() :
> > - client->clk_type == EV_CLK_MONO ?
> > - ktime_get() :
> > - ktime_get_boottime();
> > + switch (client->clk_type) {
> > + case INPUT_CLK_REAL:
> > + case INPUT_CLK_MONO:
> > + ts = ktime_to_timespec64(time[client->clk_type]);
> > + break;
> > + default:
> > + ts = ktime_to_timespec64(time[INPUT_CLK_BOOT]);
>
> Add "break" here please.
>
> > + }
> >
> > - ts = ktime_to_timespec64(time);
> >   ev.input_event_sec = ts.tv_sec;
> >   ev.input_event_usec = ts.tv_nsec / NSEC_PER_USEC;
> >   ev.type = EV_SYN;
> > @@ -185,21 +180,21 @@ static void evdev_queue_syn_dropped(struct 
> > evdev_client *client)
> >   spin_unlock_irqrestore(&client->buffer_lock, flags);
> >  }
> >
> > -static int evdev_set_clk_type(struct evdev_client *client, unsigned int 
> > clkid)
> > +static int evdev_set_clk_type(struct evdev_client *client, clockid_t clkid)
> >  {
> >   unsigned long flags;
> > - unsigned int clk_type;
> > + input_clk_t clk_type;
> >
> >   switch (clkid) {
> >
> >   case CLOCK_REALTIME:
> > - clk_type = EV_CLK_REAL;
> > + clk_type = INPUT_CLK_REAL;
> >   break;
> >   case CLOCK_MONOTONIC:
> > - clk_type = EV_CLK_MONO;
> > + clk_type = INPUT_CLK_MONO;
> >   break;
> >   case CLOCK_BOOTTIME:
> > - clk_type = EV_CLK_BOOT;
> > + clk_type = INPUT_CLK_BOOT;
> >   break;
> >   default:
> >   return -EINVAL;
> > @@ -307,12 +302,7 @@ static void evdev_events(struct input_handle *handle,
> >  {
> >   struct evdev *evdev = handle->private;
> >   struct evdev_client *client;
> > - ktime_t ev_time[EV_CLK_MAX];
> > -
> > - ev_time[EV_CLK_MONO] = ktime_get();
> > - ev_time[EV_CLK_REAL] = ktime_mono_to_real(ev_time[EV_CLK_MONO]);
> > - ev_time[EV_CLK_BOOT] = ktime_mono_to_any(ev_time[EV_CLK_MONO],
> > -  TK_OFFS_BOOT);
> > + ktime_t *ev_time = input_get_timestamp(handle->dev);
> >
> >   rcu_read_lock();
> >
> > diff --git a/drivers/input/input.c b/drivers/input/input.c
> > index 7f3c5fcb9ed6..ae8b0ee58120 100644
> > --- a/drivers/input/input.c
> > +++ b/drivers/input/input.c
> > @@ -1894,6 +1894,23 @@ void input_free_device(struct input_dev *dev)
> >  }
> >  EXPORT_SYMBOL(input_free_device);
> >
> > +/**
> > + * input_get_timestamp - get timestamp for input events
> > + * @dev: input device to get timestamp f

Re: [PATCH v2] rtl8xxxu: Fix wifi low signal strength issue of RTL8723BU

2019-07-12 Thread Daniel Drake

On Fri, Jul 5, 2019 at 10:27 AM Chris Chiu  wrote:
> Per the code before REG_S0S1_PATH_SWITCH setting, the driver has told
> the co-processor the antenna is inverse.
> memset(&h2c, 0, sizeof(struct h2c_cmd));
> h2c.ant_sel_rsv.cmd = H2C_8723B_ANT_SEL_RSV;
> h2c.ant_sel_rsv.ant_inverse = 1;
> h2c.ant_sel_rsv.int_switch_type = 0;
> rtl8xxxu_gen2_h2c_cmd(priv, &h2c, sizeof(h2c.ant_sel_rsv));
>
> At least the current modification is consistent with the antenna
> inverse setting.
> I'll verify on vendor driver about when/how the inverse be determined.

I checked this out. The codepath hit hardcodes it to the AUX port,
i.e. "inverted" setup:

EXhalbtc8723b1ant_PowerOnSetting():
if(pBtCoexist->chipInterface == BTC_INTF_USB)
{
// fixed at S0 for USB interface
pBtCoexist->fBtcWrite4Byte(pBtCoexist, 0x948, 0x0);

u1Tmp |= 0x1;// antenna inverse
pBtCoexist->fBtcWriteLocalReg1Byte(pBtCoexist, 0xfe08, u1Tmp);

pBoardInfo->btdmAntPos = BTC_ANTENNA_AT_AUX_PORT;
  }

So I'm further convinced that these performance-enhancing changes are
increasing consistency with the vendor driver.

Daniel

Re: [PATCH RESEND] KVM: Boosting vCPUs that are delivering interrupts

2019-07-12 Thread Wanpeng Li

On Fri, 12 Jul 2019 at 15:15, Wanpeng Li  wrote:
>
> From: Wanpeng Li 
>
> Inspired by commit 9cac38dd5d (KVM/s390: Set preempted flag during vcpu wakeup
> and interrupt delivery), except the lock holder, we want to also boost vCPUs
> that are delivering interrupts. Actually most smp_call_function_many calls are
> synchronous ipi calls, the ipi target vCPUs are also good yield candidates.
> This patch sets preempted flag during wakeup and interrupt delivery time.
>

I forgot to mention that I disable pv tlb shootdown during testing,
function call interrupts are not easy to be triggered directly by
userspace workloads, in addition, distros' guest kernel w/o pv tlb
shootdown support can also get benefit in both tlb shootdown and
function call interrupts scenarios.

> Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
> ebizzy -M
>
> vanilla boostingimproved
> 1VM  23000   21232-9%
> 2VM   28008000   180%
> 3VM   1800310072%
>
> Testing on my Haswell desktop 8 HT, with 8 vCPUs VM 8GB RAM, two VMs,
> one running ebizzy -M, the other running 'stress --cpu 2':
>
> w/ boosting + w/o pv sched yield(vanilla)
>
> vanilla boosting   improved
>   1570 4000   55%
>
> w/ boosting + w/ pv sched yield(vanilla)
>
> vanilla boosting   improved
>   1844 5157   79%
>
> w/o boosting, perf top in VM:
>
>  72.33%  [kernel]   [k] smp_call_function_many
>   4.22%  [kernel]   [k] call_function_i
>   3.71%  [kernel]   [k] async_page_fault
>
> w/ boosting, perf top in VM:
>
>  38.43%  [kernel]   [k] smp_call_function_many
>   6.31%  [kernel]   [k] async_page_fault
>   6.13%  libc-2.23.so   [.] __memcpy_avx_unaligned
>   4.88%  [kernel]   [k] call_function_interrupt
>
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Cc: Christian Borntraeger 
> Signed-off-by: Wanpeng Li 
> ---
>  virt/kvm/kvm_main.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index b4ab59d..2c46705 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2404,8 +2404,10 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
> int me;
> int cpu = vcpu->cpu;
>
> -   if (kvm_vcpu_wake_up(vcpu))
> +   if (kvm_vcpu_wake_up(vcpu)) {
> +   vcpu->preempted = true;
> return;
> +   }
>
> me = get_cpu();
> if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
> --
> 2.7.4
>

[PATCH 1/2] usb: dwc3: Use devres to get clocks

2019-07-12 Thread Andrey Smirnov

Use devres to get clocks and drop explicit clock freeing. No
functional change intended.

Signed-off-by: Andrey Smirnov 
Cc: Felipe Balbi 
Cc: Chris Healy 
Cc: Greg Kroah-Hartman 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/usb/dwc3/core.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index c9bb93a2c81e..768023a2553c 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1436,7 +1436,7 @@ static int dwc3_probe(struct platform_device *pdev)
if (dev->of_node) {
dwc->num_clks = ARRAY_SIZE(dwc3_core_clks);
 
-   ret = clk_bulk_get(dev, dwc->num_clks, dwc->clks);
+   ret = devm_clk_bulk_get(dev, dwc->num_clks, dwc->clks);
if (ret == -EPROBE_DEFER)
return ret;
/*
@@ -1449,7 +1449,7 @@ static int dwc3_probe(struct platform_device *pdev)
 
ret = reset_control_deassert(dwc->reset);
if (ret)
-   goto put_clks;
+   return ret;
 
ret = clk_bulk_prepare(dwc->num_clks, dwc->clks);
if (ret)
@@ -1536,8 +1536,6 @@ static int dwc3_probe(struct platform_device *pdev)
clk_bulk_unprepare(dwc->num_clks, dwc->clks);
 assert_reset:
reset_control_assert(dwc->reset);
-put_clks:
-   clk_bulk_put(dwc->num_clks, dwc->clks);
 
return ret;
 }
@@ -1560,7 +1558,6 @@ static int dwc3_remove(struct platform_device *pdev)
 
dwc3_free_event_buffers(dwc);
dwc3_free_scratch_buffers(dwc);
-   clk_bulk_put(dwc->num_clks, dwc->clks);
 
return 0;
 }
-- 
2.21.0

Re: [PATCH] waitqueue: fix clang -Wuninitialized warnings

2019-07-12 Thread Peter Zijlstra

On Tue, Jul 09, 2019 at 09:27:17PM +0200, Arnd Bergmann wrote:
> On Wed, Jul 3, 2019 at 7:58 PM Nathan Chancellor
>  wrote:
> > On Wed, Jul 03, 2019 at 10:10:55AM +0200, Arnd Bergmann wrote:
> > > When CONFIG_LOCKDEP is set, every use of DECLARE_WAIT_QUEUE_HEAD_ONSTACK()
> > > produces an annoying warning from clang, which is particularly annoying
> > > for allmodconfig builds:
> > >
> > > fs/namei.c:1646:34: error: variable 'wq' is uninitialized when used 
> > > within its own initialization [-Werror,-Wuninitialized]
> > > DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
> > > ^~
> > > include/linux/wait.h:74:63: note: expanded from macro 
> > > 'DECLARE_WAIT_QUEUE_HEAD_ONSTACK'
> > > struct wait_queue_head name = __WAIT_QUEUE_HEAD_INIT_ONSTACK(name)
> > >  ^~~~
> > > include/linux/wait.h:72:33: note: expanded from macro 
> > > '__WAIT_QUEUE_HEAD_INIT_ONSTACK'
> > > ({ init_waitqueue_head(&name); name; })
> > >^~~~
> > >
> > > After playing with it for a while, I have found a way to rephrase the
> > > macro in a way that should work well with both gcc and clang and not
> > > produce this warning. The open-coded __WAIT_QUEUE_HEAD_INIT_ONSTACK
> > > is a little more verbose than the original version by Peter Zijlstra,
> > > but avoids the gcc-ism that suppresses warnings when assigning a
> > > variable to itself.
> > >
> > > Cc: Peter Zijlstra 
> > > Signed-off-by: Arnd Bergmann 
> >
> > Reviewed-by: Nathan Chancellor 
> > Tested-by: Nathan Chancellor 
> 
> Who would be the right person to pick this patch up for mainline?

That would be me; but like Andrew, I'm not a fan of this patch.

[GIT PULL] Driver core patches for 5.3-rc1

2019-07-12 Thread Greg KH

The following changes since commit f2c7c76c5d0a443053e94adb9f0918fa2fb85c3a:

  Linux 5.2-rc3 (2019-06-02 13:55:33 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git 
tags/driver-core-5.3-rc1

for you to fetch changes up to c33d442328f556460b79aba6058adb37bb555389:

  debugfs: make error message a bit more verbose (2019-07-08 10:44:57 +0200)


Driver Core and debugfs changes for 5.3-rc1

Here is the "big" driver core and debugfs changes for 5.3-rc1

It's a lot of different patches, all across the tree due to some api
changes and lots of debugfs cleanups.  Because of this, there is going
to be some merge issues with your tree at the moment, I'll follow up
with the expected resolutions to make it easier for you.

Other than the debugfs cleanups, in this set of changes we have:
- bus iteration function cleanups (will cause build warnings
  with s390 and coresight drivers in your tree)
- scripts/get_abi.pl tool to display and parse Documentation/ABI
  entries in a simple way
- cleanups to Documenatation/ABI/ entries to make them parse
  easier due to typos and other minor things
- default_attrs use for some ktype users
- driver model documentation file conversions to .rst
- compressed firmware file loading
- deferred probe fixes

All of these have been in linux-next for a while, with a bunch of merge
issues that Stephen has been patient with me for.  Other than the merge
issues, functionality is working properly in linux-next :)

Signed-off-by: Greg Kroah-Hartman 


Anders Roxell (1):
  mm/zsmalloc.c: remove unused variable

Arnd Bergmann (1):
  ARM: omap1: remove unused variable

Colin Ian King (1):
  lkdtm: remove redundant initialization of ret

Geert Uytterhoeven (2):
  tools/firmware: Add missing newline at end of file
  arch_topology: Remove error messages on out-of-memory conditions

Greg Kroah-Hartman (53):
  zswap: ignore debugfs_create_dir() return value
  trace: no need to check return value of debugfs_create functions
  blktrace: no need to check return value of debugfs_create functions
  zsmalloc: no need to check return value of debugfs_create functions
  mm: kmemleak: no need to check return value of debugfs_create functions
  hwpoison-inject: no need to check return value of debugfs_create functions
  sh: no need to check return value of debugfs_create functions
  fail_function: no need to check return value of debugfs_create functions
  kprobes: no need to check return value of debugfs_create functions
  mm: cleancache: no need to check return value of debugfs_create functions
  backing-dev: no need to check return value of debugfs_create functions
  x86: xen: no need to check return value of debugfs_create functions
  arm: omap1: no need to check return value of debugfs_create functions
  arm: omap2: no need to check return value of debugfs_create functions
  arm: dump: no need to check return value of debugfs_create functions
  x86: mm: no need to check return value of debugfs_create functions
  x86: platform: no need to check return value of debugfs_create functions
  x86: kdebugfs: no need to check return value of debugfs_create functions
  gcov: no need to check return value of debugfs_create functions
  mailbox: no need to check return value of debugfs_create functions
  btrfs: no need to check return value of debugfs_create functions
  debugfs: make debugfs_create_u32_array() return void
  vmw_balloon: no need to check return value of debugfs_create functions
  lkdtm: no need to check return value of debugfs_create functions
  ti-st: no need to check return value of debugfs_create functions
  thermal: intel: no need to check return value of debugfs_create functions
  thermal: intel_powerclamp: no need to check return value of 
debugfs_create functions
  thermal: tegra: no need to check return value of debugfs_create functions
  cxl: no need to check return value of debugfs_create functions
  lib: dynamic_debug: no need to check return value of debugfs_create 
functions
  fault-inject: clean up debugfs file creation logic
  mic: no need to check return value of debugfs_create functions
  genwq: no need to check return value of debugfs_create functions
  mei: no need to check return value of debugfs_create functions
  coresight: cpu-debug: no need to check return value of debugfs_create 
functions
  watchdog: mei_wdt: no need to check return value of debugfs_create 
functions
  watchdog: bcm_kona_wdt: no need to check return value of debugfs_create 
functions
  6lowpan: no need to check return value of debugfs_create functions
  power: avs: sma

Re: [GIT PULL] Driver core patches for 5.3-rc1

2019-07-12 Thread Greg KH

On Fri, Jul 12, 2019 at 09:36:23AM +0200, Greg KH wrote:
> The following changes since commit f2c7c76c5d0a443053e94adb9f0918fa2fb85c3a:
> 
>   Linux 5.2-rc3 (2019-06-02 13:55:33 -0700)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git 
> tags/driver-core-5.3-rc1
> 
> for you to fetch changes up to c33d442328f556460b79aba6058adb37bb555389:
> 
>   debugfs: make error message a bit more verbose (2019-07-08 10:44:57 +0200)
> 
> 
> Driver Core and debugfs changes for 5.3-rc1
> 
> Here is the "big" driver core and debugfs changes for 5.3-rc1
> 
> It's a lot of different patches, all across the tree due to some api
> changes and lots of debugfs cleanups.  Because of this, there is going
> to be some merge issues with your tree at the moment, I'll follow up
> with the expected resolutions to make it easier for you.

Here's the merge resolution patch that worked for me:


diff --cc drivers/acpi/sleep.c
index fcf4386ecc78,f0fe7c15d657..
--- a/drivers/acpi/sleep.c
+++ b/drivers/acpi/sleep.c
diff --cc drivers/misc/mei/debugfs.c
index df6bf8b81936,47cfd5005e1b..
--- a/drivers/misc/mei/debugfs.c
+++ b/drivers/misc/mei/debugfs.c
@@@ -233,22 -154,46 +154,21 @@@ void mei_dbgfs_deregister(struct mei_de
   *
   * @dev: the mei device structure
   * @name: the mei device name
 - *
 - * Return: 0 on success, <0 on failure.
   */
 -int mei_dbgfs_register(struct mei_device *dev, const char *name)
 +void mei_dbgfs_register(struct mei_device *dev, const char *name)
  {
 -  struct dentry *dir, *f;
 +  struct dentry *dir;
  
dir = debugfs_create_dir(name, NULL);
 -  if (!dir)
 -  return -ENOMEM;
 -
dev->dbgfs_dir = dir;
  
 -  f = debugfs_create_file("meclients", S_IRUSR, dir,
 -  dev, &mei_dbgfs_meclients_fops);
 -  if (!f) {
 -  dev_err(dev->dev, "meclients: registration failed\n");
 -  goto err;
 -  }
 -  f = debugfs_create_file("active", S_IRUSR, dir,
 -  dev, &mei_dbgfs_active_fops);
 -  if (!f) {
 -  dev_err(dev->dev, "active: registration failed\n");
 -  goto err;
 -  }
 -  f = debugfs_create_file("devstate", S_IRUSR, dir,
 -  dev, &mei_dbgfs_devstate_fops);
 -  if (!f) {
 -  dev_err(dev->dev, "devstate: registration failed\n");
 -  goto err;
 -  }
 -  f = debugfs_create_file("allow_fixed_address", S_IRUSR | S_IWUSR, dir,
 -  &dev->allow_fixed_address,
 -  &mei_dbgfs_allow_fa_fops);
 -  if (!f) {
 -  dev_err(dev->dev, "allow_fixed_address: registration failed\n");
 -  goto err;
 -  }
 -  return 0;
 -err:
 -  mei_dbgfs_deregister(dev);
 -  return -ENODEV;
 +  debugfs_create_file("meclients", S_IRUSR, dir, dev,
-   &mei_dbgfs_fops_meclients);
++  &mei_dbgfs_meclients_fops);
 +  debugfs_create_file("active", S_IRUSR, dir, dev,
-   &mei_dbgfs_fops_active);
++  &mei_dbgfs_active_fops);
 +  debugfs_create_file("devstate", S_IRUSR, dir, dev,
-   &mei_dbgfs_fops_devstate);
++  &mei_dbgfs_devstate_fops);
 +  debugfs_create_file("allow_fixed_address", S_IRUSR | S_IWUSR, dir,
 +  &dev->allow_fixed_address,
-   &mei_dbgfs_fops_allow_fa);
++  &mei_dbgfs_allow_fa_fops);
  }
- 
diff --cc drivers/misc/vmw_balloon.c
index fdf5ad757226,043eed845246..
--- a/drivers/misc/vmw_balloon.c
+++ b/drivers/misc/vmw_balloon.c
@@@ -1553,15 -1942,26 +1932,24 @@@ static int __init vmballoon_init(void
if (x86_hyper_type != X86_HYPER_VMWARE)
return -ENODEV;
  
-   for (page_size = VMW_BALLOON_4K_PAGE;
-page_size <= VMW_BALLOON_LAST_SIZE; page_size++)
-   INIT_LIST_HEAD(&balloon.page_sizes[page_size].pages);
- 
- 
INIT_DELAYED_WORK(&balloon.dwork, vmballoon_work);
  
+   error = vmballoon_register_shrinker(&balloon);
+   if (error)
+   goto fail;
+ 
 -  error = vmballoon_debugfs_init(&balloon);
 -  if (error)
 -  goto fail;
 +  vmballoon_debugfs_init(&balloon);
  
+   /*
+* Initialization of compaction must be done after the call to
+* balloon_devinfo_init() .
+*/
+   balloon_devinfo_init(&balloon.b_dev_info);
+   error = vmballoon_compaction_init(&balloon);
+   if (error)
+   goto fail;
+ 
+   INIT_LIST_HEAD(&balloon.huge_pages);
spin_lock_init(&balloon.comm_lock);
init_rwsem(&balloon.conf_sem);
balloon.vmci_doorbell = VMCI_INVALID_HANDLE;
* Unmerged pat

[PATCH RESEND 1/2] KVM: LAPIC: Add pv ipi tracepoint

2019-07-12 Thread Wanpeng Li

From: Wanpeng Li 

Add pv ipi tracepoint.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/lapic.c |  2 ++
 arch/x86/kvm/trace.h | 25 +
 2 files changed, 27 insertions(+)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 42da7eb..403ae3f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -562,6 +562,8 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long 
ipi_bitmap_low,
irq.level = (icr & APIC_INT_ASSERT) != 0;
irq.trig_mode = icr & APIC_INT_LEVELTRIG;
 
+   trace_kvm_pv_send_ipi(irq.vector, min, ipi_bitmap_low, ipi_bitmap_high);
+
if (icr & APIC_DEST_MASK)
return -KVM_EINVAL;
if (icr & APIC_SHORT_MASK)
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index b5c831e..ce6ee34 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1462,6 +1462,31 @@ TRACE_EVENT(kvm_hv_send_ipi_ex,
  __entry->vector, __entry->format,
  __entry->valid_bank_mask)
 );
+
+/*
+ * Tracepoints for kvm_pv_send_ipi.
+ */
+TRACE_EVENT(kvm_pv_send_ipi,
+   TP_PROTO(u32 vector, u32 min, unsigned long ipi_bitmap_low, unsigned 
long ipi_bitmap_high),
+   TP_ARGS(vector, min, ipi_bitmap_low, ipi_bitmap_high),
+
+   TP_STRUCT__entry(
+   __field(u32, vector)
+   __field(u32, min)
+   __field(unsigned long, ipi_bitmap_low)
+   __field(unsigned long, ipi_bitmap_high)
+   ),
+
+   TP_fast_assign(
+   __entry->vector = vector;
+   __entry->min = min;
+   __entry->ipi_bitmap_low = ipi_bitmap_low;
+   __entry->ipi_bitmap_high = ipi_bitmap_high;
+   ),
+
+   TP_printk("vector %d min 0x%x ipi_bitmap_low 0x%lx ipi_bitmap_high 
0x%lx",
+ __entry->vector, __entry->min, __entry->ipi_bitmap_low, 
__entry->ipi_bitmap_high)
+);
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.7.4

[PATCH RESEND 2/2] KVM: X86: Add pv tlb shootdown tracepoint

2019-07-12 Thread Wanpeng Li

From: Wanpeng Li 

Add pv tlb shootdown tracepoint.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/trace.h | 19 +++
 arch/x86/kvm/x86.c   |  2 ++
 2 files changed, 21 insertions(+)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index ce6ee34..84f32d3 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1487,6 +1487,25 @@ TRACE_EVENT(kvm_pv_send_ipi,
TP_printk("vector %d min 0x%x ipi_bitmap_low 0x%lx ipi_bitmap_high 
0x%lx",
  __entry->vector, __entry->min, __entry->ipi_bitmap_low, 
__entry->ipi_bitmap_high)
 );
+
+TRACE_EVENT(kvm_pv_tlb_flush,
+   TP_PROTO(unsigned int vcpu_id, bool need_flush_tlb),
+   TP_ARGS(vcpu_id, need_flush_tlb),
+
+   TP_STRUCT__entry(
+   __field(unsigned int,   vcpu_id )
+   __field(bool,   need_flush_tlb  )
+   ),
+
+   TP_fast_assign(
+   __entry->vcpu_id= vcpu_id;
+   __entry->need_flush_tlb = need_flush_tlb;
+   ),
+
+   TP_printk("vcpu %u need_flush_tlb %s", __entry->vcpu_id,
+   __entry->need_flush_tlb ? "true" : "false")
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2c32311..f487c9a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2458,6 +2458,8 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 * Doing a TLB flush here, on the guest's behalf, can avoid
 * expensive IPIs.
 */
+   trace_kvm_pv_tlb_flush(vcpu->vcpu_id,
+   vcpu->arch.st.steal.preempted & KVM_VCPU_FLUSH_TLB);
if (xchg(&vcpu->arch.st.steal.preempted, 0) & KVM_VCPU_FLUSH_TLB)
kvm_vcpu_flush_tlb(vcpu, false);
 
-- 
2.7.4

Re: [GIT PULL] Driver core patches for 5.3-rc1

2019-07-12 Thread Greg KH

On Fri, Jul 12, 2019 at 09:36:23AM +0200, Greg KH wrote:
> The following changes since commit f2c7c76c5d0a443053e94adb9f0918fa2fb85c3a:
> 
>   Linux 5.2-rc3 (2019-06-02 13:55:33 -0700)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git 
> tags/driver-core-5.3-rc1
> 
> for you to fetch changes up to c33d442328f556460b79aba6058adb37bb555389:
> 
>   debugfs: make error message a bit more verbose (2019-07-08 10:44:57 +0200)
> 
> 
> Driver Core and debugfs changes for 5.3-rc1
> 
> Here is the "big" driver core and debugfs changes for 5.3-rc1
> 
> It's a lot of different patches, all across the tree due to some api
> changes and lots of debugfs cleanups.  Because of this, there is going
> to be some merge issues with your tree at the moment, I'll follow up
> with the expected resolutions to make it easier for you.
> 
> Other than the debugfs cleanups, in this set of changes we have:
>   - bus iteration function cleanups (will cause build warnings
> with s390 and coresight drivers in your tree)

Here's the s390 patch that was sent previously to resolve this issue.



From: Christian Borntraeger 

commit 92ce7e83b4e5 ("driver_find_device: Unify the match function with
class_find_device()") changed the prototype of driver_find_device to
use a const void pointer.

Change match_apqn accordingly.

Fixes: ec89b55e3bce ("s390: ap: implement PAPQ AQIC interception in kernel")
Signed-off-by: Christian Borntraeger 
Signed-off-by: Vasily Gorbik 
---
 drivers/s390/crypto/vfio_ap_ops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 2c9fb1423a39..7e85ba7c6ef0 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -26,7 +26,7 @@
 
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
 
-static int match_apqn(struct device *dev, void *data)
+static int match_apqn(struct device *dev, const void *data)
 {
struct vfio_ap_queue *q = dev_get_drvdata(dev);
 
-- 
2.21.0

Re: [GIT PULL] Driver core patches for 5.3-rc1

2019-07-12 Thread Greg KH

On Fri, Jul 12, 2019 at 09:36:23AM +0200, Greg KH wrote:
> The following changes since commit f2c7c76c5d0a443053e94adb9f0918fa2fb85c3a:
> 
>   Linux 5.2-rc3 (2019-06-02 13:55:33 -0700)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git 
> tags/driver-core-5.3-rc1
> 
> for you to fetch changes up to c33d442328f556460b79aba6058adb37bb555389:
> 
>   debugfs: make error message a bit more verbose (2019-07-08 10:44:57 +0200)
> 
> 
> Driver Core and debugfs changes for 5.3-rc1
> 
> Here is the "big" driver core and debugfs changes for 5.3-rc1
> 
> It's a lot of different patches, all across the tree due to some api
> changes and lots of debugfs cleanups.  Because of this, there is going
> to be some merge issues with your tree at the moment, I'll follow up
> with the expected resolutions to make it easier for you.
> 
> Other than the debugfs cleanups, in this set of changes we have:
>   - bus iteration function cleanups (will cause build warnings
> with s390 and coresight drivers in your tree)

And here is the patch that should resolve the coresight build issue.

From: Nathan Chancellor 
Date: Mon, 1 Jul 2019 11:28:08 -0700
Subject: [PATCH] coresight: Make the coresight_device_fwnode_match 
declaration's fwnode parameter const

drivers/hwtracing/coresight/coresight.c:1051:11: error: incompatible pointer 
types passing 'int (struct device *, void *)' to parameter of type 'int 
(*)(struct device *, const void *)' [-Werror,-Wincompatible-pointer-types]
  coresight_device_fwnode_match);
  ^
include/linux/device.h:173:17: note: passing argument to parameter 'match' here
   int (*match)(struct device *dev, const void 
*data));
 ^
1 error generated.

Signed-off-by: Nathan Chancellor 
---
 drivers/hwtracing/coresight/coresight-priv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-priv.h 
b/drivers/hwtracing/coresight/coresight-priv.h
index 8b07fe55395a..7d401790dd7e 100644
--- a/drivers/hwtracing/coresight/coresight-priv.h
+++ b/drivers/hwtracing/coresight/coresight-priv.h
@@ -202,6 +202,6 @@ static inline void *coresight_get_uci_data(const struct 
amba_id *id)
 
 void coresight_release_platform_data(struct coresight_platform_data *pdata);
 
-int coresight_device_fwnode_match(struct device *dev, void *fwnode);
+int coresight_device_fwnode_match(struct device *dev, const void *fwnode);
 
 #endif
-- 
2.22.0

-- 
Cheers,
Stephen Rothwell

Re: [PATCH] xen/pv: Fix a boot up hang triggered by int3 self test

2019-07-12 Thread Zhenzhong Duan


Sorry for the noise, it looks description is wrong.

This is not a double pop, but xen pv taking the path

with create_gap=0, I'll send a v2.

Zhenzhong

On 2019/7/11 12:47, Zhenzhong Duan wrote:

Commit 7457c0da024b ("x86/alternatives: Add int3_emulate_call()
selftest") reveals a bug in XEN PV int3 assemble code. There is
a double pop of register R11 and RCX currupting the exception
frame, one in xen_int3 and the other in xen_xenint3.

We see below hang at bootup:

general protection fault:  [#1] SMP NOPTI
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0+ #6
RIP: e030:int3_magic+0x0/0x7
Call Trace:
  alternative_instructions+0x3d/0x12e
  check_bugs+0x7c9/0x887
  ?__get_locked_pte+0x178/0x1f0
  start_kernel+0x4ff/0x535
  ?set_init_arg+0x55/0x55
  xen_start_kernel+0x571/0x57a

Fix it by removing xen_xenint3.

Signed-off-by: Zhenzhong Duan 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
---
  arch/x86/include/asm/traps.h | 2 +-
  arch/x86/xen/enlighten_pv.c  | 2 +-
  arch/x86/xen/xen-asm_64.S| 1 -
  3 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 7d6f3f3..f2bd284 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -40,7 +40,7 @@
  asmlinkage void xen_divide_error(void);
  asmlinkage void xen_xennmi(void);
  asmlinkage void xen_xendebug(void);
-asmlinkage void xen_xenint3(void);
+asmlinkage void xen_int3(void);
  asmlinkage void xen_overflow(void);
  asmlinkage void xen_bounds(void);
  asmlinkage void xen_invalid_op(void);
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 4722ba2..2138d69 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -596,7 +596,7 @@ struct trap_array_entry {
  
  static struct trap_array_entry trap_array[] = {

{ debug,   xen_xendebug,true },
-   { int3,xen_xenint3, true },
+   { int3,xen_int3,true },
{ double_fault,xen_double_fault,true },
  #ifdef CONFIG_X86_MCE
{ machine_check,   xen_machine_check,   true },
diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index 1e9ef0b..ebf610b 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -32,7 +32,6 @@ xen_pv_trap divide_error
  xen_pv_trap debug
  xen_pv_trap xendebug
  xen_pv_trap int3
-xen_pv_trap xenint3
  xen_pv_trap xennmi
  xen_pv_trap overflow
  xen_pv_trap bounds

Re: [RFC v2 01/26] mm/x86: Introduce kernel address space isolation

2019-07-12 Thread Alexandre Chartre




On 7/11/19 11:33 PM, Thomas Gleixner wrote:

On Thu, 11 Jul 2019, Alexandre Chartre wrote:

+/*
+ * When isolation is active, the address space doesn't necessarily map
+ * the percpu offset value (this_cpu_off) which is used to get pointers
+ * to percpu variables. So functions which can be invoked while isolation
+ * is active shouldn't be getting pointers to percpu variables (i.e. with
+ * get_cpu_var() or this_cpu_ptr()). Instead percpu variable should be
+ * directly read or written to (i.e. with this_cpu_read() or
+ * this_cpu_write()).
+ */
+
+int asi_enter(struct asi *asi)
+{
+   enum asi_session_state state;
+   struct asi *current_asi;
+   struct asi_session *asi_session;
+
+   state = this_cpu_read(cpu_asi_session.state);
+   /*
+* We can re-enter isolation, but only with the same ASI (we don't
+* support nesting isolation). Also, if isolation is still active,
+* then we should be re-entering with the same task.
+*/
+   if (state == ASI_SESSION_STATE_ACTIVE) {
+   current_asi = this_cpu_read(cpu_asi_session.asi);
+   if (current_asi != asi) {
+   WARN_ON(1);
+   return -EBUSY;
+   }
+   WARN_ON(this_cpu_read(cpu_asi_session.task) != current);
+   return 0;
+   }
+
+   /* isolation is not active so we can safely access the percpu pointer */
+   asi_session = &get_cpu_var(cpu_asi_session);


get_cpu_var()?? Where is the matching put_cpu_var() ? get_cpu_var()
contains a preempt_disable ...

What's wrong with a simple this_cpu_ptr() here?



Oups, my mistake, I should be using this_cpu_ptr(). I will replace all 
get_cpu_var()
with this_cpu_ptr().



+void asi_exit(struct asi *asi)
+{
+   struct asi_session *asi_session;
+   enum asi_session_state asi_state;
+   unsigned long original_cr3;
+
+   asi_state = this_cpu_read(cpu_asi_session.state);
+   if (asi_state == ASI_SESSION_STATE_INACTIVE)
+   return;
+
+   /* TODO: Kick sibling hyperthread before switching to kernel cr3 */
+   original_cr3 = this_cpu_read(cpu_asi_session.original_cr3);
+   if (original_cr3)


Why would this be 0 if the session is active?



Correct, original_cr3 won't be 0. I think this is a remain from a previous 
version
where original_cr3 was handled differently.



+   write_cr3(original_cr3);
+
+   /* page-table was switched, we can now access the percpu pointer */
+   asi_session = &get_cpu_var(cpu_asi_session);


See above.



Will fix that.


Thanks,

alex.


+   WARN_ON(asi_session->task != current);
+   asi_session->state = ASI_SESSION_STATE_INACTIVE;
+   asi_session->asi = NULL;
+   asi_session->task = NULL;
+   asi_session->original_cr3 = 0;
+}


Thanks,

tglx

Re: [PATCH] waitqueue: fix clang -Wuninitialized warnings

2019-07-12 Thread Arnd Bergmann

On Fri, Jul 12, 2019 at 2:49 AM Andrew Morton  wrote:
> On Wed,  3 Jul 2019 10:10:55 +0200 Arnd Bergmann  wrote:

> 
>
> Surely clang is being extraordinarily dumb here?
>
> DECLARE_WAIT_QUEUE_HEAD_ONSTACK() is effectively doing
>
> struct wait_queue_head name = ({ __init_waitqueue_head(&name) ; name; 
> })
>
> which is perfectly legitimate!  clang has no business assuming that
> __init_waitqueue_head() will do any reads from the pointer which it was
> passed, nor can clang assume that __init_waitqueue_head() leaves any of
> *name uninitialized.
>
> Does it also warn if code does this?
>
> struct wait_queue_head name;
> __init_waitqueue_head(&name);
> name = name;
>
> which is equivalent, isn't it?

No, it does not warn for this.

I've tried a few more variants here: https://godbolt.org/z/ykSX0r

What I think is going on here is a result of clang and gcc fundamentally
treating -Wuninitialized warnings differently. gcc tries to make the warnings
as helpful as possible, but given the NP-complete nature of this problem
it won't always get it right, and it traditionally allowed this syntax as a
workaround.

int f(void)
{
int i = i; // tell gcc not to warn
return i;
}

clang apparently implements the warnings in a way that is as
completely predictable (and won't warn in cases that it
doesn't completely understand), but decided as a result that the
gcc 'int i = i' syntax is bogus and it always warns about a variable
used in its own declaration that is later referenced, without looking
at whether the declaration does initialize it or not.

> The proposed solution is, effectively, to open-code
> __init_waitqueue_head() at each DECLARE_WAIT_QUEUE_HEAD_ONSTACK()
> callsite.  That's pretty unpleasant and calls for an explanatory
> comment at the __WAIT_QUEUE_HEAD_INIT_ONSTACK() definition site as well
> as a cautionary comment at the __init_waitqueue_head() definition so we
> can keep the two versions in sync as code evolves.

Yes, makes sense.

> Hopefully clang will soon be hit with the cluebat (yes?) and this
> change becomes obsolete in the quite short term.  Surely 6-12 months
> from now nobody will be using the uncluebatted version of clang on
> contemporary kernel sources so we get to remove this nastiness again.
> Which makes me wonder whether we should merge it at all.

Would it make you feel better to keep the current code but have an alternative
version guarded with e.g. "#if defined(__clang__ && (__clang_major__ <= 9)"?

While it is probably a good idea to fix clang here, this is one of the last
issues that causes a significant difference between gcc and clang in build
testing with kernelci:
https://kernelci.org/build/next/branch/master/kernel/next-20190709/
I'm trying to get all the warnings fixed there so we can spot build-time
regressions more easily.

  Arnd

Re: linux-next: manual merge of the char-misc tree with the driver-core tree

2019-07-12 Thread Greg KH

On Tue, Jul 09, 2019 at 09:20:03AM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> On Thu, 13 Jun 2019 15:53:44 +1000 Stephen Rothwell  
> wrote:
> >
> > Today's linux-next merge of the char-misc tree got a conflict in:
> > 
> >   drivers/misc/vmw_balloon.c
> > 
> > between commit:
> > 
> >   225afca60b8a ("vmw_balloon: no need to check return value of 
> > debugfs_create functions")
> > 
> > from the driver-core tree and commits:
> > 
> >   83a8afa72e9c ("vmw_balloon: Compaction support")
> >   5d1a86ecf328 ("vmw_balloon: Add memory shrinker")
> > 
> > from the char-misc tree.
> > 
> > I fixed it up (see below) and can carry the fix as necessary. This
> > is now fixed as far as linux-next is concerned, but any non trivial
> > conflicts should be mentioned to your upstream maintainer when your tree
> > is submitted for merging.  You may also want to consider cooperating
> > with the maintainer of the conflicting tree to minimise any particularly
> > complex conflicts.
> > 
> > -- 
> > Cheers,
> > Stephen Rothwell
> > 
> > diff --cc drivers/misc/vmw_balloon.c
> > index fdf5ad757226,043eed845246..
> > --- a/drivers/misc/vmw_balloon.c
> > +++ b/drivers/misc/vmw_balloon.c
> > @@@ -1553,15 -1942,26 +1932,24 @@@ static int __init vmballoon_init(void
> > if (x86_hyper_type != X86_HYPER_VMWARE)
> > return -ENODEV;
> >   
> > -   for (page_size = VMW_BALLOON_4K_PAGE;
> > -page_size <= VMW_BALLOON_LAST_SIZE; page_size++)
> > -   INIT_LIST_HEAD(&balloon.page_sizes[page_size].pages);
> > - 
> > - 
> > INIT_DELAYED_WORK(&balloon.dwork, vmballoon_work);
> >   
> > +   error = vmballoon_register_shrinker(&balloon);
> > +   if (error)
> > +   goto fail;
> > + 
> >  -  error = vmballoon_debugfs_init(&balloon);
> >  -  if (error)
> >  -  goto fail;
> >  +  vmballoon_debugfs_init(&balloon);
> >   
> > +   /*
> > +* Initialization of compaction must be done after the call to
> > +* balloon_devinfo_init() .
> > +*/
> > +   balloon_devinfo_init(&balloon.b_dev_info);
> > +   error = vmballoon_compaction_init(&balloon);
> > +   if (error)
> > +   goto fail;
> > + 
> > +   INIT_LIST_HEAD(&balloon.huge_pages);
> > spin_lock_init(&balloon.comm_lock);
> > init_rwsem(&balloon.conf_sem);
> > balloon.vmci_doorbell = VMCI_INVALID_HANDLE;
> 
> I am still getting this conflict (the commit ids may have changed).
> Just a reminder in case you think Linus may need to know.

Ok, I sent off the pull request for the driver core tree now.  I had all
of my other trees merged "first" so that all of the conflicts would
happen just once here.  Hopefully I've pointed out all of the potential
and real problems with this merge.

Ugh, this was a messy one, sorry about all of this, full-tree api
changes and cleanups are a pain at times.

thanks,

greg k-h

Re: [RFC v2 02/26] mm/asi: Abort isolation on interrupt, exception and context switch

2019-07-12 Thread Alexandre Chartre





On 7/12/19 2:05 AM, Andy Lutomirski wrote:



On Jul 11, 2019, at 8:25 AM, Alexandre Chartre  
wrote:

Address space isolation should be aborted if there is an interrupt,
an exception or a context switch. Interrupt/exception handlers and
context switch code need to run with the full kernel address space.
Address space isolation is aborted by restoring the original CR3
value used before entering address space isolation.



NAK to the entry changes. That code you’re changing is already known
to be a bit buggy, and it’s spaghetti. PeterZ and I are gradually
working on fixing some bugs and C-ifying it. ASI can go on top.



Agree this is spaghetti and I will be happy to move ASI on top. I will keep
an eye for your changes, and I will change the ASI code accordingly.

Thanks,

alex.

Re: objtool crashes on clang output (drivers/hwmon/pmbus/adm1275.o)

2019-07-12 Thread Arnd Bergmann

On Thu, Jul 11, 2019 at 11:29 PM Arnd Bergmann  wrote:
>
> On Thu, Jul 11, 2019 at 11:05 PM 'Jann Horn' via Clang Built Linux
>  wrote:
> > I was playing around with building the kernel with LLVM a few months
> > ago and used this local patch, but didn't get around to submitting
> > upstream because I couldn't reproduce the problem for some reason. I
> > think the warnings you're getting sound like what I saw back then:
> > https://gist.github.com/thejh/0434662728afb95d72455bf30ece5817
> >
> > Quoting the commit message from that patch:
> >
> > 
> > With clang from git master, code can be generated where a function contains
> > two indirect jump instructions that use the same switch table. To deal with
> > this case and similar ones properly, convert the switch table parsing to
> > use two passes:
> > 
> >
> > Does that sound like what you're seeing?
>
> Yes, that is exactly right, and your patch seems to address the problem
> for the cases I tried so far (will know more after a night of randconfig
> testing).

I no longer see any of the "can't find switch jump table" in last
nights randconfig
builds. I do see one other rare warning, see attached object file:

fs/reiserfs/do_balan.o: warning: objtool: replace_key()+0x158: stack
state mismatch: cfa1=7+40 cfa2=7+56
fs/reiserfs/do_balan.o: warning: objtool: balance_leaf()+0x2791: stack
state mismatch: cfa1=7+176 cfa2=7+192
fs/reiserfs/ibalance.o: warning: objtool: balance_internal()+0xe8f:
stack state mismatch: cfa1=7+240 cfa2=7+248
fs/reiserfs/ibalance.o: warning: objtool:
internal_move_pointers_items()+0x36f: stack state mismatch: cfa1=7+152
cfa2=7+144
fs/reiserfs/lbalance.o: warning: objtool:
leaf_cut_from_buffer()+0x58b: stack state mismatch: cfa1=7+128
cfa2=7+112
fs/reiserfs/lbalance.o: warning: objtool:
leaf_copy_boundary_item()+0x7a9: stack state mismatch: cfa1=7+104
cfa2=7+96
fs/reiserfs/lbalance.o: warning: objtool:
leaf_copy_items_entirely()+0x3d2: stack state mismatch: cfa1=7+120
cfa2=7+128

I suspect this comes from the calls to the __reiserfs_panic() noreturn function,
but have not actually looked at the object file.

  Arnd


lbalance.o
Description: application/object

Re: [PATCH 4/4] numa: introduce numa cling feature

2019-07-12 Thread Peter Zijlstra

On Fri, Jul 12, 2019 at 11:10:08AM +0800, 王贇 wrote:
> On 2019/7/11 下午10:27, Peter Zijlstra wrote:

> >> Thus we introduce the numa cling, which try to prevent tasks leaving
> >> the preferred node on wakeup fast path.
> > 
> > 
> >> @@ -6195,6 +6447,13 @@ static int select_idle_sibling(struct task_struct 
> >> *p, int prev, int target)
> >>if ((unsigned)i < nr_cpumask_bits)
> >>return i;
> >>
> >> +  /*
> >> +   * Failed to find an idle cpu, wake affine may want to pull but
> >> +   * try stay on prev-cpu when the task cling to it.
> >> +   */
> >> +  if (task_numa_cling(p, cpu_to_node(prev), cpu_to_node(target)))
> >> +  return prev;
> >> +
> >>return target;
> >>  }
> > 
> > Select idle sibling should never cross node boundaries and is thus the
> > entirely wrong place to fix anything.
> 
> Hmm.. in our early testing the printk show both select_task_rq_fair() and
> task_numa_find_cpu() will call select_idle_sibling with prev and target on
> different node, thus we pick this point to save few lines.

But it will never return @prev if it is not in the same cache domain as
@target. See how everything is gated by:

  && cpus_share_cache(x, target)

> But if the semantics of select_idle_sibling() is to return cpu on the same
> node of target, what about move the logical after select_idle_sibling() for
> the two callers?

No, that's insane. You don't do select_idle_sibling() to then ignore the
result. You have to change @target before calling select_idle_sibling().

[PATCH] perf diff: Report noisy for cycles diff

2019-07-12 Thread Jin Yao

This patch prints the stddev and hist for the cycles diff of
program block. It can help us to understand if the cycles diff
is noisy or not.

This patch is inspired by Andi Kleen's patch
https://lwn.net/Articles/600471/

We create new option '-n or --noisy'.

Example:

perf record -b ./div
perf record -b ./div
perf diff -c cycles

 # Event 'cycles'
 #
 # Baseline   [Program Block Range] Cycles 
Diff  Shared Object  Symbol
 #   
..  
.  
 #
 46.42% [div.c:40 -> div.c:40]  
  0  div[.] main
 46.42% [div.c:42 -> div.c:44]  
  0  div[.] main
 46.42% [div.c:42 -> div.c:39]  
  0  div[.] main
 20.72% [random_r.c:357 -> random_r.c:394]  
 -2  libc-2.27.so   [.] __random_r
 20.72% [random_r.c:357 -> random_r.c:380]  
 -1  libc-2.27.so   [.] __random_r
 20.72% [random_r.c:388 -> random_r.c:388]  
  0  libc-2.27.so   [.] __random_r
 20.72% [random_r.c:388 -> random_r.c:391]  
  0  libc-2.27.so   [.] __random_r
 17.58% [random.c:288 -> random.c:291]  
  0  libc-2.27.so   [.] __random
 17.58% [random.c:291 -> random.c:291]  
  0  libc-2.27.so   [.] __random
 17.58% [random.c:293 -> random.c:293]  
  0  libc-2.27.so   [.] __random
 17.58% [random.c:295 -> random.c:295]  
  0  libc-2.27.so   [.] __random
 17.58% [random.c:295 -> random.c:295]  
  0  libc-2.27.so   [.] __random
 17.58% [random.c:298 -> random.c:298]  
  0  libc-2.27.so   [.] __random
  8.33% [div.c:22 -> div.c:25]  
  0  div[.] compute_flag
  8.33% [div.c:27 -> div.c:28]  
  0  div[.] compute_flag
  4.80%   [rand.c:26 -> rand.c:27]  
  0  libc-2.27.so   [.] rand
  4.80%   [rand.c:28 -> rand.c:28]  
  0  libc-2.27.so   [.] rand
  2.14% [rand@plt+0 -> rand@plt+0]  
  0  div[.] rand@plt

When we enable the option '-n', the output is

perf diff -c cycles -n

 # Event 'cycles'
 #
 # Baseline [Program Block Range]/Cycles 
Diff/stddev/Hist  Shared Object  Symbol
 #   

  .  
 #
 46.42%[div.c:40 -> div.c:40]0  ± 
40.2% ▂███▁▂▁▁   div[.] main
 46.42%[div.c:42 -> div.c:44]0  
±100.0% █▁▁▁   div[.] main
 46.42%[div.c:42 -> div.c:39]0  ± 
15.3% ▃▃▂▆▃▂█▁   div[.] main
 20.72%[random_r.c:357 -> random_r.c:394]   -2  ± 
20.1% ▁▄▄▅▂▅█▁   libc-2.27.so   [.] __random_r
 20.72%[random_r.c:357 -> random_r.c:380]   -1  ± 
20.9% ▁▆▇▁█▅▇█   libc-2.27.so   [.] __random_r
 20.72%[random_r.c:388 -> random_r.c:388]0  ±  
0.0%libc-2.27.so   [.] __random_r
 20.72%[random_r.c:388 -> random_r.c:391]0  ± 
88.0% ▁▁▁█   libc-2.27.so   [.] __random_r
 17.58%[random.c:288 -> random.c:291]0  ± 
29.3% ▁▁█▁   libc-2.27.so   [.] __random
 17.58%[random.c:291 -> random.c:291]0  ± 
29.3% ▁▁▁█   libc-2.27.so   [.] __random
 17.58%[random.c:293 -> random.c:293]0  ± 
29.3% ▁▁▁█   libc-2.27.so   [.] __random
 17.58%[random.c:295 -> random.c:295]0  ±  
0.0%libc-2.27.so   [.] __random
 17.58%[random.c:295 -> random.c:295]0  ±  
0.0%libc-2.27.so   [.] __random
 17.58%[random.c:298 -> random.c:298]0  ±  
0.0%libc-2.27.so   [.] __random
  8.33%[div.c:22 -> div.c:25]0  ± 
29.3% ▁▁█▁   div[.] compute_flag
  8.33%[div.c:27 -> div.c:28]0  ± 
48.8% ▁██▁▁▁█▁   div

Re: [PATCH] waitqueue: fix clang -Wuninitialized warnings

2019-07-12 Thread Nathan Chancellor

On Fri, Jul 12, 2019 at 09:45:06AM +0200, Arnd Bergmann wrote:
> On Fri, Jul 12, 2019 at 2:49 AM Andrew Morton  
> wrote:
> > On Wed,  3 Jul 2019 10:10:55 +0200 Arnd Bergmann  wrote:
> 
> > 
> >
> > Surely clang is being extraordinarily dumb here?
> >
> > DECLARE_WAIT_QUEUE_HEAD_ONSTACK() is effectively doing
> >
> > struct wait_queue_head name = ({ __init_waitqueue_head(&name) ; 
> > name; })
> >
> > which is perfectly legitimate!  clang has no business assuming that
> > __init_waitqueue_head() will do any reads from the pointer which it was
> > passed, nor can clang assume that __init_waitqueue_head() leaves any of
> > *name uninitialized.
> >
> > Does it also warn if code does this?
> >
> > struct wait_queue_head name;
> > __init_waitqueue_head(&name);
> > name = name;
> >
> > which is equivalent, isn't it?
> 
> No, it does not warn for this.
> 
> I've tried a few more variants here: https://godbolt.org/z/ykSX0r
> 
> What I think is going on here is a result of clang and gcc fundamentally
> treating -Wuninitialized warnings differently. gcc tries to make the warnings
> as helpful as possible, but given the NP-complete nature of this problem
> it won't always get it right, and it traditionally allowed this syntax as a
> workaround.
> 
> int f(void)
> {
> int i = i; // tell gcc not to warn
> return i;
> }
> 
> clang apparently implements the warnings in a way that is as
> completely predictable (and won't warn in cases that it
> doesn't completely understand), but decided as a result that the
> gcc 'int i = i' syntax is bogus and it always warns about a variable
> used in its own declaration that is later referenced, without looking
> at whether the declaration does initialize it or not.
> 
> > The proposed solution is, effectively, to open-code
> > __init_waitqueue_head() at each DECLARE_WAIT_QUEUE_HEAD_ONSTACK()
> > callsite.  That's pretty unpleasant and calls for an explanatory
> > comment at the __WAIT_QUEUE_HEAD_INIT_ONSTACK() definition site as well
> > as a cautionary comment at the __init_waitqueue_head() definition so we
> > can keep the two versions in sync as code evolves.
> 
> Yes, makes sense.
> 
> > Hopefully clang will soon be hit with the cluebat (yes?) and this
> > change becomes obsolete in the quite short term.  Surely 6-12 months
> > from now nobody will be using the uncluebatted version of clang on
> > contemporary kernel sources so we get to remove this nastiness again.
> > Which makes me wonder whether we should merge it at all.
> 
> Would it make you feel better to keep the current code but have an alternative
> version guarded with e.g. "#if defined(__clang__ && (__clang_major__ <= 9)"?
> 
> While it is probably a good idea to fix clang here, this is one of the last
> issues that causes a significant difference between gcc and clang in build
> testing with kernelci:
> https://kernelci.org/build/next/branch/master/kernel/next-20190709/
> I'm trying to get all the warnings fixed there so we can spot build-time
> regressions more easily.
> 
>   Arnd

I'm just spitballing here since I am about to go to sleep but could we
do something like you did for bee20031772a ("disable -Wattribute-alias
warning for SYSCALL_DEFINEx()") and disable the warning in
DECLARE_WAIT_QUEUE_HEAD_ONSTACK only since we know it is not going to
be a problem? That way, if/when Clang is fixed, we can just have the
warning be disabled for older versions?

Cheers,
Nathan

Re: [PATCH 1/4] numa: introduce per-cgroup numa balancing locality, statistic

2019-07-12 Thread Peter Zijlstra

On Fri, Jul 12, 2019 at 11:43:17AM +0800, 王贇 wrote:
> 
> 
> On 2019/7/11 下午9:47, Peter Zijlstra wrote:
> [snip]
> >> +  rcu_read_lock();
> >> +  memcg = mem_cgroup_from_task(p);
> >> +  if (idx != -1)
> >> +  this_cpu_inc(memcg->stat_numa->locality[idx]);
> > 
> > I thought cgroups were supposed to be hierarchical. That is, if we have:
> > 
> >   R
> >  / \
> >  A
> > /\
> >   B
> >   \
> >t1
> > 
> > Then our task t1 should be accounted to B (as you do), but also to A and
> > R.
> 
> I get the point but not quite sure about this...
> 
> Not like pages there are no hierarchical limitation on locality, also tasks

You can use cpusets to affect that.

> running in a particular group have no influence to others, not to mention the
> extra overhead, does it really meaningful to account the stuff hierarchically?

AFAIU it's a requirement of cgroups to be hierarchical. All our other
cgroup accounting is like that.

[GIT PULL] 9p updates for 5.3

2019-07-12 Thread Dominique Martinet

Hi Linus,

Here is a 9p update for 5.3, just a couple of fixes that have been
sitting here for too long as I missed the 5.2 merge window.

I have two more patches that I didn't have time to test early enough for
this but also are plain details fix, please let me know if you would
prefer having me send a pull request for -rc2 after a week in -next or
if I should just wait until the next window.
There's little risk but I'm usually rather conservative on this.


The following changes since commit 5908e6b738e3357af42c10e1183753c70a0117a9:

  Linux 5.0-rc8 (2019-02-24 16:46:45 -0800)

are available in the git repository at:

  git://github.com/martinetd/linux tags/9p-for-5.3

for you to fetch changes up to 80a316ff16276b36d0392a8f8b2f63259857ae98:

  9p/xen: Add cleanup path in p9_trans_xen_init (2019-05-15 13:00:07
  +)


9p pull request for inclusion in 5.13

Two small fixes to properly cleanup the 9p transports list if virtio/xen
module initialization fail.
9p might otherwise try to access memory from a module that failed to
register got freed.


YueHaibing (2):
  9p/virtio: Add cleanup path in p9_virtio_init
  9p/xen: Add cleanup path in p9_trans_xen_init

 net/9p/trans_virtio.c |8 +++-
 net/9p/trans_xen.c|8 +++-
 2 files changed, 14 insertions(+), 2 deletions(-)

Re: [GIT PULL] 9p updates for 5.3

2019-07-12 Thread Dominique Martinet

Dominique Martinet wrote on Fri, Jul 12, 2019:
> 9p pull request for inclusion in 5.13

Just noticed this typo in version number here, should I make a new tag
with the correct text?

Sorry,
-- 
Dominique

Re: [PATCH v2] printk: Do not lose last line in kmsg buffer dump

2019-07-12 Thread Petr Mladek

On Thu 2019-07-11 16:29:37, Vincent Whitchurch wrote:
> kmsg_dump_get_buffer() is supposed to select all the youngest log
> messages which fit into the provided buffer.  It determines the correct
> start index by using msg_print_text() with a NULL buffer to calculate
> the size of each entry.  However, when performing the actual writes,
> msg_print_text() only writes the entry to the buffer if the written len
> is lesser than the size of the buffer.  So if the lengths of the
> selected youngest log messages happen to precisely fill up the provided
> buffer, the last log message is not included.
> 
> We don't want to modify msg_print_text() to fill up the buffer and start
> returning a length which is equal to the size of the buffer, since
> callers of its other users, such as kmsg_dump_get_line(), depend upon
> the current behaviour.
> 
> Instead, fix kmsg_dump_get_buffer() to compensate for this.
> 
> For example, with the following two final prints:
> 
> [6.427502] A
> [6.427769] 12345
> 
> A dump of a 64-byte buffer filled by kmsg_dump_get_buffer(), before this
> patch:
> 
>  : 3c 30 3e 5b 20 20 20 20 36 2e 35 32 32 31 39 37  <0>[6.522197
>  0010: 5d 20 41 41 41 41 41 41 41 41 41 41 41 41 41 0a  ] A.
>  0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>  0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> After this patch:
> 
>  : 3c 30 3e 5b 20 20 20 20 36 2e 34 35 36 36 37 38  <0>[6.456678
>  0010: 5d 20 42 42 42 42 42 42 42 42 31 32 33 34 35 0a  ] 12345.
>  0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>  0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> Signed-off-by: Vincent Whitchurch 
> ---
> v2: Move fix to kmsg_dump_get_buffer()
> 
>  kernel/printk/printk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 1888f6a3b694..424abf802f02 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3274,7 +3274,7 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, 
> bool syslog,
>   /* move first record forward until length fits into the buffer */
>   seq = dumper->cur_seq;
>   idx = dumper->cur_idx;
> - while (l > size && seq < dumper->next_seq) {
> + while (l >= size && seq < dumper->next_seq) {

This cycle searches how many messages would fit into the buffer.

The patch looks like a hack using a hole that the next cycle
does not longer check the number of really stored characters.

What would happen when msg_print_text() starts adding
the trailing '\0' as suggested by
https://lkml.kernel.org/r/20190710121049.rwhk7fknfzn3c...@pathway.suse.cz

I would much more appreciate if we make the code more secure instead
of stretching its weakness to the limits.

BTW: What is the motivation for this fix? Is a bug report
or just some research of possible buffer overflows?

The commit message pretends that the problem is bigger than
it really is. It is about one byte and not one line.

Best Regards,
Petr

Re: [RFC v2 00/27] Kernel Address Space Isolation

2019-07-12 Thread Alexandre Chartre




On 7/12/19 12:38 AM, Dave Hansen wrote:

On 7/11/19 7:25 AM, Alexandre Chartre wrote:

- Kernel code mapped to the ASI page-table has been reduced to:
   . the entire kernel (I still need to test with only the kernel text)
   . the cpu entry area (because we need the GDT to be mapped)
   . the cpu ASI session (for managing ASI)
   . the current stack

- Optionally, an ASI can request the following kernel mapping to be added:
   . the stack canary
   . the cpu offsets (this_cpu_off)
   . the current task
   . RCU data (rcu_data)
   . CPU HW events (cpu_hw_events).


I don't see the per-cpu areas in here.  But, the ASI macros in
entry_64.S (and asi_start_abort()) use per-cpu data.


We don't map all per-cpu areas, but only the per-cpu variables we need. ASI
code uses the per-cpu cpu_asi_session variable which is mapped when an ASI
is created (see patch 15/26):

+   /*
+* Map the percpu ASI sessions. This is used by interrupt handlers
+* to figure out if we have entered isolation and switch back to
+* the kernel address space.
+*/
+   err = ASI_MAP_CPUVAR(asi, cpu_asi_session);
+   if (err)
+   return err;



Also, this stuff seems to do naughty stuff (calling C code, touching
per-cpu data) before the PTI CR3 writes have been done.  But, I don't
see anything excluding PTI and this code from coexisting.


My understanding is that PTI CR3 writes only happens when switching to/from
userland. While ASI enter/exit/abort happens while we are already in the kernel,
so asi_start_abort() is not called when coming from userland and so not
interacting with PTI.

For example, if ASI in used during a syscall (e.g. with KVM), we have:

 -> syscall
- PTI CR3 write (kernel CR3)
- syscall handler:
  ...
  asi_enter()-> write ASI CR3
  .. code run with ASI ..
  asi_exit() or asi abort -> restore original CR3
  ...
- PTI CR3 write (userland CR3)
 <- syscall


Thanks,

alex.

[PATCH v4 0/5] hv: Remove dependencies on guest page size

2019-07-12 Thread Maya Nakamura

The Linux guest page size and hypervisor page size concepts are
different, even though they happen to be the same value on x86. Hyper-V
code mixes up the two, so this patchset begins to address that by
creating and using a set of Hyper-V specific page definitions.

A major benefit of those new definitions is that they support non-x86
architectures, such as ARM64, that use different page sizes. On ARM64,
the guest page size may not be 4096, and Hyper-V always runs with a page
size of 4096.

In this patchset, the first two patches lay the foundation for the
others, creating definitions and preparing for allocation of memory with
the size and alignment that Hyper-V expects as a page. Patch 3 applies
the page size definition where the guest VM and Hyper-V communicate, and
where the code intends to use the Hyper-V page size. The last two
patches set the ring buffer size to a fixed value, removing the
dependency on the guest page size.

This is the initial set of changes to the Hyper-V code, and future
patches will make additional changes using the same foundation, for
example, replace __vmalloc() and related functions when Hyper-V pages
are intended.

Changes in v4 (all apply to patch 2 only):
- Remove file name from the subject.
- Include prototypes of two new functions.
- Add another Link tag.

Changes in v3:
- Simplify expression for BUILD_BUG_ON() in patch 2.
- Add Link and Reviewed-by tags.

Change in v2:
- Replace patch 2 with a new one.

Maya Nakamura (5):
  x86: hv: hyperv-tlfs.h: Create and use Hyper-V page definitions
  x86: hv: Add functions to allocate/deallocate page for Hyper-V
  hv: vmbus: Replace page definition with Hyper-V specific one
  HID: hv: Remove dependencies on PAGE_SIZE for ring buffer
  Input: hv: Remove dependencies on PAGE_SIZE for ring buffer

 arch/x86/hyperv/hv_init.c | 14 ++
 arch/x86/include/asm/hyperv-tlfs.h| 12 +++-
 arch/x86/include/asm/mshyperv.h   |  5 -
 drivers/hid/hid-hyperv.c  |  4 ++--
 drivers/hv/hyperv_vmbus.h |  8 
 drivers/input/serio/hyperv-keyboard.c |  4 ++--
 6 files changed, 37 insertions(+), 10 deletions(-)

-- 
2.17.1

Re: [PATCH v2 0/5] Add NUMA-awareness to qspinlock

2019-07-12 Thread Hanjun Guo

On 2019/7/3 19:58, Jan Glauber wrote:
> Hi Alex,
> I've tried this series on arm64 (ThunderX2 with up to SMT=4  and 224 CPUs)
> with the borderline testcase of accessing a single file from all
> threads. With that
> testcase the qspinlock slowpath is the top spot in the kernel.
> 
> The results look really promising:
> 
> CPUsnormalnuma-qspinlocks
> -
> 56149.41  73.90
> 224  576.95  290.31
> 
> Also frontend-stalls are reduced to 50% and interconnect traffic is
> greatly reduced.
> Tested-by: Jan Glauber 

Tested this patchset on Kunpeng920 ARM64 server (96 cores,
4 NUMA nodes), and with the same test case from Jan, I can
see 150%+ boost! (Need to add a patch below [1].)

For the real workload such as Nginx I can see about 10%
performance improvement as well.

Tested-by: Hanjun Guo 

Please cc me for new versions and I'm willing to test it.

Thanks
Hanjun

[1]
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 657bbc5..72c1346 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -792,6 +792,20 @@ config NODES_SHIFT
  Specify the maximum number of NUMA Nodes available on the target
  system.  Increases memory reserved to accommodate various tables.

+config NUMA_AWARE_SPINLOCKS
+ bool "Numa-aware spinlocks"
+ depends on NUMA
+ default y
+ help
+   Introduce NUMA (Non Uniform Memory Access) awareness into
+   the slow path of spinlocks.
+
+   The kernel will try to keep the lock on the same node,
+   thus reducing the number of remote cache misses, while
+   trading some of the short term fairness for better performance.
+
+   Say N if you want absolute first come first serve fairness.
+
 config USE_PERCPU_NUMA_NODE_ID
def_bool y
depends on NUMA
diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h
index 2994167..be5dd44 100644
--- a/kernel/locking/qspinlock_cna.h
+++ b/kernel/locking/qspinlock_cna.h
@@ -4,7 +4,7 @@
 #endif

 #include 
-
+#include 
 /*
  * Implement a NUMA-aware version of MCS (aka CNA, or compact NUMA-aware lock).
  *
@@ -170,7 +170,7 @@ static __always_inline void cna_init_node(struct 
mcs_spinlock *node, int cpuid,
  u32 tail)
 {
if (decode_numa_node(node->node_and_count) == -1)
-   store_numa_node(node, numa_cpu_node(cpuid));
+ store_numa_node(node, cpu_to_node(cpuid));
node->encoded_tail = tail;
 }

[PATCH v2] xen/pv: Fix a boot up hang revealed by int3 self test

2019-07-12 Thread Zhenzhong Duan

Commit 7457c0da024b ("x86/alternatives: Add int3_emulate_call()
selftest") is used to ensure there is a gap setup in exception stack
which could be used for inserting call return address.

This gap is missed in XEN PV int3 exception entry path, then below panic
triggered:

[0.772876] general protection fault:  [#1] SMP NOPTI
[0.772886] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0+ #11
[0.772893] RIP: e030:int3_magic+0x0/0x7
[0.772905] RSP: 3507:82203e98 EFLAGS: 0246
[0.773334] Call Trace:
[0.773334]  alternative_instructions+0x3d/0x12e
[0.773334]  check_bugs+0x7c9/0x887
[0.773334]  ? __get_locked_pte+0x178/0x1f0
[0.773334]  start_kernel+0x4ff/0x535
[0.773334]  ? set_init_arg+0x55/0x55
[0.773334]  xen_start_kernel+0x571/0x57a

As xenint3 and int3 entry code are same except xenint3 doesn't generate
a gap, we can fix it by using int3 and drop useless xenint3.

Signed-off-by: Zhenzhong Duan 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
---
 v2: fix up description.
---
 arch/x86/entry/entry_64.S| 1 -
 arch/x86/include/asm/traps.h | 2 +-
 arch/x86/xen/enlighten_pv.c  | 2 +-
 arch/x86/xen/xen-asm_64.S| 1 -
 4 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 0ea4831..35a66fc 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1176,7 +1176,6 @@ idtentry stack_segmentdo_stack_segment
has_error_code=1
 #ifdef CONFIG_XEN_PV
 idtentry xennmido_nmi  has_error_code=0
 idtentry xendebug  do_debughas_error_code=0
-idtentry xenint3   do_int3 has_error_code=0
 #endif
 
 idtentry general_protectiondo_general_protection   has_error_code=1
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 7d6f3f3..f2bd284 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -40,7 +40,7 @@
 asmlinkage void xen_divide_error(void);
 asmlinkage void xen_xennmi(void);
 asmlinkage void xen_xendebug(void);
-asmlinkage void xen_xenint3(void);
+asmlinkage void xen_int3(void);
 asmlinkage void xen_overflow(void);
 asmlinkage void xen_bounds(void);
 asmlinkage void xen_invalid_op(void);
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 4722ba2..2138d69 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -596,7 +596,7 @@ struct trap_array_entry {
 
 static struct trap_array_entry trap_array[] = {
{ debug,   xen_xendebug,true },
-   { int3,xen_xenint3, true },
+   { int3,xen_int3,true },
{ double_fault,xen_double_fault,true },
 #ifdef CONFIG_X86_MCE
{ machine_check,   xen_machine_check,   true },
diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index 1e9ef0b..ebf610b 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -32,7 +32,6 @@ xen_pv_trap divide_error
 xen_pv_trap debug
 xen_pv_trap xendebug
 xen_pv_trap int3
-xen_pv_trap xenint3
 xen_pv_trap xennmi
 xen_pv_trap overflow
 xen_pv_trap bounds
-- 
1.8.3.1

[PATCH v4 1/5] x86: hv: hyperv-tlfs.h: Create and use Hyper-V page definitions

2019-07-12 Thread Maya Nakamura

Define HV_HYP_PAGE_SHIFT, HV_HYP_PAGE_SIZE, and HV_HYP_PAGE_MASK because
the Linux guest page size and hypervisor page size concepts are
different, even though they happen to be the same value on x86.

Also, replace PAGE_SIZE with HV_HYP_PAGE_SIZE.

Signed-off-by: Maya Nakamura 
Reviewed-by: Michael Kelley 
Reviewed-by: Vitaly Kuznetsov 
---
 arch/x86/include/asm/hyperv-tlfs.h | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
b/arch/x86/include/asm/hyperv-tlfs.h
index af78cd72b8f3..7a2705694f5b 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -12,6 +12,16 @@
 #include 
 #include 
 
+/*
+ * While not explicitly listed in the TLFS, Hyper-V always runs with a page 
size
+ * of 4096. These definitions are used when communicating with Hyper-V using
+ * guest physical pages and guest physical page addresses, since the guest page
+ * size may not be 4096 on all architectures.
+ */
+#define HV_HYP_PAGE_SHIFT  12
+#define HV_HYP_PAGE_SIZE   BIT(HV_HYP_PAGE_SHIFT)
+#define HV_HYP_PAGE_MASK   (~(HV_HYP_PAGE_SIZE - 1))
+
 /*
  * The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent
  * is set by CPUID(HvCpuIdFunctionVersionAndFeatures).
@@ -847,7 +857,7 @@ union hv_gpa_page_range {
  * count is equal with how many entries of union hv_gpa_page_range can
  * be populated into the input parameter page.
  */
-#define HV_MAX_FLUSH_REP_COUNT ((PAGE_SIZE - 2 * sizeof(u64)) /\
+#define HV_MAX_FLUSH_REP_COUNT ((HV_HYP_PAGE_SIZE - 2 * sizeof(u64)) / \
sizeof(union hv_gpa_page_range))
 
 struct hv_guest_mapping_flush_list {
-- 
2.17.1

Re: [PATCH v3 0/9] i2c: add support for filters

2019-07-12 Thread Ludovic Desroches

On Tue, Jul 09, 2019 at 03:19:26PM +0200, Eugen Hristev - M18282 wrote:
> From: Eugen Hristev 
> 
> Hello,
> 
> This series adds support for analog and digital filters for i2c controllers
> 
> This series is based on the series:
> [PATCH v2 0/9] i2c: at91: filters support for at91 SoCs
> and enhanced to add the bindings for all controllers plus an extra binding
> for the width of the spikes in nanoseconds.
> 
> First, bindings are created for
> 'i2c-ana-filter'
> 'i2c-dig-filter'
> 'i2c-filter-width-ns'
> 
> The support is added in the i2c core to retrieve filter width and add it
> to the timings structure.
> Next, the at91 driver is enhanced for supporting digital filter, advanced
> digital filter (with selectable spike width) and the analog filter.
> 
> Finally the device tree for two boards are modified to make use of the
> new properties.
> 
> This series is the result of the comments on the ML in the direction
> requested: to make the bindings globally available for i2c drivers.
> 
> Changes in v3:
> - made bindings global for i2c controllers and modified accordingly
> - gave up PADFCDF bit because it's a lack in datasheet
> - the computation on the width of the spike is based on periph clock as it
> is done for hold time.
> 
> Changes in v2:
> - added device tree bindings and support for enable-ana-filt and
> enable-dig-filt
> - added the new properties to the DT for sama5d4_xplained/sama5d2_xplained
> 
> Eugen Hristev (9):
>   dt-bindings: i2c: at91: add new compatible
>   dt-bindings: i2c: add bindings for i2c analog and digital filter
>   i2c: add support for filter-width-ns optional property
>   i2c: at91: add new platform support for sam9x60
>   i2c: at91: add support for digital filtering
>   i2c: at91: add support for advanced digital filtering
>   i2c: at91: add support for analog filtering
>   ARM: dts: at91: sama5d2_xplained: add analog and digital filter for
> i2c
>   ARM: dts: at91: sama5d4_xplained: add analog filter for i2c
> 
>  Documentation/devicetree/bindings/i2c/i2c-at91.txt |  3 +-
>  Documentation/devicetree/bindings/i2c/i2c.txt  | 11 +
>  arch/arm/boot/dts/at91-sama5d2_xplained.dts|  6 +++
>  arch/arm/boot/dts/at91-sama5d4_xplained.dts|  1 +
>  drivers/i2c/busses/i2c-at91-core.c | 38 +
>  drivers/i2c/busses/i2c-at91-master.c   | 49 
> --
>  drivers/i2c/busses/i2c-at91.h  | 13 ++
>  drivers/i2c/i2c-core-base.c|  2 +
>  include/linux/i2c.h|  2 +
>  9 files changed, 121 insertions(+), 4 deletions(-)

Hi,

I don't know if it will fit other vendors need concerning the binding
but for Microchip it sounds good.

Acked-by: Ludovic Desroches 
for the whole serie.

Regards

Ludovic

[PATCH v4 2/5] x86: hv: Add functions to allocate/deallocate page for Hyper-V

2019-07-12 Thread Maya Nakamura

Introduce two new functions, hv_alloc_hyperv_page() and
hv_free_hyperv_page(), to allocate/deallocate memory with the size and
alignment that Hyper-V expects as a page. Although currently they are
not used, they are ready to be used to allocate/deallocate memory on x86
when their ARM64 counterparts are implemented, keeping symmetry between
architectures with potentially different guest page sizes.

Link: 
https://lore.kernel.org/lkml/alpine.deb.2.21.1906272334560.32...@nanos.tec.linutronix.de/
Link: https://lore.kernel.org/lkml/87muindr9c@vitty.brq.redhat.com/
Signed-off-by: Maya Nakamura 
Reviewed-by: Michael Kelley 
Reviewed-by: Vitaly Kuznetsov 
---
 arch/x86/hyperv/hv_init.c   | 14 ++
 arch/x86/include/asm/mshyperv.h |  5 -
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0e033ef11a9f..e8960a83add7 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -37,6 +37,20 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
 u32 hv_max_vp_index;
 EXPORT_SYMBOL_GPL(hv_max_vp_index);
 
+void *hv_alloc_hyperv_page(void)
+{
+   BUILD_BUG_ON(PAGE_SIZE != HV_HYP_PAGE_SIZE);
+
+   return (void *)__get_free_page(GFP_KERNEL);
+}
+EXPORT_SYMBOL_GPL(hv_alloc_hyperv_page);
+
+void hv_free_hyperv_page(unsigned long addr)
+{
+   free_page(addr);
+}
+EXPORT_SYMBOL_GPL(hv_free_hyperv_page);
+
 static int hv_cpu_init(unsigned int cpu)
 {
u64 msr_vp_index;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 2a793bf6ebb0..32ec9df39a99 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -218,7 +218,8 @@ static inline struct hv_vp_assist_page 
*hv_get_vp_assist_page(unsigned int cpu)
 
 void __init hyperv_init(void);
 void hyperv_setup_mmu_ops(void);
-
+void *hv_alloc_hyperv_page(void);
+void hv_free_hyperv_page(unsigned long addr);
 void hyperv_reenlightenment_intr(struct pt_regs *regs);
 void set_hv_tscchange_cb(void (*cb)(void));
 void clear_hv_tscchange_cb(void);
@@ -241,6 +242,8 @@ static inline void hv_apic_init(void) {}
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
+static inline void *hv_alloc_hyperv_page(void) { return NULL; }
+static inline void hv_free_hyperv_page(unsigned long addr) {}
 static inline void set_hv_tscchange_cb(void (*cb)(void)) {}
 static inline void clear_hv_tscchange_cb(void) {}
 static inline void hyperv_stop_tsc_emulation(void) {};
-- 
2.17.1

[PATCH v4 3/5] hv: vmbus: Replace page definition with Hyper-V specific one

2019-07-12 Thread Maya Nakamura

Replace PAGE_SIZE with HV_HYP_PAGE_SIZE because the guest page size may
not be 4096 on all architectures and Hyper-V always runs with a page
size of 4096.

Signed-off-by: Maya Nakamura 
Reviewed-by: Michael Kelley 
Reviewed-by: Vitaly Kuznetsov 
---
 drivers/hv/hyperv_vmbus.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 362e70e9d145..019469c3cbca 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -192,11 +192,11 @@ int hv_ringbuffer_read(struct vmbus_channel *channel,
   u64 *requestid, bool raw);
 
 /*
- * Maximum channels is determined by the size of the interrupt page
- * which is PAGE_SIZE. 1/2 of PAGE_SIZE is for send endpoint interrupt
- * and the other is receive endpoint interrupt
+ * Maximum channels, 16348, is determined by the size of the interrupt page,
+ * which is HV_HYP_PAGE_SIZE. 1/2 of HV_HYP_PAGE_SIZE is to send endpoint
+ * interrupt, and the other is to receive endpoint interrupt.
  */
-#define MAX_NUM_CHANNELS   ((PAGE_SIZE >> 1) << 3) /* 16348 channels */
+#define MAX_NUM_CHANNELS   ((HV_HYP_PAGE_SIZE >> 1) << 3)
 
 /* The value here must be in multiple of 32 */
 /* TODO: Need to make this configurable */
-- 
2.17.1

Re: [PATCH -mm] autonuma: Fix scan period updating

2019-07-12 Thread Mel Gorman

On Thu, Jul 04, 2019 at 08:32:06AM +0800, Huang, Ying wrote:
> Mel Gorman  writes:
> 
> > On Tue, Jun 25, 2019 at 09:23:22PM +0800, huang ying wrote:
> >> On Mon, Jun 24, 2019 at 10:25 PM Mel Gorman  wrote:
> >> >
> >> > On Mon, Jun 24, 2019 at 10:56:04AM +0800, Huang Ying wrote:
> >> > > The autonuma scan period should be increased (scanning is slowed down)
> >> > > if the majority of the page accesses are shared with other processes.
> >> > > But in current code, the scan period will be decreased (scanning is
> >> > > speeded up) in that situation.
> >> > >
> >> > > This patch fixes the code.  And this has been tested via tracing the
> >> > > scan period changing and /proc/vmstat numa_pte_updates counter when
> >> > > running a multi-threaded memory accessing program (most memory
> >> > > areas are accessed by multiple threads).
> >> > >
> >> >
> >> > The patch somewhat flips the logic on whether shared or private is
> >> > considered and it's not immediately obvious why that was required. That
> >> > aside, other than the impact on numa_pte_updates, what actual
> >> > performance difference was measured and on on what workloads?
> >> 
> >> The original scanning period updating logic doesn't match the original
> >> patch description and comments.  I think the original patch
> >> description and comments make more sense.  So I fix the code logic to
> >> make it match the original patch description and comments.
> >> 
> >> If my understanding to the original code logic and the original patch
> >> description and comments were correct, do you think the original patch
> >> description and comments are wrong so we need to fix the comments
> >> instead?  Or you think we should prove whether the original patch
> >> description and comments are correct?
> >> 
> >
> > I'm about to get knocked offline so cannot answer properly. The code may
> > indeed be wrong and I have observed higher than expected NUMA scanning
> > behaviour than expected although not enough to cause problems. A comment
> > fix is fine but if you're changing the scanning behaviour, it should be
> > backed up with data justifying that the change both reduces the observed
> > scanning and that it has no adverse performance implications.
> 
> Got it!  Thanks for comments!  As for performance testing, do you have
> some candidate workloads?
> 

Ordinarily I would hope that the patch was motivated by observed
behaviour so you have a metric for goodness. However, for NUMA balancing
I would typically run basic workloads first -- dbench, tbench, netperf,
hackbench and pipetest. The objective would be to measure the degree
automatic NUMA balancing is interfering with a basic workload to see if
they patch reduces the number of minor faults incurred even though there
is no NUMA balancing to be worried about. This measures the general
overhead of a patch. If your reasoning is correct, you'd expect lower
overhead.

For balancing itself, I usually look at Andrea's original autonuma
benchmark, NAS Parallel Benchmark (D class usually although C class for
much older or smaller machines) and spec JBB 2005 and 2015. Of the JBB
benchmarks, 2005 is usually more reasonable for evaluating NUMA balancing
than 2015 is (which can be unstable for a variety of reasons). In this
case, I would be looking at whether the overhead is reduced, whether the
ratio of local hits is the same or improved and the primary metric of
each (time to completion for Andrea's and NAS, throughput for JBB).

Even if there is no change to locality and the primary metric but there
is less scanning and overhead overall, it would still be an improvement.

If you have trouble doing such an evaluation, I'll queue tests if they
are based on a patch that addresses the specific point of concern (scan
period not updated) as it's still not obvious why flipping the logic of
whether shared or private is considered was necessary.

-- 
Mel Gorman
SUSE Labs

[PATCH v4 4/5] HID: hv: Remove dependencies on PAGE_SIZE for ring buffer

2019-07-12 Thread Maya Nakamura

Define the ring buffer size as a constant expression because it should
not depend on the guest page size.

Signed-off-by: Maya Nakamura 
Reviewed-by: Michael Kelley 
---
 drivers/hid/hid-hyperv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c
index 7795831d37c2..cc5b09b87ab0 100644
--- a/drivers/hid/hid-hyperv.c
+++ b/drivers/hid/hid-hyperv.c
@@ -104,8 +104,8 @@ struct synthhid_input_report {
 
 #pragma pack(pop)
 
-#define INPUTVSC_SEND_RING_BUFFER_SIZE (10*PAGE_SIZE)
-#define INPUTVSC_RECV_RING_BUFFER_SIZE (10*PAGE_SIZE)
+#define INPUTVSC_SEND_RING_BUFFER_SIZE (40 * 1024)
+#define INPUTVSC_RECV_RING_BUFFER_SIZE (40 * 1024)
 
 
 enum pipe_prot_msg_type {
-- 
2.17.1

[PATCH v4 5/5] Input: hv: Remove dependencies on PAGE_SIZE for ring buffer

2019-07-12 Thread Maya Nakamura

Define the ring buffer size as a constant expression because it should
not depend on the guest page size.

Signed-off-by: Maya Nakamura 
Reviewed-by: Michael Kelley 
---
 drivers/input/serio/hyperv-keyboard.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/input/serio/hyperv-keyboard.c 
b/drivers/input/serio/hyperv-keyboard.c
index 8e457e50f837..88ae7c2ac3c8 100644
--- a/drivers/input/serio/hyperv-keyboard.c
+++ b/drivers/input/serio/hyperv-keyboard.c
@@ -75,8 +75,8 @@ struct synth_kbd_keystroke {
 
 #define HK_MAXIMUM_MESSAGE_SIZE 256
 
-#define KBD_VSC_SEND_RING_BUFFER_SIZE  (10 * PAGE_SIZE)
-#define KBD_VSC_RECV_RING_BUFFER_SIZE  (10 * PAGE_SIZE)
+#define KBD_VSC_SEND_RING_BUFFER_SIZE  (40 * 1024)
+#define KBD_VSC_RECV_RING_BUFFER_SIZE  (40 * 1024)
 
 #define XTKBD_EMUL0 0xe0
 #define XTKBD_EMUL1 0xe1
-- 
2.17.1

Re: [PATCH v1] drm/modes: Skip invalid cmdline mode

2019-07-12 Thread Dmitry Osipenko

12.07.2019 11:10, Maxime Ripard пишет:
> On Thu, Jul 11, 2019 at 06:55:03PM +0300, Dmitry Osipenko wrote:
>> 11.07.2019 12:03, Maxime Ripard пишет:
>>> On Wed, Jul 10, 2019 at 06:05:18PM +0300, Dmitry Osipenko wrote:
 10.07.2019 17:05, Maxime Ripard пишет:
> On Wed, Jul 10, 2019 at 04:29:19PM +0300, Dmitry Osipenko wrote:
>> This works:
>>
>> diff --git a/drivers/gpu/drm/drm_client_modeset.c 
>> b/drivers/gpu/drm/drm_client_modeset.c
>> index 56d36779d213..e5a2f9c8f404 100644
>> --- a/drivers/gpu/drm/drm_client_modeset.c
>> +++ b/drivers/gpu/drm/drm_client_modeset.c
>> @@ -182,6 +182,8 @@ drm_connector_pick_cmdline_mode(struct drm_connector 
>> *connector)
>> mode = drm_mode_create_from_cmdline_mode(connector->dev, 
>> cmdline_mode);
>> if (mode)
>> list_add(&mode->head, &connector->modes);
>> +   else
>> +   cmdline_mode->specified = false;
>
> Hmmm, it's not clear to me why that wouldn't be the case.
>
> If we come back to the beginning of that function, we retrieve the
> cmdline_mode buffer from the connector pointer, that will probably
> have been parsed a first time using drm_mode_create_from_cmdline_mode
> in drm_helper_probe_add_cmdline_mode.
>
> Now, I'm guessing that the issue is that in
> drm_mode_parse_command_line_for_connector, if we have a named mode, we
> just copy the mode over and set mode->specified.
>
> And we then move over to do other checks, and that's probably what
> fails and returns, but our drm_cmdline_mode will have been modified.
>
> I'm not entirely sure how to deal with that though.
>
> I guess we could allocate a drm_cmdline_mode structure on the stack,
> fill that, and if successful copy over its content to the one in
> drm_connector. That would allow us to only change the content on
> success, which is what I would expect from such a function?
>
> How does that sound?

 I now see that there is DRM_MODE_TYPE_USERDEF flag that is assigned only
 for the "cmdline" mode and drm_client_rotation() is the only place in
 DRM code that cares about whether mode is from cmdline, hence looks like
 it will be more correct to do the following:
>>>
>>> I'm still under the impression that we're dealing with workarounds of
>>> a more central issue, which is that we shouldn't return a partially
>>> modified drm_cmdline_mode.
>>>
>>> You said it yourself, the breakage is in the commit changing the
>>> command line parsing logic, while you're fixing here some code that
>>> was introduced later on.
>>
>> The problem stems from assumption that *any* named mode is valid. It
>> looks to me that the ultimate solution would be to move the mode's name
>> comparison into the [1], if that's possible.
>>
>> [1] drm_mode_parse_command_line_for_connector()
> 
> Well, one could argue that video=tegrafb is invalid and should be
> rejected as well, but we haven't cleared that up.

The video=tegrafb is invalid mode, there is nothing to argue here. And
the problem is that invalid modes and not rejected for the very beginning.

>>> Can you try the followintg patch?
>>> http://code.bulix.org/8cwk4c-794565?raw
>>
>> This doesn't help because the problem with the rotation_reflection is
>> that it's 0 if "rotation" not present in the cmdline and then ilog2(0)
>> returns -1. So the patch "drm/modes: Don't apply cmdline's rotation if
>> it wasn't specified" should be correct in any case.
> 
> So we would have the same issue with rotate=0 then?

No, we won't. Rotation mode is parsed into the DRM_MODE bitmask and
rotate=0 corresponds to DRM_MODE_ROTATE_0, which is BIT(0) as you may
notice. Hence rotation_reflection=0 is always an invalid value, meaning
that "rotate" option does not present in the cmdline. Please consult the
code, in particular see drm_mode_parse_cmdline_options() which was
written by yourself ;)

[PATCH] fdt: Properly handle "no-map" field in the memory region

2019-07-12 Thread KarimAllah Ahmed

Mark the memory region with NOMAP flag instead of completely removing it
from the memory blocks. That makes the FDT handling consistent with the EFI
memory map handling.

Cc: Rob Herring 
Cc: Frank Rowand 
Cc: devicet...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed 
---
 drivers/of/fdt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index de893c9..77982ae 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -1175,7 +1175,7 @@ int __init __weak 
early_init_dt_reserve_memory_arch(phys_addr_t base,
phys_addr_t size, bool nomap)
 {
if (nomap)
-   return memblock_remove(base, size);
+   return memblock_mark_nomap(base, size);
return memblock_reserve(base, size);
 }
 
-- 
2.7.4

[PATCH v7 1/3] KVM: x86: add support for user wait instructions

2019-07-12 Thread Tao Xu

UMONITOR, UMWAIT and TPAUSE are a set of user wait instructions.
This patch adds support for user wait instructions in KVM. Availability
of the user wait instructions is indicated by the presence of the CPUID
feature flag WAITPKG CPUID.0x07.0x0:ECX[5]. User wait instructions may
be executed at any privilege level, and use IA32_UMWAIT_CONTROL MSR to
set the maximum time.

The behavior of user wait instructions in VMX non-root operation is
determined first by the setting of the "enable user wait and pause"
secondary processor-based VM-execution control bit 26.
If the VM-execution control is 0, UMONITOR/UMWAIT/TPAUSE cause
an invalid-opcode exception (#UD).
If the VM-execution control is 1, treatment is based on the
setting of the “RDTSC exiting” VM-execution control. Because KVM never
enables RDTSC exiting, if the instruction causes a delay, the amount of
time delayed is called here the physical delay. The physical delay is
first computed by determining the virtual delay. If
IA32_UMWAIT_CONTROL[31:2] is zero, the virtual delay is the value in
EDX:EAX minus the value that RDTSC would return; if
IA32_UMWAIT_CONTROL[31:2] is not zero, the virtual delay is the minimum
of that difference and AND(IA32_UMWAIT_CONTROL,FFFCH).

Because umwait and tpause can put a (psysical) CPU into a power saving
state, by default we dont't expose it to kvm and enable it only when
guest CPUID has it.

Detailed information about user wait instructions can be found in the
latest Intel 64 and IA-32 Architectures Software Developer's Manual.

Co-developed-by: Jingqi Liu 
Signed-off-by: Jingqi Liu 
Signed-off-by: Tao Xu 
---

Changes in v7:
- Add nested support for user wait instructions (Paolo)
---
 arch/x86/include/asm/vmx.h |  1 +
 arch/x86/kvm/cpuid.c   |  2 +-
 arch/x86/kvm/vmx/nested.c  |  1 +
 arch/x86/kvm/vmx/vmx.c | 20 
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index a39136b0d509..8f00882664d3 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -69,6 +69,7 @@
 #define SECONDARY_EXEC_PT_USE_GPA  0x0100
 #define SECONDARY_EXEC_MODE_BASED_EPT_EXEC 0x0040
 #define SECONDARY_EXEC_TSC_SCALING  0x0200
+#define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE   0x0400
 
 #define PIN_BASED_EXT_INTR_MASK 0x0001
 #define PIN_BASED_NMI_EXITING   0x0008
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 4992e7c99588..7d2cd4066f64 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -402,7 +402,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ |
F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
-   F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B);
+   F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/;
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 46af3a5e9209..a4d5da34b306 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2048,6 +2048,7 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, 
struct vmcs12 *vmcs12)
  SECONDARY_EXEC_ENABLE_INVPCID |
  SECONDARY_EXEC_RDTSCP |
  SECONDARY_EXEC_XSAVES |
+ SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
  SECONDARY_EXEC_APIC_REGISTER_VIRT |
  SECONDARY_EXEC_ENABLE_VMFUNC);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d98eac371c0a..f411c9ae5589 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2247,6 +2247,7 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf,
SECONDARY_EXEC_RDRAND_EXITING |
SECONDARY_EXEC_ENABLE_PML |
SECONDARY_EXEC_TSC_SCALING |
+   SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE |
SECONDARY_EXEC_PT_USE_GPA |
SECONDARY_EXEC_PT_CONCEAL_VMX |
SECONDARY_EXEC_ENABLE_VMFUNC |
@@ -3984,6 +3985,25 @@ static void vmx_compute_secondary_exec_control(struct 
vcpu_vmx *vmx)
}
}
 
+   if (vmcs_config.cpu_based_2nd_exec_ctrl &
+   SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE) {
+   /* Exposing WAITPKG only when WAITPKG is exposed */
+   bool waitpkg_enabled =
+   guest_cpuid_has(vcpu, X86_FEATURE_WAITPKG);
+
+   if (!waitpkg_enabled)
+

[PATCH v7 3/3] KVM: vmx: handle vm-exit for UMWAIT and TPAUSE

2019-07-12 Thread Tao Xu

As the latest Intel 64 and IA-32 Architectures Software Developer's
Manual, UMWAIT and TPAUSE instructions cause a VM exit if the
RDTSC exiting and enable user wait and pause VM-execution
controls are both 1.

This patch is to handle the vm-exit for UMWAIT and TPAUSE as this
should never happen.

Co-developed-by: Jingqi Liu 
Signed-off-by: Jingqi Liu 
Signed-off-by: Tao Xu 
---

Changes in v7:
- Add nested exit reason for UMWAIT and TPAUSE (Paolo)
---
 arch/x86/include/uapi/asm/vmx.h |  6 +-
 arch/x86/kvm/vmx/nested.c   |  3 +++
 arch/x86/kvm/vmx/vmx.c  | 16 
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index d213ec5c3766..d88d7a68849b 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -85,6 +85,8 @@
 #define EXIT_REASON_PML_FULL62
 #define EXIT_REASON_XSAVES  63
 #define EXIT_REASON_XRSTORS 64
+#define EXIT_REASON_UMWAIT  67
+#define EXIT_REASON_TPAUSE  68
 
 #define VMX_EXIT_REASONS \
{ EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \
@@ -142,7 +144,9 @@
{ EXIT_REASON_RDSEED,"RDSEED" }, \
{ EXIT_REASON_PML_FULL,  "PML_FULL" }, \
{ EXIT_REASON_XSAVES,"XSAVES" }, \
-   { EXIT_REASON_XRSTORS,   "XRSTORS" }
+   { EXIT_REASON_XRSTORS,   "XRSTORS" }, \
+   { EXIT_REASON_UMWAIT,"UMWAIT" }, \
+   { EXIT_REASON_TPAUSE,"TPAUSE" }
 
 #define VMX_ABORT_SAVE_GUEST_MSR_FAIL1
 #define VMX_ABORT_LOAD_HOST_PDPTE_FAIL   2
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a4d5da34b306..9f91f834ec43 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5213,6 +5213,9 @@ bool nested_vmx_exit_reflected(struct kvm_vcpu *vcpu, u32 
exit_reason)
case EXIT_REASON_ENCLS:
/* SGX is never exposed to L1 */
return false;
+   case EXIT_REASON_UMWAIT: case EXIT_REASON_TPAUSE:
+   return nested_cpu_has2(vmcs12,
+   SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE);
default:
return true;
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0787f140d155..e026b1313dc3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5349,6 +5349,20 @@ static int handle_monitor(struct kvm_vcpu *vcpu)
return handle_nop(vcpu);
 }
 
+static int handle_umwait(struct kvm_vcpu *vcpu)
+{
+   kvm_skip_emulated_instruction(vcpu);
+   WARN(1, "this should never happen\n");
+   return 1;
+}
+
+static int handle_tpause(struct kvm_vcpu *vcpu)
+{
+   kvm_skip_emulated_instruction(vcpu);
+   WARN(1, "this should never happen\n");
+   return 1;
+}
+
 static int handle_invpcid(struct kvm_vcpu *vcpu)
 {
u32 vmx_instruction_info;
@@ -5559,6 +5573,8 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu 
*vcpu) = {
[EXIT_REASON_VMFUNC]  = handle_vmx_instruction,
[EXIT_REASON_PREEMPTION_TIMER]= handle_preemption_timer,
[EXIT_REASON_ENCLS]   = handle_encls,
+   [EXIT_REASON_UMWAIT]  = handle_umwait,
+   [EXIT_REASON_TPAUSE]  = handle_tpause,
 };
 
 static const int kvm_vmx_max_exit_handlers =
-- 
2.20.1

[PATCH v7 2/3] KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL

2019-07-12 Thread Tao Xu

UMWAIT and TPAUSE instructions use IA32_UMWAIT_CONTROL at MSR index E1H
to determines the maximum time in TSC-quanta that the processor can reside
in either C0.1 or C0.2.

This patch emulates MSR IA32_UMWAIT_CONTROL in guest and differentiate
IA32_UMWAIT_CONTROL between host and guest. The variable
mwait_control_cached in arch/x86/power/umwait.c caches the MSR value, so
this patch uses it to avoid frequently rdmsr of IA32_UMWAIT_CONTROL.

Co-developed-by: Jingqi Liu 
Signed-off-by: Jingqi Liu 
Signed-off-by: Tao Xu 
---

Changes in v7:
- Use the test on vmx->secondary_exec_control to replace
  guest_cpuid_has (Paolo)
---
 arch/x86/kernel/cpu/umwait.c |  3 ++-
 arch/x86/kvm/vmx/vmx.c   | 33 +
 arch/x86/kvm/vmx/vmx.h   |  9 +
 arch/x86/kvm/x86.c   |  1 +
 4 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/umwait.c b/arch/x86/kernel/cpu/umwait.c
index 6a204e7336c1..631152a67c6e 100644
--- a/arch/x86/kernel/cpu/umwait.c
+++ b/arch/x86/kernel/cpu/umwait.c
@@ -15,7 +15,8 @@
  * Cache IA32_UMWAIT_CONTROL MSR. This is a systemwide control. By default,
  * umwait max time is 10 in TSC-quanta and C0.2 is enabled
  */
-static u32 umwait_control_cached = UMWAIT_CTRL_VAL(10, UMWAIT_C02_ENABLE);
+u32 umwait_control_cached = UMWAIT_CTRL_VAL(10, UMWAIT_C02_ENABLE);
+EXPORT_SYMBOL_GPL(umwait_control_cached);
 
 /*
  * Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL MSR in
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f411c9ae5589..0787f140d155 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1676,6 +1676,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
 #endif
case MSR_EFER:
return kvm_get_msr_common(vcpu, msr_info);
+   case MSR_IA32_UMWAIT_CONTROL:
+   if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx))
+   return 1;
+
+   msr_info->data = vmx->msr_ia32_umwait_control;
+   break;
case MSR_IA32_SPEC_CTRL:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
@@ -1838,6 +1844,16 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
return 1;
vmcs_write64(GUEST_BNDCFGS, data);
break;
+   case MSR_IA32_UMWAIT_CONTROL:
+   if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx))
+   return 1;
+
+   /* The reserved bit IA32_UMWAIT_CONTROL[1] should be zero */
+   if (data & BIT_ULL(1))
+   return 1;
+
+   vmx->msr_ia32_umwait_control = data;
+   break;
case MSR_IA32_SPEC_CTRL:
if (!msr_info->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
@@ -4139,6 +4155,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
vmx->rmode.vm86_active = 0;
vmx->spec_ctrl = 0;
 
+   vmx->msr_ia32_umwait_control = 0;
+
vcpu->arch.microcode_version = 0x1ULL;
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(vcpu, 0);
@@ -6352,6 +6370,19 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
msrs[i].host, false);
 }
 
+static void atomic_switch_umwait_control_msr(struct vcpu_vmx *vmx)
+{
+   if (!vmx_has_waitpkg(vmx))
+   return;
+
+   if (vmx->msr_ia32_umwait_control != umwait_control_cached)
+   add_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL,
+   vmx->msr_ia32_umwait_control,
+   umwait_control_cached, false);
+   else
+   clear_atomic_switch_msr(vmx, MSR_IA32_UMWAIT_CONTROL);
+}
+
 static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
 {
vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
@@ -6460,6 +6491,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
atomic_switch_perf_msrs(vmx);
 
+   atomic_switch_umwait_control_msr(vmx);
+
vmx_update_hv_timer(vcpu);
 
/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 61128b48c503..b4ca34f7a2da 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -14,6 +14,8 @@
 extern const u32 vmx_msr_index[];
 extern u64 host_efer;
 
+extern u32 umwait_control_cached;
+
 #define MSR_TYPE_R 1
 #define MSR_TYPE_W 2
 #define MSR_TYPE_RW3
@@ -194,6 +196,7 @@ struct vcpu_vmx {
 #endif
 
u64   spec_ctrl;
+   u64   msr_ia32_umwait_control;
 
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
@@ -523,6 +526,12 @@ static inline void decache_tsc_multiplier(struct vcpu_vmx 
*vmx)
vmcs_write64(TSC_MULTIPLIER, vmx->current_tsc_ratio);
 }
 
+static inline bool vmx

[PATCH v7 0/3] KVM: x86: Enable user wait instructions

2019-07-12 Thread Tao Xu

UMONITOR, UMWAIT and TPAUSE are a set of user wait instructions.

UMONITOR arms address monitoring hardware using an address. A store
to an address within the specified address range triggers the
monitoring hardware to wake up the processor waiting in umwait.

UMWAIT instructs the processor to enter an implementation-dependent
optimized state while monitoring a range of addresses. The optimized
state may be either a light-weight power/performance optimized state
(c0.1 state) or an improved power/performance optimized state
(c0.2 state).

TPAUSE instructs the processor to enter an implementation-dependent
optimized state c0.1 or c0.2 state and wake up when time-stamp counter
reaches specified timeout.

Availability of the user wait instructions is indicated by the presence
of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5].

The patches enable the umonitor, umwait and tpause features in KVM.
Because umwait and tpause can put a (psysical) CPU into a power saving
state, by default we dont't expose it to kvm and enable it only when
guest CPUID has it. If the instruction causes a delay, the amount
of time delayed is called here the physical delay. The physical delay is
first computed by determining the virtual delay (the time to delay
relative to the VM’s timestamp counter). 

The release document ref below link:
Intel 64 and IA-32 Architectures Software Developer's Manual,
https://software.intel.com/sites/default/files/\
managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf

Changelog:
v7:
Add nested support for user wait instructions (Paolo)
Use the test on vmx->secondary_exec_control to replace
guest_cpuid_has (Paolo)
v6:
add check msr_info->host_initiated in get/set msr(Xiaoyao)
restore the atomic_switch_umwait_control_msr()(Xiaoyao)

Tao Xu (3):
  KVM: x86: add support for user wait instructions
  KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL
  KVM: vmx: handle vm-exit for UMWAIT and TPAUSE

 arch/x86/include/asm/vmx.h  |  1 +
 arch/x86/include/uapi/asm/vmx.h |  6 ++-
 arch/x86/kernel/cpu/umwait.c|  3 +-
 arch/x86/kvm/cpuid.c|  2 +-
 arch/x86/kvm/vmx/nested.c   |  4 ++
 arch/x86/kvm/vmx/vmx.c  | 69 +
 arch/x86/kvm/vmx/vmx.h  |  9 +
 arch/x86/kvm/x86.c  |  1 +
 8 files changed, 92 insertions(+), 3 deletions(-)

-- 
2.20.1

Re: linux-next: build failure after merge of the char-misc tree

2019-07-12 Thread Greg KH

On Fri, Jul 12, 2019 at 10:44:30AM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> On Mon, 8 Jul 2019 19:23:45 +1000 Stephen Rothwell  
> wrote:
> > 
> > After merging the char-misc tree, today's linux-next build (x86_64
> > allmodconfig) failed like this:
> > 
> > drivers/misc/vmw_balloon.c: In function 'vmballoon_mount':
> > drivers/misc/vmw_balloon.c:1736:14: error: 'simple_dname' undeclared (first 
> > use in this function); did you mean 'simple_rename'?
> >.d_dname = simple_dname,
> >   ^~~~
> >   simple_rename
> > drivers/misc/vmw_balloon.c:1736:14: note: each undeclared identifier is 
> > reported only once for each function it appears in
> > drivers/misc/vmw_balloon.c:1739:9: error: implicit declaration of function 
> > 'mount_pseudo'; did you mean 'mount_bdev'? 
> > [-Werror=implicit-function-declaration]
> >   return mount_pseudo(fs_type, "balloon-vmware:", NULL, &ops,
> >  ^~~~
> >  mount_bdev
> > drivers/misc/vmw_balloon.c:1739:9: warning: returning 'int' from a function 
> > with return type 'struct dentry *' makes pointer from integer without a 
> > cast [-Wint-conversion]
> >   return mount_pseudo(fs_type, "balloon-vmware:", NULL, &ops,
> >  ^~~~
> > BALLOON_VMW_MAGIC);
> > ~~
> > 
> > Caused by commit
> > 
> >   83a8afa72e9c ("vmw_balloon: Compaction support")
> > 
> > interacting with commits
> > 
> >   7e5f7bb08b8c ("unexport simple_dname()")
> >   8d9e46d80777 ("fold mount_pseudo_xattr() into pseudo_fs_get_tree()")
> > 
> > from the vfs tree.
> > 
> > I applied the following merge fix patch:
> > 
> > From: Stephen Rothwell 
> > Date: Mon, 8 Jul 2019 19:17:56 +1000
> > Subject: [PATCH] convert vmwballoon to use the new mount API
> > 
> > Signed-off-by: Stephen Rothwell 
> > ---
> >  drivers/misc/vmw_balloon.c | 14 --
> >  1 file changed, 4 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c
> > index 91fa43051535..e8c0f7525f13 100644
> > --- a/drivers/misc/vmw_balloon.c
> > +++ b/drivers/misc/vmw_balloon.c
> > @@ -29,6 +29,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -1728,21 +1729,14 @@ static inline void vmballoon_debugfs_exit(struct 
> > vmballoon *b)
> >  
> >  #ifdef CONFIG_BALLOON_COMPACTION
> >  
> > -static struct dentry *vmballoon_mount(struct file_system_type *fs_type,
> > - int flags, const char *dev_name,
> > - void *data)
> > +static int vmballoon_init_fs_context(struct fs_context *fc)
> >  {
> > -   static const struct dentry_operations ops = {
> > -   .d_dname = simple_dname,
> > -   };
> > -
> > -   return mount_pseudo(fs_type, "balloon-vmware:", NULL, &ops,
> > -   BALLOON_VMW_MAGIC);
> > +   return init_pseudo(fc, BALLOON_VMW_MAGIC) ? 0 : -ENOMEM;
> >  }
> >  
> >  static struct file_system_type vmballoon_fs = {
> > .name   = "balloon-vmware",
> > -   .mount  = vmballoon_mount,
> > +   .init_fs_context  = vmballoon_init_fs_context,
> > .kill_sb= kill_anon_super,
> >  };
> >  
> 
> This is now a conflict between the vfs tree and Linus' tree.

Looks good to me, I'll watch out for this when Al's tree is merged.

thanks,

greg k-h

Re: Staging status of speakup

2019-07-12 Thread Greg Kroah-Hartman

On Sun, Jul 07, 2019 at 08:57:10AM +0200, Greg Kroah-Hartman wrote:
> On Sat, Jul 06, 2019 at 08:08:57PM +0100, Okash Khawaja wrote:
> > On Fri, 15 Mar 2019 20:18:31 -0700
> > Greg Kroah-Hartman  wrote:
> > 
> > > On Fri, Mar 15, 2019 at 01:01:27PM +, Okash Khawaja wrote:
> > > > Hi,
> > > > 
> > > > We have made progress on the items in TODO file of speakup driver in
> > > > staging directory and wanted to get some clarity on the remaining
> > > > items. Below is a summary of status of each item along with the
> > > > quotes from TODO file.
> > > > 
> > > > 1. "The first issue has to do with the way speakup communicates
> > > > with serial ports.  Currently, we communicate directly with the
> > > > hardware ports. This however conflicts with the standard serial
> > > > port drivers, which poses various problems. This is also not
> > > > working for modern hardware such as PCI-based serial ports.  Also,
> > > > there is not a way we can communicate with USB devices.  The
> > > > current serial port handling code is in serialio.c in this
> > > > directory."
> > > > 
> > > > Drivers for all external synths now use TTY to communcate with the
> > > > devices. Only ones still using direct communication with hardware
> > > > ports are internal synths: acntpc, decpc, dtlk and keypc. These are
> > > > typically ISA cards and generally hardware which is difficult to
> > > > make work. We can leave these in staging.  
> > > 
> > > Ok, that's fine.
> > > 
> > > > 2. "Some places are currently using in_atomic() because speakup
> > > > functions are called in various contexts, and a couple of things
> > > > can't happen in these cases. Pushing work to some worker thread
> > > > would probably help, as was already done for the serial port
> > > > driving part."
> > > > 
> > > > There aren't any uses of in_atomic anymore. Commit d7500135802c
> > > > "Staging: speakup: Move pasting into a work item" was the last one
> > > > that removed such uses.  
> > > 
> > > Great, let's remove that todo item then.
> > > 
> > > > 3. "There is a duplication of the selection functions in
> > > > selections.c. These functions should get exported from
> > > > drivers/char/selection.c (clear_selection notably) and used from
> > > > there instead."
> > > > 
> > > > This is yet to be done. I guess drivers/char/selection.c is now
> > > > under drivers/tty/vt/selection.c.  
> > > 
> > > Yes, someone should update the todo item :)
> > > 
> > > > 4. "The kobjects may have to move to a more proper place in /sys.The
> > > > discussion on lkml resulted to putting speech synthesizers in the
> > > > "speech" class, and the speakup screen reader itself
> > > > into /sys/class/vtconsole/vtcon0/speakup, the nasty path being
> > > > handled by userland tools."
> > > > 
> > > > Although this makes logical sense, the change will mean changing
> > > > interface with userspace and hence the user space tools. I tried to
> > > > search the lkml discussion but couldn't find it. It will be good to
> > > > know your thoughts on this.  
> > > 
> > > I don't remember, sorry.  I can review the kobject/sysfs usage if you
> > > think it is "good enough" now and see if I find anything
> > > objectionable.
> > > 
> > > > Finally there is an issue where text in output buffer sometimes gets
> > > > garbled on SMP systems, but we can continue working on it after the
> > > > driver is moved out of staging, if that's okay. Basically we need a
> > > > reproducer of this issue.
> > > > 
> > > > In addition to above, there are likely code style issues which will
> > > > need to be fixed.
> > > > 
> > > > We are very keen to get speakup out of staging both, for settling
> > > > the driver but also for getting included in distros which build
> > > > only the mainline drivers.  
> > > 
> > > That's great, I am glad to see this happen.  How about work on the
> > > selection thing and then I can review the kobject stuff in a few
> > > weeks, and then we can start moving things for 5.2?
> > 
> > Hi Greg,
> > 
> > Apologies for the delay. I de-duplicated selection code in speakup to
> > use code that's already in kernel (commit ids 496124e5e16e and
> > 41f13084506a). Following items are what remain now:
> > 
> > 1. moving kobjects location
> > 2. fixing garbled text
> > 
> > I couldn't replicate garbled text but Simon (also in CC list) is
> > looking into it.
> > 
> > Can you please advise on the way forward?
> 
> I don't think the "garbled text" is an issue to get this out of staging
> if others do not see this.  It can be fixed like any other bug at a
> later point if it is figured out.
> 
> The kobject stuff does need to be looked at.  Let me carve out some time
> next week to do that and I will let you know what I see/recommend.

At first glance, this might all be just fine.

But, I can't quite figure out what some files are doing.  No matter
what, you will need Documentation/ABI/ entries for the speakup code for
these sysfs files.

Can you make up a patch to create a
drivers/

Re: [PATCH v3] media: si2168: Refactor command setup code

2019-07-12 Thread Uwe Kleine-König

Hello,

On Thu, Jul 04, 2019 at 12:33:22PM +0200, Marc Gonzalez wrote:
> Refactor the command setup code, and let the compiler determine
> the size of each command.
> 
> Reviewed-by: Jonathan Neuschäfer 
> Signed-off-by: Marc Gonzalez 
> ---
> Changes from v1:
> - Use a real function to populate struct si2168_cmd *cmd, and a trivial
> macro wrapping it (macro because sizeof).
> Changes from v2:
> - Fix header mess
> - Add Jonathan's tag
> ---
>  drivers/media/dvb-frontends/si2168.c | 146 +--
>  1 file changed, 45 insertions(+), 101 deletions(-)
> 
> diff --git a/drivers/media/dvb-frontends/si2168.c 
> b/drivers/media/dvb-frontends/si2168.c
> index c64b360ce6b5..5e81e076369c 100644
> --- a/drivers/media/dvb-frontends/si2168.c
> +++ b/drivers/media/dvb-frontends/si2168.c
> @@ -12,6 +12,16 @@
>  
>  static const struct dvb_frontend_ops si2168_ops;
>  
> +static void cmd_setup(struct si2168_cmd *cmd, char *args, int wlen, int rlen)

I'd add an "inline" here. And you could add a const for *args.

> +{
> + memcpy(cmd->args, args, wlen);
> + cmd->wlen = wlen;
> + cmd->rlen = rlen;
> +}
> +
> +#define CMD_SETUP(cmd, args, rlen) \
> + cmd_setup(cmd, args, sizeof(args) - 1, rlen)

Here is the chance to add some static checking. Also it is a good habit
to put parens around macro arguments.

Something like:

#define CMD_SETUP(cmd, args, rlen) ({ \
BUILD_BUG_ON(sizeof((args)) - 1 > SI2168_ARGLEN);
cmd_setup((cmd), (args), __must_be_array((args)) + sizeof((args)) - 1, 
(rlen));

Maybe let this macro live in drivers/media/dvb-frontends/si2168_priv.h
where struct si2168_cmd is defined?

I looked over the transformations in the rest of the patch and this
looks good.

Best regards
Uwe


signature.asc
Description: PGP signature

Re: [PATCH v2] printk: Do not lose last line in kmsg buffer dump

2019-07-12 Thread Vincent Whitchurch

On Fri, Jul 12, 2019 at 10:09:04AM +0200, Petr Mladek wrote:
> The patch looks like a hack using a hole that the next cycle
> does not longer check the number of really stored characters.
> 
> What would happen when msg_print_text() starts adding
> the trailing '\0' as suggested by
> https://lkml.kernel.org/r/20190710121049.rwhk7fknfzn3c...@pathway.suse.cz

I did have a look at that possibility, but I didn't see how that could
work without potentially affecting userspace users of the syslog ABI.
AFAICS the suggested change in msg_print_text() can be done in one of
three ways:

 (1) msg_print_text() adds the '\0' and includes this length both when
 it estimates the size (NULL buffer) and when it actually prints:

 If we do this:
 - kmsg_dump_get_line_nolock() would have to subtract 1 from the len
   since its callers expected that len is always smaller than the
   size of the buffer.
 - The buffers given to use via the syslog interface will now include
   a '\0', potentially affecting userspace applications which use
   this ABI.
 
 (2) msg_print_text() adds the '\0', and includes this in the length
 only when estimating the size, and not when it actually prints.

 If we do this:
 - SYSLOG_ACTION_SIZE_UNREAD tries uses the size estimate to give
   userspace a count of how many characters are present in the
   buffer, and now this count will start differing from the actual
   count that can be read, potentially affecting userspace
   applications.

 (3) msg_print_text() adds the '\0', and does not include this length
 in the result at all.

 If we do this:
 - The original kmsg dump issue is not solved, since the last line
   is still lost.

> BTW: What is the motivation for this fix? Is a bug report
> or just some research of possible buffer overflows?

The fix is not attempting to fix a buffer overflow, theoretical or
otherwise.

It's a fix for a bug in functionality which has been observed on our
systems:  We use pstore to save the kernel log when the kernel crashes,
and sometimes the log in the pstore misses the last line, and since the
last line usual says why we're panicing so it's rather important not to
miss.

> The commit message pretends that the problem is bigger than
> it really is. It is about one byte and not one line.

I'm not quite sure I follow.  The current code does fail to include the
*entire* last line.

The memcpy on line #1294 is never executed for the last line because we
stop the loop because of the check on line #1289:

  1270  static size_t msg_print_text(const struct printk_log *msg, bool syslog, 
char *buf, size_t size)
  1271  {
  1272  const char *text = log_text(msg);
  1273  size_t text_size = msg->text_len;
  1274  size_t len = 0;
  1275  
  1276  do {
  1277  const char *next = memchr(text, '\n', text_size);
  1278  size_t text_len;
  1279  
  1280  if (next) {
  1281  text_len = next - text;
  1282  next++;
  1283  text_size -= next - text;
  1284  } else {
  1285  text_len = text_size;
  1286  }
  1287  
  1288  if (buf) {
  1289  if (print_prefix(msg, syslog, NULL) +
  1290  text_len + 1 > size - len)
  1291  break;
  1292  
  1293  len += print_prefix(msg, syslog, buf + len);
  1294  memcpy(buf + len, text, text_len);
  1295  len += text_len;
  1296  buf[len++] = '\n';
  1297  } else {
  1298  /* SYSLOG_ACTION_* buffer size only calculation 
*/
  1299  len += print_prefix(msg, syslog, NULL);
  1300  len += text_len;
  1301  len++;
  1302  }

Re: linux-next: Fixes tag needs some work in the block tree

2019-07-12 Thread Minwoo Im

On 19-07-11 16:03:22, Jens Axboe wrote:
> On 7/11/19 3:35 PM, Stephen Rothwell wrote:
> > Hi all,
> > 
> > In commit
> > 
> >8f3858763d33 ("nvme: fix NULL deref for fabrics options")
> > 
> > Fixes tag
> > 
> >Fixes: 958f2a0f8 ("nvme-tcp: set the STABLE_WRITES flag when data digests
> > 
> > has these problem(s):
> > 
> >- SHA1 should be at least 12 digits long
> >  Can be fixed by setting core.abbrev to 12 (or more) or (for git v2.11
> >  or later) just making sure it is not set (or set to "auto").
> >- Subject has leading but no trailing parentheses
> >- Subject has leading but no trailing quotes
> > 
> > Please do not split Fixes tags over more than one line.  Also do not
> > include blank lines among the tags.

I'm sorry for noises here.  I will keep that in mind.

Thanks Stephen,

> 
> I should have caught that. Since it's top-of-tree and recent, I'll
> amend it.

Jens,  I will do it from the next time.  Thanks for ammend.

[PATCH] staging: android: ion: Remove unused rbtree for ion_buffer

2019-07-12 Thread Lecopzer Chen

ion_buffer_add() insert ion_buffer into rbtree every time creating
an ion_buffer but never use it after ION reworking.
Also, buffer_lock protects only rbtree operation, remove it together.

Signed-off-by: Lecopzer Chen 
Cc: YJ Chiang 
Cc: Lecopzer Chen 
---
 drivers/staging/android/ion/ion.c | 36 ---
 drivers/staging/android/ion/ion.h | 10 +
 2 files changed, 1 insertion(+), 45 deletions(-)

diff --git a/drivers/staging/android/ion/ion.c 
b/drivers/staging/android/ion/ion.c
index 92c2914239e3..e6b1ca141b93 100644
--- a/drivers/staging/android/ion/ion.c
+++ b/drivers/staging/android/ion/ion.c
@@ -29,32 +29,6 @@
 static struct ion_device *internal_dev;
 static int heap_id;
 
-/* this function should only be called while dev->lock is held */
-static void ion_buffer_add(struct ion_device *dev,
-  struct ion_buffer *buffer)
-{
-   struct rb_node **p = &dev->buffers.rb_node;
-   struct rb_node *parent = NULL;
-   struct ion_buffer *entry;
-
-   while (*p) {
-   parent = *p;
-   entry = rb_entry(parent, struct ion_buffer, node);
-
-   if (buffer < entry) {
-   p = &(*p)->rb_left;
-   } else if (buffer > entry) {
-   p = &(*p)->rb_right;
-   } else {
-   pr_err("%s: buffer already found.", __func__);
-   BUG();
-   }
-   }
-
-   rb_link_node(&buffer->node, parent, p);
-   rb_insert_color(&buffer->node, &dev->buffers);
-}
-
 /* this function should only be called while dev->lock is held */
 static struct ion_buffer *ion_buffer_create(struct ion_heap *heap,
struct ion_device *dev,
@@ -100,9 +74,6 @@ static struct ion_buffer *ion_buffer_create(struct ion_heap 
*heap,
 
INIT_LIST_HEAD(&buffer->attachments);
mutex_init(&buffer->lock);
-   mutex_lock(&dev->buffer_lock);
-   ion_buffer_add(dev, buffer);
-   mutex_unlock(&dev->buffer_lock);
return buffer;
 
 err1:
@@ -131,11 +102,6 @@ void ion_buffer_destroy(struct ion_buffer *buffer)
 static void _ion_buffer_destroy(struct ion_buffer *buffer)
 {
struct ion_heap *heap = buffer->heap;
-   struct ion_device *dev = buffer->dev;
-
-   mutex_lock(&dev->buffer_lock);
-   rb_erase(&buffer->node, &dev->buffers);
-   mutex_unlock(&dev->buffer_lock);
 
if (heap->flags & ION_HEAP_FLAG_DEFER_FREE)
ion_heap_freelist_add(heap, buffer);
@@ -694,8 +660,6 @@ static int ion_device_create(void)
}
 
idev->debug_root = debugfs_create_dir("ion", NULL);
-   idev->buffers = RB_ROOT;
-   mutex_init(&idev->buffer_lock);
init_rwsem(&idev->lock);
plist_head_init(&idev->heaps);
internal_dev = idev;
diff --git a/drivers/staging/android/ion/ion.h 
b/drivers/staging/android/ion/ion.h
index e291299fd35f..74914a266e25 100644
--- a/drivers/staging/android/ion/ion.h
+++ b/drivers/staging/android/ion/ion.h
@@ -23,7 +23,6 @@
 
 /**
  * struct ion_buffer - metadata for a particular buffer
- * @node:  node in the ion_device buffers tree
  * @list:  element in list of deferred freeable buffers
  * @dev:   back pointer to the ion_device
  * @heap:  back pointer to the heap the buffer came from
@@ -39,10 +38,7 @@
  * @attachments:   list of devices attached to this buffer
  */
 struct ion_buffer {
-   union {
-   struct rb_node node;
-   struct list_head list;
-   };
+   struct list_head list;
struct ion_device *dev;
struct ion_heap *heap;
unsigned long flags;
@@ -61,14 +57,10 @@ void ion_buffer_destroy(struct ion_buffer *buffer);
 /**
  * struct ion_device - the metadata of the ion device node
  * @dev:   the actual misc device
- * @buffers:   an rb tree of all the existing buffers
- * @buffer_lock:   lock protecting the tree of buffers
  * @lock:  rwsem protecting the tree of heaps and clients
  */
 struct ion_device {
struct miscdevice dev;
-   struct rb_root buffers;
-   struct mutex buffer_lock;
struct rw_semaphore lock;
struct plist_head heaps;
struct dentry *debug_root;
-- 
2.17.1

[PATCH] mm: sparse: Skip no-map regions in memblocks_present

2019-07-12 Thread KarimAllah Ahmed

Do not mark regions that are marked with nomap to be present, otherwise
these memblock cause unnecessarily allocation of metadata.

Cc: Andrew Morton 
Cc: Pavel Tatashin 
Cc: Oscar Salvador 
Cc: Michal Hocko 
Cc: Mike Rapoport 
Cc: Baoquan He 
Cc: Qian Cai 
Cc: Wei Yang 
Cc: Logan Gunthorpe 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed 
---
 mm/sparse.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/mm/sparse.c b/mm/sparse.c
index fd13166..33810b6 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -256,6 +256,10 @@ void __init memblocks_present(void)
struct memblock_region *reg;
 
for_each_memblock(memory, reg) {
+
+   if (memblock_is_nomap(reg))
+   continue;
+
memory_present(memblock_get_region_node(reg),
   memblock_region_memory_base_pfn(reg),
   memblock_region_memory_end_pfn(reg));
-- 
2.7.4

[PATCH] rdma/siw: avoid smp_store_mb() on a u64

2019-07-12 Thread Arnd Bergmann

The new siw driver fails to build on i386 with

drivers/infiniband/sw/siw/siw_qp.c:1025:3: error: invalid output size for 
constraint '+q'
smp_store_mb(*cq->notify, SIW_NOTIFY_NOT);
^
include/asm-generic/barrier.h:141:35: note: expanded from macro 'smp_store_mb'
 #define smp_store_mb(var, value)  __smp_store_mb(var, value)
  ^
arch/x86/include/asm/barrier.h:65:47: note: expanded from macro '__smp_store_mb'
 #define __smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0)
  ^
include/asm-generic/atomic-instrumented.h:1648:2: note: expanded from macro 
'xchg'
arch_xchg(__ai_ptr, __VA_ARGS__);   \
^
arch/x86/include/asm/cmpxchg.h:78:27: note: expanded from macro 'arch_xchg'
 #define arch_xchg(ptr, v)   __xchg_op((ptr), (v), xchg, "")
^
arch/x86/include/asm/cmpxchg.h:48:19: note: expanded from macro '__xchg_op'
  : "+q" (__ret), "+m" (*(ptr)) \
  ^
drivers/infiniband/sw/siw/siw_qp.o: In function `siw_sqe_complete':
siw_qp.c:(.text+0x1450): undefined reference to `__xchg_wrong_size'
drivers/infiniband/sw/siw/siw_qp.o: In function `siw_rqe_complete':
siw_qp.c:(.text+0x15b0): undefined reference to `__xchg_wrong_size'
drivers/infiniband/sw/siw/siw_verbs.o: In function `siw_req_notify_cq':
siw_verbs.c:(.text+0x18ff): undefined reference to `__xchg_wrong_size'

Since smp_store_mb() has to be an atomic store, but the architecture
can only do this on 32-bit quantities or smaller, but 'cq->notify'
is a 64-bit word.

Apparently the smp_store_mb() is paired with a READ_ONCE() here, which
seems like an odd choice because there is only a barrier on the writer
side and not the reader, and READ_ONCE() is already not atomic on
quantities larger than a CPU register.

I suspect it is sufficient to use the (possibly nonatomic) WRITE_ONCE()
and an SMP memory barrier here. If it does need to be atomic as well
as 64-bit quantities, using an atomic64_set_release()/atomic64_read_acquire()
may be a better choice.

Fixes: 303ae1cdfdf7 ("rdma/siw: application interface")
Fixes: f29dd55b0236 ("rdma/siw: queue pair methods")
Cc: Peter Zijlstra 
Signed-off-by: Arnd Bergmann 
---
 drivers/infiniband/sw/siw/siw_qp.c| 4 +++-
 drivers/infiniband/sw/siw/siw_verbs.c | 5 +++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw_qp.c 
b/drivers/infiniband/sw/siw/siw_qp.c
index 11383d9f95ef..a2c08f17f13d 100644
--- a/drivers/infiniband/sw/siw/siw_qp.c
+++ b/drivers/infiniband/sw/siw/siw_qp.c
@@ -1016,13 +1016,15 @@ static bool siw_cq_notify_now(struct siw_cq *cq, u32 
flags)
if (!cq->base_cq.comp_handler)
return false;
 
+   smp_rmb();
cq_notify = READ_ONCE(*cq->notify);
 
if ((cq_notify & SIW_NOTIFY_NEXT_COMPLETION) ||
((cq_notify & SIW_NOTIFY_SOLICITED) &&
 (flags & SIW_WQE_SOLICITED))) {
/* dis-arm CQ */
-   smp_store_mb(*cq->notify, SIW_NOTIFY_NOT);
+   WRITE_ONCE(*cq->notify, SIW_NOTIFY_NOT);
+   smp_wmb();
 
return true;
}
diff --git a/drivers/infiniband/sw/siw/siw_verbs.c 
b/drivers/infiniband/sw/siw/siw_verbs.c
index 32dc79d0e898..41c5ab293fe1 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.c
+++ b/drivers/infiniband/sw/siw/siw_verbs.c
@@ -1142,10 +1142,11 @@ int siw_req_notify_cq(struct ib_cq *base_cq, enum 
ib_cq_notify_flags flags)
 
if ((flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED)
/* CQ event for next solicited completion */
-   smp_store_mb(*cq->notify, SIW_NOTIFY_SOLICITED);
+   WRITE_ONCE(*cq->notify, SIW_NOTIFY_SOLICITED);
else
/* CQ event for any signalled completion */
-   smp_store_mb(*cq->notify, SIW_NOTIFY_ALL);
+   WRITE_ONCE(*cq->notify, SIW_NOTIFY_ALL);
+   smp_wmb();
 
if (flags & IB_CQ_REPORT_MISSED_EVENTS)
return cq->cq_put - cq->cq_get;
-- 
2.20.0

[PATCH] rdma/siw: select CONFIG_DMA_VIRT_OPS

2019-07-12 Thread Arnd Bergmann

Without this symbol we get a link failure:

ERROR: "dma_virt_ops" [drivers/infiniband/sw/siw/siw.ko] undefined!

Fixes: bdcf26bf9b3a ("rdma/siw: network and RDMA core interface")
Signed-off-by: Arnd Bergmann 
---
 drivers/infiniband/sw/siw/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/sw/siw/Kconfig 
b/drivers/infiniband/sw/siw/Kconfig
index 94f684174ce3..ea282789f466 100644
--- a/drivers/infiniband/sw/siw/Kconfig
+++ b/drivers/infiniband/sw/siw/Kconfig
@@ -1,6 +1,7 @@
 config RDMA_SIW
tristate "Software RDMA over TCP/IP (iWARP) driver"
depends on INET && INFINIBAND && CRYPTO_CRC32
+   select DMA_VIRT_OPS
help
This driver implements the iWARP RDMA transport over
the Linux TCP/IP network stack. It enables a system with a
-- 
2.20.0

[PATCH] rdma/siw: fix enum type mismatch warnings

2019-07-12 Thread Arnd Bergmann

The values in map_cqe_status[] don't match the type:

drivers/infiniband/sw/siw/siw_cq.c:31:4: error: implicit conversion from 
enumeration type 'enum siw_wc_status' to different enumeration type 'enum 
siw_opcode' [-Werror,-Wenum-conversion]
{ SIW_WC_SUCCESS, IB_WC_SUCCESS },
~ ^~
drivers/infiniband/sw/siw/siw_cq.c:32:4: error: implicit conversion from 
enumeration type 'enum siw_wc_status' to different enumeration type 'enum 
siw_opcode' [-Werror,-Wenum-conversion]
{ SIW_WC_LOC_LEN_ERR, IB_WC_LOC_LEN_ERR },
~ ^~

Change the struct definition to make them match and stop the
warning.

Fixes: b0fff7317bb4 ("rdma/siw: completion queue methods")
Signed-off-by: Arnd Bergmann 
---
 drivers/infiniband/sw/siw/siw_cq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/siw/siw_cq.c 
b/drivers/infiniband/sw/siw/siw_cq.c
index e2a0ee40d5b5..e381ae9b7d62 100644
--- a/drivers/infiniband/sw/siw/siw_cq.c
+++ b/drivers/infiniband/sw/siw/siw_cq.c
@@ -25,7 +25,7 @@ static int map_wc_opcode[SIW_NUM_OPCODES] = {
 };
 
 static struct {
-   enum siw_opcode siw;
+   enum siw_wc_status siw;
enum ib_wc_status ib;
 } map_cqe_status[SIW_NUM_WC_STATUS] = {
{ SIW_WC_SUCCESS, IB_WC_SUCCESS },
-- 
2.20.0

[PATCH] platform/x86: pcengines-apu2 needs gpiolib

2019-07-12 Thread Arnd Bergmann

I ran into another build issue in randconfig testing for this driver,
when CONFIG_GPIOLIB is not set:

WARNING: unmet direct dependencies detected for GPIO_AMD_FCH
  Depends on [n]: GPIOLIB [=n] && HAS_IOMEM [=y]
  Selected by [y]:
  - PCENGINES_APU2 [=y] && X86 [=y] && X86_PLATFORM_DEVICES [=y] && INPUT [=y] 
&& INPUT_KEYBOARD [=y] && LEDS_CLASS [=y]

WARNING: unmet direct dependencies detected for KEYBOARD_GPIO_POLLED
  Depends on [n]: !UML && INPUT [=y] && INPUT_KEYBOARD [=y] && GPIOLIB [=n]
  Selected by [y]:
  - PCENGINES_APU2 [=y] && X86 [=y] && X86_PLATFORM_DEVICES [=y] && INPUT [=y] 
&& INPUT_KEYBOARD [=y] && LEDS_CLASS [=y]

Make the 'select' statements conditional on that so we don't have to
introduce another 'select'.

Fixes: f8eb0235f659 ("x86: pcengines apuv2 gpio/leds/keys platform driver")
Fixes: a422bf11bdb4 ("platform/x86: fix PCENGINES_APU2 Kconfig warning")
Signed-off-by: Arnd Bergmann 
---
 drivers/platform/x86/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
index e869a5c760b6..cf48b9068843 100644
--- a/drivers/platform/x86/Kconfig
+++ b/drivers/platform/x86/Kconfig
@@ -1324,8 +1324,8 @@ config PCENGINES_APU2
tristate "PC Engines APUv2/3 front button and LEDs driver"
depends on INPUT && INPUT_KEYBOARD
depends on LEDS_CLASS
-   select GPIO_AMD_FCH
-   select KEYBOARD_GPIO_POLLED
+   select GPIO_AMD_FCH if GPIOLIB
+   select KEYBOARD_GPIO_POLLED if GPIOLIB
select LEDS_GPIO
help
  This driver provides support for the front button and LEDs on
-- 
2.20.0

Re: BUG: MAX_STACK_TRACE_ENTRIES too low! (2)

2019-07-12 Thread Peter Zijlstra

On Thu, Jul 11, 2019 at 11:53:12AM -0700, Bart Van Assche wrote:
> On 7/10/19 3:09 PM, Peter Zijlstra wrote:
> > One thing I mentioned when Thomas did the unwinder API changes was
> > trying to move lockdep over to something like stackdepot.
> > 
> > We can't directly use stackdepot as is, because it uses locks and memory
> > allocation, but we could maybe add a lower level API to it and use that
> > under the graph_lock() on static storage or something.
> > 
> > Otherwise we'll have to (re)implement something like it.
> > 
> > I've not looked at it in detail.
> 
> Hi Peter,
> 
> Is something like the untested patch below perhaps what you had in mind?

Most excellent, yes! Now I suppose the $64000 question is if it actually
reduces the amount of storage we use for stack traces..

Seems to boot just fine.. :-)

[PATCH] ASoC: audio-graph-card: fix type mismatch warning

2019-07-12 Thread Arnd Bergmann

The new temporary variable is lacks a 'const' annotation:

sound/soc/generic/audio-graph-card.c:87:7: error: assigning to 'u32 *' (aka 
'unsigned int *') from 'const void *' discards qualifiers 
[-Werror,-Wincompatible-pointer-types-discards-qualifiers]

Fixes: c152f8491a8d ("ASoC: audio-graph-card: fix an use-after-free in 
graph_get_dai_id()")
Signed-off-by: Arnd Bergmann 
---
 sound/soc/generic/audio-graph-card.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/generic/audio-graph-card.c 
b/sound/soc/generic/audio-graph-card.c
index c8abb86afefa..288df245b2f0 100644
--- a/sound/soc/generic/audio-graph-card.c
+++ b/sound/soc/generic/audio-graph-card.c
@@ -63,7 +63,7 @@ static int graph_get_dai_id(struct device_node *ep)
struct device_node *endpoint;
struct of_endpoint info;
int i, id;
-   u32 *reg;
+   const u32 *reg;
int ret;
 
/* use driver specified DAI ID if exist */
-- 
2.20.0

[PATCH 1/2] f2fs: introduce {page,io}_is_mergeable() for readability

2019-07-12 Thread Chao Yu

Wrap merge condition into function for readability, no logic change.

Signed-off-by: Chao Yu 
---
v2: remove bio validation check in page_is_mergeable().
 fs/f2fs/data.c | 40 +---
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 6a8db4abdf5f..f1e401f9fc13 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -482,6 +482,33 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
return 0;
 }
 
+static bool page_is_mergeable(struct f2fs_sb_info *sbi, struct bio *bio,
+   block_t last_blkaddr, block_t cur_blkaddr)
+{
+   if (last_blkaddr != cur_blkaddr)
+   return false;
+   return __same_bdev(sbi, cur_blkaddr, bio);
+}
+
+static bool io_type_is_mergeable(struct f2fs_bio_info *io,
+   struct f2fs_io_info *fio)
+{
+   if (io->fio.op != fio->op)
+   return false;
+   return io->fio.op_flags == fio->op_flags;
+}
+
+static bool io_is_mergeable(struct f2fs_sb_info *sbi, struct bio *bio,
+   struct f2fs_bio_info *io,
+   struct f2fs_io_info *fio,
+   block_t last_blkaddr,
+   block_t cur_blkaddr)
+{
+   if (!page_is_mergeable(sbi, bio, last_blkaddr, cur_blkaddr))
+   return false;
+   return io_type_is_mergeable(io, fio);
+}
+
 int f2fs_merge_page_bio(struct f2fs_io_info *fio)
 {
struct bio *bio = *fio->bio;
@@ -495,8 +522,8 @@ int f2fs_merge_page_bio(struct f2fs_io_info *fio)
trace_f2fs_submit_page_bio(page, fio);
f2fs_trace_ios(fio, 0);
 
-   if (bio && (*fio->last_block + 1 != fio->new_blkaddr ||
-   !__same_bdev(fio->sbi, fio->new_blkaddr, bio))) {
+   if (bio && !page_is_mergeable(fio->sbi, bio, *fio->last_block,
+   fio->new_blkaddr)) {
__submit_bio(fio->sbi, bio, fio->type);
bio = NULL;
}
@@ -569,9 +596,8 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio)
 
inc_page_count(sbi, WB_DATA_TYPE(bio_page));
 
-   if (io->bio && (io->last_block_in_bio != fio->new_blkaddr - 1 ||
-   (io->fio.op != fio->op || io->fio.op_flags != fio->op_flags) ||
-   !__same_bdev(sbi, fio->new_blkaddr, io->bio)))
+   if (io->bio && !io_is_mergeable(sbi, io->bio, io, fio,
+   io->last_block_in_bio, fio->new_blkaddr))
__submit_merged_bio(io);
 alloc_new:
if (io->bio == NULL) {
@@ -1643,8 +1669,8 @@ static int f2fs_read_single_page(struct inode *inode, 
struct page *page,
 * This page will go to BIO.  Do we need to send this
 * BIO off first?
 */
-   if (bio && (*last_block_in_bio != block_nr - 1 ||
-   !__same_bdev(F2FS_I_SB(inode), block_nr, bio))) {
+   if (bio && !page_is_mergeable(F2FS_I_SB(inode), bio,
+   *last_block_in_bio, block_nr - 1)) {
 submit_and_realloc:
__submit_bio(F2FS_I_SB(inode), bio, DATA);
bio = NULL;
-- 
2.18.0.rc1

Re: [PATCH] arm: Extend the check for RAM in /dev/mem

2019-07-12 Thread Russell King - ARM Linux admin

On Fri, Jul 12, 2019 at 02:58:18AM +, Raslan, KarimAllah wrote:
> On Fri, 2019-07-12 at 08:06 +0530, Anshuman Khandual wrote:
> > 
> > On 07/12/2019 03:51 AM, KarimAllah Ahmed wrote:
> > > 
> > > Some valid RAM can live outside kernel control (e.g. using mem= kernel
> > > command-line). For these regions, pfn_valid would return "false" causing
> > > system RAM to be mapped as uncached. Use memblock instead to identify RAM.
> > 
> > Once the remaining memory is outside of the kernel (as the admin would have
> > intended with mem= command line) what is the particular concern regarding
> > the way those get mapped (cached or not) ? It is not to be used any way.
> 
> They can be used by user-space which might lead to them being used by the 
> kernel. One use-case would be using them as guest memory for KVM as I 
> detailed 
> here:
> 
> https://lwn.net/Articles/778240/

>From the 32-bit ARM point of view...

What if someone's already doing something similar with a non-coherent
DSP and is relying on the current behaviour?  This change is a user
visible behavioural change that could end up breaking userspace.

In other words, it isn't something we should rush into.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

[PATCH] [net-next, netfilter] mlx5: avoid unused variable warning

2019-07-12 Thread Arnd Bergmann

Without CONFIG_MLX5_ESWITCH we get a harmless warning:

drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3467:21: error: unused 
variable 'priv' [-Werror,-Wunused-variable]
struct mlx5e_priv *priv = netdev_priv(dev);

Hide the declaration in the same #ifdef as its usage.

Fixes: 4e95bc268b91 ("net: flow_offload: add flow_block_cb_setup_simple()")
Signed-off-by: Arnd Bergmann 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 6d0ae87c8ded..b562ba904ea1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3464,7 +3464,9 @@ static LIST_HEAD(mlx5e_block_cb_list);
 static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
  void *type_data)
 {
+#ifdef CONFIG_MLX5_ESWITCH
struct mlx5e_priv *priv = netdev_priv(dev);
+#endif
 
switch (type) {
 #ifdef CONFIG_MLX5_ESWITCH
-- 
2.20.0

Re: [PATCH 4/4] numa: introduce numa cling feature

2019-07-12 Thread 王贇




On 2019/7/12 下午3:53, Peter Zijlstra wrote:
[snip]
return target;
  }
>>>
>>> Select idle sibling should never cross node boundaries and is thus the
>>> entirely wrong place to fix anything.
>>
>> Hmm.. in our early testing the printk show both select_task_rq_fair() and
>> task_numa_find_cpu() will call select_idle_sibling with prev and target on
>> different node, thus we pick this point to save few lines.
> 
> But it will never return @prev if it is not in the same cache domain as
> @target. See how everything is gated by:
> 
>   && cpus_share_cache(x, target)

Yeah, that's right.

> 
>> But if the semantics of select_idle_sibling() is to return cpu on the same
>> node of target, what about move the logical after select_idle_sibling() for
>> the two callers?
> 
> No, that's insane. You don't do select_idle_sibling() to then ignore the
> result. You have to change @target before calling select_idle_sibling().
> 

I see, we should not override the decision of select_idle_sibling().

Actually the original design we try to achieve is:

  let wake affine select the target
  try find idle sibling of target
  if got one
pick it
  else if task cling to prev
pick prev

That is to consider wake affine superior to numa cling.

But after rethinking maybe this is not necessary, since numa cling is
also some kind of strong wake affine hint, actually maybe even a better
one to filter out the bad cases.

I'll try change @target instead and give a retest then.

Regards,
Michael Wang

[PATCH] xen/trace: avoid clang warning on function pointers

2019-07-12 Thread Arnd Bergmann

clang-9 does not like the way that the is_signed_type() compares
function pointers deep inside of the trace even macros:

In file included from arch/x86/xen/trace.c:21:
In file included from include/trace/events/xen.h:475:
In file included from include/trace/define_trace.h:102:
In file included from include/trace/trace_events.h:467:
include/trace/events/xen.h:69:7: error: ordered comparison of function pointers 
('xen_mc_callback_fn_t' (aka 'void (*)(void *)') and 'xen_mc_callback_fn_t') 
[-Werror,-Wordered-compare-function-pointers]
__field(xen_mc_callback_fn_t, fn)
^
include/trace/trace_events.h:415:29: note: expanded from macro '__field'
 #define __field(type, item) __field_ext(type, item, FILTER_OTHER)
^
include/trace/trace_events.h:401:6: note: expanded from macro '__field_ext'
 is_signed_type(type), filter_type);\
 ^
include/linux/trace_events.h:540:44: note: expanded from macro 'is_signed_type'
 #define is_signed_type(type)(((type)(-1)) < (type)1)
  ^
note: (skipping 1 expansions in backtrace; use -fmacro-backtrace-limit=0 to see 
all)
include/trace/trace_events.h:77:16: note: expanded from macro 'TRACE_EVENT'
 PARAMS(tstruct),  \
 ~~~^~~~
include/linux/tracepoint.h:95:25: note: expanded from macro 'PARAMS'
 #define PARAMS(args...) args
^
include/trace/trace_events.h:455:2: note: expanded from macro 
'DECLARE_EVENT_CLASS'
tstruct;\
^~~

I guess the warning is reasonable in principle, though this seems to
be the only instance we get in the entire kernel today.
Shut up the warning by making it a void pointer in the exported
structure.

Fixes: c796f213a693 ("xen/trace: add multicall tracing")
Signed-off-by: Arnd Bergmann 
---
 include/trace/events/xen.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/trace/events/xen.h b/include/trace/events/xen.h
index 9a0e8af21310..f75b77414ac1 100644
--- a/include/trace/events/xen.h
+++ b/include/trace/events/xen.h
@@ -66,7 +66,7 @@ TRACE_EVENT(xen_mc_callback,
TP_PROTO(xen_mc_callback_fn_t fn, void *data),
TP_ARGS(fn, data),
TP_STRUCT__entry(
-   __field(xen_mc_callback_fn_t, fn)
+   __field(void *, fn)
__field(void *, data)
),
TP_fast_assign(
-- 
2.20.0

HELLO! PLEASE TRY AND RESPOND SOONEST

2019-07-12 Thread Wilson Smith



My Dear Friend,

Before I introduce myself, I wish to inform you that this letter is not a hoax 
mail and I urge you to treat it serious. This letter must come to you as a big 
surprise, but I believe it is only a day that people meet and become great 
friends and business partners. Please I want you to read this letter very 
carefully and I must apologize for barging this message into your mailbox 
without any formal introduction due to the urgency and confidentiality of this 
business and I know that this message will come to you as a surprise. Please 
this is not a joke and I will not like you to joke with it ok, with due respect 
to your person and much sincerity of purpose, I make this contact with you as I 
believe that you can be of great assistance to me. My name is Mr.Wilson Smith, 
from London, UK. I work in Kas Bank UK branch as telex manager, please see this 
as a confidential message and do not reveal it to another person and let me 
know whether you can be of assistance regarding my proposal below because it is 
top secret.

I am about to retire from active Banking service to start a new life but I am 
sceptical to reveal this particular secret to a stranger. You must assure me 
that everything will be handled confidentially because we are not going to 
suffer again in life. It has been 10 years now that most of the greedy African 
Politicians used our bank to launder money overseas through the help of their 
Political advisers. Most of the funds which they transferred out of the shores 
of Africa were gold and oil money that was supposed to have been used to 
develop the continent. Their Political advisers always inflated the amounts 
before transferring to foreign accounts, so I also used the opportunity to 
divert part of the funds hence I am aware that there is no official trace of 
how much was transferred as all the accounts used for such transfers were being 
closed after transfer. I acted as the Bank Officer to most of the politicians 
and when I discovered that they were using me to succeed in their greedy act; I 
also cleaned some of their banking records from the Bank files and no one cared 
to ask me because the money was too much for them to control. They laundered 
over £5billion pounds during the process.

Before I send this message to you, I have already diverted (£3.5million pounds) 
to an escrow account belonging to no one in the bank. The bank is anxious now 
to know who the beneficiary to the funds is because they have made a lot of 
profits with the funds. It is more than Eight years now and most of the 
politicians are no longer using our bank to transfer funds overseas. The 
(£3.5million pounds) has been laying waste in our bank and I don’t want to 
retire from the bank without transferring the funds to a foreign account to 
enable me to share the proceeds with the receiver (a foreigner). The money will 
be shared 60% for me and 40% for you. There is no one coming to ask you about 
the funds because I secured everything. I only want you to assist me by 
providing a reliable bank account where the funds can be transferred. Make Sure 
You Reply To My private email: wilsn...@gmail.com

Re: [PATCH] arm: Extend the check for RAM in /dev/mem

2019-07-12 Thread Raslan, KarimAllah

On Fri, 2019-07-12 at 09:56 +0100, Russell King - ARM Linux admin wrote:
> On Fri, Jul 12, 2019 at 02:58:18AM +, Raslan, KarimAllah wrote:
> > 
> > On Fri, 2019-07-12 at 08:06 +0530, Anshuman Khandual wrote:
> > > 
> > > 
> > > On 07/12/2019 03:51 AM, KarimAllah Ahmed wrote:
> > > > 
> > > > 
> > > > Some valid RAM can live outside kernel control (e.g. using mem= kernel
> > > > command-line). For these regions, pfn_valid would return "false" causing
> > > > system RAM to be mapped as uncached. Use memblock instead to identify 
> > > > RAM.
> > > 
> > > Once the remaining memory is outside of the kernel (as the admin would 
> > > have
> > > intended with mem= command line) what is the particular concern regarding
> > > the way those get mapped (cached or not) ? It is not to be used any way.
> > 
> > They can be used by user-space which might lead to them being used by the 
> > kernel. One use-case would be using them as guest memory for KVM as I 
> > detailed 
> > here:
> > 
> > https://lwn.net/Articles/778240/
> 
> From the 32-bit ARM point of view...
> 
> What if someone's already doing something similar with a non-coherent
> DSP and is relying on the current behaviour?  This change is a user
> visible behavioural change that could end up breaking userspace.
> 
> In other words, it isn't something we should rush into.

Yes, that makes sense. How about adding a command-line option for this new 
behavior instead? Would this be more reasonable?



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

[PATCH] acpi: fix false-positive -Wuninitialized warning

2019-07-12 Thread Arnd Bergmann

clang gets confused by an uninitialized variable in what looks
to it like a never executed code path:

arch/x86/kernel/acpi/boot.c:618:13: error: variable 'polarity' is uninitialized 
when used here [-Werror,-Wuninitialized]
polarity = polarity ? ACPI_ACTIVE_LOW : ACPI_ACTIVE_HIGH;
   ^~~~
arch/x86/kernel/acpi/boot.c:606:32: note: initialize the variable 'polarity' to 
silence this warning
int rc, irq, trigger, polarity;
  ^
   = 0
arch/x86/kernel/acpi/boot.c:617:12: error: variable 'trigger' is uninitialized 
when used here [-Werror,-Wuninitialized]
trigger = trigger ? ACPI_LEVEL_SENSITIVE : ACPI_EDGE_SENSITIVE;
  ^~~
arch/x86/kernel/acpi/boot.c:606:22: note: initialize the variable 'trigger' to 
silence this warning
int rc, irq, trigger, polarity;
^
 = 0

This is unfortunately a design decision in clang and won't be fixed.

Changing the acpi_get_override_irq() macro to an inline function
reliably avoids the issue.

Signed-off-by: Arnd Bergmann 
---
 include/linux/acpi.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index a95cce5e82e7..9426b9aaed86 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -324,7 +324,10 @@ struct irq_domain *acpi_irq_create_hierarchy(unsigned int 
flags,
 #ifdef CONFIG_X86_IO_APIC
 extern int acpi_get_override_irq(u32 gsi, int *trigger, int *polarity);
 #else
-#define acpi_get_override_irq(gsi, trigger, polarity) (-1)
+static inline int acpi_get_override_irq(u32 gsi, int *trigger, int *polarity)
+{
+   return -1;
+}
 #endif
 /*
  * This function undoes the effect of one call to acpi_register_gsi().
-- 
2.20.0

[PATCH] slab: work around clang bug #42570

2019-07-12 Thread Arnd Bergmann

Clang gets rather confused about two variables in the same special
section when one of them is not initialized, leading to an assembler
warning later:

/tmp/slab_common-18f869.s: Assembler messages:
/tmp/slab_common-18f869.s:7526: Warning: ignoring changed section attributes 
for .data..ro_after_init

Adding an initialization to kmalloc_caches is rather silly here
but does avoid the issue.

Link: https://bugs.llvm.org/show_bug.cgi?id=42570
Signed-off-by: Arnd Bergmann 
---
We might decide to wait until this is fixed in clang, but
so far all versions targetting x86 seem to be affected.
---
 mm/slab_common.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 6c49dbb3769e..807490fe217a 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1028,7 +1028,8 @@ struct kmem_cache *__init create_kmalloc_cache(const char 
*name,
 }
 
 struct kmem_cache *
-kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1] __ro_after_init;
+kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1] __ro_after_init =
+{ /* initialization for https://bugs.llvm.org/show_bug.cgi?id=42570 */ };
 EXPORT_SYMBOL(kmalloc_caches);
 
 /*
-- 
2.20.0

[PATCH] [net-next] cxgb4: reduce kernel stack usage in cudbg_collect_mem_region()

2019-07-12 Thread Arnd Bergmann

The cudbg_collect_mem_region() and cudbg_read_fw_mem() both use several
hundred kilobytes of kernel stack space. One gets inlined into the other,
which causes the stack usage to be combined beyond the warning limit
when building with clang:

drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c:1057:12: error: stack frame size 
of 1244 bytes in function 'cudbg_collect_mem_region' 
[-Werror,-Wframe-larger-than=]

Restructuring cudbg_collect_mem_region() lets clang do the same
optimization that gcc does and reuse the stack slots as it can
see that the large variables are never used together.

A better fix might be to avoid using cudbg_meminfo on the stack
altogether, but that requires a larger rewrite.

Fixes: a1c69520f785 ("cxgb4: collect MC memory dump")
Signed-off-by: Arnd Bergmann 
---
 .../net/ethernet/chelsio/cxgb4/cudbg_lib.c| 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c 
b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
index a76529a7662d..c2e92786608b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
@@ -1054,14 +1054,12 @@ static void cudbg_t4_fwcache(struct cudbg_init 
*pdbg_init,
}
 }
 
-static int cudbg_collect_mem_region(struct cudbg_init *pdbg_init,
-   struct cudbg_buffer *dbg_buff,
-   struct cudbg_error *cudbg_err,
-   u8 mem_type)
+static unsigned long cudbg_mem_region_size(struct cudbg_init *pdbg_init,
+  struct cudbg_error *cudbg_err,
+  u8 mem_type)
 {
struct adapter *padap = pdbg_init->adap;
struct cudbg_meminfo mem_info;
-   unsigned long size;
u8 mc_idx;
int rc;
 
@@ -1075,7 +1073,16 @@ static int cudbg_collect_mem_region(struct cudbg_init 
*pdbg_init,
if (rc)
return rc;
 
-   size = mem_info.avail[mc_idx].limit - mem_info.avail[mc_idx].base;
+   return mem_info.avail[mc_idx].limit - mem_info.avail[mc_idx].base;
+}
+
+static int cudbg_collect_mem_region(struct cudbg_init *pdbg_init,
+   struct cudbg_buffer *dbg_buff,
+   struct cudbg_error *cudbg_err,
+   u8 mem_type)
+{
+   unsigned long size = cudbg_mem_region_size(pdbg_init, cudbg_err, 
mem_type);
+
return cudbg_read_fw_mem(pdbg_init, dbg_buff, mem_type, size,
 cudbg_err);
 }
-- 
2.20.0

[PATCH] lib/mpi: fix building with 32-bit x86

2019-07-12 Thread Arnd Bergmann

The mpi library contains some rather old inline assembly statements
that produce a lot of warnings for 32-bit x86, such as:

lib/mpi/mpih-div.c:76:16: error: invalid use of a cast in a inline asm context 
requiring an l-value: remove the cast or build with -fheinous-gnu-extensions
udiv_qrnnd(qp[i], n1, n1, np[i], d);
~~~^~~~
lib/mpi/longlong.h:423:20: note: expanded from macro 'udiv_qrnnd'
: "=a" ((USItype)(q)), \
~~^~

There is no point in doing a type cast for the output of an inline assembler
statement, so just remove the cast here, as we have done for other architectures
in the past.

See-also: dea632cadd12 ("lib/mpi: fix build with clang")
Signed-off-by: Arnd Bergmann 
---
 lib/mpi/longlong.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/lib/mpi/longlong.h b/lib/mpi/longlong.h
index 08c60d10747f..3bb6260d8f42 100644
--- a/lib/mpi/longlong.h
+++ b/lib/mpi/longlong.h
@@ -397,8 +397,8 @@ do { \
 #define add_ss(sh, sl, ah, al, bh, bl) \
__asm__ ("addl %5,%1\n" \
   "adcl %3,%0" \
-   : "=r" ((USItype)(sh)), \
-"=&r" ((USItype)(sl)) \
+   : "=r" (sh), \
+"=&r" (sl) \
: "%0" ((USItype)(ah)), \
 "g" ((USItype)(bh)), \
 "%1" ((USItype)(al)), \
@@ -406,22 +406,22 @@ do { \
 #define sub_ddmmss(sh, sl, ah, al, bh, bl) \
__asm__ ("subl %5,%1\n" \
   "sbbl %3,%0" \
-   : "=r" ((USItype)(sh)), \
-"=&r" ((USItype)(sl)) \
+   : "=r" (sh), \
+"=&r" (sl) \
: "0" ((USItype)(ah)), \
 "g" ((USItype)(bh)), \
 "1" ((USItype)(al)), \
 "g" ((USItype)(bl)))
 #define umul_ppmm(w1, w0, u, v) \
__asm__ ("mull %3" \
-   : "=a" ((USItype)(w0)), \
-"=d" ((USItype)(w1)) \
+   : "=a" (w0), \
+"=d" (w1) \
: "%0" ((USItype)(u)), \
 "rm" ((USItype)(v)))
 #define udiv_qrnnd(q, r, n1, n0, d) \
__asm__ ("divl %4" \
-   : "=a" ((USItype)(q)), \
-"=d" ((USItype)(r)) \
+   : "=a" (q), \
+"=d" (r) \
: "0" ((USItype)(n0)), \
 "1" ((USItype)(n1)), \
 "rm" ((USItype)(d)))
-- 
2.20.0

[PATCH] x86: math-emu: hide clang warnings for 16-bit overflow

2019-07-12 Thread Arnd Bergmann

clang warns about a few parts of the math-emu implementation
where a 16-bit integer becomes negative during assignment:

arch/x86/math-emu/poly_tan.c:88:35: error: implicit conversion from 'int' to 
'short' changes value from 49216 to -16320 [-Werror,-Wconstant-conversion]
  (0x41 + EXTENDED_Ebias) | SIGN_Negative);
  ^~~~
arch/x86/math-emu/fpu_emu.h:180:58: note: expanded from macro 'setexponent16'
 #define setexponent16(x,y)  { (*(short *)&((x)->exp)) = (y); }
  ~  ^
arch/x86/math-emu/reg_constant.c:37:32: error: implicit conversion from 'int' 
to 'short' changes value from 49085 to -16451 [-Werror,-Wconstant-conversion]
FPU_REG const CONST_PI2extra = MAKE_REG(NEG, -66,
   ^~
arch/x86/math-emu/reg_constant.c:21:25: note: expanded from macro 'MAKE_REG'
((EXTENDED_Ebias+(e)) | ((SIGN_##s != 0)*0x8000)) }
 ~^~
arch/x86/math-emu/reg_constant.c:48:28: error: implicit conversion from 'int' 
to 'short' changes value from 65535 to -1 [-Werror,-Wconstant-conversion]
FPU_REG const CONST_QNaN = MAKE_REG(NEG, EXP_OVER, 0x, 0xC000);
   ^~~
arch/x86/math-emu/reg_constant.c:21:25: note: expanded from macro 'MAKE_REG'
((EXTENDED_Ebias+(e)) | ((SIGN_##s != 0)*0x8000)) }
 ~^~

The code seems correct to me, so add a typecast to shut up the warnings.

Signed-off-by: Arnd Bergmann 
---
 arch/x86/math-emu/fpu_emu.h  | 2 +-
 arch/x86/math-emu/reg_constant.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/math-emu/fpu_emu.h b/arch/x86/math-emu/fpu_emu.h
index a5a41ec58072..0c16ca56 100644
--- a/arch/x86/math-emu/fpu_emu.h
+++ b/arch/x86/math-emu/fpu_emu.h
@@ -177,7 +177,7 @@ static inline void reg_copy(FPU_REG const *x, FPU_REG *y)
 #define setexponentpos(x,y) { (*(short *)&((x)->exp)) = \
   ((y) + EXTENDED_Ebias) & 0x7fff; }
 #define exponent16(x) (*(short *)&((x)->exp))
-#define setexponent16(x,y)  { (*(short *)&((x)->exp)) = (y); }
+#define setexponent16(x,y)  { (*(short *)&((x)->exp)) = (u16)(y); }
 #define addexponent(x,y){ (*(short *)&((x)->exp)) += (y); }
 #define stdexp(x)   { (*(short *)&((x)->exp)) += EXTENDED_Ebias; }
 
diff --git a/arch/x86/math-emu/reg_constant.c b/arch/x86/math-emu/reg_constant.c
index 8dc9095bab22..742619e94bdf 100644
--- a/arch/x86/math-emu/reg_constant.c
+++ b/arch/x86/math-emu/reg_constant.c
@@ -18,7 +18,7 @@
 #include "control_w.h"
 
 #define MAKE_REG(s, e, l, h) { l, h, \
-   ((EXTENDED_Ebias+(e)) | ((SIGN_##s != 0)*0x8000)) }
+   (u16)((EXTENDED_Ebias+(e)) | ((SIGN_##s != 0)*0x8000)) }
 
 FPU_REG const CONST_1 = MAKE_REG(POS, 0, 0x, 0x8000);
 #if 0
-- 
2.20.0

[GIT PULL] Pin control bulk changes for v5.3

2019-07-12 Thread Linus Walleij

Hi Linus,

here is the bulk of pin control changes for the v5.3 kernel cycle.

This is pretty linear development in pin control, nothing really
stand out. We had a bit of SPDX fuzz with tglx fixing up tags
with scripts at the same time as maintainers were fixing up the
same tags, but I regard that as a one-off and not a good time
for an exercise in "what can be done differently". Let's resolve
the conflicts and move on (I don't know if there will be any,
don't think so.)

Please pull it in! Technical details in the signed tag.

Yours,
Linus Walleij

The following changes since commit a188339ca5a396acc588e5851ed7e19f66b0ebd9:

  Linux 5.2-rc1 (2019-05-19 15:47:09 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl.git
tags/pinctrl-v5.3-1

for you to fetch changes up to 4c105769bf6de29856bf80a4045e6725301c58ce:

  pinctrl: aspeed: Strip moved macros and structs from private header
(2019-07-10 11:19:20 +0200)


This is the bulk of pin control changes for the v5.3 kernel
cycle:

Core changes:

- Device links can optionally be added between a pin control
  producer and its consumers. This will affect how the system
  power management is handled: a pin controller will not suspend
  before all of its consumers have been suspended. This was
  necessary for the ST Microelectronics STMFX expander and
  need to be tested on other systems as well: it makes sense
  to make this default in the long run. Right now it is
  opt-in per driver.

- Drive strength can be specified in microamps. With decreases
  in silicon technology, milliamps isn't granular enough, let's
  make it possible to select drive strengths in microamps. Right
  now the Meson (AMlogic) driver needs this.

New drivers:

- New subdriver for the Tegra 194 SoC.

- New subdriver for the Qualcomm SDM845.

- New subdriver for the Qualcomm SM8150.

- New subdriver for the Freescale i.MX8MN (Freescale is now a
  product line of NXP).

- New subdriver for Marvell MV98DX1135.

Driver improvements:

- The Bitmain BM1880 driver now supports pin config in
  addition to muxing.

- The Qualcomm drivers can now reserve some GPIOs as taken
  aside and not usable for users. This is used in ACPI systems
  to take out some GPIO lines used by the BIOS so that
  noone else (neither kernel nor userspace) will play with them
  by mistake and crash the machine.

- A slew of refurbishing around the Aspeed drivers (board
  management controllers for servers) in preparation for the
  new Aspeed AST2600 SoC.

- A slew of improvements over the SH PFC drivers as usual.

- Misc cleanups and fixes.


Alexandre Torgue (4):
  pinctrl: stm32: add suspend/resume management
  pinctrl: stm32: Enable suspend/resume for stm32mp157c SoC
  pinctrl: stm32: add lock mechanism for irqmux selection
  dt-bindings: pinctrl: Convert stm32 pinctrl bindings to json-schema

Andrew Jeffery (9):
  dt-bindings: pinctrl: aspeed: Split bindings document in two
  dt-bindings: pinctrl: aspeed: Convert AST2400 bindings to json-schema
  dt-bindings: pinctrl: aspeed: Convert AST2500 bindings to json-schema
  MAINTAINERS: Add entry for ASPEED pinctrl drivers
  pinctrl: aspeed: Correct comment that is no longer true
  pinctrl: aspeed: Clarify comment about strapping W1C
  pinctrl: aspeed: Split out pinmux from general pinctrl
  pinctrl: aspeed: Add implementation-related documentation
  pinctrl: aspeed: Strip moved macros and structs from private header

Andy Shevchenko (3):
  pinctrl: baytrail: Use defined macro instead of magic in
byt_get_gpio_mux()
  pinctrl: baytrail: Re-use data structures from pinctrl-intel.h
  pinctrl: baytrail: Use GENMASK() consistently

Anson Huang (3):
  dt-bindings: imx: Correct pinfunc head file path for i.MX8MM
  dt-bindings: imx: Add pinctrl binding doc for i.MX8MN
  pinctrl: freescale: Add i.MX8MN pinctrl driver support

Benjamin Gaignard (2):
  pinctrl: Enable device link creation for pin control
  pinctrl: stmfx: enable links creations

Bjorn Andersson (1):
  pinctrl: qcom: sdm845: Expose ufs_reset as gpio

Charles Keepax (1):
  pinctrl: madera: Fixup SPDX headers

Chris Packham (2):
  dt-bindings: pinctrl: mvebu: Document bindings for 98DX1135
  pinctrl: mvebu: Add support for MV98DX1135

Colin Ian King (1):
  dt-bindings: pinctrl: fix spelling mistakes in pinctl documentation

Doug Berger (1):
  pinctrl: bcm: Allow PINCTRL_BCM2835 for ARCH_BRCMSTB

Enrico Weigelt (1):
  gpio: Fix build warnings on undefined struct pinctrl_dev

Florian Fainelli (1):
  dt-bindings: pinctrl: bcm2835-gpio: Document BCM7211 compatible

Geert Uytterhoeven (26):
  pinctrl: sh-pfc: Correct printk level of group reference warning
  pinctrl: sh-pfc: Mark run-time debug code __init
  pinctrl

Re: [PATCH 1/4] numa: introduce per-cgroup numa balancing locality, statistic

2019-07-12 Thread 王贇




On 2019/7/12 下午3:58, Peter Zijlstra wrote:
[snip]
>>>
>>> Then our task t1 should be accounted to B (as you do), but also to A and
>>> R.
>>
>> I get the point but not quite sure about this...
>>
>> Not like pages there are no hierarchical limitation on locality, also tasks
> 
> You can use cpusets to affect that.

Could you please give more detail on this?

> 
>> running in a particular group have no influence to others, not to mention the
>> extra overhead, does it really meaningful to account the stuff 
>> hierarchically?
> 
> AFAIU it's a requirement of cgroups to be hierarchical. All our other
> cgroup accounting is like that.

Ok, should respect the convention :-)

Regards,
Michael Wang

>

[PATCH] thp: fix unused shmem_parse_huge() function warning

2019-07-12 Thread Arnd Bergmann

When CONFIG_SYSFS is disabled but CONFIG_TMPFS is enabled, we get a warning
about shmem_parse_huge() never being called:

mm/shmem.c:417:12: error: unused function 'shmem_parse_huge' 
[-Werror,-Wunused-function]
static int shmem_parse_huge(const char *str)

Change the #ifdef so we no longer build this function in that configuration.

Fixes: 144df3b288c4 ("vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use 
the new mount API")
Signed-off-by: Arnd Bergmann 
---
 mm/shmem.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index ba40fac908c5..32aa9d46b87c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -413,7 +413,7 @@ static bool shmem_confirm_swap(struct address_space 
*mapping,
 
 static int shmem_huge __read_mostly;
 
-#if defined(CONFIG_SYSFS) || defined(CONFIG_TMPFS)
+#if defined(CONFIG_SYSFS)
 static int shmem_parse_huge(const char *str)
 {
if (!strcmp(str, "never"))
@@ -430,7 +430,9 @@ static int shmem_parse_huge(const char *str)
return SHMEM_HUGE_FORCE;
return -EINVAL;
 }
+#endif
 
+#if defined(CONFIG_SYSFS) || defined(CONFIG_TMPFS)
 static const char *shmem_format_huge(int huge)
 {
switch (huge) {
-- 
2.20.0

Re: [PATCH v2] printk: Do not lose last line in kmsg buffer dump

2019-07-12 Thread Petr Mladek

On Thu 2019-07-11 16:29:37, Vincent Whitchurch wrote:
> kmsg_dump_get_buffer() is supposed to select all the youngest log
> messages which fit into the provided buffer.  It determines the correct
> start index by using msg_print_text() with a NULL buffer to calculate
> the size of each entry.  However, when performing the actual writes,
> msg_print_text() only writes the entry to the buffer if the written len
> is lesser than the size of the buffer.  So if the lengths of the
> selected youngest log messages happen to precisely fill up the provided
> buffer, the last log message is not included.
> 
> We don't want to modify msg_print_text() to fill up the buffer and start
> returning a length which is equal to the size of the buffer, since
> callers of its other users, such as kmsg_dump_get_line(), depend upon
> the current behaviour.
> 
> Instead, fix kmsg_dump_get_buffer() to compensate for this.
> 
> For example, with the following two final prints:
> 
> [6.427502] A
> [6.427769] 12345
> 
> A dump of a 64-byte buffer filled by kmsg_dump_get_buffer(), before this
> patch:
> 
>  : 3c 30 3e 5b 20 20 20 20 36 2e 35 32 32 31 39 37  <0>[6.522197
>  0010: 5d 20 41 41 41 41 41 41 41 41 41 41 41 41 41 0a  ] A.
>  0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>  0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> After this patch:
> 
>  : 3c 30 3e 5b 20 20 20 20 36 2e 34 35 36 36 37 38  <0>[6.456678
>  0010: 5d 20 42 42 42 42 42 42 42 42 31 32 33 34 35 0a  ] 12345.
>  0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>  0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> Signed-off-by: Vincent Whitchurch 

I think that I need vacation. I have got lost in all the checks
and got it wrongly in the morning.

This patch fixes the calculation of messages that might fit
into the buffer. It makes sure that the function that writes
the messages will really allow to write them.

It seems to be the correct fix.

Reviewed-by: Petr Mladek 

Best Regards,
Petr

[PATCH 1/2] x86: kvm: avoid -Wsometimes-uninitized warning

2019-07-12 Thread Arnd Bergmann

clang points out that running a 64-bit guest on a 32-bit host
would lead to uninitialized variables:

arch/x86/kvm/hyperv.c:1610:6: error: variable 'ingpa' is used uninitialized 
whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
if (!longmode) {
^
arch/x86/kvm/hyperv.c:1632:55: note: uninitialized use occurs here
trace_kvm_hv_hypercall(code, fast, rep_cnt, rep_idx, ingpa, outgpa);
 ^
arch/x86/kvm/hyperv.c:1610:2: note: remove the 'if' if its condition is always 
true
if (!longmode) {
^~~
arch/x86/kvm/hyperv.c:1595:18: note: initialize the variable 'ingpa' to silence 
this warning
u64 param, ingpa, outgpa, ret = HV_STATUS_SUCCESS;
^
 = 0
arch/x86/kvm/hyperv.c:1610:6: error: variable 'outgpa' is used uninitialized 
whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
arch/x86/kvm/hyperv.c:1610:6: error: variable 'param' is used uninitialized 
whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]

Since that combination is not supported anyway, change the condition
to tell the compiler how the code is actually executed.

Signed-off-by: Arnd Bergmann 
---
 arch/x86/kvm/hyperv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index a39e38f13029..950436c502ba 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1607,7 +1607,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 
longmode = is_64_bit_mode(vcpu);
 
-   if (!longmode) {
+   if (!IS_ENABLED(CONFIG_X86_64) || !longmode) {
param = ((u64)kvm_rdx_read(vcpu) << 32) |
(kvm_rax_read(vcpu) & 0x);
ingpa = ((u64)kvm_rbx_read(vcpu) << 32) |
-- 
2.20.0

[PATCH 2/2] x86: kvm: avoid constant-conversion warning

2019-07-12 Thread Arnd Bergmann

clang finds a contruct suspicious that converts an unsigned
character to a signed integer and back, causing an overflow:

arch/x86/kvm/mmu.c:4605:39: error: implicit conversion from 'int' to 'u8' (aka 
'unsigned char') changes value from -205 to 51 [-Werror,-Wconstant-conversion]
u8 wf = (pfec & PFERR_WRITE_MASK) ? ~w : 0;
   ~~   ^~
arch/x86/kvm/mmu.c:4607:38: error: implicit conversion from 'int' to 'u8' (aka 
'unsigned char') changes value from -241 to 15 [-Werror,-Wconstant-conversion]
u8 uf = (pfec & PFERR_USER_MASK) ? ~u : 0;
   ~~  ^~
arch/x86/kvm/mmu.c:4609:39: error: implicit conversion from 'int' to 'u8' (aka 
'unsigned char') changes value from -171 to 85 [-Werror,-Wconstant-conversion]
u8 ff = (pfec & PFERR_FETCH_MASK) ? ~x : 0;
   ~~   ^~

Add an explicit cast to tell clang that everything works as
intended here.

Signed-off-by: Arnd Bergmann 
---
 arch/x86/kvm/mmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 17ece7b994b1..aea7f969ecb8 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4602,11 +4602,11 @@ static void update_permission_bitmask(struct kvm_vcpu 
*vcpu,
 */
 
/* Faults from writes to non-writable pages */
-   u8 wf = (pfec & PFERR_WRITE_MASK) ? ~w : 0;
+   u8 wf = (pfec & PFERR_WRITE_MASK) ? (u8)~w : 0;
/* Faults from user mode accesses to supervisor pages */
-   u8 uf = (pfec & PFERR_USER_MASK) ? ~u : 0;
+   u8 uf = (pfec & PFERR_USER_MASK) ? (u8)~u : 0;
/* Faults from fetches of non-executable pages*/
-   u8 ff = (pfec & PFERR_FETCH_MASK) ? ~x : 0;
+   u8 ff = (pfec & PFERR_FETCH_MASK) ? (u8)~x : 0;
/* Faults from kernel mode fetches of user pages */
u8 smepf = 0;
/* Faults from kernel mode accesses of user pages */
-- 
2.20.0

Re: [PATCH] dax: Fix missed PMD wakeups

2019-07-12 Thread Jan Kara

On Thu 11-07-19 08:25:50, Matthew Wilcox wrote:
> On Thu, Jul 11, 2019 at 07:13:50AM -0700, Matthew Wilcox wrote:
> > However, the XA_RETRY_ENTRY might be a good choice.  It doesn't normally
> > appear in an XArray (it may appear if you're looking at a deleted node,
> > but since we're holding the lock, we can't see deleted nodes).
> 
...

> @@ -254,7 +267,7 @@ static void wait_entry_unlocked(struct xa_state *xas, 
> void *entry)
>  static void put_unlocked_entry(struct xa_state *xas, void *entry)
>  {
>   /* If we were the only waiter woken, wake the next one */
> - if (entry)
> + if (entry && dax_is_conflict(entry))

This should be !dax_is_conflict(entry)...

>   dax_wake_entry(xas, entry, false);
>  }

Otherwise the patch looks good to me so feel free to add:

Reviewed-by: Jan Kara 

once you fix this.

Honza
-- 
Jan Kara 
SUSE Labs, CR

[PATCH] dma: ste_dma40: fix unneeded variable warning

2019-07-12 Thread Arnd Bergmann

clang-9 points out that there are two variables that depending on the
configuration may only be used in an ARRAY_SIZE() expression but not
referenced:

drivers/dma/ste_dma40.c:145:12: error: variable 'd40_backup_regs' is not needed 
and will not be emitted [-Werror,-Wunneeded-internal-declaration]
static u32 d40_backup_regs[] = {
   ^
drivers/dma/ste_dma40.c:214:12: error: variable 'd40_backup_regs_chan' is not 
needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
static u32 d40_backup_regs_chan[] = {

Mark these __maybe_unused to shut up the warning.

Signed-off-by: Arnd Bergmann 
---
 drivers/dma/ste_dma40.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/ste_dma40.c b/drivers/dma/ste_dma40.c
index 89d710899010..de8bfd9a76e9 100644
--- a/drivers/dma/ste_dma40.c
+++ b/drivers/dma/ste_dma40.c
@@ -142,7 +142,7 @@ enum d40_events {
  * when the DMA hw is powered off.
  * TODO: Add save/restore of D40_DREG_GCC on dma40 v3 or later, if that works.
  */
-static u32 d40_backup_regs[] = {
+static __maybe_unused u32 d40_backup_regs[] = {
D40_DREG_LCPA,
D40_DREG_LCLA,
D40_DREG_PRMSE,
@@ -211,7 +211,7 @@ static u32 d40_backup_regs_v4b[] = {
 
 #define BACKUP_REGS_SZ_V4B ARRAY_SIZE(d40_backup_regs_v4b)
 
-static u32 d40_backup_regs_chan[] = {
+static __maybe_unused u32 d40_backup_regs_chan[] = {
D40_CHAN_REG_SSCFG,
D40_CHAN_REG_SSELT,
D40_CHAN_REG_SSPTR,
-- 
2.20.0

Re: [PATCH] xen/trace: avoid clang warning on function pointers

2019-07-12 Thread Sedat Dilek

On Fri, Jul 12, 2019 at 10:59 AM Arnd Bergmann  wrote:
>
> clang-9 does not like the way that the is_signed_type() compares
> function pointers deep inside of the trace even macros:
>
> In file included from arch/x86/xen/trace.c:21:
> In file included from include/trace/events/xen.h:475:
> In file included from include/trace/define_trace.h:102:
> In file included from include/trace/trace_events.h:467:
> include/trace/events/xen.h:69:7: error: ordered comparison of function 
> pointers ('xen_mc_callback_fn_t' (aka 'void (*)(void *)') and 
> 'xen_mc_callback_fn_t') [-Werror,-Wordered-compare-function-pointers]
> __field(xen_mc_callback_fn_t, fn)
> ^
> include/trace/trace_events.h:415:29: note: expanded from macro '__field'
>  #define __field(type, item) __field_ext(type, item, FILTER_OTHER)
> ^
> include/trace/trace_events.h:401:6: note: expanded from macro '__field_ext'
>  is_signed_type(type), filter_type);\
>  ^
> include/linux/trace_events.h:540:44: note: expanded from macro 
> 'is_signed_type'
>  #define is_signed_type(type)(((type)(-1)) < (type)1)
>   ^
> note: (skipping 1 expansions in backtrace; use -fmacro-backtrace-limit=0 to 
> see all)
> include/trace/trace_events.h:77:16: note: expanded from macro 'TRACE_EVENT'
>  PARAMS(tstruct),  \
>  ~~~^~~~
> include/linux/tracepoint.h:95:25: note: expanded from macro 'PARAMS'
>  #define PARAMS(args...) args
> ^
> include/trace/trace_events.h:455:2: note: expanded from macro 
> 'DECLARE_EVENT_CLASS'
> tstruct;\
> ^~~
>
> I guess the warning is reasonable in principle, though this seems to
> be the only instance we get in the entire kernel today.
> Shut up the warning by making it a void pointer in the exported
> structure.
>

Thanks for bringing this up (again), Arnd.

As this is a known CBL issue please add...

Link: https://github.com/ClangBuiltLinux/linux/issues/97

...and...

Tested-by: Sedat Dilek 

For the sake of completeness see also the comments of Steven Rostedt
and user "Honeybyte" in the above Link - if not known/read.

- Sedat -

P.S.: I am using this patch since 6 months in my
for-5.x/clang-warningfree local Git repository.

> Fixes: c796f213a693 ("xen/trace: add multicall tracing")
> Signed-off-by: Arnd Bergmann 
> ---
>  include/trace/events/xen.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/trace/events/xen.h b/include/trace/events/xen.h
> index 9a0e8af21310..f75b77414ac1 100644
> --- a/include/trace/events/xen.h
> +++ b/include/trace/events/xen.h
> @@ -66,7 +66,7 @@ TRACE_EVENT(xen_mc_callback,
> TP_PROTO(xen_mc_callback_fn_t fn, void *data),
> TP_ARGS(fn, data),
> TP_STRUCT__entry(
> -   __field(xen_mc_callback_fn_t, fn)
> +   __field(void *, fn)
> __field(void *, data)
> ),
> TP_fast_assign(
> --
> 2.20.0
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clang-built-linux+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/clang-built-linux/20190712085908.4146364-1-arnd%40arndb.de.

RE: [PATCH] clk: renesas: cpg-mssr: Fix reset control race condition

2019-07-12 Thread Yoshihiro Shimoda

Hi Geert-san,

> From: Geert Uytterhoeven, Sent: Thursday, July 11, 2019 10:04 PM
> 
> The module reset code in the Renesas CPG/MSSR driver uses
> read-modify-write (RMW) operations to write to a Software Reset Register
> (SRCRn), and simple writes to write to a Software Reset Clearing
> Register (SRSTCLRn), as was mandated by the R-Car Gen2 and Gen3 Hardware
> User's Manuals.
> 
> However, this may cause a race condition when two devices are reset in
> parallel: if the reset for device A completes in the middle of the RMW
> operation for device B, device A may be reset again, causing subtle
> failures (e.g. i2c timeouts):
> 
>   thread Athread B
>   
> 
>   val = SRCRn
>   val |= bit A
>   SRCRn = val
> 
>   delay
> 
>   val = SRCRn (bit A is set)
> 
>   SRSTCLRn = bit A
>   (bit A in SRCRn is cleared)
> 
>   val |= bit B
>   SRCRn = val (bit A and B are set)
> 
> This can be reproduced on e.g. Salvator-XS using:
> 
> $ while true; do i2cdump -f -y 4 0x6A b > /dev/null; done &
> $ while true; do i2cdump -f -y 2 0x10 b > /dev/null; done &
> 
> i2c-rcar e651.i2c: error -110 : 4002
> i2c-rcar e66d8000.i2c: error -110 : 4002
> 
> According to the R-Car Gen3 Hardware Manual Errata for Rev.
> 0.80 of Feb 28, 2018, reflected in Rev. 1.00 of the R-Car Gen3 Hardware
> User's Manual, writes to SRCRn do not require read-modify-write cycles.
> 
> Note that the R-Car Gen2 Hardware User's Manual has not been updated
> yet, and still says a read-modify-write sequence is required.  According
> to the hardware team, the reset hardware block is the same on both R-Car
> Gen2 and Gen3, though.
> 
> Hence fix the issue by replacing the read-modify-write operations on
> SRCRn by simple writes.
> 
> Reported-by: Yao Lihua 
> Fixes: 6197aa65c4905532 ("clk: renesas: cpg-mssr: Add support for reset 
> control")
> Signed-off-by: Geert Uytterhoeven 
> ---

Thank you for the patch! Our test team tested this patch, so

Tested-by: Linh Phung 

> So far I haven't been able to reproduce the issue on R-Car Gen2 (after
> forcing i2c reset on Gen2, too).  Perhaps my Koelsch doesn't have enough
> CPU cores.  What about Lager?

According to the test team, Lager also could not reproduce this issue.
Should we investigate it why?

Best regards,
Yoshihiro Shimoda

> Hi Mike, Stephen,
> 
> As this is a bugfix, can you please take this directly, if accepted?
> 
> Thanks!
> ---
>  drivers/clk/renesas/renesas-cpg-mssr.c | 16 ++--
>  1 file changed, 2 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/clk/renesas/renesas-cpg-mssr.c 
> b/drivers/clk/renesas/renesas-cpg-mssr.c
> index 52bbb9ce3807db31..d4075b13067429cd 100644
> --- a/drivers/clk/renesas/renesas-cpg-mssr.c
> +++ b/drivers/clk/renesas/renesas-cpg-mssr.c
> @@ -572,17 +572,11 @@ static int cpg_mssr_reset(struct reset_controller_dev 
> *rcdev,
>   unsigned int reg = id / 32;
>   unsigned int bit = id % 32;
>   u32 bitmask = BIT(bit);
> - unsigned long flags;
> - u32 value;
> 
>   dev_dbg(priv->dev, "reset %u%02u\n", reg, bit);
> 
>   /* Reset module */
> - spin_lock_irqsave(&priv->rmw_lock, flags);
> - value = readl(priv->base + SRCR(reg));
> - value |= bitmask;
> - writel(value, priv->base + SRCR(reg));
> - spin_unlock_irqrestore(&priv->rmw_lock, flags);
> + writel(bitmask, priv->base + SRCR(reg));
> 
>   /* Wait for at least one cycle of the RCLK clock (@ ca. 32 kHz) */
>   udelay(35);
> @@ -599,16 +593,10 @@ static int cpg_mssr_assert(struct reset_controller_dev 
> *rcdev, unsigned long id)
>   unsigned int reg = id / 32;
>   unsigned int bit = id % 32;
>   u32 bitmask = BIT(bit);
> - unsigned long flags;
> - u32 value;
> 
>   dev_dbg(priv->dev, "assert %u%02u\n", reg, bit);
> 
> - spin_lock_irqsave(&priv->rmw_lock, flags);
> - value = readl(priv->base + SRCR(reg));
> - value |= bitmask;
> - writel(value, priv->base + SRCR(reg));
> - spin_unlock_irqrestore(&priv->rmw_lock, flags);
> + writel(bitmask, priv->base + SRCR(reg));
>   return 0;
>  }
> 
> --
> 2.17.1

Re: Re: [PATCH] media: v4l: Add packed YUV444 24bpp pixel format

2019-07-12 Thread paul.kocialkow...@bootlin.com

Hi,

On Thu 11 Jul 19, 13:57, Mirela Rabulea wrote:
> On Jo, 2019-07-11 at 10:18 +0200, Paul Kocialkowski wrote:
> > Caution: EXT Email
> > 
> > Hi,
> > 
> > On Wed 03 Jul 19, 18:15, Mirela Rabulea wrote:
> > > 
> > > The added format is V4L2_PIX_FMT_YUV24, this is a packed
> > > YUV 4:4:4 format, with 8 bits for each component, 24 bits
> > > per sample.
> > > 
> > > This format is used by the i.MX 8QuadMax and i.MX
> > > 8DualXPlus/8QuadXPlus
> > > JPEG encoder/decoder.
> > So this format is not aligned to 32-bit words at all and we can
> > expect
> > to see cases where a single 32-bit word contains data for two pixels?
> > 
> > Nothing wrong with that, just checking whether I understood this
> > right :)
> > 
> 
> Hi Paul,
> yes, your understanding is correct.

Out of curiosity, is the JPEG block assmiliated to (one of) the Hantro VPUs
or is it a totally different and unrelated hardware block?

Anyway the change looks good to me:
Reviewed-by: Paul Kocialkowski 

Cheers,

Paul

-- 
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

Re: [PATCH v2] printk: Do not lose last line in kmsg buffer dump

2019-07-12 Thread Sergey Senozhatsky

On (07/12/19 11:12), Petr Mladek wrote:
> > For example, with the following two final prints:
> > 
> > [6.427502] A
> > [6.427769] 12345
> > 
> > A dump of a 64-byte buffer filled by kmsg_dump_get_buffer(), before this
> > patch:
> > 
> >  : 3c 30 3e 5b 20 20 20 20 36 2e 35 32 32 31 39 37  <0>[6.522197
> >  0010: 5d 20 41 41 41 41 41 41 41 41 41 41 41 41 41 0a  ] A.
> >  0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> >  0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> > 
> > After this patch:
> > 
> >  : 3c 30 3e 5b 20 20 20 20 36 2e 34 35 36 36 37 38  <0>[6.456678
> >  0010: 5d 20 42 42 42 42 42 42 42 42 31 32 33 34 35 0a  ] 12345.
> >  0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> >  0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> > 
> > Signed-off-by: Vincent Whitchurch 
> 
> I think that I need vacation. I have got lost in all the checks
> and got it wrongly in the morning.
> 
> This patch fixes the calculation of messages that might fit
> into the buffer. It makes sure that the function that writes
> the messages will really allow to write them.
> 
> It seems to be the correct fix.
> 
> Reviewed-by: Petr Mladek 

Looks correct to me as well.

Reviewed-by: Sergey Senozhatsky 

-ss

[HELP REQUESTED from the community] Was: Staging status of speakup

2019-07-12 Thread Samuel Thibault

Hello,

To readers of the linux-speakup: could you help on this so we can get
Speakup in mainline?  Neither Okash or I completely know what user
consequences the files in /sys/accessibility/speakup/ have, so could
people give brief explanations for each file (something like 3-6 lines
of explanation)?

The i18n/ files have been already documented in section 14.1 of the
spkguide.txt, so we do not need help for them.

Thanks!
Samuel

Greg KH, le ven. 12 juil. 2019 10:38:19 +0200, a ecrit:
> Can you make up a patch to create a
> drivers/staging/speakup/sysfs-speakup file with the needed information?
> That way it will be much easier to determine exactly what these sysfs
> files do and my review can be easier, and perhaps not needed at all :)

[PATCH] [v2] mic: avoid statically declaring a 'struct device'.

2019-07-12 Thread Arnd Bergmann

Generally, declaring a platform device as a static variable is
a bad idea and can cause all kinds of problems, in particular
with the DMA configuration and lifetime rules.

A specific problem we hit here is from a bug in clang that warns
about certain (otherwise valid) macros when used in static variables:

drivers/misc/mic/card/mic_x100.c:285:27: warning: shift count >= width of type 
[-Wshift-count-overflow]
static u64 mic_dma_mask = DMA_BIT_MASK(64);
  ^~~~
include/linux/dma-mapping.h:141:54: note: expanded from macro 'DMA_BIT_MASK'
 #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
 ^ ~~~

A slightly better way here is to create the platform device dynamically
and set the dma mask in the probe function.
This avoids the warning and some other problems, but is still not ideal
because the device creation should really be separated from the driver,
and the fact that the device has no parent means we have to force
the dma mask rather than having it set up from the bus that the device
is actually on.

Fixes: dd8d8d44df64 ("misc: mic: MIC card driver specific changes to enable 
SCIF")
Signed-off-by: Arnd Bergmann 
---
v2: rewrite to use platform_device_register_simple() and make it
actually build

Please merge after -rc1 is out.
---
 drivers/misc/mic/card/mic_x100.c | 28 
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/drivers/misc/mic/card/mic_x100.c b/drivers/misc/mic/card/mic_x100.c
index 266ffb6f6c44..c8bff2916d3d 100644
--- a/drivers/misc/mic/card/mic_x100.c
+++ b/drivers/misc/mic/card/mic_x100.c
@@ -237,6 +237,9 @@ static int __init mic_probe(struct platform_device *pdev)
mdrv->dev = &pdev->dev;
snprintf(mdrv->name, sizeof(mic_driver_name), mic_driver_name);
 
+   /* FIXME: use dma_set_mask_and_coherent() and check result */
+   dma_coerce_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
+
mdev->mmio.pa = MIC_X100_MMIO_BASE;
mdev->mmio.len = MIC_X100_MMIO_LEN;
mdev->mmio.va = devm_ioremap(&pdev->dev, MIC_X100_MMIO_BASE,
@@ -282,18 +285,6 @@ static void mic_platform_shutdown(struct platform_device 
*pdev)
mic_remove(pdev);
 }
 
-static u64 mic_dma_mask = DMA_BIT_MASK(64);
-
-static struct platform_device mic_platform_dev = {
-   .name = mic_driver_name,
-   .id   = 0,
-   .num_resources = 0,
-   .dev = {
-   .dma_mask = &mic_dma_mask,
-   .coherent_dma_mask = DMA_BIT_MASK(64),
-   },
-};
-
 static struct platform_driver __refdata mic_platform_driver = {
.probe = mic_probe,
.remove = mic_remove,
@@ -303,6 +294,8 @@ static struct platform_driver __refdata mic_platform_driver 
= {
},
 };
 
+static struct platform_device *mic_platform_dev;
+
 static int __init mic_init(void)
 {
int ret;
@@ -316,9 +309,12 @@ static int __init mic_init(void)
 
request_module("mic_x100_dma");
mic_init_card_debugfs();
-   ret = platform_device_register(&mic_platform_dev);
+
+   mic_platform_dev = platform_device_register_simple(mic_driver_name,
+  0, NULL, 0);
+   ret = PTR_ERR_OR_ZERO(mic_platform_dev);
if (ret) {
-   pr_err("platform_device_register ret %d\n", ret);
+   pr_err("platform_device_register_full ret %d\n", ret);
goto cleanup_debugfs;
}
ret = platform_driver_register(&mic_platform_driver);
@@ -329,7 +325,7 @@ static int __init mic_init(void)
return ret;
 
 device_unregister:
-   platform_device_unregister(&mic_platform_dev);
+   platform_device_unregister(mic_platform_dev);
 cleanup_debugfs:
mic_exit_card_debugfs();
 done:
@@ -339,7 +335,7 @@ static int __init mic_init(void)
 static void __exit mic_exit(void)
 {
platform_driver_unregister(&mic_platform_driver);
-   platform_device_unregister(&mic_platform_dev);
+   platform_device_unregister(mic_platform_dev);
mic_exit_card_debugfs();
 }
 
-- 
2.20.0

Re: Staging status of speakup

2019-07-12 Thread Okash Khawaja

On Fri, Jul 12, 2019 at 9:38 AM Greg Kroah-Hartman
 wrote:
>
> On Sun, Jul 07, 2019 at 08:57:10AM +0200, Greg Kroah-Hartman wrote:
> > On Sat, Jul 06, 2019 at 08:08:57PM +0100, Okash Khawaja wrote:
> > > On Fri, 15 Mar 2019 20:18:31 -0700
> > > Greg Kroah-Hartman  wrote:
> > >
> > > > On Fri, Mar 15, 2019 at 01:01:27PM +, Okash Khawaja wrote:
> > > > > Hi,
> > > > >
> > > > > We have made progress on the items in TODO file of speakup driver in
> > > > > staging directory and wanted to get some clarity on the remaining
> > > > > items. Below is a summary of status of each item along with the
> > > > > quotes from TODO file.
> > > > >
> > > > > 1. "The first issue has to do with the way speakup communicates
> > > > > with serial ports.  Currently, we communicate directly with the
> > > > > hardware ports. This however conflicts with the standard serial
> > > > > port drivers, which poses various problems. This is also not
> > > > > working for modern hardware such as PCI-based serial ports.  Also,
> > > > > there is not a way we can communicate with USB devices.  The
> > > > > current serial port handling code is in serialio.c in this
> > > > > directory."
> > > > >
> > > > > Drivers for all external synths now use TTY to communcate with the
> > > > > devices. Only ones still using direct communication with hardware
> > > > > ports are internal synths: acntpc, decpc, dtlk and keypc. These are
> > > > > typically ISA cards and generally hardware which is difficult to
> > > > > make work. We can leave these in staging.
> > > >
> > > > Ok, that's fine.
> > > >
> > > > > 2. "Some places are currently using in_atomic() because speakup
> > > > > functions are called in various contexts, and a couple of things
> > > > > can't happen in these cases. Pushing work to some worker thread
> > > > > would probably help, as was already done for the serial port
> > > > > driving part."
> > > > >
> > > > > There aren't any uses of in_atomic anymore. Commit d7500135802c
> > > > > "Staging: speakup: Move pasting into a work item" was the last one
> > > > > that removed such uses.
> > > >
> > > > Great, let's remove that todo item then.
> > > >
> > > > > 3. "There is a duplication of the selection functions in
> > > > > selections.c. These functions should get exported from
> > > > > drivers/char/selection.c (clear_selection notably) and used from
> > > > > there instead."
> > > > >
> > > > > This is yet to be done. I guess drivers/char/selection.c is now
> > > > > under drivers/tty/vt/selection.c.
> > > >
> > > > Yes, someone should update the todo item :)
> > > >
> > > > > 4. "The kobjects may have to move to a more proper place in /sys.The
> > > > > discussion on lkml resulted to putting speech synthesizers in the
> > > > > "speech" class, and the speakup screen reader itself
> > > > > into /sys/class/vtconsole/vtcon0/speakup, the nasty path being
> > > > > handled by userland tools."
> > > > >
> > > > > Although this makes logical sense, the change will mean changing
> > > > > interface with userspace and hence the user space tools. I tried to
> > > > > search the lkml discussion but couldn't find it. It will be good to
> > > > > know your thoughts on this.
> > > >
> > > > I don't remember, sorry.  I can review the kobject/sysfs usage if you
> > > > think it is "good enough" now and see if I find anything
> > > > objectionable.
> > > >
> > > > > Finally there is an issue where text in output buffer sometimes gets
> > > > > garbled on SMP systems, but we can continue working on it after the
> > > > > driver is moved out of staging, if that's okay. Basically we need a
> > > > > reproducer of this issue.
> > > > >
> > > > > In addition to above, there are likely code style issues which will
> > > > > need to be fixed.
> > > > >
> > > > > We are very keen to get speakup out of staging both, for settling
> > > > > the driver but also for getting included in distros which build
> > > > > only the mainline drivers.
> > > >
> > > > That's great, I am glad to see this happen.  How about work on the
> > > > selection thing and then I can review the kobject stuff in a few
> > > > weeks, and then we can start moving things for 5.2?
> > >
> > > Hi Greg,
> > >
> > > Apologies for the delay. I de-duplicated selection code in speakup to
> > > use code that's already in kernel (commit ids 496124e5e16e and
> > > 41f13084506a). Following items are what remain now:
> > >
> > > 1. moving kobjects location
> > > 2. fixing garbled text
> > >
> > > I couldn't replicate garbled text but Simon (also in CC list) is
> > > looking into it.
> > >
> > > Can you please advise on the way forward?
> >
> > I don't think the "garbled text" is an issue to get this out of staging
> > if others do not see this.  It can be fixed like any other bug at a
> > later point if it is figured out.
> >
> > The kobject stuff does need to be looked at.  Let me carve out some time
> > next week to do that and I will let you know what I see/recommend.
>
> At first glance,

Re: [PATCH v3] arm64: dts: sdm845: Add video nodes

2019-07-12 Thread Rajendra Nayak





On 7/2/2019 5:42 PM, Aniket Masule wrote:

From: Malathi Gottam 

This adds video nodes to sdm845 based on the examples
in the bindings.

Signed-off-by: Malathi Gottam 
Co-developed-by: Aniket Masule 
Signed-off-by: Aniket Masule 


Reviewed-by: Rajendra Nayak 


---
  arch/arm64/boot/dts/qcom/sdm845.dtsi | 30 ++
  1 file changed, 30 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index fcb9330..f3cd94f 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -1893,6 +1893,36 @@
};
};
  
+		video-codec@aa0 {

+   compatible = "qcom,sdm845-venus";
+   reg = <0 0x0aa0 0 0xff000>;
+   interrupts = ;
+   power-domains = <&videocc VENUS_GDSC>;
+   clocks = <&videocc VIDEO_CC_VENUS_CTL_CORE_CLK>,
+<&videocc VIDEO_CC_VENUS_AHB_CLK>,
+<&videocc VIDEO_CC_VENUS_CTL_AXI_CLK>;
+   clock-names = "core", "iface", "bus";
+   iommus = <&apps_smmu 0x10a0 0x8>,
+<&apps_smmu 0x10b0 0x0>;
+   memory-region = <&venus_mem>;
+
+   video-core0 {
+   compatible = "venus-decoder";
+   clocks = <&videocc VIDEO_CC_VCODEC0_CORE_CLK>,
+<&videocc VIDEO_CC_VCODEC0_AXI_CLK>;
+   clock-names = "core", "bus";
+   power-domains = <&videocc VCODEC0_GDSC>;
+   };
+
+   video-core1 {
+   compatible = "venus-encoder";
+   clocks = <&videocc VIDEO_CC_VCODEC1_CORE_CLK>,
+<&videocc VIDEO_CC_VCODEC1_AXI_CLK>;
+   clock-names = "core", "bus";
+   power-domains = <&videocc VCODEC1_GDSC>;
+   };
+   };
+
videocc: clock-controller@ab0 {
compatible = "qcom,sdm845-videocc";
reg = <0 0x0ab0 0 0x1>;



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

1 2 3 4 5 6 7 8 9 >

1 - 100 of 863 matches

Mail list logo