Re: Enabling peer to peer device transactions for PCIe devices

2017-10-23 Thread Petrosyan, Ludwig
Yes I agree it has to be started with the write transaction, according of PCIe 
standard all write transaction are address routed, and I agree with Logan:
if in write transaction TLP the endpoint address written in header the TLP 
should not touch CPU, the PCIe Switch has to route it to endpoint.
The idea was: in MTCA system there is PCIe Switch on MCH (MTCA crate HUB) this 
switch connects CPU to other Crate Slots, so one port is Upstream and others 
are Downstream  ports, DMA read from CPU is usual write on endpoint side, 
Xilinx DMA core has two registers Destination Address and Source Address,
device driver to make DMA has to set up these registers,
usually device driver to start DMA read from Board sets Source address as FPGA 
memory address and Destination address is DMA prepared system address,
in case of testing p2p I set Destination address as physical address of other 
endpoint.
More detailed:
we have so called pcie universal driver: the idea behind is
1. all pcie configuration staff, find enabled BARs, mapping BARs, usual 
read/write and common ioctl (get slot number, get driver version ...) 
implemented in universal driver and EXPORTed.
2. if some system function in new kernel are changed we change it only in 
universal parts (easy support a big number of drivers )
so the universal driver something like PCIe Driver API
3. the universal driver provides read/writ functions so we have the same device 
access API for any PCIe device, we could use the same user application with any 
PCIe device

now. during BARs finding and mapping universal driver keeps pcie endpoint 
physical address in some internal structures, any top driver may get physical 
address
of other pcie endpoint by slot number.
in may case during get_resorce the physical address is 0xB200, I check 
lspci -H1 - -s psie switch port bus address (the endpoint connected to this 
port, checked by lspci -H1 -t) the same address (0xB20) is the memory 
behind bridge, 
I want to make p2p writes to offset 0x4, so I set DMA destination address 
0xB240
is something wrong?

thanks for help
regards

Ludwig

- Original Message -
From: "Logan Gunthorpe" 
To: "David Laight" , "Petrosyan, Ludwig" 

Cc: "Alexander Deucher" , "linux-kernel" 
, "linux-rdma" , 
"linux-nvdimm" , "Linux-media" 
, "dri-devel" , 
"linux-pci" , "John Bridgman" 
, "Felix Kuehling" , "Serguei 
Sagalovitch" , "Paul Blinzer" 
, "Christian Koenig" , "Suravee 
Suthikulpanit" , "Ben Sander" 

Sent: Tuesday, 24 October, 2017 00:04:26
Subject: Re: Enabling peer to peer device transactions for PCIe devices

On 23/10/17 10:08 AM, David Laight wrote:
> It is also worth checking that the hardware actually supports p2p transfers.
> Writes are more likely to be supported then reads.
> ISTR that some intel cpus support some p2p writes, but there could easily
> be errata against them.

Ludwig mentioned a PCIe switch. The few switches I'm aware of support 
P2P transfers. So if everything is setup correctly, the TLPs shouldn't 
even touch the CPU.

But, yes, generally it's a good idea to start with writes and see if 
they work first.

Logan


Re: Enabling peer to peer device transactions for PCIe devices

2017-10-23 Thread Petrosyan, Ludwig
Yes I agree it has to be started with the write transaction, according of PCIe 
standard all write transaction are address routed, and I agree with Logan:
if in write transaction TLP the endpoint address written in header the TLP 
should not touch CPU, the PCIe Switch has to route it to endpoint.
The idea was: in MTCA system there is PCIe Switch on MCH (MTCA crate HUB) this 
switch connects CPU to other Crate Slots, so one port is Upstream and others 
are Downstream  ports, DMA read from CPU is usual write on endpoint side, 
Xilinx DMA core has two registers Destination Address and Source Address,
device driver to make DMA has to set up these registers,
usually device driver to start DMA read from Board sets Source address as FPGA 
memory address and Destination address is DMA prepared system address,
in case of testing p2p I set Destination address as physical address of other 
endpoint.
More detailed:
we have so called pcie universal driver: the idea behind is
1. all pcie configuration staff, find enabled BARs, mapping BARs, usual 
read/write and common ioctl (get slot number, get driver version ...) 
implemented in universal driver and EXPORTed.
2. if some system function in new kernel are changed we change it only in 
universal parts (easy support a big number of drivers )
so the universal driver something like PCIe Driver API
3. the universal driver provides read/writ functions so we have the same device 
access API for any PCIe device, we could use the same user application with any 
PCIe device

now. during BARs finding and mapping universal driver keeps pcie endpoint 
physical address in some internal structures, any top driver may get physical 
address
of other pcie endpoint by slot number.
in may case during get_resorce the physical address is 0xB200, I check 
lspci -H1 - -s psie switch port bus address (the endpoint connected to this 
port, checked by lspci -H1 -t) the same address (0xB20) is the memory 
behind bridge, 
I want to make p2p writes to offset 0x4, so I set DMA destination address 
0xB240
is something wrong?

thanks for help
regards

Ludwig

- Original Message -
From: "Logan Gunthorpe" 
To: "David Laight" , "Petrosyan, Ludwig" 

Cc: "Alexander Deucher" , "linux-kernel" 
, "linux-rdma" , 
"linux-nvdimm" , "Linux-media" 
, "dri-devel" , 
"linux-pci" , "John Bridgman" 
, "Felix Kuehling" , "Serguei 
Sagalovitch" , "Paul Blinzer" 
, "Christian Koenig" , "Suravee 
Suthikulpanit" , "Ben Sander" 

Sent: Tuesday, 24 October, 2017 00:04:26
Subject: Re: Enabling peer to peer device transactions for PCIe devices

On 23/10/17 10:08 AM, David Laight wrote:
> It is also worth checking that the hardware actually supports p2p transfers.
> Writes are more likely to be supported then reads.
> ISTR that some intel cpus support some p2p writes, but there could easily
> be errata against them.

Ludwig mentioned a PCIe switch. The few switches I'm aware of support 
P2P transfers. So if everything is setup correctly, the TLPs shouldn't 
even touch the CPU.

But, yes, generally it's a good idea to start with writes and see if 
they work first.

Logan


Re: [PATCH 09/19] net: average: Kill off ACCESS_ONCE()

2017-10-23 Thread Johannes Berg
On Mon, 2017-10-23 at 21:07 +, Paul E. McKenney wrote:
> From: Mark Rutland 
> 
> For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
> preference to ACCESS_ONCE(), and new code is expected to use one of the
> former. So far, there's been no reason to change most existing uses of
> ACCESS_ONCE(), as these aren't currently harmful.
> 
> However, for some features it is necessary to instrument reads and
> writes separately, which is not possible with ACCESS_ONCE(). This
> distinction is critical to correct operation.
> 
> It's possible to transform the bulk of kernel code using the Coccinelle
> script below. However, this doesn't pick up some uses, including those
> in . As a preparatory step, this patch converts the
> file to use {READ,WRITE}_ONCE() consistently.
> 
> At the same time, this patch addds missing includes necessary for
> {READ,WRITE}_ONCE(), *BUG_ON*(), and ilog2().
> 
> 
> virtual patch
> 
> @ depends on patch @
> expression E1, E2;
> @@
> 
> - ACCESS_ONCE(E1) = E2
> + WRITE_ONCE(E1, E2)
> 
> @ depends on patch @
> expression E;
> @@
> 
> - ACCESS_ONCE(E)
> + READ_ONCE(E)
> 
> 
> Signed-off-by: Mark Rutland 
> Cc: Johannes Berg 

Reviewed-by: Johannes Berg 

Let me know if you want me to apply this, since I seem to be the
average.h maintainer :-)

johannes


Re: [PATCH 09/19] net: average: Kill off ACCESS_ONCE()

2017-10-23 Thread Johannes Berg
On Mon, 2017-10-23 at 21:07 +, Paul E. McKenney wrote:
> From: Mark Rutland 
> 
> For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
> preference to ACCESS_ONCE(), and new code is expected to use one of the
> former. So far, there's been no reason to change most existing uses of
> ACCESS_ONCE(), as these aren't currently harmful.
> 
> However, for some features it is necessary to instrument reads and
> writes separately, which is not possible with ACCESS_ONCE(). This
> distinction is critical to correct operation.
> 
> It's possible to transform the bulk of kernel code using the Coccinelle
> script below. However, this doesn't pick up some uses, including those
> in . As a preparatory step, this patch converts the
> file to use {READ,WRITE}_ONCE() consistently.
> 
> At the same time, this patch addds missing includes necessary for
> {READ,WRITE}_ONCE(), *BUG_ON*(), and ilog2().
> 
> 
> virtual patch
> 
> @ depends on patch @
> expression E1, E2;
> @@
> 
> - ACCESS_ONCE(E1) = E2
> + WRITE_ONCE(E1, E2)
> 
> @ depends on patch @
> expression E;
> @@
> 
> - ACCESS_ONCE(E)
> + READ_ONCE(E)
> 
> 
> Signed-off-by: Mark Rutland 
> Cc: Johannes Berg 

Reviewed-by: Johannes Berg 

Let me know if you want me to apply this, since I seem to be the
average.h maintainer :-)

johannes


Re: [PATCH 3/3] kdump: round up the total memory size to 128M for crashkernel reservation

2017-10-23 Thread Baoquan He
Hi Dave,

On 10/24/17 at 01:31pm, Dave Young wrote:
> The total memory size we get in kernel is usually slightly less than 2G with a
> 2G memory module machine. The main reason is bios/firmware reserve some area
> it will not export all memory as usable to Linux.
> 
> 2G memory X86 kvm guest test result of the total_mem value:
> UEFI boot with ovmf: 0x7ef1
> Legacy boot kvm guest: 0x7ff7cc00
> 
> An option is to use dmi/smbios to get physical memory size, but it's not
> reliable as well. According to Prarit hardware vendors sometimes screw this 
> up.
> Thus we choose to round up total size to 128M to workaround this problem.
> This is a best effort workaround, will improve it when we have better way
> in the future. 

Thanks for this posting. While I don't get the point of this patch. So
firmware take piece of memory, then why we need to count it into the
total memory which we want to calculate a crashkernel memory based on.

Not counting that, is there anyting incorrect?

Thanks
Baoquan

> 
> Signed-off-by: Dave Young 
> ---
>  kernel/crash_core.c |   17 +
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> --- linux.orig/kernel/crash_core.c
> +++ linux/kernel/crash_core.c
> @@ -42,6 +42,15 @@ static int __init parse_crashkernel_mem(
>  {
>   char *cur = cmdline, *tmp;
>   bool infinite_end = false;
> + unsigned long long total_mem = system_ram;
> +
> + /*
> +  * Firmware usually reserves some memory regions for it's own use.
> +  * so we get less than actual system memory size.
> +  * We workaround this by round up the total size to 128M which is
> +  * enough for most test cases.
> +  */
> + total_mem = roundup(total_mem, 0x800);
>  
>   /* for each entry of the comma-separated list */
>   do {
> @@ -86,13 +95,13 @@ static int __init parse_crashkernel_mem(
>   return -EINVAL;
>   }
>   cur = tmp;
> - if (size >= system_ram) {
> + if (size >= total_mem) {
>   pr_warn("crashkernel: invalid size\n");
>   return -EINVAL;
>   }
>  
>   /* match ? */
> - if (system_ram >= start && system_ram < end) {
> + if (total_mem >= start && total_mem < end) {
>   *crash_size = size;
>   if (end == ULLONG_MAX)
>   infinite_end = true;
> @@ -126,9 +135,9 @@ static int __init parse_crashkernel_mem(
>   pr_warn("Memory reservation scale order 
> expected after '^'\n");
>   return -EINVAL;
>   }
> - size = (system_ram - *crash_size) >> shift;
> + size = (total_mem - *crash_size) >> shift;
>   size = *crash_size + roundup(size, 1ULL << 20);
> - if (size < system_ram)
> + if (size < total_mem)
>   *crash_size = size;
>   cur = tmp;
>   } else
> 
> 


Re: [PATCH 3/3] kdump: round up the total memory size to 128M for crashkernel reservation

2017-10-23 Thread Baoquan He
Hi Dave,

On 10/24/17 at 01:31pm, Dave Young wrote:
> The total memory size we get in kernel is usually slightly less than 2G with a
> 2G memory module machine. The main reason is bios/firmware reserve some area
> it will not export all memory as usable to Linux.
> 
> 2G memory X86 kvm guest test result of the total_mem value:
> UEFI boot with ovmf: 0x7ef1
> Legacy boot kvm guest: 0x7ff7cc00
> 
> An option is to use dmi/smbios to get physical memory size, but it's not
> reliable as well. According to Prarit hardware vendors sometimes screw this 
> up.
> Thus we choose to round up total size to 128M to workaround this problem.
> This is a best effort workaround, will improve it when we have better way
> in the future. 

Thanks for this posting. While I don't get the point of this patch. So
firmware take piece of memory, then why we need to count it into the
total memory which we want to calculate a crashkernel memory based on.

Not counting that, is there anyting incorrect?

Thanks
Baoquan

> 
> Signed-off-by: Dave Young 
> ---
>  kernel/crash_core.c |   17 +
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> --- linux.orig/kernel/crash_core.c
> +++ linux/kernel/crash_core.c
> @@ -42,6 +42,15 @@ static int __init parse_crashkernel_mem(
>  {
>   char *cur = cmdline, *tmp;
>   bool infinite_end = false;
> + unsigned long long total_mem = system_ram;
> +
> + /*
> +  * Firmware usually reserves some memory regions for it's own use.
> +  * so we get less than actual system memory size.
> +  * We workaround this by round up the total size to 128M which is
> +  * enough for most test cases.
> +  */
> + total_mem = roundup(total_mem, 0x800);
>  
>   /* for each entry of the comma-separated list */
>   do {
> @@ -86,13 +95,13 @@ static int __init parse_crashkernel_mem(
>   return -EINVAL;
>   }
>   cur = tmp;
> - if (size >= system_ram) {
> + if (size >= total_mem) {
>   pr_warn("crashkernel: invalid size\n");
>   return -EINVAL;
>   }
>  
>   /* match ? */
> - if (system_ram >= start && system_ram < end) {
> + if (total_mem >= start && total_mem < end) {
>   *crash_size = size;
>   if (end == ULLONG_MAX)
>   infinite_end = true;
> @@ -126,9 +135,9 @@ static int __init parse_crashkernel_mem(
>   pr_warn("Memory reservation scale order 
> expected after '^'\n");
>   return -EINVAL;
>   }
> - size = (system_ram - *crash_size) >> shift;
> + size = (total_mem - *crash_size) >> shift;
>   size = *crash_size + roundup(size, 1ULL << 20);
> - if (size < system_ram)
> + if (size < total_mem)
>   *crash_size = size;
>   cur = tmp;
>   } else
> 
> 


[PATCH] arm: ubsan: select ARCH_HAS_UBSAN_SANITIZE_ALL

2017-10-23 Thread Seung-Woo Kim
Select ARCH_HAS_UBSAN_SANITIZE_ALL from arm confiuration to enable UBSAN
on arm.

Signed-off-by: Seung-Woo Kim 
---
 arch/arm/Kconfig  |1 +
 arch/arm/boot/compressed/Makefile |2 ++
 arch/arm/vdso/Makefile|2 ++
 3 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 7888c98..5eb3c89 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -11,6 +11,7 @@ config ARM
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAVE_CUSTOM_GPIO_H
select ARCH_HAS_GCOV_PROFILE_ALL
+   select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT if CPU_V7
diff --git a/arch/arm/boot/compressed/Makefile 
b/arch/arm/boot/compressed/Makefile
index d50430c..8f06ab2 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -24,6 +24,8 @@ endif
 
 GCOV_PROFILE   := n
 
+UBSAN_SANITIZE := n
+
 #
 # Architecture dependencies
 #
diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
index 59a8fa7..e343700 100644
--- a/arch/arm/vdso/Makefile
+++ b/arch/arm/vdso/Makefile
@@ -29,6 +29,8 @@ CFLAGS_vgettimeofday.o = -O2
 # Disable gcov profiling for VDSO code
 GCOV_PROFILE := n
 
+UBSAN_SANITIZE := n
+
 # Force dependency
 $(obj)/vdso.o : $(obj)/vdso.so
 
-- 
1.7.4.1



[PATCH] arm: ubsan: select ARCH_HAS_UBSAN_SANITIZE_ALL

2017-10-23 Thread Seung-Woo Kim
Select ARCH_HAS_UBSAN_SANITIZE_ALL from arm confiuration to enable UBSAN
on arm.

Signed-off-by: Seung-Woo Kim 
---
 arch/arm/Kconfig  |1 +
 arch/arm/boot/compressed/Makefile |2 ++
 arch/arm/vdso/Makefile|2 ++
 3 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 7888c98..5eb3c89 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -11,6 +11,7 @@ config ARM
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAVE_CUSTOM_GPIO_H
select ARCH_HAS_GCOV_PROFILE_ALL
+   select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT if CPU_V7
diff --git a/arch/arm/boot/compressed/Makefile 
b/arch/arm/boot/compressed/Makefile
index d50430c..8f06ab2 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -24,6 +24,8 @@ endif
 
 GCOV_PROFILE   := n
 
+UBSAN_SANITIZE := n
+
 #
 # Architecture dependencies
 #
diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
index 59a8fa7..e343700 100644
--- a/arch/arm/vdso/Makefile
+++ b/arch/arm/vdso/Makefile
@@ -29,6 +29,8 @@ CFLAGS_vgettimeofday.o = -O2
 # Disable gcov profiling for VDSO code
 GCOV_PROFILE := n
 
+UBSAN_SANITIZE := n
+
 # Force dependency
 $(obj)/vdso.o : $(obj)/vdso.so
 
-- 
1.7.4.1



[PATCH v5 2/2] dma: sprd: Add Spreadtrum DMA driver

2017-10-23 Thread Baolin Wang
This patch adds the DMA controller driver for Spreadtrum SC9860 platform.

Signed-off-by: Baolin Wang 
---
Changes since v4:
 - Correct lisence.
 - Rename SPRD_DMA_WAIT_BDONE macro to SPRD_DMA_WAIT_BDONE_OFFSET.
 - Optimize sprd_dma_chn_update() function.
 - Print warning when getting incorrect interrupt type in 
sprd_dma_get_int_type()
   and will check the incorrect interrupt type in sprd_dma_check_trans_done().
 - Check if txstate is valid in sprd_dma_tx_status().
 - Replace of_property_read_u32() with device_property_read_u32().
 - Other coding style fixes.

Changes since v3:
 - Remove reduandant local 'mask' and 'val' variables.
 - Simplify sprd_dma_get_req_type() function.
 - Change  pm_runtime_put_sync() to  pm_runtime_put() in probe function.
 - Simplify sizeof function.
 - Other coding style fixes.

Changes since v2:
 - Add COMPILE_TEST as dependency.
 - Renamed DMA macro definition properly.
 - Add sprd_dma_chn_update() helpers to save lots of code.
 - Change pm_runtime_put_sync() to pm_runtime_put() when free resources.
 - Re-implement the sprd_dma_tx_status() function.
 - Free irq and kill tasklet when remove driver.
 - Add some documentaion.
 - Change to module_init() level and add MODULE_ALIAS().
 - Other coding style fixes.

Changes since v1:
 - Use virt-dma to manage dma descriptors.
 - Remove link-list and channel-start-channel Spreadtrum special features.
 - Remove device_config implementation.
 - Other optimization.

Note: This patch just implemented the basic DMA_MEMCPY function, and in
future we will send new patches to introduce some Speadtrum special features,
that will be talk about how to handle these features easily instead of in one
big patch which is hard to review.
---
 drivers/dma/Kconfig|8 +
 drivers/dma/Makefile   |1 +
 drivers/dma/sprd-dma.c |  988 
 3 files changed, 997 insertions(+)
 create mode 100644 drivers/dma/sprd-dma.c

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index fadc4d8..a2aa7fe 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -483,6 +483,14 @@ config STM32_DMA
  If you have a board based on such a MCU and wish to use DMA say Y
  here.
 
+config SPRD_DMA
+   tristate "Spreadtrum DMA support"
+   depends on ARCH_SPRD || COMPILE_TEST
+   select DMA_ENGINE
+   select DMA_VIRTUAL_CHANNELS
+   help
+ Enable support for the on-chip DMA controller on Spreadtrum platform.
+
 config S3C24XX_DMAC
bool "Samsung S3C24XX DMA support"
depends on ARCH_S3C24XX || COMPILE_TEST
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index f08f8de..9e7ec34 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_RENESAS_DMA) += sh/
 obj-$(CONFIG_SIRF_DMA) += sirf-dma.o
 obj-$(CONFIG_STE_DMA40) += ste_dma40.o ste_dma40_ll.o
 obj-$(CONFIG_STM32_DMA) += stm32-dma.o
+obj-$(CONFIG_SPRD_DMA) += sprd-dma.o
 obj-$(CONFIG_S3C24XX_DMAC) += s3c24xx-dma.o
 obj-$(CONFIG_TXX9_DMAC) += txx9dmac.o
 obj-$(CONFIG_TEGRA20_APB_DMA) += tegra20-apb-dma.o
diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
new file mode 100644
index 000..b652071
--- /dev/null
+++ b/drivers/dma/sprd-dma.c
@@ -0,0 +1,988 @@
+/*
+ * Copyright (C) 2017 Spreadtrum Communications Inc.
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "virt-dma.h"
+
+#define SPRD_DMA_CHN_REG_OFFSET0x1000
+#define SPRD_DMA_CHN_REG_LENGTH0x40
+#define SPRD_DMA_MEMCPY_MIN_SIZE   64
+
+/* DMA global registers definition */
+#define SPRD_DMA_GLB_PAUSE 0x0
+#define SPRD_DMA_GLB_FRAG_WAIT 0x4
+#define SPRD_DMA_GLB_REQ_PEND0_EN  0x8
+#define SPRD_DMA_GLB_REQ_PEND1_EN  0xc
+#define SPRD_DMA_GLB_INT_RAW_STS   0x10
+#define SPRD_DMA_GLB_INT_MSK_STS   0x14
+#define SPRD_DMA_GLB_REQ_STS   0x18
+#define SPRD_DMA_GLB_CHN_EN_STS0x1c
+#define SPRD_DMA_GLB_DEBUG_STS 0x20
+#define SPRD_DMA_GLB_ARB_SEL_STS   0x24
+#define SPRD_DMA_GLB_REQ_UID(uid)  (0x4 * ((uid) - 1))
+#define SPRD_DMA_GLB_REQ_UID_OFFSET0x2000
+
+/* DMA channel registers definition */
+#define SPRD_DMA_CHN_PAUSE 0x0
+#define SPRD_DMA_CHN_REQ   0x4
+#define SPRD_DMA_CHN_CFG   0x8
+#define SPRD_DMA_CHN_INTC  0xc
+#define SPRD_DMA_CHN_SRC_ADDR  0x10
+#define SPRD_DMA_CHN_DES_ADDR  0x14
+#define SPRD_DMA_CHN_FRG_LEN   0x18
+#define SPRD_DMA_CHN_BLK_LEN   0x1c
+#define SPRD_DMA_CHN_TRSC_LEN  0x20
+#define SPRD_DMA_CHN_TRSF_STEP 0x24
+#define SPRD_DMA_CHN_WARP_PTR  0x28
+#define SPRD_DMA_CHN_WARP_TO   0x2c
+#define SPRD_DMA_CHN_LLIST_PTR 0x30
+#define SPRD_DMA_CHN_FRAG_STEP 0x34
+#define 

[PATCH v5 2/2] dma: sprd: Add Spreadtrum DMA driver

2017-10-23 Thread Baolin Wang
This patch adds the DMA controller driver for Spreadtrum SC9860 platform.

Signed-off-by: Baolin Wang 
---
Changes since v4:
 - Correct lisence.
 - Rename SPRD_DMA_WAIT_BDONE macro to SPRD_DMA_WAIT_BDONE_OFFSET.
 - Optimize sprd_dma_chn_update() function.
 - Print warning when getting incorrect interrupt type in 
sprd_dma_get_int_type()
   and will check the incorrect interrupt type in sprd_dma_check_trans_done().
 - Check if txstate is valid in sprd_dma_tx_status().
 - Replace of_property_read_u32() with device_property_read_u32().
 - Other coding style fixes.

Changes since v3:
 - Remove reduandant local 'mask' and 'val' variables.
 - Simplify sprd_dma_get_req_type() function.
 - Change  pm_runtime_put_sync() to  pm_runtime_put() in probe function.
 - Simplify sizeof function.
 - Other coding style fixes.

Changes since v2:
 - Add COMPILE_TEST as dependency.
 - Renamed DMA macro definition properly.
 - Add sprd_dma_chn_update() helpers to save lots of code.
 - Change pm_runtime_put_sync() to pm_runtime_put() when free resources.
 - Re-implement the sprd_dma_tx_status() function.
 - Free irq and kill tasklet when remove driver.
 - Add some documentaion.
 - Change to module_init() level and add MODULE_ALIAS().
 - Other coding style fixes.

Changes since v1:
 - Use virt-dma to manage dma descriptors.
 - Remove link-list and channel-start-channel Spreadtrum special features.
 - Remove device_config implementation.
 - Other optimization.

Note: This patch just implemented the basic DMA_MEMCPY function, and in
future we will send new patches to introduce some Speadtrum special features,
that will be talk about how to handle these features easily instead of in one
big patch which is hard to review.
---
 drivers/dma/Kconfig|8 +
 drivers/dma/Makefile   |1 +
 drivers/dma/sprd-dma.c |  988 
 3 files changed, 997 insertions(+)
 create mode 100644 drivers/dma/sprd-dma.c

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index fadc4d8..a2aa7fe 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -483,6 +483,14 @@ config STM32_DMA
  If you have a board based on such a MCU and wish to use DMA say Y
  here.
 
+config SPRD_DMA
+   tristate "Spreadtrum DMA support"
+   depends on ARCH_SPRD || COMPILE_TEST
+   select DMA_ENGINE
+   select DMA_VIRTUAL_CHANNELS
+   help
+ Enable support for the on-chip DMA controller on Spreadtrum platform.
+
 config S3C24XX_DMAC
bool "Samsung S3C24XX DMA support"
depends on ARCH_S3C24XX || COMPILE_TEST
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index f08f8de..9e7ec34 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_RENESAS_DMA) += sh/
 obj-$(CONFIG_SIRF_DMA) += sirf-dma.o
 obj-$(CONFIG_STE_DMA40) += ste_dma40.o ste_dma40_ll.o
 obj-$(CONFIG_STM32_DMA) += stm32-dma.o
+obj-$(CONFIG_SPRD_DMA) += sprd-dma.o
 obj-$(CONFIG_S3C24XX_DMAC) += s3c24xx-dma.o
 obj-$(CONFIG_TXX9_DMAC) += txx9dmac.o
 obj-$(CONFIG_TEGRA20_APB_DMA) += tegra20-apb-dma.o
diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
new file mode 100644
index 000..b652071
--- /dev/null
+++ b/drivers/dma/sprd-dma.c
@@ -0,0 +1,988 @@
+/*
+ * Copyright (C) 2017 Spreadtrum Communications Inc.
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "virt-dma.h"
+
+#define SPRD_DMA_CHN_REG_OFFSET0x1000
+#define SPRD_DMA_CHN_REG_LENGTH0x40
+#define SPRD_DMA_MEMCPY_MIN_SIZE   64
+
+/* DMA global registers definition */
+#define SPRD_DMA_GLB_PAUSE 0x0
+#define SPRD_DMA_GLB_FRAG_WAIT 0x4
+#define SPRD_DMA_GLB_REQ_PEND0_EN  0x8
+#define SPRD_DMA_GLB_REQ_PEND1_EN  0xc
+#define SPRD_DMA_GLB_INT_RAW_STS   0x10
+#define SPRD_DMA_GLB_INT_MSK_STS   0x14
+#define SPRD_DMA_GLB_REQ_STS   0x18
+#define SPRD_DMA_GLB_CHN_EN_STS0x1c
+#define SPRD_DMA_GLB_DEBUG_STS 0x20
+#define SPRD_DMA_GLB_ARB_SEL_STS   0x24
+#define SPRD_DMA_GLB_REQ_UID(uid)  (0x4 * ((uid) - 1))
+#define SPRD_DMA_GLB_REQ_UID_OFFSET0x2000
+
+/* DMA channel registers definition */
+#define SPRD_DMA_CHN_PAUSE 0x0
+#define SPRD_DMA_CHN_REQ   0x4
+#define SPRD_DMA_CHN_CFG   0x8
+#define SPRD_DMA_CHN_INTC  0xc
+#define SPRD_DMA_CHN_SRC_ADDR  0x10
+#define SPRD_DMA_CHN_DES_ADDR  0x14
+#define SPRD_DMA_CHN_FRG_LEN   0x18
+#define SPRD_DMA_CHN_BLK_LEN   0x1c
+#define SPRD_DMA_CHN_TRSC_LEN  0x20
+#define SPRD_DMA_CHN_TRSF_STEP 0x24
+#define SPRD_DMA_CHN_WARP_PTR  0x28
+#define SPRD_DMA_CHN_WARP_TO   0x2c
+#define SPRD_DMA_CHN_LLIST_PTR 0x30
+#define SPRD_DMA_CHN_FRAG_STEP 0x34
+#define SPRD_DMA_CHN_SRC_BLK_STEP  

[PATCH v5 1/2] dt-bindings: dma: Add Spreadtrum SC9860 DMA controller

2017-10-23 Thread Baolin Wang
This patch adds the binding documentation for Spreadtrum SC9860 DMA
controller device.

Signed-off-by: Baolin Wang 
Acked-by: Rob Herring 
---
Changes since v4:
 - No updates.

Changes since v3:
 - No updates.

Changes since v2:
 - No updates.

Changes since v1:
 - Fix typos.
---
 Documentation/devicetree/bindings/dma/sprd-dma.txt |   41 
 1 file changed, 41 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/sprd-dma.txt

diff --git a/Documentation/devicetree/bindings/dma/sprd-dma.txt 
b/Documentation/devicetree/bindings/dma/sprd-dma.txt
new file mode 100644
index 000..7a10fea
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/sprd-dma.txt
@@ -0,0 +1,41 @@
+* Spreadtrum DMA controller
+
+This binding follows the generic DMA bindings defined in dma.txt.
+
+Required properties:
+- compatible: Should be "sprd,sc9860-dma".
+- reg: Should contain DMA registers location and length.
+- interrupts: Should contain one interrupt shared by all channel.
+- #dma-cells: must be <1>. Used to represent the number of integer
+   cells in the dmas property of client device.
+- #dma-channels : Number of DMA channels supported. Should be 32.
+- clock-names: Should contain the clock of the DMA controller.
+- clocks: Should contain a clock specifier for each entry in clock-names.
+
+Example:
+
+Controller:
+apdma: dma-controller@2010 {
+   compatible = "sprd,sc9860-dma";
+   reg = <0x2010 0x4000>;
+   interrupts = ;
+   #dma-cells = <1>;
+   #dma-channels = <32>;
+   clock-names = "enable";
+   clocks = <_ap_ahb_gates 5>;
+};
+
+
+Client:
+DMA clients connected to the Spreadtrum DMA controller must use the format
+described in the dma.txt file, using a two-cell specifier for each channel.
+The two cells in order are:
+1. A phandle pointing to the DMA controller.
+2. The channel id.
+
+spi0: spi@70a0{
+   ...
+   dma-names = "rx_chn", "tx_chn";
+   dmas = < 11>, < 12>;
+   ...
+};
-- 
1.7.9.5



[PATCH v5 1/2] dt-bindings: dma: Add Spreadtrum SC9860 DMA controller

2017-10-23 Thread Baolin Wang
This patch adds the binding documentation for Spreadtrum SC9860 DMA
controller device.

Signed-off-by: Baolin Wang 
Acked-by: Rob Herring 
---
Changes since v4:
 - No updates.

Changes since v3:
 - No updates.

Changes since v2:
 - No updates.

Changes since v1:
 - Fix typos.
---
 Documentation/devicetree/bindings/dma/sprd-dma.txt |   41 
 1 file changed, 41 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/sprd-dma.txt

diff --git a/Documentation/devicetree/bindings/dma/sprd-dma.txt 
b/Documentation/devicetree/bindings/dma/sprd-dma.txt
new file mode 100644
index 000..7a10fea
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/sprd-dma.txt
@@ -0,0 +1,41 @@
+* Spreadtrum DMA controller
+
+This binding follows the generic DMA bindings defined in dma.txt.
+
+Required properties:
+- compatible: Should be "sprd,sc9860-dma".
+- reg: Should contain DMA registers location and length.
+- interrupts: Should contain one interrupt shared by all channel.
+- #dma-cells: must be <1>. Used to represent the number of integer
+   cells in the dmas property of client device.
+- #dma-channels : Number of DMA channels supported. Should be 32.
+- clock-names: Should contain the clock of the DMA controller.
+- clocks: Should contain a clock specifier for each entry in clock-names.
+
+Example:
+
+Controller:
+apdma: dma-controller@2010 {
+   compatible = "sprd,sc9860-dma";
+   reg = <0x2010 0x4000>;
+   interrupts = ;
+   #dma-cells = <1>;
+   #dma-channels = <32>;
+   clock-names = "enable";
+   clocks = <_ap_ahb_gates 5>;
+};
+
+
+Client:
+DMA clients connected to the Spreadtrum DMA controller must use the format
+described in the dma.txt file, using a two-cell specifier for each channel.
+The two cells in order are:
+1. A phandle pointing to the DMA controller.
+2. The channel id.
+
+spi0: spi@70a0{
+   ...
+   dma-names = "rx_chn", "tx_chn";
+   dmas = < 11>, < 12>;
+   ...
+};
-- 
1.7.9.5



Re: [PATCH v5 5/6] input: Add MediaTek PMIC keys support

2017-10-23 Thread Dmitry Torokhov
On Wed, Sep 27, 2017 at 06:44:07PM +0800, Chen Zhong wrote:
> This patch add support to handle MediaTek PMIC MT6397/MT6323 key
> interrupts including pwrkey and homekey, also add setting for
> long press key shutdown behavior.
> 
> Signed-off-by: Chen Zhong 
> ---
>  drivers/input/keyboard/Kconfig |9 +
>  drivers/input/keyboard/Makefile|1 +
>  drivers/input/keyboard/mtk-pmic-keys.c |  341 
> 
>  3 files changed, 351 insertions(+)
>  create mode 100644 drivers/input/keyboard/mtk-pmic-keys.c
> 
> diff --git a/drivers/input/keyboard/Kconfig b/drivers/input/keyboard/Kconfig
> index 4c4ab1c..bd4e20a 100644
> --- a/drivers/input/keyboard/Kconfig
> +++ b/drivers/input/keyboard/Kconfig
> @@ -756,4 +756,13 @@ config KEYBOARD_BCM
> To compile this driver as a module, choose M here: the
> module will be called bcm-keypad.
>  
> +config KEYBOARD_MTK_PMIC
> + tristate "MediaTek PMIC keys support"
> + depends on MFD_MT6397
> + help
> +   Say Y here if you want to use the pmic keys (powerkey/homekey).
> +
> +   To compile this driver as a module, choose M here: the
> +   module will be called pmic-keys.
> +
>  endif
> diff --git a/drivers/input/keyboard/Makefile b/drivers/input/keyboard/Makefile
> index d2338ba..20c0b98 100644
> --- a/drivers/input/keyboard/Makefile
> +++ b/drivers/input/keyboard/Makefile
> @@ -40,6 +40,7 @@ obj-$(CONFIG_KEYBOARD_MATRIX)   += 
> matrix_keypad.o
>  obj-$(CONFIG_KEYBOARD_MAX7359)   += max7359_keypad.o
>  obj-$(CONFIG_KEYBOARD_MCS)   += mcs_touchkey.o
>  obj-$(CONFIG_KEYBOARD_MPR121)+= mpr121_touchkey.o
> +obj-$(CONFIG_KEYBOARD_MTK_PMIC)  += mtk-pmic-keys.o
>  obj-$(CONFIG_KEYBOARD_NEWTON)+= newtonkbd.o
>  obj-$(CONFIG_KEYBOARD_NOMADIK)   += nomadik-ske-keypad.o
>  obj-$(CONFIG_KEYBOARD_NSPIRE)+= nspire-keypad.o
> diff --git a/drivers/input/keyboard/mtk-pmic-keys.c 
> b/drivers/input/keyboard/mtk-pmic-keys.c
> new file mode 100644
> index 000..529fd95
> --- /dev/null
> +++ b/drivers/input/keyboard/mtk-pmic-keys.c
> @@ -0,0 +1,341 @@
> +/*
> + * Copyright (C) 2017 MediaTek, Inc.
> + *
> + * Author: Chen Zhong 
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define MTK_PMIC_PWRKEY_RST_EN_MASK  0x1
> +#define MTK_PMIC_PWRKEY_RST_EN_SHIFT 6
> +#define MTK_PMIC_HOMEKEY_RST_EN_MASK 0x1
> +#define MTK_PMIC_HOMEKEY_RST_EN_SHIFT5
> +#define MTK_PMIC_RST_DU_MASK 0x3
> +#define MTK_PMIC_RST_DU_SHIFT8
> +
> +#define MTK_PMIC_PWRKEY_RST  \
> + (MTK_PMIC_PWRKEY_RST_EN_MASK << MTK_PMIC_PWRKEY_RST_EN_SHIFT)
> +#define MTK_PMIC_HOMEKEY_RST \
> + (MTK_PMIC_HOMEKEY_RST_EN_MASK << MTK_PMIC_HOMEKEY_RST_EN_SHIFT)
> +
> +#define MTK_PMIC_PWRKEY_INDEX0
> +#define MTK_PMIC_HOMEKEY_INDEX   1
> +#define MTK_PMIC_MAX_KEY_COUNT   2
> +
> +struct mtk_pmic_keys_regs {
> + u32 deb_reg;
> + u32 deb_mask;
> + u32 intsel_reg;
> + u32 intsel_mask;
> +};
> +
> +#define MTK_PMIC_KEYS_REGS(_deb_reg, _deb_mask,  \
> + _intsel_reg, _intsel_mask)  \
> +{\
> + .deb_reg= _deb_reg, \
> + .deb_mask   = _deb_mask,\
> + .intsel_reg = _intsel_reg,  \
> + .intsel_mask= _intsel_mask, \
> +}
> +
> +struct mtk_pmic_regs {
> + const struct mtk_pmic_keys_regs keys_regs[MTK_PMIC_MAX_KEY_COUNT];
> + u32 pmic_rst_reg;
> +};
> +
> +static const struct mtk_pmic_regs mt6397_regs = {
> + .keys_regs[MTK_PMIC_PWRKEY_INDEX] =
> + MTK_PMIC_KEYS_REGS(MT6397_CHRSTATUS,
> + 0x8, MT6397_INT_RSV, 0x10),
> + .keys_regs[MTK_PMIC_HOMEKEY_INDEX] =
> + MTK_PMIC_KEYS_REGS(MT6397_OCSTATUS2,
> + 0x10, MT6397_INT_RSV, 0x8),
> + .pmic_rst_reg = MT6397_TOP_RST_MISC,
> +};
> +
> +static const struct mtk_pmic_regs mt6323_regs = {
> + .keys_regs[MTK_PMIC_PWRKEY_INDEX] =
> + MTK_PMIC_KEYS_REGS(MT6323_CHRSTATUS,
> + 0x2, MT6323_INT_MISC_CON, 0x10),
> + .keys_regs[MTK_PMIC_HOMEKEY_INDEX] =
> + MTK_PMIC_KEYS_REGS(MT6323_CHRSTATUS,
> +

Re: [PATCH v5 5/6] input: Add MediaTek PMIC keys support

2017-10-23 Thread Dmitry Torokhov
On Wed, Sep 27, 2017 at 06:44:07PM +0800, Chen Zhong wrote:
> This patch add support to handle MediaTek PMIC MT6397/MT6323 key
> interrupts including pwrkey and homekey, also add setting for
> long press key shutdown behavior.
> 
> Signed-off-by: Chen Zhong 
> ---
>  drivers/input/keyboard/Kconfig |9 +
>  drivers/input/keyboard/Makefile|1 +
>  drivers/input/keyboard/mtk-pmic-keys.c |  341 
> 
>  3 files changed, 351 insertions(+)
>  create mode 100644 drivers/input/keyboard/mtk-pmic-keys.c
> 
> diff --git a/drivers/input/keyboard/Kconfig b/drivers/input/keyboard/Kconfig
> index 4c4ab1c..bd4e20a 100644
> --- a/drivers/input/keyboard/Kconfig
> +++ b/drivers/input/keyboard/Kconfig
> @@ -756,4 +756,13 @@ config KEYBOARD_BCM
> To compile this driver as a module, choose M here: the
> module will be called bcm-keypad.
>  
> +config KEYBOARD_MTK_PMIC
> + tristate "MediaTek PMIC keys support"
> + depends on MFD_MT6397
> + help
> +   Say Y here if you want to use the pmic keys (powerkey/homekey).
> +
> +   To compile this driver as a module, choose M here: the
> +   module will be called pmic-keys.
> +
>  endif
> diff --git a/drivers/input/keyboard/Makefile b/drivers/input/keyboard/Makefile
> index d2338ba..20c0b98 100644
> --- a/drivers/input/keyboard/Makefile
> +++ b/drivers/input/keyboard/Makefile
> @@ -40,6 +40,7 @@ obj-$(CONFIG_KEYBOARD_MATRIX)   += 
> matrix_keypad.o
>  obj-$(CONFIG_KEYBOARD_MAX7359)   += max7359_keypad.o
>  obj-$(CONFIG_KEYBOARD_MCS)   += mcs_touchkey.o
>  obj-$(CONFIG_KEYBOARD_MPR121)+= mpr121_touchkey.o
> +obj-$(CONFIG_KEYBOARD_MTK_PMIC)  += mtk-pmic-keys.o
>  obj-$(CONFIG_KEYBOARD_NEWTON)+= newtonkbd.o
>  obj-$(CONFIG_KEYBOARD_NOMADIK)   += nomadik-ske-keypad.o
>  obj-$(CONFIG_KEYBOARD_NSPIRE)+= nspire-keypad.o
> diff --git a/drivers/input/keyboard/mtk-pmic-keys.c 
> b/drivers/input/keyboard/mtk-pmic-keys.c
> new file mode 100644
> index 000..529fd95
> --- /dev/null
> +++ b/drivers/input/keyboard/mtk-pmic-keys.c
> @@ -0,0 +1,341 @@
> +/*
> + * Copyright (C) 2017 MediaTek, Inc.
> + *
> + * Author: Chen Zhong 
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define MTK_PMIC_PWRKEY_RST_EN_MASK  0x1
> +#define MTK_PMIC_PWRKEY_RST_EN_SHIFT 6
> +#define MTK_PMIC_HOMEKEY_RST_EN_MASK 0x1
> +#define MTK_PMIC_HOMEKEY_RST_EN_SHIFT5
> +#define MTK_PMIC_RST_DU_MASK 0x3
> +#define MTK_PMIC_RST_DU_SHIFT8
> +
> +#define MTK_PMIC_PWRKEY_RST  \
> + (MTK_PMIC_PWRKEY_RST_EN_MASK << MTK_PMIC_PWRKEY_RST_EN_SHIFT)
> +#define MTK_PMIC_HOMEKEY_RST \
> + (MTK_PMIC_HOMEKEY_RST_EN_MASK << MTK_PMIC_HOMEKEY_RST_EN_SHIFT)
> +
> +#define MTK_PMIC_PWRKEY_INDEX0
> +#define MTK_PMIC_HOMEKEY_INDEX   1
> +#define MTK_PMIC_MAX_KEY_COUNT   2
> +
> +struct mtk_pmic_keys_regs {
> + u32 deb_reg;
> + u32 deb_mask;
> + u32 intsel_reg;
> + u32 intsel_mask;
> +};
> +
> +#define MTK_PMIC_KEYS_REGS(_deb_reg, _deb_mask,  \
> + _intsel_reg, _intsel_mask)  \
> +{\
> + .deb_reg= _deb_reg, \
> + .deb_mask   = _deb_mask,\
> + .intsel_reg = _intsel_reg,  \
> + .intsel_mask= _intsel_mask, \
> +}
> +
> +struct mtk_pmic_regs {
> + const struct mtk_pmic_keys_regs keys_regs[MTK_PMIC_MAX_KEY_COUNT];
> + u32 pmic_rst_reg;
> +};
> +
> +static const struct mtk_pmic_regs mt6397_regs = {
> + .keys_regs[MTK_PMIC_PWRKEY_INDEX] =
> + MTK_PMIC_KEYS_REGS(MT6397_CHRSTATUS,
> + 0x8, MT6397_INT_RSV, 0x10),
> + .keys_regs[MTK_PMIC_HOMEKEY_INDEX] =
> + MTK_PMIC_KEYS_REGS(MT6397_OCSTATUS2,
> + 0x10, MT6397_INT_RSV, 0x8),
> + .pmic_rst_reg = MT6397_TOP_RST_MISC,
> +};
> +
> +static const struct mtk_pmic_regs mt6323_regs = {
> + .keys_regs[MTK_PMIC_PWRKEY_INDEX] =
> + MTK_PMIC_KEYS_REGS(MT6323_CHRSTATUS,
> + 0x2, MT6323_INT_MISC_CON, 0x10),
> + .keys_regs[MTK_PMIC_HOMEKEY_INDEX] =
> + MTK_PMIC_KEYS_REGS(MT6323_CHRSTATUS,
> + 0x4, MT6323_INT_MISC_CON, 0x8),
> + 

Re: [PATCH] of: Devices with pci_epf_bus_type require DMA configuration

2017-10-23 Thread Kishon Vijay Abraham I
Hi,

On Monday 23 October 2017 06:35 PM, Robin Murphy wrote:
> On 23/10/17 06:43, Kishon Vijay Abraham I wrote:
>> Hi,
>>
>> On Wednesday 11 October 2017 10:15 PM, Robin Murphy wrote:
>>> On 11/10/17 09:00, Kishon Vijay Abraham I wrote:
 pci-epc-core.c invokes of_dma_configure in order to configure
 the coherent_dma_mask/dma_mask of endpoint function device. This is
 required for dma_alloc_coherent to succeed in pci function driver
 (pci-epf-test.c). However after
 commit 723288836628bc1c08 ("of: restrict DMA configuration"),
 of_dma_configure doesn't configure the coherent_dma_mask/dma_mask
 of endpoint function device (since it doesn't have dma-ranges
 property), resulting in dma_alloc_coherent in pci endpoint function
 driver to to fail. Fix it by making sure of_dma_configure configures
 coherent_dma_mask/dma_mask irrespective of whether the node has
 dma-ranges property or not.
>>>
>>> Frankly, what the endpoint stuff is doing looks wrong anyway. As I
>>> understand it, the endpoint functions aren't real devices, just a
>>> partitioning of resources - the only piece of hardware actually doing
>>> DMA is the EPC itself, which should already have been configured
>>> appropriately as a platform device.
>>
>> EPF devices use EPC devices which in turn use the actual platform device for
>> configuring the hardware. IMO the devices in one layer shouldn't have to
>> explicitly use devices in another layer other than using clearly defined 
>> API's.
>> Here platform_device is the bottom later, above which is epc_device and on 
>> top
>> is epf_device.
> 
> Note that the "sysdev"-type model that I'm implying already has
> precedent elsewhere, e.g. in the USB layer. Since you already have
> things abstracted behind the pci_epf_{alloc,free}_space() API, this
> seems like a simple change to make.

I got your point. The device in struct pci_epc is also a virtual device (it
doesn't have a driver associated with it). So we'll have to use something like
epf->epc->dev.parent to actually use the platform device.

It has a couple of indirections but it should be okay to change a couple of
API's I think. With that of_dma_configure can also be removed from
pci-epc-core.c. I'll send a patch fixing those.
> 
>> The idea is just by doing the initial setup in the framework, the epf driver 
>> be
>> able to use APIs like dma_alloc_coherent using it's own *device* rather than
>> the EPC's "device".
> 
> OK, but when would that actually happen? Please correct me if I've got
> it wrong, but as best I understand it, the hardware extent of the EPF is
> some registers controlling the config space that the EPC exposes to the
> other end of the PCI link - the only actual DMA happening will be in
> response to PCI traffic in and out of the EPF's BARs, between the EPC
> and the memory backing those BAR regions which is already controlled by
> your API. Sure, the EPF *driver* is free to access whatever memory it

That's correct. The allocated memory is used only using BARs.
> feels like, but as a software construct that doesn't constitute DMA.
> 
> To put it another way, if an endpoint driver *does* call
> dma_alloc_coherent(epf_device, ...), what does it then do with the
> resulting DMA address?
> 
>>> It seems to me that the EPF BAR allocations should just be using the EPC
>>> device directly, rather than trying to pretend the EPFs are distinct DMA
>>> masters.
>>>
>>> Furthermore, now that I've looked:
>>>
dma_addr_t phys_addr;
>>>
>>> please no :(
>>>
>>> (I can easily think of more than one system with an EP-capable DWC PCIe
>>> block integrated behind an IOMMU)
>>
>> hmm.. haven't used IOMMU but won't setting up parent-child relationship 
>> between
>> EPC and EPF help in the case of IOMMU?
> 
> I was merely referring to the characterisation throughout this code that
> a DMA address is a physical address; it isn't, for any number of reasons
> (IOMMUs are just the most obvious). Fortunately it's only a cosmetic
> naming problem, the actual usage looks OK.

right, that was named incorrectly.

Thanks
Kishon


Re: [PATCH] of: Devices with pci_epf_bus_type require DMA configuration

2017-10-23 Thread Kishon Vijay Abraham I
Hi,

On Monday 23 October 2017 06:35 PM, Robin Murphy wrote:
> On 23/10/17 06:43, Kishon Vijay Abraham I wrote:
>> Hi,
>>
>> On Wednesday 11 October 2017 10:15 PM, Robin Murphy wrote:
>>> On 11/10/17 09:00, Kishon Vijay Abraham I wrote:
 pci-epc-core.c invokes of_dma_configure in order to configure
 the coherent_dma_mask/dma_mask of endpoint function device. This is
 required for dma_alloc_coherent to succeed in pci function driver
 (pci-epf-test.c). However after
 commit 723288836628bc1c08 ("of: restrict DMA configuration"),
 of_dma_configure doesn't configure the coherent_dma_mask/dma_mask
 of endpoint function device (since it doesn't have dma-ranges
 property), resulting in dma_alloc_coherent in pci endpoint function
 driver to to fail. Fix it by making sure of_dma_configure configures
 coherent_dma_mask/dma_mask irrespective of whether the node has
 dma-ranges property or not.
>>>
>>> Frankly, what the endpoint stuff is doing looks wrong anyway. As I
>>> understand it, the endpoint functions aren't real devices, just a
>>> partitioning of resources - the only piece of hardware actually doing
>>> DMA is the EPC itself, which should already have been configured
>>> appropriately as a platform device.
>>
>> EPF devices use EPC devices which in turn use the actual platform device for
>> configuring the hardware. IMO the devices in one layer shouldn't have to
>> explicitly use devices in another layer other than using clearly defined 
>> API's.
>> Here platform_device is the bottom later, above which is epc_device and on 
>> top
>> is epf_device.
> 
> Note that the "sysdev"-type model that I'm implying already has
> precedent elsewhere, e.g. in the USB layer. Since you already have
> things abstracted behind the pci_epf_{alloc,free}_space() API, this
> seems like a simple change to make.

I got your point. The device in struct pci_epc is also a virtual device (it
doesn't have a driver associated with it). So we'll have to use something like
epf->epc->dev.parent to actually use the platform device.

It has a couple of indirections but it should be okay to change a couple of
API's I think. With that of_dma_configure can also be removed from
pci-epc-core.c. I'll send a patch fixing those.
> 
>> The idea is just by doing the initial setup in the framework, the epf driver 
>> be
>> able to use APIs like dma_alloc_coherent using it's own *device* rather than
>> the EPC's "device".
> 
> OK, but when would that actually happen? Please correct me if I've got
> it wrong, but as best I understand it, the hardware extent of the EPF is
> some registers controlling the config space that the EPC exposes to the
> other end of the PCI link - the only actual DMA happening will be in
> response to PCI traffic in and out of the EPF's BARs, between the EPC
> and the memory backing those BAR regions which is already controlled by
> your API. Sure, the EPF *driver* is free to access whatever memory it

That's correct. The allocated memory is used only using BARs.
> feels like, but as a software construct that doesn't constitute DMA.
> 
> To put it another way, if an endpoint driver *does* call
> dma_alloc_coherent(epf_device, ...), what does it then do with the
> resulting DMA address?
> 
>>> It seems to me that the EPF BAR allocations should just be using the EPC
>>> device directly, rather than trying to pretend the EPFs are distinct DMA
>>> masters.
>>>
>>> Furthermore, now that I've looked:
>>>
dma_addr_t phys_addr;
>>>
>>> please no :(
>>>
>>> (I can easily think of more than one system with an EP-capable DWC PCIe
>>> block integrated behind an IOMMU)
>>
>> hmm.. haven't used IOMMU but won't setting up parent-child relationship 
>> between
>> EPC and EPF help in the case of IOMMU?
> 
> I was merely referring to the characterisation throughout this code that
> a DMA address is a physical address; it isn't, for any number of reasons
> (IOMMUs are just the most obvious). Fortunately it's only a cosmetic
> naming problem, the actual usage looks OK.

right, that was named incorrectly.

Thanks
Kishon


Re: [RFC PATCH] fs: fsnotify: account fsnotify metadata to kmemcg

2017-10-23 Thread Amir Goldstein
On Tue, Oct 24, 2017 at 7:12 AM, Yang Shi  wrote:
>
>
> On 10/22/17 1:24 AM, Amir Goldstein wrote:
>>
>> On Sat, Oct 21, 2017 at 12:07 AM, Yang Shi  wrote:
>>>
>>>
>>>
>>> On 10/19/17 8:14 PM, Amir Goldstein wrote:


 On Fri, Oct 20, 2017 at 12:20 AM, Yang Shi 
 wrote:
>
>
> We observed some misbehaved user applications might consume significant
> amount of fsnotify slabs silently. It'd better to account those slabs
> in
> kmemcg so that we can get heads up before misbehaved applications use
> too
> much memory silently.



 In what way do they misbehave? create a lot of marks? create a lot of
 events?
 Not reading events in their queue?
>>>
>>>
>>>
>>> It looks both a lot marks and events. I'm not sure if it is the latter
>>> case.
>>> If I knew more about the details of the behavior, I would elaborated more
>>> in
>>> the commit log.
>>
>>
>> If you are not sure, do not refer to user application as "misbehaved".
>> Is updatedb(8) a misbehaved application because it produces a lot of
>> access
>> events?
>
>
> Should be not. It sounds like our in-house applications. But, it is a sort
> of blackbox to me.
>

If you know which process is "misbehaving" you can look at
ls -l /proc//fd |grep notify
and see the anonymous inotify/fanotify file descriptors

then you can look at  /proc//fdinfo/ file of those
file descriptors to learn more about the fanotify flags etc.

...

>
>>
>> But I think there is another problem, not introduced by your change, but
>> could
>> be amplified because of it - when a non-permission event allocation fails,
>> the
>> event is silently dropped, AFAICT, with no indication to listener.
>> That seems like a bug to me, because there is a perfectly safe way to deal
>> with
>> event allocation failure - queue the overflow event.
>
>
> I'm not sure if such issue could be amplified by the accounting since once
> the usage exceeds the limit any following kmem allocation would fail. So, it
> might fail at fsnotify event allocation, or other places, i.e. fork, open
> syscall, etc. So, in most cases the generator even can't generate new event
> any more.
>

To be clear, I did not mean that kmem limit would cause a storm of dropped
events. I meant if you have a listener outside memcp watching a single file
for access/modifications and you have many containers each with its own
limited memcg, then event drops probability goes to infinity as you run more
of those kmem limited containers with event producers.

> The typical output from my LTP test is filesystem dcache allocation error or
> fork error due to kmem limit is reached.

And that should be considered a success result of the test.
The only failure case is when producer touches the file and event is
not delivered
nor an overflow event delivered.
You can probably try to reduce allocation failure for fork and dentry by:
1. pin dentry cache of subject file on test init by opening the file
2. set the low kmem limit after forking

Then you should probably loop the test enough times
in some of the times, producer may fail to access the file
in others if will succeed and produce events properly
and many some times, producer will access the file and event
will be dropped, so event count is lower than access count.



>
>> I am not going to be the one to determine if fixing this alleged bug is a
>> prerequisite for merging your patch, but I think enforcing memory limits
>> on
>> event allocation could amplify that bug, so it should be fixed.
>>
>> The upside is that with both your accounting fix and ENOMEM = overlflow
>> fix, it going to be easy to write a test that verifies both of them:
>> - Run a listener in memcg with limited kmem and unlimited (or very
>> large) event queue
>> - Produce events inside memcg without listener reading them
>> - Read event and expect an OVERFLOW even
>>
>> This is a simple variant of LTP tests inotify05 and fanotify05.
>
>
> I tried to test your patch with LTP, but it sounds not that easy to setup a
> scenario to make fsnotify event allocation just hit the kmem limit, since
> the limit may be hit before a new event is allocated, for example allocating
> dentry cache in open syscall may hit the limit.
>
> So, it sounds the overflow event might be not generated by the producer in
> most cases.
>

Right. not as simple, but maybe still possible as I described above.
Assuming that my patch is not buggy...

Thanks,
Amir.


Re: [RFC PATCH] fs: fsnotify: account fsnotify metadata to kmemcg

2017-10-23 Thread Amir Goldstein
On Tue, Oct 24, 2017 at 7:12 AM, Yang Shi  wrote:
>
>
> On 10/22/17 1:24 AM, Amir Goldstein wrote:
>>
>> On Sat, Oct 21, 2017 at 12:07 AM, Yang Shi  wrote:
>>>
>>>
>>>
>>> On 10/19/17 8:14 PM, Amir Goldstein wrote:


 On Fri, Oct 20, 2017 at 12:20 AM, Yang Shi 
 wrote:
>
>
> We observed some misbehaved user applications might consume significant
> amount of fsnotify slabs silently. It'd better to account those slabs
> in
> kmemcg so that we can get heads up before misbehaved applications use
> too
> much memory silently.



 In what way do they misbehave? create a lot of marks? create a lot of
 events?
 Not reading events in their queue?
>>>
>>>
>>>
>>> It looks both a lot marks and events. I'm not sure if it is the latter
>>> case.
>>> If I knew more about the details of the behavior, I would elaborated more
>>> in
>>> the commit log.
>>
>>
>> If you are not sure, do not refer to user application as "misbehaved".
>> Is updatedb(8) a misbehaved application because it produces a lot of
>> access
>> events?
>
>
> Should be not. It sounds like our in-house applications. But, it is a sort
> of blackbox to me.
>

If you know which process is "misbehaving" you can look at
ls -l /proc//fd |grep notify
and see the anonymous inotify/fanotify file descriptors

then you can look at  /proc//fdinfo/ file of those
file descriptors to learn more about the fanotify flags etc.

...

>
>>
>> But I think there is another problem, not introduced by your change, but
>> could
>> be amplified because of it - when a non-permission event allocation fails,
>> the
>> event is silently dropped, AFAICT, with no indication to listener.
>> That seems like a bug to me, because there is a perfectly safe way to deal
>> with
>> event allocation failure - queue the overflow event.
>
>
> I'm not sure if such issue could be amplified by the accounting since once
> the usage exceeds the limit any following kmem allocation would fail. So, it
> might fail at fsnotify event allocation, or other places, i.e. fork, open
> syscall, etc. So, in most cases the generator even can't generate new event
> any more.
>

To be clear, I did not mean that kmem limit would cause a storm of dropped
events. I meant if you have a listener outside memcp watching a single file
for access/modifications and you have many containers each with its own
limited memcg, then event drops probability goes to infinity as you run more
of those kmem limited containers with event producers.

> The typical output from my LTP test is filesystem dcache allocation error or
> fork error due to kmem limit is reached.

And that should be considered a success result of the test.
The only failure case is when producer touches the file and event is
not delivered
nor an overflow event delivered.
You can probably try to reduce allocation failure for fork and dentry by:
1. pin dentry cache of subject file on test init by opening the file
2. set the low kmem limit after forking

Then you should probably loop the test enough times
in some of the times, producer may fail to access the file
in others if will succeed and produce events properly
and many some times, producer will access the file and event
will be dropped, so event count is lower than access count.



>
>> I am not going to be the one to determine if fixing this alleged bug is a
>> prerequisite for merging your patch, but I think enforcing memory limits
>> on
>> event allocation could amplify that bug, so it should be fixed.
>>
>> The upside is that with both your accounting fix and ENOMEM = overlflow
>> fix, it going to be easy to write a test that verifies both of them:
>> - Run a listener in memcg with limited kmem and unlimited (or very
>> large) event queue
>> - Produce events inside memcg without listener reading them
>> - Read event and expect an OVERFLOW even
>>
>> This is a simple variant of LTP tests inotify05 and fanotify05.
>
>
> I tried to test your patch with LTP, but it sounds not that easy to setup a
> scenario to make fsnotify event allocation just hit the kmem limit, since
> the limit may be hit before a new event is allocated, for example allocating
> dentry cache in open syscall may hit the limit.
>
> So, it sounds the overflow event might be not generated by the producer in
> most cases.
>

Right. not as simple, but maybe still possible as I described above.
Assuming that my patch is not buggy...

Thanks,
Amir.


[PATCH 2/3] X86/kdump: crashkernel=X try to reserve below 896M first then below 4G and MAXMEM

2017-10-23 Thread dyoung
Now crashkernel=X will fail if there's not enough memory at low region
(below 896M) when trying to reserve large memory size.  One can use
crashkernel=xM,high to reserve it at high region (>4G) but it is more
convinient to improve crashkernel=X to: 

 - First try to reserve X below 896M (for being compatible with old
   kexec-tools).
 - If fails, try to reserve X below 4G (swiotlb need to stay below 4G).
 - If fails, try to reserve X from MAXMEM top down.

It's more transparent and user-friendly.

If crashkernel is large and the reserved is beyond 896M, old kexec-tools
is not compatible with new kernel because old kexec-tools can not load
kernel at high memory region, there was an old discussion below:
https://lkml.org/lkml/2013/10/15/601

But actually the behavior is consistent during my test. Suppose
old kernel fail to reserve memory at low areas, kdump does not
work because no meory reserved. With this patch, suppose new kernel
successfully reserved memory at high areas, old kexec-tools still fail
to load kdump kernel (tested 2.0.2), so it is acceptable, no need to
worry about the compatibility.

Here is the test result (kexec-tools 2.0.2, no high memory load
support):
Crashkernel over 4G:
# cat /proc/iomem|grep Crash
  be00-cdff : Crash kernel
  21300-21eff : Crash kernel
# ./kexec  -p /boot/vmlinuz-`uname -r`
Memory for crashkernel is not reserved
Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
Then try loading kdump kernel

crashkernel: 896M-4G:
# cat /proc/iomem|grep Crash
  9600-cdef : Crash kernel
# ./kexec -p /boot/vmlinuz-4.14.0-rc4+
ELF core (kcore) parse failed
Cannot load /boot/vmlinuz-4.14.0-rc4+

Signed-off-by: Dave Young 
---
 arch/x86/kernel/setup.c |   16 
 1 file changed, 16 insertions(+)

--- linux-x86.orig/arch/x86/kernel/setup.c
+++ linux-x86/arch/x86/kernel/setup.c
@@ -568,6 +568,22 @@ static void __init reserve_crashkernel(v
high ? CRASH_ADDR_HIGH_MAX
 : CRASH_ADDR_LOW_MAX,
crash_size, CRASH_ALIGN);
+#ifdef CONFIG_X86_64
+   /*
+* crashkernel=X reserve below 896M fails? Try below 4G
+*/
+   if (!high && !crash_base)
+   crash_base = memblock_find_in_range(CRASH_ALIGN,
+   (1ULL << 32),
+   crash_size, CRASH_ALIGN);
+   /*
+* crashkernel=X reserve below 4G fails? Try MAXMEM
+*/
+   if (!high && !crash_base)
+   crash_base = memblock_find_in_range(CRASH_ALIGN,
+   CRASH_ADDR_HIGH_MAX,
+   crash_size, CRASH_ALIGN);
+#endif
if (!crash_base) {
pr_info("crashkernel reservation failed - No suitable 
area found.\n");
return;




[PATCH 2/3] X86/kdump: crashkernel=X try to reserve below 896M first then below 4G and MAXMEM

2017-10-23 Thread dyoung
Now crashkernel=X will fail if there's not enough memory at low region
(below 896M) when trying to reserve large memory size.  One can use
crashkernel=xM,high to reserve it at high region (>4G) but it is more
convinient to improve crashkernel=X to: 

 - First try to reserve X below 896M (for being compatible with old
   kexec-tools).
 - If fails, try to reserve X below 4G (swiotlb need to stay below 4G).
 - If fails, try to reserve X from MAXMEM top down.

It's more transparent and user-friendly.

If crashkernel is large and the reserved is beyond 896M, old kexec-tools
is not compatible with new kernel because old kexec-tools can not load
kernel at high memory region, there was an old discussion below:
https://lkml.org/lkml/2013/10/15/601

But actually the behavior is consistent during my test. Suppose
old kernel fail to reserve memory at low areas, kdump does not
work because no meory reserved. With this patch, suppose new kernel
successfully reserved memory at high areas, old kexec-tools still fail
to load kdump kernel (tested 2.0.2), so it is acceptable, no need to
worry about the compatibility.

Here is the test result (kexec-tools 2.0.2, no high memory load
support):
Crashkernel over 4G:
# cat /proc/iomem|grep Crash
  be00-cdff : Crash kernel
  21300-21eff : Crash kernel
# ./kexec  -p /boot/vmlinuz-`uname -r`
Memory for crashkernel is not reserved
Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
Then try loading kdump kernel

crashkernel: 896M-4G:
# cat /proc/iomem|grep Crash
  9600-cdef : Crash kernel
# ./kexec -p /boot/vmlinuz-4.14.0-rc4+
ELF core (kcore) parse failed
Cannot load /boot/vmlinuz-4.14.0-rc4+

Signed-off-by: Dave Young 
---
 arch/x86/kernel/setup.c |   16 
 1 file changed, 16 insertions(+)

--- linux-x86.orig/arch/x86/kernel/setup.c
+++ linux-x86/arch/x86/kernel/setup.c
@@ -568,6 +568,22 @@ static void __init reserve_crashkernel(v
high ? CRASH_ADDR_HIGH_MAX
 : CRASH_ADDR_LOW_MAX,
crash_size, CRASH_ALIGN);
+#ifdef CONFIG_X86_64
+   /*
+* crashkernel=X reserve below 896M fails? Try below 4G
+*/
+   if (!high && !crash_base)
+   crash_base = memblock_find_in_range(CRASH_ALIGN,
+   (1ULL << 32),
+   crash_size, CRASH_ALIGN);
+   /*
+* crashkernel=X reserve below 4G fails? Try MAXMEM
+*/
+   if (!high && !crash_base)
+   crash_base = memblock_find_in_range(CRASH_ALIGN,
+   CRASH_ADDR_HIGH_MAX,
+   crash_size, CRASH_ALIGN);
+#endif
if (!crash_base) {
pr_info("crashkernel reservation failed - No suitable 
area found.\n");
return;




[PATCH 1/3] kdump: extend crashkernel=range:size to dynamically increase reservation size

2017-10-23 Thread dyoung
crashkernel=range:size syntax allows to reserve specified size for system
with total memory fall into the specified range. For example:
crashkernel=2G-3G:128M,3G-:256M reserves 128M for system with memory >=2G
and memory <3G, and reserves 256M for system with memory >= 3G

In the above case 256M as a fixed value which can not fulfill very huge 
systems with large memory and IO devices. As memory size increases usually
minimum memory requirement for booting will also increase. It is nearly
impossible for a kernel to run with a fixed limited memory size for all
kinds of systems.

Thus extend the crashkernel=range:size to let user specify the scaling
ratio in kernel cmdline like below:
crashkernel=range:size^order, for example:
crashkernel=2G-:128M^14 reserve 128M + (total_memory - 128M) >> 14
for machines with over 2G memory.

Also s/start-[end]/[start]-end/ in kernel-parameters.txt according to code

Signed-off-by: Dave Young 
---
 Documentation/admin-guide/kernel-parameters.txt |   10 --
 Documentation/kdump/kdump.txt   |7 ++--
 kernel/crash_core.c |   35 +---
 3 files changed, 41 insertions(+), 11 deletions(-)

--- linux-x86.orig/kernel/crash_core.c
+++ linux-x86/kernel/crash_core.c
@@ -31,7 +31,7 @@ static unsigned char *vmcoreinfo_data_sa
 /*
  * This function parses command lines in the format
  *
- *   crashkernel=ramsize-range:size[,...][@offset]
+ *   crashkernel=ramsize-range:size[,...][@offset][^order]
  *
  * The function returns 0 on success and -EINVAL on failure.
  */
@@ -41,6 +41,7 @@ static int __init parse_crashkernel_mem(
unsigned long long *crash_base)
 {
char *cur = cmdline, *tmp;
+   bool infinite_end = false;
 
/* for each entry of the comma-separated list */
do {
@@ -93,13 +94,21 @@ static int __init parse_crashkernel_mem(
/* match ? */
if (system_ram >= start && system_ram < end) {
*crash_size = size;
+   if (end == ULLONG_MAX)
+   infinite_end = true;
break;
}
} while (*cur++ == ',');
 
-   if (*crash_size > 0) {
-   while (*cur && *cur != ' ' && *cur != '@')
+   if (*crash_size <= 0)
+   goto out;
+
+   while (*cur && *cur != ' ') {
+   if (*cur != '@' && *cur != '^') {
cur++;
+   continue;
+   }
+
if (*cur == '@') {
cur++;
*crash_base = memparse(cur, );
@@ -107,10 +116,28 @@ static int __init parse_crashkernel_mem(
pr_warn("Memory value expected after '@'\n");
return -EINVAL;
}
-   }
+   cur = tmp;
+   } else if (*cur == '^' && infinite_end ) {
+   unsigned long long shift, size;
+
+   cur++;
+   shift = memparse(cur, );
+   if (cur == tmp) {
+   pr_warn("Memory reservation scale order 
expected after '^'\n");
+   return -EINVAL;
+   }
+   size = (system_ram - *crash_size) >> shift;
+   size = *crash_size + roundup(size, 1ULL << 20);
+   if (size < system_ram)
+   *crash_size = size;
+   cur = tmp;
+   } else
+   cur++;
}
 
return 0;
+out:
+   return -EINVAL;
 }
 
 /*
--- linux-x86.orig/Documentation/admin-guide/kernel-parameters.txt
+++ linux-x86/Documentation/admin-guide/kernel-parameters.txt
@@ -680,12 +680,14 @@
is selected automatically. Check
Documentation/kdump/kdump.txt for further details.
 
-   crashkernel=range1:size1[,range2:size2,...][@offset]
+   crashkernel=range1:size1[,range2:size2,...][@offset][^order]
[KNL] Same as above, but depends on the memory
in the running system. The syntax of range is
-   start-[end] where start and end are both
-   a memory unit (amount[KMG]). See also
-   Documentation/kdump/kdump.txt for an example.
+   [start]-end where start and end are both
+   a memory unit (amount[KMG]). In case the end of the
+   range is infinity '^order' can be used to scale
+   the size to size + (total_mem - start) >> 2^order
+   See also Documentation/kdump/kdump.txt for an example.
 
crashkernel=size[KMG],high
[KNL, x86_64] range could be above 4G. Allow 

[PATCH 1/3] kdump: extend crashkernel=range:size to dynamically increase reservation size

2017-10-23 Thread dyoung
crashkernel=range:size syntax allows to reserve specified size for system
with total memory fall into the specified range. For example:
crashkernel=2G-3G:128M,3G-:256M reserves 128M for system with memory >=2G
and memory <3G, and reserves 256M for system with memory >= 3G

In the above case 256M as a fixed value which can not fulfill very huge 
systems with large memory and IO devices. As memory size increases usually
minimum memory requirement for booting will also increase. It is nearly
impossible for a kernel to run with a fixed limited memory size for all
kinds of systems.

Thus extend the crashkernel=range:size to let user specify the scaling
ratio in kernel cmdline like below:
crashkernel=range:size^order, for example:
crashkernel=2G-:128M^14 reserve 128M + (total_memory - 128M) >> 14
for machines with over 2G memory.

Also s/start-[end]/[start]-end/ in kernel-parameters.txt according to code

Signed-off-by: Dave Young 
---
 Documentation/admin-guide/kernel-parameters.txt |   10 --
 Documentation/kdump/kdump.txt   |7 ++--
 kernel/crash_core.c |   35 +---
 3 files changed, 41 insertions(+), 11 deletions(-)

--- linux-x86.orig/kernel/crash_core.c
+++ linux-x86/kernel/crash_core.c
@@ -31,7 +31,7 @@ static unsigned char *vmcoreinfo_data_sa
 /*
  * This function parses command lines in the format
  *
- *   crashkernel=ramsize-range:size[,...][@offset]
+ *   crashkernel=ramsize-range:size[,...][@offset][^order]
  *
  * The function returns 0 on success and -EINVAL on failure.
  */
@@ -41,6 +41,7 @@ static int __init parse_crashkernel_mem(
unsigned long long *crash_base)
 {
char *cur = cmdline, *tmp;
+   bool infinite_end = false;
 
/* for each entry of the comma-separated list */
do {
@@ -93,13 +94,21 @@ static int __init parse_crashkernel_mem(
/* match ? */
if (system_ram >= start && system_ram < end) {
*crash_size = size;
+   if (end == ULLONG_MAX)
+   infinite_end = true;
break;
}
} while (*cur++ == ',');
 
-   if (*crash_size > 0) {
-   while (*cur && *cur != ' ' && *cur != '@')
+   if (*crash_size <= 0)
+   goto out;
+
+   while (*cur && *cur != ' ') {
+   if (*cur != '@' && *cur != '^') {
cur++;
+   continue;
+   }
+
if (*cur == '@') {
cur++;
*crash_base = memparse(cur, );
@@ -107,10 +116,28 @@ static int __init parse_crashkernel_mem(
pr_warn("Memory value expected after '@'\n");
return -EINVAL;
}
-   }
+   cur = tmp;
+   } else if (*cur == '^' && infinite_end ) {
+   unsigned long long shift, size;
+
+   cur++;
+   shift = memparse(cur, );
+   if (cur == tmp) {
+   pr_warn("Memory reservation scale order 
expected after '^'\n");
+   return -EINVAL;
+   }
+   size = (system_ram - *crash_size) >> shift;
+   size = *crash_size + roundup(size, 1ULL << 20);
+   if (size < system_ram)
+   *crash_size = size;
+   cur = tmp;
+   } else
+   cur++;
}
 
return 0;
+out:
+   return -EINVAL;
 }
 
 /*
--- linux-x86.orig/Documentation/admin-guide/kernel-parameters.txt
+++ linux-x86/Documentation/admin-guide/kernel-parameters.txt
@@ -680,12 +680,14 @@
is selected automatically. Check
Documentation/kdump/kdump.txt for further details.
 
-   crashkernel=range1:size1[,range2:size2,...][@offset]
+   crashkernel=range1:size1[,range2:size2,...][@offset][^order]
[KNL] Same as above, but depends on the memory
in the running system. The syntax of range is
-   start-[end] where start and end are both
-   a memory unit (amount[KMG]). See also
-   Documentation/kdump/kdump.txt for an example.
+   [start]-end where start and end are both
+   a memory unit (amount[KMG]). In case the end of the
+   range is infinity '^order' can be used to scale
+   the size to size + (total_mem - start) >> 2^order
+   See also Documentation/kdump/kdump.txt for an example.
 
crashkernel=size[KMG],high
[KNL, x86_64] range could be above 4G. Allow kernel
--- 

[PATCH 0/3] kdump: crashkernel parameter improvement

2017-10-23 Thread dyoung
Hi,

Here is a try to improve current crashkernel kernel parameter

Patch 1/3 adds an extra functionality so that one can use like
crashkernel=2G-:128M^12 to reserve 128M for 2G+ machine but also scale the
size based on system memory, that means 128M + (total_mem - 128M) >> 12

Patch 2/3 is a resending of previous post to let crashkernel=xM to reserve
first from below 896M, then <4G, then 

[PATCH 0/3] kdump: crashkernel parameter improvement

2017-10-23 Thread dyoung
Hi,

Here is a try to improve current crashkernel kernel parameter

Patch 1/3 adds an extra functionality so that one can use like
crashkernel=2G-:128M^12 to reserve 128M for 2G+ machine but also scale the
size based on system memory, that means 128M + (total_mem - 128M) >> 12

Patch 2/3 is a resending of previous post to let crashkernel=xM to reserve
first from below 896M, then <4G, then 

[PATCH 3/3] kdump: round up the total memory size to 128M for crashkernel reservation

2017-10-23 Thread dyoung
The total memory size we get in kernel is usually slightly less than 2G with a
2G memory module machine. The main reason is bios/firmware reserve some area
it will not export all memory as usable to Linux.

2G memory X86 kvm guest test result of the total_mem value:
UEFI boot with ovmf: 0x7ef1
Legacy boot kvm guest: 0x7ff7cc00

An option is to use dmi/smbios to get physical memory size, but it's not
reliable as well. According to Prarit hardware vendors sometimes screw this up.
Thus we choose to round up total size to 128M to workaround this problem.
This is a best effort workaround, will improve it when we have better way
in the future. 

Signed-off-by: Dave Young 
---
 kernel/crash_core.c |   17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

--- linux.orig/kernel/crash_core.c
+++ linux/kernel/crash_core.c
@@ -42,6 +42,15 @@ static int __init parse_crashkernel_mem(
 {
char *cur = cmdline, *tmp;
bool infinite_end = false;
+   unsigned long long total_mem = system_ram;
+
+   /*
+* Firmware usually reserves some memory regions for it's own use.
+* so we get less than actual system memory size.
+* We workaround this by round up the total size to 128M which is
+* enough for most test cases.
+*/
+   total_mem = roundup(total_mem, 0x800);
 
/* for each entry of the comma-separated list */
do {
@@ -86,13 +95,13 @@ static int __init parse_crashkernel_mem(
return -EINVAL;
}
cur = tmp;
-   if (size >= system_ram) {
+   if (size >= total_mem) {
pr_warn("crashkernel: invalid size\n");
return -EINVAL;
}
 
/* match ? */
-   if (system_ram >= start && system_ram < end) {
+   if (total_mem >= start && total_mem < end) {
*crash_size = size;
if (end == ULLONG_MAX)
infinite_end = true;
@@ -126,9 +135,9 @@ static int __init parse_crashkernel_mem(
pr_warn("Memory reservation scale order 
expected after '^'\n");
return -EINVAL;
}
-   size = (system_ram - *crash_size) >> shift;
+   size = (total_mem - *crash_size) >> shift;
size = *crash_size + roundup(size, 1ULL << 20);
-   if (size < system_ram)
+   if (size < total_mem)
*crash_size = size;
cur = tmp;
} else




[PATCH 3/3] kdump: round up the total memory size to 128M for crashkernel reservation

2017-10-23 Thread dyoung
The total memory size we get in kernel is usually slightly less than 2G with a
2G memory module machine. The main reason is bios/firmware reserve some area
it will not export all memory as usable to Linux.

2G memory X86 kvm guest test result of the total_mem value:
UEFI boot with ovmf: 0x7ef1
Legacy boot kvm guest: 0x7ff7cc00

An option is to use dmi/smbios to get physical memory size, but it's not
reliable as well. According to Prarit hardware vendors sometimes screw this up.
Thus we choose to round up total size to 128M to workaround this problem.
This is a best effort workaround, will improve it when we have better way
in the future. 

Signed-off-by: Dave Young 
---
 kernel/crash_core.c |   17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

--- linux.orig/kernel/crash_core.c
+++ linux/kernel/crash_core.c
@@ -42,6 +42,15 @@ static int __init parse_crashkernel_mem(
 {
char *cur = cmdline, *tmp;
bool infinite_end = false;
+   unsigned long long total_mem = system_ram;
+
+   /*
+* Firmware usually reserves some memory regions for it's own use.
+* so we get less than actual system memory size.
+* We workaround this by round up the total size to 128M which is
+* enough for most test cases.
+*/
+   total_mem = roundup(total_mem, 0x800);
 
/* for each entry of the comma-separated list */
do {
@@ -86,13 +95,13 @@ static int __init parse_crashkernel_mem(
return -EINVAL;
}
cur = tmp;
-   if (size >= system_ram) {
+   if (size >= total_mem) {
pr_warn("crashkernel: invalid size\n");
return -EINVAL;
}
 
/* match ? */
-   if (system_ram >= start && system_ram < end) {
+   if (total_mem >= start && total_mem < end) {
*crash_size = size;
if (end == ULLONG_MAX)
infinite_end = true;
@@ -126,9 +135,9 @@ static int __init parse_crashkernel_mem(
pr_warn("Memory reservation scale order 
expected after '^'\n");
return -EINVAL;
}
-   size = (system_ram - *crash_size) >> shift;
+   size = (total_mem - *crash_size) >> shift;
size = *crash_size + roundup(size, 1ULL << 20);
-   if (size < system_ram)
+   if (size < total_mem)
*crash_size = size;
cur = tmp;
} else




Re: [PATCH V8 00/14] mmc: Add Command Queue support

2017-10-23 Thread Ulf Hansson
+ Bartlomiej

[...]

> So my conclusion is, let's start a as you suggested, by not completing
> the request in ->done() as to maintain existing behavior. Then we can
> address optimizations on top, which very likely will involve doing
> changes to host drivers as well.

 Have you tested the latest version now?

>>>
>>> Ping?
>>
>> Still ping?
>
> How is your silence in any way an acceptable way to execute your
> responsibilities as maintainer!

Seriously? You posted the new version Oct 13.

I had to make some late minute travel decisions, so unfortunate I
won't be able to test this on HW from this Friday.

However, you have completely ignored mine, Linus and Bartlomiej's
comments about that we want the blkmq port being a separate patch(es)
and then make the CMDQ patches on top. This worries me, because it
seems like our messages don't reach you.

Kind regards
Uffe


Re: [PATCH V8 00/14] mmc: Add Command Queue support

2017-10-23 Thread Ulf Hansson
+ Bartlomiej

[...]

> So my conclusion is, let's start a as you suggested, by not completing
> the request in ->done() as to maintain existing behavior. Then we can
> address optimizations on top, which very likely will involve doing
> changes to host drivers as well.

 Have you tested the latest version now?

>>>
>>> Ping?
>>
>> Still ping?
>
> How is your silence in any way an acceptable way to execute your
> responsibilities as maintainer!

Seriously? You posted the new version Oct 13.

I had to make some late minute travel decisions, so unfortunate I
won't be able to test this on HW from this Friday.

However, you have completely ignored mine, Linus and Bartlomiej's
comments about that we want the blkmq port being a separate patch(es)
and then make the CMDQ patches on top. This worries me, because it
seems like our messages don't reach you.

Kind regards
Uffe


Re: usb/input/gtco: slab-out-of-bounds in parse_hid_report_descriptor

2017-10-23 Thread Dmitry Torokhov
On Mon, Oct 23, 2017 at 01:24:23PM +0200, Andrey Konovalov wrote:
> Hi!
> 
> I've got the following report while fuzzing the kernel with syzkaller.
> 
> On commit 3e0cc09a3a2c40ec1ffb6b4e12da86e98feccb11 (4.14-rc5+).
> 
> parse_hid_report_descriptor() has a while (i < length) loop, which
> only guarantees that there's at least 1 byte in the buffer, but the
> loop body can read multiple bytes which causes out-of-bounds access.

Ugh, this whole driver should be moved over to HID, but I am not sure
who has hardware to test... I just sent a patch plugging this hole.

Thanks for the report.

-- 
Dmitry


Re: usb/input/gtco: slab-out-of-bounds in parse_hid_report_descriptor

2017-10-23 Thread Dmitry Torokhov
On Mon, Oct 23, 2017 at 01:24:23PM +0200, Andrey Konovalov wrote:
> Hi!
> 
> I've got the following report while fuzzing the kernel with syzkaller.
> 
> On commit 3e0cc09a3a2c40ec1ffb6b4e12da86e98feccb11 (4.14-rc5+).
> 
> parse_hid_report_descriptor() has a while (i < length) loop, which
> only guarantees that there's at least 1 byte in the buffer, but the
> loop body can read multiple bytes which causes out-of-bounds access.

Ugh, this whole driver should be moved over to HID, but I am not sure
who has hardware to test... I just sent a patch plugging this hole.

Thanks for the report.

-- 
Dmitry


[PATCH 1/1] mpi: check for shift exponent greater than 31.

2017-10-23 Thread Ayush Mittal
This patch check for shift exponent greater than 31,
detected by UBSAN.

1)UBSAN: Undefined behaviour in lib/mpi/generic_mpih-lshift.c:57:22
 shift exponent 32 is too large for 32-bit type 'long unsigned int'

2)UBSAN: Undefined behaviour in lib/mpi/generic_mpih-lshift.c:60:20
shift exponent 32 is too large for 32-bit type 'long unsigned int'

3)UBSAN: Undefined behaviour in lib/mpi/generic_mpih-rshift.c:57:21
shift exponent 32 is too large for 32-bit type 'long unsigned int'

4) UBSAN: Undefined behaviour in lib/mpi/generic_mpih-rshift.c:60:19
shift exponent 32 is too large for 32-bit type 'long unsigned int'

So instead of undefined behaviour defining behaviour when shift exponent 
is greater than 31 then value will be Zero .

Signed-off-by: Ayush Mittal 
Signed-off-by: Vaneet Narang 
---
 lib/mpi/generic_mpih-lshift.c | 6 +++---
 lib/mpi/generic_mpih-rshift.c | 6 +++---
 lib/mpi/mpi-internal.h| 3 +++
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/lib/mpi/generic_mpih-lshift.c b/lib/mpi/generic_mpih-lshift.c
index 8631892..1c8277e 100644
--- a/lib/mpi/generic_mpih-lshift.c
+++ b/lib/mpi/generic_mpih-lshift.c
@@ -50,14 +50,14 @@
sh_2 = BITS_PER_MPI_LIMB - sh_1;
i = usize - 1;
low_limb = up[i];
-   retval = low_limb >> sh_2;
+   retval = MPI_RSHIFT(low_limb, sh_2);
high_limb = low_limb;
while (--i >= 0) {
low_limb = up[i];
-   wp[i] = (high_limb << sh_1) | (low_limb >> sh_2);
+   wp[i] = MPI_LSHIFT(high_limb, sh_1) | MPI_RSHIFT(low_limb, 
sh_2);
high_limb = low_limb;
}
-   wp[i] = high_limb << sh_1;
+   wp[i] = MPI_LSHIFT(high_limb, sh_1);
 
return retval;
 }
diff --git a/lib/mpi/generic_mpih-rshift.c b/lib/mpi/generic_mpih-rshift.c
index ffa3288..3a28bdd 100644
--- a/lib/mpi/generic_mpih-rshift.c
+++ b/lib/mpi/generic_mpih-rshift.c
@@ -50,14 +50,14 @@
wp -= 1;
sh_2 = BITS_PER_MPI_LIMB - sh_1;
high_limb = up[0];
-   retval = high_limb << sh_2;
+   retval = MPI_LSHIFT(high_limb, sh_2);
low_limb = high_limb;
for (i = 1; i < usize; i++) {
high_limb = up[i];
-   wp[i] = (low_limb >> sh_1) | (high_limb << sh_2);
+   wp[i] = MPI_RSHIFT(low_limb, sh_1) | MPI_LSHIFT(high_limb, 
sh_2);
low_limb = high_limb;
}
-   wp[i] = low_limb >> sh_1;
+   wp[i] = MPI_RSHIFT(low_limb, sh_1);
 
return retval;
 }
diff --git a/lib/mpi/mpi-internal.h b/lib/mpi/mpi-internal.h
index 7eceedd..9c94a98 100644
--- a/lib/mpi/mpi-internal.h
+++ b/lib/mpi/mpi-internal.h
@@ -37,6 +37,9 @@
 #include 
 #include 
 
+#define MPI_LSHIFT(val, shift) (shift >= BITS_PER_MPI_LIMB ? 0 : (val << 
shift))
+#define MPI_RSHIFT(val, shift) (shift >= BITS_PER_MPI_LIMB ? 0 : (val >> 
shift))
+
 #define log_debug printk
 #define log_bug printk
 
-- 
1.9.1



[PATCH 1/1] mpi: check for shift exponent greater than 31.

2017-10-23 Thread Ayush Mittal
This patch check for shift exponent greater than 31,
detected by UBSAN.

1)UBSAN: Undefined behaviour in lib/mpi/generic_mpih-lshift.c:57:22
 shift exponent 32 is too large for 32-bit type 'long unsigned int'

2)UBSAN: Undefined behaviour in lib/mpi/generic_mpih-lshift.c:60:20
shift exponent 32 is too large for 32-bit type 'long unsigned int'

3)UBSAN: Undefined behaviour in lib/mpi/generic_mpih-rshift.c:57:21
shift exponent 32 is too large for 32-bit type 'long unsigned int'

4) UBSAN: Undefined behaviour in lib/mpi/generic_mpih-rshift.c:60:19
shift exponent 32 is too large for 32-bit type 'long unsigned int'

So instead of undefined behaviour defining behaviour when shift exponent 
is greater than 31 then value will be Zero .

Signed-off-by: Ayush Mittal 
Signed-off-by: Vaneet Narang 
---
 lib/mpi/generic_mpih-lshift.c | 6 +++---
 lib/mpi/generic_mpih-rshift.c | 6 +++---
 lib/mpi/mpi-internal.h| 3 +++
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/lib/mpi/generic_mpih-lshift.c b/lib/mpi/generic_mpih-lshift.c
index 8631892..1c8277e 100644
--- a/lib/mpi/generic_mpih-lshift.c
+++ b/lib/mpi/generic_mpih-lshift.c
@@ -50,14 +50,14 @@
sh_2 = BITS_PER_MPI_LIMB - sh_1;
i = usize - 1;
low_limb = up[i];
-   retval = low_limb >> sh_2;
+   retval = MPI_RSHIFT(low_limb, sh_2);
high_limb = low_limb;
while (--i >= 0) {
low_limb = up[i];
-   wp[i] = (high_limb << sh_1) | (low_limb >> sh_2);
+   wp[i] = MPI_LSHIFT(high_limb, sh_1) | MPI_RSHIFT(low_limb, 
sh_2);
high_limb = low_limb;
}
-   wp[i] = high_limb << sh_1;
+   wp[i] = MPI_LSHIFT(high_limb, sh_1);
 
return retval;
 }
diff --git a/lib/mpi/generic_mpih-rshift.c b/lib/mpi/generic_mpih-rshift.c
index ffa3288..3a28bdd 100644
--- a/lib/mpi/generic_mpih-rshift.c
+++ b/lib/mpi/generic_mpih-rshift.c
@@ -50,14 +50,14 @@
wp -= 1;
sh_2 = BITS_PER_MPI_LIMB - sh_1;
high_limb = up[0];
-   retval = high_limb << sh_2;
+   retval = MPI_LSHIFT(high_limb, sh_2);
low_limb = high_limb;
for (i = 1; i < usize; i++) {
high_limb = up[i];
-   wp[i] = (low_limb >> sh_1) | (high_limb << sh_2);
+   wp[i] = MPI_RSHIFT(low_limb, sh_1) | MPI_LSHIFT(high_limb, 
sh_2);
low_limb = high_limb;
}
-   wp[i] = low_limb >> sh_1;
+   wp[i] = MPI_RSHIFT(low_limb, sh_1);
 
return retval;
 }
diff --git a/lib/mpi/mpi-internal.h b/lib/mpi/mpi-internal.h
index 7eceedd..9c94a98 100644
--- a/lib/mpi/mpi-internal.h
+++ b/lib/mpi/mpi-internal.h
@@ -37,6 +37,9 @@
 #include 
 #include 
 
+#define MPI_LSHIFT(val, shift) (shift >= BITS_PER_MPI_LIMB ? 0 : (val << 
shift))
+#define MPI_RSHIFT(val, shift) (shift >= BITS_PER_MPI_LIMB ? 0 : (val >> 
shift))
+
 #define log_debug printk
 #define log_bug printk
 
-- 
1.9.1



[PATCH] Input: gtco - fix potential out-of-bound access

2017-10-23 Thread Dmitry Torokhov
parse_hid_report_descriptor() has a while (i < length) loop, which
only guarantees that there's at least 1 byte in the buffer, but the
loop body can read multiple bytes which causes out-of-bounds access.

Reported-by: Andrey Konovalov 
Signed-off-by: Dmitry Torokhov 
---
 drivers/input/tablet/gtco.c | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/input/tablet/gtco.c b/drivers/input/tablet/gtco.c
index b796e891e2ee..0351203b8c24 100644
--- a/drivers/input/tablet/gtco.c
+++ b/drivers/input/tablet/gtco.c
@@ -230,13 +230,24 @@ static void parse_hid_report_descriptor(struct gtco 
*device, char * report,
 
/* Walk  this report and pull out the info we need */
while (i < length) {
-   prefix = report[i];
-
-   /* Skip over prefix */
-   i++;
+   prefix = report[i++];
 
/* Determine data size and save the data in the proper variable 
*/
-   size = PREF_SIZE(prefix);
+   if (PREF_SIZE(prefix) < 1) {
+   dev_err(ddev,
+   "Invalid size %d in element at offset %d\n",
+   PREF_SIZE(prefix), i);
+   break;
+   }
+
+   size = 1U << (PREF_SIZE(prefix) - 1);
+   if (i + size >= length) {
+   dev_err(ddev,
+   "Not enough data (need %d, have %d)\n",
+   i + size, length);
+   break;
+   }
+
switch (size) {
case 1:
data = report[i];
@@ -244,8 +255,7 @@ static void parse_hid_report_descriptor(struct gtco 
*device, char * report,
case 2:
data16 = get_unaligned_le16([i]);
break;
-   case 3:
-   size = 4;
+   case 4:
data32 = get_unaligned_le32([i]);
break;
}
-- 
2.15.0.rc0.271.g36b669edcc-goog


-- 
Dmitry


[PATCH] Input: gtco - fix potential out-of-bound access

2017-10-23 Thread Dmitry Torokhov
parse_hid_report_descriptor() has a while (i < length) loop, which
only guarantees that there's at least 1 byte in the buffer, but the
loop body can read multiple bytes which causes out-of-bounds access.

Reported-by: Andrey Konovalov 
Signed-off-by: Dmitry Torokhov 
---
 drivers/input/tablet/gtco.c | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/input/tablet/gtco.c b/drivers/input/tablet/gtco.c
index b796e891e2ee..0351203b8c24 100644
--- a/drivers/input/tablet/gtco.c
+++ b/drivers/input/tablet/gtco.c
@@ -230,13 +230,24 @@ static void parse_hid_report_descriptor(struct gtco 
*device, char * report,
 
/* Walk  this report and pull out the info we need */
while (i < length) {
-   prefix = report[i];
-
-   /* Skip over prefix */
-   i++;
+   prefix = report[i++];
 
/* Determine data size and save the data in the proper variable 
*/
-   size = PREF_SIZE(prefix);
+   if (PREF_SIZE(prefix) < 1) {
+   dev_err(ddev,
+   "Invalid size %d in element at offset %d\n",
+   PREF_SIZE(prefix), i);
+   break;
+   }
+
+   size = 1U << (PREF_SIZE(prefix) - 1);
+   if (i + size >= length) {
+   dev_err(ddev,
+   "Not enough data (need %d, have %d)\n",
+   i + size, length);
+   break;
+   }
+
switch (size) {
case 1:
data = report[i];
@@ -244,8 +255,7 @@ static void parse_hid_report_descriptor(struct gtco 
*device, char * report,
case 2:
data16 = get_unaligned_le16([i]);
break;
-   case 3:
-   size = 4;
+   case 4:
data32 = get_unaligned_le32([i]);
break;
}
-- 
2.15.0.rc0.271.g36b669edcc-goog


-- 
Dmitry


Re: x86/kdump: crashkernel=X try to reserve below 896M first then below 4G and MAXMEM

2017-10-23 Thread Dave Young
On 10/20/17 at 01:52pm, Dave Young wrote:
> Now crashkernel=X will fail if there's not enough memory at low region
> (below 896M) when trying to reserve large memory size.  One can use
> crashkernel=xM,high to reserve it at high region (>4G) but it is more
> convinient to improve crashkernel=X to: 
> 
>  - First try to reserve X below 896M (for being compatible with old
>kexec-tools).
>  - If fails, try to reserve X below 4G (swiotlb need to stay below 4G).
>  - If fails, try to reserve X from MAXMEM top down.
> 
> It's more transparent and user-friendly.
> 
> If crashkernel is large and the reserved is beyond 896M, old kexec-tools
> is not compatible with new kernel because old kexec-tools can not load
> kernel at high memory region, there was an old discussion below
> (previously posted by Chao Wang):
> https://lkml.org/lkml/2013/10/15/601
> 
> But actually the behavior is consistent during my test. Suppose
> old kernel fail to reserve memory at low areas, kdump does not
> work because no memory reserved. With this patch, suppose new kernel
> successfully reserved memory at high areas, old kexec-tools still fail
> to load kdump kernel (tested 2.0.2), so it is acceptable, no need to
> worry about the compatibility.
> 
> Here is the test result (kexec-tools 2.0.2, no high memory load
> support):
> Crashkernel over 4G:
> # cat /proc/iomem|grep Crash
>   be00-cdff : Crash kernel
>   21300-21eff : Crash kernel
> # ./kexec  -p /boot/vmlinuz-`uname -r`
> Memory for crashkernel is not reserved
> Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
> Then try loading kdump kernel
> 
> crashkernel: 896M-4G:
> # cat /proc/iomem|grep Crash
>   9600-cdef : Crash kernel
> # ./kexec -p /boot/vmlinuz-4.14.0-rc4+
> ELF core (kcore) parse failed
> Cannot load /boot/vmlinuz-4.14.0-rc4+
> 
> Signed-off-by: Dave Young 
> ---
>  arch/x86/kernel/setup.c |   16 
>  1 file changed, 16 insertions(+)
> 
> --- linux-x86.orig/arch/x86/kernel/setup.c
> +++ linux-x86/arch/x86/kernel/setup.c
> @@ -568,6 +568,22 @@ static void __init reserve_crashkernel(v
>   high ? CRASH_ADDR_HIGH_MAX
>: CRASH_ADDR_LOW_MAX,
>   crash_size, CRASH_ALIGN);
> +#ifdef CONFIG_X86_64
> + /*
> +  * crashkernel=X reserve below 896M fails? Try below 4G
> +  */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + (1ULL << 32),
> + crash_size, CRASH_ALIGN);
> + /*
> +  * crashkernel=X reserve below 4G fails? Try MAXMEM
> +  */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + CRASH_ADDR_HIGH_MAX,
> + crash_size, CRASH_ALIGN);
> +#endif
>   if (!crash_base) {
>   pr_info("crashkernel reservation failed - No suitable 
> area found.\n");
>   return;


Hmm, forgot to add [PATCH] prefix, will resend this along with other two
crashkernel patches.

Thanks
Dave


Re: x86/kdump: crashkernel=X try to reserve below 896M first then below 4G and MAXMEM

2017-10-23 Thread Dave Young
On 10/20/17 at 01:52pm, Dave Young wrote:
> Now crashkernel=X will fail if there's not enough memory at low region
> (below 896M) when trying to reserve large memory size.  One can use
> crashkernel=xM,high to reserve it at high region (>4G) but it is more
> convinient to improve crashkernel=X to: 
> 
>  - First try to reserve X below 896M (for being compatible with old
>kexec-tools).
>  - If fails, try to reserve X below 4G (swiotlb need to stay below 4G).
>  - If fails, try to reserve X from MAXMEM top down.
> 
> It's more transparent and user-friendly.
> 
> If crashkernel is large and the reserved is beyond 896M, old kexec-tools
> is not compatible with new kernel because old kexec-tools can not load
> kernel at high memory region, there was an old discussion below
> (previously posted by Chao Wang):
> https://lkml.org/lkml/2013/10/15/601
> 
> But actually the behavior is consistent during my test. Suppose
> old kernel fail to reserve memory at low areas, kdump does not
> work because no memory reserved. With this patch, suppose new kernel
> successfully reserved memory at high areas, old kexec-tools still fail
> to load kdump kernel (tested 2.0.2), so it is acceptable, no need to
> worry about the compatibility.
> 
> Here is the test result (kexec-tools 2.0.2, no high memory load
> support):
> Crashkernel over 4G:
> # cat /proc/iomem|grep Crash
>   be00-cdff : Crash kernel
>   21300-21eff : Crash kernel
> # ./kexec  -p /boot/vmlinuz-`uname -r`
> Memory for crashkernel is not reserved
> Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
> Then try loading kdump kernel
> 
> crashkernel: 896M-4G:
> # cat /proc/iomem|grep Crash
>   9600-cdef : Crash kernel
> # ./kexec -p /boot/vmlinuz-4.14.0-rc4+
> ELF core (kcore) parse failed
> Cannot load /boot/vmlinuz-4.14.0-rc4+
> 
> Signed-off-by: Dave Young 
> ---
>  arch/x86/kernel/setup.c |   16 
>  1 file changed, 16 insertions(+)
> 
> --- linux-x86.orig/arch/x86/kernel/setup.c
> +++ linux-x86/arch/x86/kernel/setup.c
> @@ -568,6 +568,22 @@ static void __init reserve_crashkernel(v
>   high ? CRASH_ADDR_HIGH_MAX
>: CRASH_ADDR_LOW_MAX,
>   crash_size, CRASH_ALIGN);
> +#ifdef CONFIG_X86_64
> + /*
> +  * crashkernel=X reserve below 896M fails? Try below 4G
> +  */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + (1ULL << 32),
> + crash_size, CRASH_ALIGN);
> + /*
> +  * crashkernel=X reserve below 4G fails? Try MAXMEM
> +  */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + CRASH_ADDR_HIGH_MAX,
> + crash_size, CRASH_ALIGN);
> +#endif
>   if (!crash_base) {
>   pr_info("crashkernel reservation failed - No suitable 
> area found.\n");
>   return;


Hmm, forgot to add [PATCH] prefix, will resend this along with other two
crashkernel patches.

Thanks
Dave


Re: [PATCH 04/12] PM / core: Add SMART_SUSPEND driver flag

2017-10-23 Thread Ulf Hansson
On 16 October 2017 at 03:29, Rafael J. Wysocki  wrote:
> From: Rafael J. Wysocki 
>
> Define and document a SMART_SUSPEND flag to instruct bus types and PM
> domains that the system suspend callbacks provided by the driver can
> cope with runtime-suspended devices, so from the driver's perspective
> it should be safe to leave devices in runtime suspend during system
> suspend.
>
> Setting that flag also causes the PM core to skip the "late" and
> "noirq" phases of device suspend for devices that remain in runtime
> suspend at the beginning of the "late" phase (when runtime PM has
> been disabled for them) under the assumption that their state cannot
> (and should not) change after that point until the system suspend
> transition is complete.  Moreover, the PM core prevents runtime PM
> from acting on devices with DPM_FLAG_SMART_SUSPEND during system
> resume by setting their runtime PM status to "active" at the end of
> the "early" phase (right prior to enabling runtime PM for them).
> That allows system resume callbacks to do whatever is necessary to
> resume the device without worrying about runtime PM possibly
> running in parallel with them.

After some sleep, I woke up and realized that the hole thing of making
the PM core skip invoking system sleep callbacks, is not compatible
with devices being attached to the genpd. Sorry.

The reason is because genpd may not power off the PM domain, even if
all devices attached to it are runtime suspended. For example, due to
a subdomain holding it powered or because a PM QoS constraints
prevents to power off it in runtime. Then to understand whether it
shall power off/on the PM domain, during system-wide PM it requires
the system sleep callbacks to be invoked.

So, even if the driver can cope with the behavior from
DPM_FLAG_SMART_SUSPEND, then what happens when the PM domain (genpd)
can not?

Taking this into account, this feels like solution entirely specific
to ACPI and PCI. That is fine by me, however then we still have those
cross SoC drivers, the i2c-designware driver, which may have its
device attached to an ACPI PM domain or perhaps a genpd.

>
> However, that doesn't apply to transitions involving ->thaw_noirq,
> ->thaw_early and ->thaw callbacks during hibernation, as they
> generally are not expected to change the power states of devices.
> Consequently, if a device is in runtime suspend at the beginning
> of such a transition, it must stay in runtime suspend until the
> "complete" phase of it (since the callbacks may not change its
> power state).
>

[...]

Kind regards
Uffe


Re: [PATCH 04/12] PM / core: Add SMART_SUSPEND driver flag

2017-10-23 Thread Ulf Hansson
On 16 October 2017 at 03:29, Rafael J. Wysocki  wrote:
> From: Rafael J. Wysocki 
>
> Define and document a SMART_SUSPEND flag to instruct bus types and PM
> domains that the system suspend callbacks provided by the driver can
> cope with runtime-suspended devices, so from the driver's perspective
> it should be safe to leave devices in runtime suspend during system
> suspend.
>
> Setting that flag also causes the PM core to skip the "late" and
> "noirq" phases of device suspend for devices that remain in runtime
> suspend at the beginning of the "late" phase (when runtime PM has
> been disabled for them) under the assumption that their state cannot
> (and should not) change after that point until the system suspend
> transition is complete.  Moreover, the PM core prevents runtime PM
> from acting on devices with DPM_FLAG_SMART_SUSPEND during system
> resume by setting their runtime PM status to "active" at the end of
> the "early" phase (right prior to enabling runtime PM for them).
> That allows system resume callbacks to do whatever is necessary to
> resume the device without worrying about runtime PM possibly
> running in parallel with them.

After some sleep, I woke up and realized that the hole thing of making
the PM core skip invoking system sleep callbacks, is not compatible
with devices being attached to the genpd. Sorry.

The reason is because genpd may not power off the PM domain, even if
all devices attached to it are runtime suspended. For example, due to
a subdomain holding it powered or because a PM QoS constraints
prevents to power off it in runtime. Then to understand whether it
shall power off/on the PM domain, during system-wide PM it requires
the system sleep callbacks to be invoked.

So, even if the driver can cope with the behavior from
DPM_FLAG_SMART_SUSPEND, then what happens when the PM domain (genpd)
can not?

Taking this into account, this feels like solution entirely specific
to ACPI and PCI. That is fine by me, however then we still have those
cross SoC drivers, the i2c-designware driver, which may have its
device attached to an ACPI PM domain or perhaps a genpd.

>
> However, that doesn't apply to transitions involving ->thaw_noirq,
> ->thaw_early and ->thaw callbacks during hibernation, as they
> generally are not expected to change the power states of devices.
> Consequently, if a device is in runtime suspend at the beginning
> of such a transition, it must stay in runtime suspend until the
> "complete" phase of it (since the callbacks may not change its
> power state).
>

[...]

Kind regards
Uffe


Re: [RFC PATCH] kbuild: Allow specifying some base host CFLAGS

2017-10-23 Thread Masahiro Yamada
Hi Douglous

2017-10-20 14:06 GMT+09:00 Doug Anderson :
> Hi,
>
> On Wed, Oct 18, 2017 at 9:45 AM, Masahiro Yamada
>  wrote:
>> 2017-10-14 3:02 GMT+09:00 Douglas Anderson :
>>> Right now there is a way to add some CFLAGS that affect target builds,
>>> but no way to add CFLAGS that affect host builds.  Let's add a way.
>>> We'll document two environment variables: CFLAGS_HOST and
>>> CXXFLAGS_HOST.
>>>
>>> We'll document that these variables get appended to by the kernel to
>>> make the final CFLAGS.  That means that, though the environment can
>>> specify some flags, if there is a conflict the kernel can override and
>>> win.  This works differently than KCFLAGS which is appended (and thus
>>> can override) the kernel specified CFLAGS.
>>>
>>> Why would I make KCFLAGS and CFLAGS_HOST work differently in this way?
>>> My argument is that it's about expected usage.  Typically the build
>>> system invoking the kernel has some idea about some basic CFLAGS that
>>> it wants to use to build things for the host and things for the
>>> target.  In general the build system would expect that its flags can
>>> be overridden if necessary (perhaps we need to turn off a warning when
>>> compiling a certain file, for instance).  So, all other things being
>>> equal, the way I'm making CFLAGS_HOST is the way I'd expect things to
>>> work.
>>>
>>> So, if it's expected that the build system can pass in a base set of
>>> flags, why didn't we make KCFLAGS work that way?  The short answer is:
>>> when building for the target the kernel is just "special".  The build
>>> system's "target" CFLAGS are likely intended for userspace programs
>>> and likely make very little sense to use as a basis.  This was talked
>>> about in the seminal commit 69ee0b352242 ("kbuild: do not pick up
>>> CFLAGS from the environment").  Basically: if the build system REALLY
>>> knows what it's doing then it can pass in flags that the kernel will
>>> use, but otherwise it should butt out.  Presumably this build system
>>> that really knows what it's doing knows better than the kernel so
>>> KCFLAGS comes after the kernel's normal flags.
>>>
>>> One last note: I chose to add new variables rather than just having
>>> the build system try to pass HOSTCFLAGS in somehow (either through the
>>> environment or the command line) to avoid weird interactions with
>>> recursive invocations of make.
>>>
>>> Signed-off-by: Douglas Anderson 
>>> ---
>>
>> I'd like to know for-instance cases where this is useful.
>
> I'm not sure I have any exact use cases.  I know vapier@ (CCed) was
> pushing for making sure that these flags get passed from the portage
> ebuild into the kernel build, so maybe he has some cases?  Right now
> we have the "-pipe" flag that ought to be passed in to the host
> compiler but we're dropping it on the floor, but that doesn't seem
> terribly critical.
>
> ...but in general the Linux kernel doesn't have all the details about
> the host system.  That means it can't necessarily build the tools
> quite as optimally (it can't pass "-mtune, right?).  I could also
> imagine that there could be ABI flags that need to be specified?  Like
> if we had floating point math in a host tool it would be important
> that the build system could tell the kernel what to use for
> "-mfloat-abi".
>
> ...so basically: it's all theoretical at this point in time from my
> point of view, but I can definitely understand how it could be
> necessary in the right environment.


I get your point.

My concern is kbuild already supports various *FLAGS.

The reason for the missing optional host flags
is probably people have less interest in host-tool optimization.

Once we support a new FLAGS, it is hard to change/remove it.

I will wait more in case other people may want to say something to this.

-- 
Best Regards
Masahiro Yamada


Re: [RFC PATCH] kbuild: Allow specifying some base host CFLAGS

2017-10-23 Thread Masahiro Yamada
Hi Douglous

2017-10-20 14:06 GMT+09:00 Doug Anderson :
> Hi,
>
> On Wed, Oct 18, 2017 at 9:45 AM, Masahiro Yamada
>  wrote:
>> 2017-10-14 3:02 GMT+09:00 Douglas Anderson :
>>> Right now there is a way to add some CFLAGS that affect target builds,
>>> but no way to add CFLAGS that affect host builds.  Let's add a way.
>>> We'll document two environment variables: CFLAGS_HOST and
>>> CXXFLAGS_HOST.
>>>
>>> We'll document that these variables get appended to by the kernel to
>>> make the final CFLAGS.  That means that, though the environment can
>>> specify some flags, if there is a conflict the kernel can override and
>>> win.  This works differently than KCFLAGS which is appended (and thus
>>> can override) the kernel specified CFLAGS.
>>>
>>> Why would I make KCFLAGS and CFLAGS_HOST work differently in this way?
>>> My argument is that it's about expected usage.  Typically the build
>>> system invoking the kernel has some idea about some basic CFLAGS that
>>> it wants to use to build things for the host and things for the
>>> target.  In general the build system would expect that its flags can
>>> be overridden if necessary (perhaps we need to turn off a warning when
>>> compiling a certain file, for instance).  So, all other things being
>>> equal, the way I'm making CFLAGS_HOST is the way I'd expect things to
>>> work.
>>>
>>> So, if it's expected that the build system can pass in a base set of
>>> flags, why didn't we make KCFLAGS work that way?  The short answer is:
>>> when building for the target the kernel is just "special".  The build
>>> system's "target" CFLAGS are likely intended for userspace programs
>>> and likely make very little sense to use as a basis.  This was talked
>>> about in the seminal commit 69ee0b352242 ("kbuild: do not pick up
>>> CFLAGS from the environment").  Basically: if the build system REALLY
>>> knows what it's doing then it can pass in flags that the kernel will
>>> use, but otherwise it should butt out.  Presumably this build system
>>> that really knows what it's doing knows better than the kernel so
>>> KCFLAGS comes after the kernel's normal flags.
>>>
>>> One last note: I chose to add new variables rather than just having
>>> the build system try to pass HOSTCFLAGS in somehow (either through the
>>> environment or the command line) to avoid weird interactions with
>>> recursive invocations of make.
>>>
>>> Signed-off-by: Douglas Anderson 
>>> ---
>>
>> I'd like to know for-instance cases where this is useful.
>
> I'm not sure I have any exact use cases.  I know vapier@ (CCed) was
> pushing for making sure that these flags get passed from the portage
> ebuild into the kernel build, so maybe he has some cases?  Right now
> we have the "-pipe" flag that ought to be passed in to the host
> compiler but we're dropping it on the floor, but that doesn't seem
> terribly critical.
>
> ...but in general the Linux kernel doesn't have all the details about
> the host system.  That means it can't necessarily build the tools
> quite as optimally (it can't pass "-mtune, right?).  I could also
> imagine that there could be ABI flags that need to be specified?  Like
> if we had floating point math in a host tool it would be important
> that the build system could tell the kernel what to use for
> "-mfloat-abi".
>
> ...so basically: it's all theoretical at this point in time from my
> point of view, but I can definitely understand how it could be
> necessary in the right environment.


I get your point.

My concern is kbuild already supports various *FLAGS.

The reason for the missing optional host flags
is probably people have less interest in host-tool optimization.

Once we support a new FLAGS, it is hard to change/remove it.

I will wait more in case other people may want to say something to this.

-- 
Best Regards
Masahiro Yamada


Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages

2017-10-23 Thread Joonsoo Kim
On Mon, Oct 23, 2017 at 10:10:09AM +0200, Michal Hocko wrote:
> On Mon 23-10-17 14:23:09, Joonsoo Kim wrote:
> > On Fri, Oct 20, 2017 at 09:02:20AM +0200, Michal Hocko wrote:
> > > On Fri 20-10-17 15:50:14, Joonsoo Kim wrote:
> > > > On Fri, Oct 20, 2017 at 07:59:22AM +0200, Michal Hocko wrote:
> > > > > On Fri 20-10-17 11:13:29, Joonsoo Kim wrote:
> > > > > > On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
> > > > > > > On Thu 19-10-17 10:20:41, Michal Hocko wrote:
> > > > > > > > On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> > > > > > > > > On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > > > > > > > > > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> > > > > > > > [...]
> > > > > > > > > > > Hello,
> > > > > > > > > > > 
> > > > > > > > > > > This patch will break the CMA user. As you mentioned, CMA 
> > > > > > > > > > > allocation
> > > > > > > > > > > itself isn't migrateable. So, after a single page is 
> > > > > > > > > > > allocated through
> > > > > > > > > > > CMA allocation, has_unmovable_pages() will return true 
> > > > > > > > > > > for this
> > > > > > > > > > > pageblock. Then, futher CMA allocation request to this 
> > > > > > > > > > > pageblock will
> > > > > > > > > > > fail because it requires isolating the pageblock.
> > > > > > > > > > 
> > > > > > > > > > Hmm, does this mean that the CMA allocation path depends on
> > > > > > > > > > has_unmovable_pages to return false here even though the 
> > > > > > > > > > memory is not
> > > > > > > > > > movable? This sounds really strange to me and kind of abuse 
> > > > > > > > > > of this
> > > > > > > > > 
> > > > > > > > > Your understanding is correct. Perhaps, abuse or wrong 
> > > > > > > > > function name.
> > > > > > > > >
> > > > > > > > > > function. Which path is that? Can we do the migrate type 
> > > > > > > > > > test theres?
> > > > > > > > > 
> > > > > > > > > alloc_contig_range() -> start_isolate_page_range() ->
> > > > > > > > > set_migratetype_isolate() -> has_unmovable_pages()
> > > > > > > > 
> > > > > > > > I see. It seems that the CMA and memory hotplug have a very 
> > > > > > > > different
> > > > > > > > view on what should happen during isolation.
> > > > > > > >  
> > > > > > > > > We can add one argument, 'XXX' to set_migratetype_isolate() 
> > > > > > > > > and change
> > > > > > > > > it to check migrate type rather than has_unmovable_pages() if 
> > > > > > > > > 'XXX' is
> > > > > > > > > specified.
> > > > > > > > 
> > > > > > > > Can we use the migratetype argument and do the special thing for
> > > > > > > > MIGRATE_CMA? Like the following diff?
> > > > > > > 
> > > > > > > And with the full changelog.
> > > > > > > ---
> > > > > > > >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 
> > > > > > > >00:00:00 2001
> > > > > > > From: Michal Hocko 
> > > > > > > Date: Thu, 19 Oct 2017 14:14:02 +0200
> > > > > > > Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
> > > > > > >  has_unmovable_pages
> > > > > > > 
> > > > > > > Joonsoo has noticed that "mm: drop migrate type checks from
> > > > > > > has_unmovable_pages" would break CMA allocator because it relies 
> > > > > > > on
> > > > > > > has_unmovable_pages returning false even for CMA pageblocks which 
> > > > > > > in
> > > > > > > fact don't have to be movable:
> > > > > > > alloc_contig_range
> > > > > > >   start_isolate_page_range
> > > > > > > set_migratetype_isolate
> > > > > > >   has_unmovable_pages
> > > > > > > 
> > > > > > > This is a result of the code sharing between CMA and memory 
> > > > > > > hotplug
> > > > > > > while each one has a different idea of what has_unmovable_pages 
> > > > > > > should
> > > > > > > return. This is unfortunate but fixing it properly would require 
> > > > > > > a lot
> > > > > > > of code duplication.
> > > > > > > 
> > > > > > > Fix the issue by introducing the requested migrate type argument
> > > > > > > and special case MIGRATE_CMA case where CMA page blocks are 
> > > > > > > handled
> > > > > > > properly. This will work for memory hotplug because it requires
> > > > > > > MIGRATE_MOVABLE.
> > > > > > 
> > > > > > Unfortunately, alloc_contig_range() can be called with
> > > > > > MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.
> > > > > 
> > > > > Yes, alloc_contig_range can be called with MIGRATE_MOVABLE but my
> > > > > understanding is that only CMA allocator really depends on this weird
> > > > > semantic and that does MIGRATE_CMA unconditionally.
> > > > 
> > > > alloc_contig_range() could be called for partial pages in the
> > > > pageblock. With your patch, this case also fails unnecessarilly if the
> > > > other pages in the pageblock is pinned.
> > > 
> > > Is this really the case for GB pages? Do we really want to mess those
> > 
> > No, but, as I mentioned already, this API can be called with less
> > pages. I know that there is no user with less pages at this moment but
> > I cannot see any point 

Re: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages

2017-10-23 Thread Joonsoo Kim
On Mon, Oct 23, 2017 at 10:10:09AM +0200, Michal Hocko wrote:
> On Mon 23-10-17 14:23:09, Joonsoo Kim wrote:
> > On Fri, Oct 20, 2017 at 09:02:20AM +0200, Michal Hocko wrote:
> > > On Fri 20-10-17 15:50:14, Joonsoo Kim wrote:
> > > > On Fri, Oct 20, 2017 at 07:59:22AM +0200, Michal Hocko wrote:
> > > > > On Fri 20-10-17 11:13:29, Joonsoo Kim wrote:
> > > > > > On Thu, Oct 19, 2017 at 02:21:18PM +0200, Michal Hocko wrote:
> > > > > > > On Thu 19-10-17 10:20:41, Michal Hocko wrote:
> > > > > > > > On Thu 19-10-17 16:33:56, Joonsoo Kim wrote:
> > > > > > > > > On Thu, Oct 19, 2017 at 09:15:03AM +0200, Michal Hocko wrote:
> > > > > > > > > > On Thu 19-10-17 11:51:11, Joonsoo Kim wrote:
> > > > > > > > [...]
> > > > > > > > > > > Hello,
> > > > > > > > > > > 
> > > > > > > > > > > This patch will break the CMA user. As you mentioned, CMA 
> > > > > > > > > > > allocation
> > > > > > > > > > > itself isn't migrateable. So, after a single page is 
> > > > > > > > > > > allocated through
> > > > > > > > > > > CMA allocation, has_unmovable_pages() will return true 
> > > > > > > > > > > for this
> > > > > > > > > > > pageblock. Then, futher CMA allocation request to this 
> > > > > > > > > > > pageblock will
> > > > > > > > > > > fail because it requires isolating the pageblock.
> > > > > > > > > > 
> > > > > > > > > > Hmm, does this mean that the CMA allocation path depends on
> > > > > > > > > > has_unmovable_pages to return false here even though the 
> > > > > > > > > > memory is not
> > > > > > > > > > movable? This sounds really strange to me and kind of abuse 
> > > > > > > > > > of this
> > > > > > > > > 
> > > > > > > > > Your understanding is correct. Perhaps, abuse or wrong 
> > > > > > > > > function name.
> > > > > > > > >
> > > > > > > > > > function. Which path is that? Can we do the migrate type 
> > > > > > > > > > test theres?
> > > > > > > > > 
> > > > > > > > > alloc_contig_range() -> start_isolate_page_range() ->
> > > > > > > > > set_migratetype_isolate() -> has_unmovable_pages()
> > > > > > > > 
> > > > > > > > I see. It seems that the CMA and memory hotplug have a very 
> > > > > > > > different
> > > > > > > > view on what should happen during isolation.
> > > > > > > >  
> > > > > > > > > We can add one argument, 'XXX' to set_migratetype_isolate() 
> > > > > > > > > and change
> > > > > > > > > it to check migrate type rather than has_unmovable_pages() if 
> > > > > > > > > 'XXX' is
> > > > > > > > > specified.
> > > > > > > > 
> > > > > > > > Can we use the migratetype argument and do the special thing for
> > > > > > > > MIGRATE_CMA? Like the following diff?
> > > > > > > 
> > > > > > > And with the full changelog.
> > > > > > > ---
> > > > > > > >From 8cbd811d741f5dd93d1b21bb3ef94482a4d0bd32 Mon Sep 17 
> > > > > > > >00:00:00 2001
> > > > > > > From: Michal Hocko 
> > > > > > > Date: Thu, 19 Oct 2017 14:14:02 +0200
> > > > > > > Subject: [PATCH] mm: distinguish CMA and MOVABLE isolation in
> > > > > > >  has_unmovable_pages
> > > > > > > 
> > > > > > > Joonsoo has noticed that "mm: drop migrate type checks from
> > > > > > > has_unmovable_pages" would break CMA allocator because it relies 
> > > > > > > on
> > > > > > > has_unmovable_pages returning false even for CMA pageblocks which 
> > > > > > > in
> > > > > > > fact don't have to be movable:
> > > > > > > alloc_contig_range
> > > > > > >   start_isolate_page_range
> > > > > > > set_migratetype_isolate
> > > > > > >   has_unmovable_pages
> > > > > > > 
> > > > > > > This is a result of the code sharing between CMA and memory 
> > > > > > > hotplug
> > > > > > > while each one has a different idea of what has_unmovable_pages 
> > > > > > > should
> > > > > > > return. This is unfortunate but fixing it properly would require 
> > > > > > > a lot
> > > > > > > of code duplication.
> > > > > > > 
> > > > > > > Fix the issue by introducing the requested migrate type argument
> > > > > > > and special case MIGRATE_CMA case where CMA page blocks are 
> > > > > > > handled
> > > > > > > properly. This will work for memory hotplug because it requires
> > > > > > > MIGRATE_MOVABLE.
> > > > > > 
> > > > > > Unfortunately, alloc_contig_range() can be called with
> > > > > > MIGRATE_MOVABLE so this patch cannot perfectly fix the problem.
> > > > > 
> > > > > Yes, alloc_contig_range can be called with MIGRATE_MOVABLE but my
> > > > > understanding is that only CMA allocator really depends on this weird
> > > > > semantic and that does MIGRATE_CMA unconditionally.
> > > > 
> > > > alloc_contig_range() could be called for partial pages in the
> > > > pageblock. With your patch, this case also fails unnecessarilly if the
> > > > other pages in the pageblock is pinned.
> > > 
> > > Is this really the case for GB pages? Do we really want to mess those
> > 
> > No, but, as I mentioned already, this API can be called with less
> > pages. I know that there is no user with less pages at this moment but
> > I cannot see any point to reduce this 

[PATCH 2/3] drivers: phy: broadcom: Add driver for Cygnus USB phy controller

2017-10-23 Thread Raveendra Padasalagi
Add driver for Broadcom's USB phy controller's used in Cygnus
familyof SoC. Cygnus has three USB phy controller's, port 0,
port 1 provides USB host functionality and port 2 can be configured
for host/device role.

Configuration of host/device role for port 2 is achieved based on
the extcon events, the driver registers to extcon framework to get
appropriate connect events for Host/Device cables connect/disconnect
states based on VBUS and ID interrupts.

Signed-off-by: Raveendra Padasalagi 
---
 drivers/phy/broadcom/Kconfig  |  14 +
 drivers/phy/broadcom/Makefile |   1 +
 drivers/phy/broadcom/phy-bcm-cygnus-usb.c | 672 ++
 3 files changed, 687 insertions(+)
 create mode 100644 drivers/phy/broadcom/phy-bcm-cygnus-usb.c

diff --git a/drivers/phy/broadcom/Kconfig b/drivers/phy/broadcom/Kconfig
index 64fc59c..3179daf 100644
--- a/drivers/phy/broadcom/Kconfig
+++ b/drivers/phy/broadcom/Kconfig
@@ -1,6 +1,20 @@
 #
 # Phy drivers for Broadcom platforms
 #
+config PHY_BCM_CYGNUS_USB
+   tristate "Broadcom Cygnus USB PHY support"
+   depends on OF
+   depends on ARCH_BCM_CYGNUS || COMPILE_TEST
+   select GENERIC_PHY
+   select EXTCON_USB_GPIO
+   default ARCH_BCM_CYGNUS
+   help
+ Enable this to support three USB PHY's present in Broadcom's
+ Cygnus chip.
+
+ The phys are capable of supporting host mode on all ports and
+ device mode for port 2.
+
 config PHY_CYGNUS_PCIE
tristate "Broadcom Cygnus PCIe PHY driver"
depends on OF && (ARCH_BCM_CYGNUS || COMPILE_TEST)
diff --git a/drivers/phy/broadcom/Makefile b/drivers/phy/broadcom/Makefile
index 4eb82ec..3dec23c 100644
--- a/drivers/phy/broadcom/Makefile
+++ b/drivers/phy/broadcom/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_PHY_CYGNUS_PCIE)  += phy-bcm-cygnus-pcie.o
+obj-$(CONFIG_PHY_BCM_CYGNUS_USB)   += phy-bcm-cygnus-usb.o
 obj-$(CONFIG_BCM_KONA_USB2_PHY)+= phy-bcm-kona-usb2.o
 obj-$(CONFIG_PHY_BCM_NS_USB2)  += phy-bcm-ns-usb2.o
 obj-$(CONFIG_PHY_BCM_NS_USB3)  += phy-bcm-ns-usb3.o
diff --git a/drivers/phy/broadcom/phy-bcm-cygnus-usb.c 
b/drivers/phy/broadcom/phy-bcm-cygnus-usb.c
new file mode 100644
index 000..ef2a94c
--- /dev/null
+++ b/drivers/phy/broadcom/phy-bcm-cygnus-usb.c
@@ -0,0 +1,672 @@
+/*
+ * Copyright 2017 Broadcom
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation (the "GPL").
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 (GPLv2) for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 (GPLv2) along with this source code.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define CDRU_USBPHY_CLK_RST_SEL_OFFSET 0x0
+#define CDRU_USBPHY2_HOST_DEV_SEL_OFFSET   0x4
+#define CDRU_USB_DEV_SUSPEND_RESUME_CTRL_OFFSET0x5C
+#define CDRU_USBPHY_P0_STATUS_OFFSET   0x1C
+#define CDRU_USBPHY_P1_STATUS_OFFSET   0x34
+#define CDRU_USBPHY_P2_STATUS_OFFSET   0x4C
+#define CRMU_USB_PHY_AON_CTRL_OFFSET   0x0
+
+#define CDRU_USBPHY_USBPHY_ILDO_ON_FLAGBIT(1)
+#define CDRU_USBPHY_USBPHY_PLL_LOCKBIT(0)
+#define CDRU_USB_DEV_SUSPEND_RESUME_CTRL_DISABLE   BIT(0)
+
+#define PHY2_DEV_HOST_CTRL_SEL_DEVICE  0
+#define PHY2_DEV_HOST_CTRL_SEL_HOST1
+#define PHY2_DEV_HOST_CTRL_SEL_IDLE2
+#define CRMU_USBPHY_P0_AFE_CORERDY_VDDCBIT(1)
+#define CRMU_USBPHY_P0_RESETB  BIT(2)
+#define CRMU_USBPHY_P1_AFE_CORERDY_VDDCBIT(9)
+#define CRMU_USBPHY_P1_RESETB  BIT(10)
+#define CRMU_USBPHY_P2_AFE_CORERDY_VDDCBIT(17)
+#define CRMU_USBPHY_P2_RESETB  BIT(18)
+
+#define USB2_IDM_IDM_IO_CONTROL_DIRECT_OFFSET  0x0408
+#define USB2_IDM_IDM_IO_CONTROL_DIRECT_CLK_ENABLE  BIT(0)
+#define SUSPEND_OVERRIDE_0 13
+#define SUSPEND_OVERRIDE_1 14
+#define SUSPEND_OVERRIDE_2 15
+#define USB2_IDM_IDM_RESET_CONTROL_OFFSET  0x0800
+#define USB2_IDM_IDM_RESET_CONTROL__RESET  0
+#define USB2D_IDM_IDM_IO_SS_CLEAR_NAK_NEMPTY_EN_I  BIT(24)
+
+#define PLL_LOCK_RETRY_COUNT   1000
+#define MAX_REGULATOR_NAME_LEN 25
+#define 

[PATCH 2/3] drivers: phy: broadcom: Add driver for Cygnus USB phy controller

2017-10-23 Thread Raveendra Padasalagi
Add driver for Broadcom's USB phy controller's used in Cygnus
familyof SoC. Cygnus has three USB phy controller's, port 0,
port 1 provides USB host functionality and port 2 can be configured
for host/device role.

Configuration of host/device role for port 2 is achieved based on
the extcon events, the driver registers to extcon framework to get
appropriate connect events for Host/Device cables connect/disconnect
states based on VBUS and ID interrupts.

Signed-off-by: Raveendra Padasalagi 
---
 drivers/phy/broadcom/Kconfig  |  14 +
 drivers/phy/broadcom/Makefile |   1 +
 drivers/phy/broadcom/phy-bcm-cygnus-usb.c | 672 ++
 3 files changed, 687 insertions(+)
 create mode 100644 drivers/phy/broadcom/phy-bcm-cygnus-usb.c

diff --git a/drivers/phy/broadcom/Kconfig b/drivers/phy/broadcom/Kconfig
index 64fc59c..3179daf 100644
--- a/drivers/phy/broadcom/Kconfig
+++ b/drivers/phy/broadcom/Kconfig
@@ -1,6 +1,20 @@
 #
 # Phy drivers for Broadcom platforms
 #
+config PHY_BCM_CYGNUS_USB
+   tristate "Broadcom Cygnus USB PHY support"
+   depends on OF
+   depends on ARCH_BCM_CYGNUS || COMPILE_TEST
+   select GENERIC_PHY
+   select EXTCON_USB_GPIO
+   default ARCH_BCM_CYGNUS
+   help
+ Enable this to support three USB PHY's present in Broadcom's
+ Cygnus chip.
+
+ The phys are capable of supporting host mode on all ports and
+ device mode for port 2.
+
 config PHY_CYGNUS_PCIE
tristate "Broadcom Cygnus PCIe PHY driver"
depends on OF && (ARCH_BCM_CYGNUS || COMPILE_TEST)
diff --git a/drivers/phy/broadcom/Makefile b/drivers/phy/broadcom/Makefile
index 4eb82ec..3dec23c 100644
--- a/drivers/phy/broadcom/Makefile
+++ b/drivers/phy/broadcom/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_PHY_CYGNUS_PCIE)  += phy-bcm-cygnus-pcie.o
+obj-$(CONFIG_PHY_BCM_CYGNUS_USB)   += phy-bcm-cygnus-usb.o
 obj-$(CONFIG_BCM_KONA_USB2_PHY)+= phy-bcm-kona-usb2.o
 obj-$(CONFIG_PHY_BCM_NS_USB2)  += phy-bcm-ns-usb2.o
 obj-$(CONFIG_PHY_BCM_NS_USB3)  += phy-bcm-ns-usb3.o
diff --git a/drivers/phy/broadcom/phy-bcm-cygnus-usb.c 
b/drivers/phy/broadcom/phy-bcm-cygnus-usb.c
new file mode 100644
index 000..ef2a94c
--- /dev/null
+++ b/drivers/phy/broadcom/phy-bcm-cygnus-usb.c
@@ -0,0 +1,672 @@
+/*
+ * Copyright 2017 Broadcom
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation (the "GPL").
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 (GPLv2) for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 (GPLv2) along with this source code.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define CDRU_USBPHY_CLK_RST_SEL_OFFSET 0x0
+#define CDRU_USBPHY2_HOST_DEV_SEL_OFFSET   0x4
+#define CDRU_USB_DEV_SUSPEND_RESUME_CTRL_OFFSET0x5C
+#define CDRU_USBPHY_P0_STATUS_OFFSET   0x1C
+#define CDRU_USBPHY_P1_STATUS_OFFSET   0x34
+#define CDRU_USBPHY_P2_STATUS_OFFSET   0x4C
+#define CRMU_USB_PHY_AON_CTRL_OFFSET   0x0
+
+#define CDRU_USBPHY_USBPHY_ILDO_ON_FLAGBIT(1)
+#define CDRU_USBPHY_USBPHY_PLL_LOCKBIT(0)
+#define CDRU_USB_DEV_SUSPEND_RESUME_CTRL_DISABLE   BIT(0)
+
+#define PHY2_DEV_HOST_CTRL_SEL_DEVICE  0
+#define PHY2_DEV_HOST_CTRL_SEL_HOST1
+#define PHY2_DEV_HOST_CTRL_SEL_IDLE2
+#define CRMU_USBPHY_P0_AFE_CORERDY_VDDCBIT(1)
+#define CRMU_USBPHY_P0_RESETB  BIT(2)
+#define CRMU_USBPHY_P1_AFE_CORERDY_VDDCBIT(9)
+#define CRMU_USBPHY_P1_RESETB  BIT(10)
+#define CRMU_USBPHY_P2_AFE_CORERDY_VDDCBIT(17)
+#define CRMU_USBPHY_P2_RESETB  BIT(18)
+
+#define USB2_IDM_IDM_IO_CONTROL_DIRECT_OFFSET  0x0408
+#define USB2_IDM_IDM_IO_CONTROL_DIRECT_CLK_ENABLE  BIT(0)
+#define SUSPEND_OVERRIDE_0 13
+#define SUSPEND_OVERRIDE_1 14
+#define SUSPEND_OVERRIDE_2 15
+#define USB2_IDM_IDM_RESET_CONTROL_OFFSET  0x0800
+#define USB2_IDM_IDM_RESET_CONTROL__RESET  0
+#define USB2D_IDM_IDM_IO_SS_CLEAR_NAK_NEMPTY_EN_I  BIT(24)
+
+#define PLL_LOCK_RETRY_COUNT   1000
+#define MAX_REGULATOR_NAME_LEN 25
+#define DUAL_ROLE_PHY   

[PATCH 3/3] ARM: dts: Add dt node for Broadcom Cygnus USB phy

2017-10-23 Thread Raveendra Padasalagi
Add DT node for Broadcom's USB phy controller's used
in Cygnus family of SoC.

Signed-off-by: Raveendra Padasalagi 
---
 arch/arm/boot/dts/bcm-cygnus.dtsi | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/arch/arm/boot/dts/bcm-cygnus.dtsi 
b/arch/arm/boot/dts/bcm-cygnus.dtsi
index 7c957ea..810df68 100644
--- a/arch/arm/boot/dts/bcm-cygnus.dtsi
+++ b/arch/arm/boot/dts/bcm-cygnus.dtsi
@@ -96,6 +96,41 @@
#address-cells = <1>;
#size-cells = <1>;
 
+   extcon_usb: extcon_usb {
+   compatible = "linux,extcon-usb-gpio";
+   vbus-gpio = <_asiu 121 0>;
+   id-gpio = <_asiu 122 0>;
+   status = "okay";
+   };
+
+   usbphy: phy@0301c028 {
+   compatible = "brcm,cygnus-usb-phy";
+   reg = <0x0301c028 0x4>,
+ <0x0301d1b4 0x5c>,
+ <0x18115000 0xa00>,
+ <0x18111000 0xa00>;
+   reg-names = "crmu-usbphy-aon-ctrl", "cdru-usbphy",
+   "usb2h-idm", "usb2d-idm";
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   usbphy0: usb-phy@0 {
+   #phy-cells = <1>;
+   reg = <0>;
+   };
+
+   usbphy1: usb-phy@1 {
+   #phy-cells = <1>;
+   reg = <1>;
+   };
+
+   usbphy2: usb-phy@2 {
+   #phy-cells = <1>;
+   reg = <2>;
+   extcon = <_usb>;
+   };
+   };
+
otp: otp@0301c800 {
compatible = "brcm,ocotp";
reg = <0x0301c800 0x2c>;
-- 
1.9.1



[PATCH 3/3] ARM: dts: Add dt node for Broadcom Cygnus USB phy

2017-10-23 Thread Raveendra Padasalagi
Add DT node for Broadcom's USB phy controller's used
in Cygnus family of SoC.

Signed-off-by: Raveendra Padasalagi 
---
 arch/arm/boot/dts/bcm-cygnus.dtsi | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/arch/arm/boot/dts/bcm-cygnus.dtsi 
b/arch/arm/boot/dts/bcm-cygnus.dtsi
index 7c957ea..810df68 100644
--- a/arch/arm/boot/dts/bcm-cygnus.dtsi
+++ b/arch/arm/boot/dts/bcm-cygnus.dtsi
@@ -96,6 +96,41 @@
#address-cells = <1>;
#size-cells = <1>;
 
+   extcon_usb: extcon_usb {
+   compatible = "linux,extcon-usb-gpio";
+   vbus-gpio = <_asiu 121 0>;
+   id-gpio = <_asiu 122 0>;
+   status = "okay";
+   };
+
+   usbphy: phy@0301c028 {
+   compatible = "brcm,cygnus-usb-phy";
+   reg = <0x0301c028 0x4>,
+ <0x0301d1b4 0x5c>,
+ <0x18115000 0xa00>,
+ <0x18111000 0xa00>;
+   reg-names = "crmu-usbphy-aon-ctrl", "cdru-usbphy",
+   "usb2h-idm", "usb2d-idm";
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   usbphy0: usb-phy@0 {
+   #phy-cells = <1>;
+   reg = <0>;
+   };
+
+   usbphy1: usb-phy@1 {
+   #phy-cells = <1>;
+   reg = <1>;
+   };
+
+   usbphy2: usb-phy@2 {
+   #phy-cells = <1>;
+   reg = <2>;
+   extcon = <_usb>;
+   };
+   };
+
otp: otp@0301c800 {
compatible = "brcm,ocotp";
reg = <0x0301c800 0x2c>;
-- 
1.9.1



[PATCH 1/3] Documentation: DT: Add Cygnus usb phy binding

2017-10-23 Thread Raveendra Padasalagi
Add devicetree binding document for broadcom's
Cygnus SoC specific usb phy controller driver.

Signed-off-by: Raveendra Padasalagi 
---
 .../bindings/phy/brcm,cygnus-usb-phy.txt   | 101 +
 1 file changed, 101 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/phy/brcm,cygnus-usb-phy.txt

diff --git a/Documentation/devicetree/bindings/phy/brcm,cygnus-usb-phy.txt 
b/Documentation/devicetree/bindings/phy/brcm,cygnus-usb-phy.txt
new file mode 100644
index 000..2d99fea
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/brcm,cygnus-usb-phy.txt
@@ -0,0 +1,101 @@
+BROADCOM CYGNUS USB PHY
+
+Required Properties:
+- compatible:  brcm,cygnus-usb-phy
+- reg : the register start address and length for crmu_usbphy_aon_ctrl,
+  cdru usb phy control and reset registers, usb host idm registers,
+  usb device idm registers.
+- reg-names: a list of the names corresponding to the previous register
+  ranges. Should contain "crmu-usbphy-aon-ctrl", "cdru-usbphy",
+  "usb2h-idm", "usb2d-idm".
+- address-cells: should be 1
+- size-cells: should be 0
+
+Sub-nodes:
+  Each port's PHY should be represented as a sub-node.
+
+Sub-nodes required properties:
+- reg: the PHY number
+- #phy-cells must be 1
+  The node that uses the phy must provide 1 integer argument specifying
+  port number.
+
+Optional Properties:
+- vbus-p#-supply : The regulator for vbus out control for the host
+  functionality enabled ports.
+- vbus-gpios: vbus gpio binding
+  This is mandatory for port 2, as port 2 is used as dual role phy.
+  Based on the vbus and id values device or host role is determined
+  for phy 2.
+
+- extcon: extcon phandle
+  This is mandatory for port 2,  as port 2 is used as dual role phy.
+  extcon should be phandle to external usb gpio module which provide
+  device or host role notifications based on the ID and VBUS gpio's state.
+
+
+Refer to phy/phy-bindings.txt for the generic PHY binding properties
+
+NOTE: port 0 and port 1 are host only and port 2 is dual role port.
+
+Example of phy :
+   usbphy: phy@0301c028 {
+   compatible = "brcm,cygnus-usb-phy";
+   reg = <0x0301c028 0x4>,
+ <0x0301d1b4 0x5c>,
+ <0x18115000 0xa00>,
+ <0x18111000 0xa00>;
+   reg-names = "crmu-usbphy-aon-ctrl", "cdru-usbphy",
+   "usb2h-idm", "usb2d-idm";
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   usbphy0: usb-phy@0 {
+   reg = <0>;
+   #phy-cells = <1>;
+   };
+
+   usbphy1: usb-phy@1 {
+   reg = <1>;
+   #phy-cells = <1>;
+   };
+
+   usbphy2: usb-phy@2 {
+   reg = <2>;
+   #phy-cells = <1>;
+   extcon = <_usb>;
+   };
+   };
+
+   extcon_usb: extcon_usb {
+   compatible = "linux,extcon-usb-gpio";
+   vbus-gpio = <_asiu 121 0>;
+   id-gpio = <_asiu 122 0>;
+   status = "okay";
+   };
+
+
+Example of node using the phy:
+
+   /* This nodes declares all three ports, port 0
+   and port 1 are host and port 2 is device */
+
+   ehci0: usb@18048000 {
+   compatible = "generic-ehci";
+   reg = <0x18048000 0x100>;
+   interrupts = ;
+   phys = < 0  1  2>;
+   phy-names = "usbp0","usbp1","usbp2";
+   status = "okay";
+   };
+
+   /* This node declares port 2 phy
+   and configures it for device */
+
+   usbd_udc_dwc1: usbd_udc_dwc@1804c000 {
+   compatible = "iproc-udc";
+   reg = <0x1804c000 0x2000>;
+   interrupts = ;
+   phys = < 2>;
+   phy-names = "usbdrd";
+   };
-- 
1.9.1



[PATCH 0/3] Add driver for Broadcom Cygnus USB phy controller

2017-10-23 Thread Raveendra Padasalagi
Add driver for Broadcom's USB phy controller's used in Cygnus family
of SoC and it's based on 4.14-rc3 tag.

The patch set can be fetched from iproc-cyg-usb-v1 branch of
https://github.com/Broadcom/arm64-linux.git

Raveendra Padasalagi (3):
  Documentation: DT: Add Cygnus usb phy binding
  drivers: phy: broadcom: Add driver for Cygnus USB phy controller
  ARM: dts: Add dt node for Broadcom Cygnus USB phy

 .../bindings/phy/brcm,cygnus-usb-phy.txt   | 101 
 arch/arm/boot/dts/bcm-cygnus.dtsi  |  35 ++
 drivers/phy/broadcom/Kconfig   |  14 +
 drivers/phy/broadcom/Makefile  |   1 +
 drivers/phy/broadcom/phy-bcm-cygnus-usb.c  | 672 +
 5 files changed, 823 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/phy/brcm,cygnus-usb-phy.txt
 create mode 100644 drivers/phy/broadcom/phy-bcm-cygnus-usb.c

-- 
1.9.1



[PATCH 1/3] Documentation: DT: Add Cygnus usb phy binding

2017-10-23 Thread Raveendra Padasalagi
Add devicetree binding document for broadcom's
Cygnus SoC specific usb phy controller driver.

Signed-off-by: Raveendra Padasalagi 
---
 .../bindings/phy/brcm,cygnus-usb-phy.txt   | 101 +
 1 file changed, 101 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/phy/brcm,cygnus-usb-phy.txt

diff --git a/Documentation/devicetree/bindings/phy/brcm,cygnus-usb-phy.txt 
b/Documentation/devicetree/bindings/phy/brcm,cygnus-usb-phy.txt
new file mode 100644
index 000..2d99fea
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/brcm,cygnus-usb-phy.txt
@@ -0,0 +1,101 @@
+BROADCOM CYGNUS USB PHY
+
+Required Properties:
+- compatible:  brcm,cygnus-usb-phy
+- reg : the register start address and length for crmu_usbphy_aon_ctrl,
+  cdru usb phy control and reset registers, usb host idm registers,
+  usb device idm registers.
+- reg-names: a list of the names corresponding to the previous register
+  ranges. Should contain "crmu-usbphy-aon-ctrl", "cdru-usbphy",
+  "usb2h-idm", "usb2d-idm".
+- address-cells: should be 1
+- size-cells: should be 0
+
+Sub-nodes:
+  Each port's PHY should be represented as a sub-node.
+
+Sub-nodes required properties:
+- reg: the PHY number
+- #phy-cells must be 1
+  The node that uses the phy must provide 1 integer argument specifying
+  port number.
+
+Optional Properties:
+- vbus-p#-supply : The regulator for vbus out control for the host
+  functionality enabled ports.
+- vbus-gpios: vbus gpio binding
+  This is mandatory for port 2, as port 2 is used as dual role phy.
+  Based on the vbus and id values device or host role is determined
+  for phy 2.
+
+- extcon: extcon phandle
+  This is mandatory for port 2,  as port 2 is used as dual role phy.
+  extcon should be phandle to external usb gpio module which provide
+  device or host role notifications based on the ID and VBUS gpio's state.
+
+
+Refer to phy/phy-bindings.txt for the generic PHY binding properties
+
+NOTE: port 0 and port 1 are host only and port 2 is dual role port.
+
+Example of phy :
+   usbphy: phy@0301c028 {
+   compatible = "brcm,cygnus-usb-phy";
+   reg = <0x0301c028 0x4>,
+ <0x0301d1b4 0x5c>,
+ <0x18115000 0xa00>,
+ <0x18111000 0xa00>;
+   reg-names = "crmu-usbphy-aon-ctrl", "cdru-usbphy",
+   "usb2h-idm", "usb2d-idm";
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   usbphy0: usb-phy@0 {
+   reg = <0>;
+   #phy-cells = <1>;
+   };
+
+   usbphy1: usb-phy@1 {
+   reg = <1>;
+   #phy-cells = <1>;
+   };
+
+   usbphy2: usb-phy@2 {
+   reg = <2>;
+   #phy-cells = <1>;
+   extcon = <_usb>;
+   };
+   };
+
+   extcon_usb: extcon_usb {
+   compatible = "linux,extcon-usb-gpio";
+   vbus-gpio = <_asiu 121 0>;
+   id-gpio = <_asiu 122 0>;
+   status = "okay";
+   };
+
+
+Example of node using the phy:
+
+   /* This nodes declares all three ports, port 0
+   and port 1 are host and port 2 is device */
+
+   ehci0: usb@18048000 {
+   compatible = "generic-ehci";
+   reg = <0x18048000 0x100>;
+   interrupts = ;
+   phys = < 0  1  2>;
+   phy-names = "usbp0","usbp1","usbp2";
+   status = "okay";
+   };
+
+   /* This node declares port 2 phy
+   and configures it for device */
+
+   usbd_udc_dwc1: usbd_udc_dwc@1804c000 {
+   compatible = "iproc-udc";
+   reg = <0x1804c000 0x2000>;
+   interrupts = ;
+   phys = < 2>;
+   phy-names = "usbdrd";
+   };
-- 
1.9.1



[PATCH 0/3] Add driver for Broadcom Cygnus USB phy controller

2017-10-23 Thread Raveendra Padasalagi
Add driver for Broadcom's USB phy controller's used in Cygnus family
of SoC and it's based on 4.14-rc3 tag.

The patch set can be fetched from iproc-cyg-usb-v1 branch of
https://github.com/Broadcom/arm64-linux.git

Raveendra Padasalagi (3):
  Documentation: DT: Add Cygnus usb phy binding
  drivers: phy: broadcom: Add driver for Cygnus USB phy controller
  ARM: dts: Add dt node for Broadcom Cygnus USB phy

 .../bindings/phy/brcm,cygnus-usb-phy.txt   | 101 
 arch/arm/boot/dts/bcm-cygnus.dtsi  |  35 ++
 drivers/phy/broadcom/Kconfig   |  14 +
 drivers/phy/broadcom/Makefile  |   1 +
 drivers/phy/broadcom/phy-bcm-cygnus-usb.c  | 672 +
 5 files changed, 823 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/phy/brcm,cygnus-usb-phy.txt
 create mode 100644 drivers/phy/broadcom/phy-bcm-cygnus-usb.c

-- 
1.9.1



clock event device’s next_event

2017-10-23 Thread Sodagudi Prasad

Hi Viresh and Thomas,

In the functions tick_nohz_stop_sched_tick(), when expires = KTIME_MAX 
we are canceling the tick_sched_timer timer but we are not updating the 
clock event device’s next_event to KTIME_MAX.
Due to that broadcast device’s next_event is not programmed properly and 
resulting unnecessary wakeups for this cpu.


/*
 * If the expiration time == KTIME_MAX, then we simply stop
 * the tick timer.
 */
if (unlikely(expires == KTIME_MAX)) {
if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
hrtimer_cancel(>sched_timer);
goto out;
}


After digging further, I see that following call flow is updating 
tick_cpu_device state to shutdown state but clock event device  
next_event is not updated to KTIME_MAX.
hrtimer_cancel -> __remove_hrtimer -> hrtimer_force_reprogram -> 
tick_program_event.


int tick_program_event(ktime_t expires, int force)
{
struct clock_event_device *dev = 
__this_cpu_read(tick_cpu_device.evtdev);


if (unlikely(expires == KTIME_MAX)) {
/*
 * We don't need the clock event device any more, stop 
it.

 */
clockevents_switch_state(dev, 
CLOCK_EVT_STATE_ONESHOT_STOPPED);

return 0;
}
In the above tick_program_event() function clock event device’s 
next_event is not getting updated as clockevents_program_event() 
function not called after state update.


-thanks, Prasad


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


clock event device’s next_event

2017-10-23 Thread Sodagudi Prasad

Hi Viresh and Thomas,

In the functions tick_nohz_stop_sched_tick(), when expires = KTIME_MAX 
we are canceling the tick_sched_timer timer but we are not updating the 
clock event device’s next_event to KTIME_MAX.
Due to that broadcast device’s next_event is not programmed properly and 
resulting unnecessary wakeups for this cpu.


/*
 * If the expiration time == KTIME_MAX, then we simply stop
 * the tick timer.
 */
if (unlikely(expires == KTIME_MAX)) {
if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
hrtimer_cancel(>sched_timer);
goto out;
}


After digging further, I see that following call flow is updating 
tick_cpu_device state to shutdown state but clock event device  
next_event is not updated to KTIME_MAX.
hrtimer_cancel -> __remove_hrtimer -> hrtimer_force_reprogram -> 
tick_program_event.


int tick_program_event(ktime_t expires, int force)
{
struct clock_event_device *dev = 
__this_cpu_read(tick_cpu_device.evtdev);


if (unlikely(expires == KTIME_MAX)) {
/*
 * We don't need the clock event device any more, stop 
it.

 */
clockevents_switch_state(dev, 
CLOCK_EVT_STATE_ONESHOT_STOPPED);

return 0;
}
In the above tick_program_event() function clock event device’s 
next_event is not getting updated as clockevents_program_event() 
function not called after state update.


-thanks, Prasad


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] net: xfrm_user: use BUG_ON instead of if condition followed by BUG

2017-10-23 Thread Herbert Xu
On Mon, Oct 23, 2017 at 09:01:46PM -0700, Alexei Starovoitov wrote:
>
> fwiw I had the same argument earlier:
> https://lkml.org/lkml/2017/10/9/1139

Fair point on eliminating a branch.  But I'd prefer something like

bool cond;

cond = code_that_does_something();
BUG_ON(cond);

rather than just

BUG_ON(code_that_does_something());

But maybe it's just me.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] net: xfrm_user: use BUG_ON instead of if condition followed by BUG

2017-10-23 Thread Herbert Xu
On Mon, Oct 23, 2017 at 09:01:46PM -0700, Alexei Starovoitov wrote:
>
> fwiw I had the same argument earlier:
> https://lkml.org/lkml/2017/10/9/1139

Fair point on eliminating a branch.  But I'd prefer something like

bool cond;

cond = code_that_does_something();
BUG_ON(cond);

rather than just

BUG_ON(code_that_does_something());

But maybe it's just me.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] net: xfrm_user: use BUG_ON instead of if condition followed by BUG

2017-10-23 Thread Gustavo A. R. Silva


Quoting Herbert Xu :


On Mon, Oct 23, 2017 at 10:50:43PM -0500, Gustavo A. R. Silva wrote:


Quoting Herbert Xu :

>On Mon, Oct 23, 2017 at 01:18:57PM -0500, Gustavo A. R. Silva wrote:
>>Use BUG_ON instead of if condition followed by BUG.
>>
>>This issue was detected with the help of Coccinelle.
>>
>>Signed-off-by: Gustavo A. R. Silva 
>
>I think this patch is terrible.  Why on earth is Coccinelle even
>warning about this?
>
>If anything we should be converting these constructs to not use
>BUG.
>

Yeah, and under this assumption the original code is even worse.


No your patch makes it worse.  The original construct makes it
clear that the conditional is real code and not just some debugging
aid.

With your patch, real code is now inside BUG_ON.



What is the reason for BUG_ON to exist then if it makes the code worse  
than an _if_ condition followed by BUG?


Thanks
--
Gustavo A. R. Silva






Re: [PATCH] net: xfrm_user: use BUG_ON instead of if condition followed by BUG

2017-10-23 Thread Gustavo A. R. Silva


Quoting Herbert Xu :


On Mon, Oct 23, 2017 at 10:50:43PM -0500, Gustavo A. R. Silva wrote:


Quoting Herbert Xu :

>On Mon, Oct 23, 2017 at 01:18:57PM -0500, Gustavo A. R. Silva wrote:
>>Use BUG_ON instead of if condition followed by BUG.
>>
>>This issue was detected with the help of Coccinelle.
>>
>>Signed-off-by: Gustavo A. R. Silva 
>
>I think this patch is terrible.  Why on earth is Coccinelle even
>warning about this?
>
>If anything we should be converting these constructs to not use
>BUG.
>

Yeah, and under this assumption the original code is even worse.


No your patch makes it worse.  The original construct makes it
clear that the conditional is real code and not just some debugging
aid.

With your patch, real code is now inside BUG_ON.



What is the reason for BUG_ON to exist then if it makes the code worse  
than an _if_ condition followed by BUG?


Thanks
--
Gustavo A. R. Silva






Re: [PATCH] kbuild: clang: fix build failures with sparse check

2017-10-23 Thread Masahiro Yamada
2017-10-21 6:09 GMT+09:00 David Lin :
> We should avoid using the space character when passing arguments to
> clang, because static code analysis check tool such as sparse may
> misinterpret the arguments followed by spaces as build targets hence
> cause the build to fail.
>
> Signed-off-by: David Lin 
> ---
>  Makefile | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index 05f95df0a247..c8819d0de907 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -685,11 +685,11 @@ KBUILD_CFLAGS += $(stackp-flag)
>
>  ifeq ($(cc-name),clang)
>  ifneq ($(CROSS_COMPILE),)
> -CLANG_TARGET   := -target $(notdir $(CROSS_COMPILE:%-=%))
> +CLANG_TARGET   := --target=$(notdir $(CROSS_COMPILE:%-=%))
>  GCC_TOOLCHAIN  := $(realpath $(dir $(shell which $(LD)))/..)
>  endif
>  ifneq ($(GCC_TOOLCHAIN),)
> -CLANG_GCC_TC   := -gcc-toolchain $(GCC_TOOLCHAIN)
> +CLANG_GCC_TC   := --gcc-toolchain=$(GCC_TOOLCHAIN)
>  endif
>  KBUILD_CFLAGS += $(CLANG_TARGET) $(CLANG_GCC_TC)
>  KBUILD_AFLAGS += $(CLANG_TARGET) $(CLANG_GCC_TC)
> --
> 2.15.0.rc0.271.g36b669edcc-goog
>

Applied to linux-kbuild/fixes.  Thanks!


-- 
Best Regards
Masahiro Yamada


Re: [PATCH] kbuild: clang: fix build failures with sparse check

2017-10-23 Thread Masahiro Yamada
2017-10-21 6:09 GMT+09:00 David Lin :
> We should avoid using the space character when passing arguments to
> clang, because static code analysis check tool such as sparse may
> misinterpret the arguments followed by spaces as build targets hence
> cause the build to fail.
>
> Signed-off-by: David Lin 
> ---
>  Makefile | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index 05f95df0a247..c8819d0de907 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -685,11 +685,11 @@ KBUILD_CFLAGS += $(stackp-flag)
>
>  ifeq ($(cc-name),clang)
>  ifneq ($(CROSS_COMPILE),)
> -CLANG_TARGET   := -target $(notdir $(CROSS_COMPILE:%-=%))
> +CLANG_TARGET   := --target=$(notdir $(CROSS_COMPILE:%-=%))
>  GCC_TOOLCHAIN  := $(realpath $(dir $(shell which $(LD)))/..)
>  endif
>  ifneq ($(GCC_TOOLCHAIN),)
> -CLANG_GCC_TC   := -gcc-toolchain $(GCC_TOOLCHAIN)
> +CLANG_GCC_TC   := --gcc-toolchain=$(GCC_TOOLCHAIN)
>  endif
>  KBUILD_CFLAGS += $(CLANG_TARGET) $(CLANG_GCC_TC)
>  KBUILD_AFLAGS += $(CLANG_TARGET) $(CLANG_GCC_TC)
> --
> 2.15.0.rc0.271.g36b669edcc-goog
>

Applied to linux-kbuild/fixes.  Thanks!


-- 
Best Regards
Masahiro Yamada


Re: [RFC PATCH] fs: fsnotify: account fsnotify metadata to kmemcg

2017-10-23 Thread Yang Shi



On 10/22/17 1:24 AM, Amir Goldstein wrote:

On Sat, Oct 21, 2017 at 12:07 AM, Yang Shi  wrote:



On 10/19/17 8:14 PM, Amir Goldstein wrote:


On Fri, Oct 20, 2017 at 12:20 AM, Yang Shi  wrote:


We observed some misbehaved user applications might consume significant
amount of fsnotify slabs silently. It'd better to account those slabs in
kmemcg so that we can get heads up before misbehaved applications use too
much memory silently.



In what way do they misbehave? create a lot of marks? create a lot of
events?
Not reading events in their queue?



It looks both a lot marks and events. I'm not sure if it is the latter case.
If I knew more about the details of the behavior, I would elaborated more in
the commit log.


If you are not sure, do not refer to user application as "misbehaved".
Is updatedb(8) a misbehaved application because it produces a lot of access
events?


Should be not. It sounds like our in-house applications. But, it is a 
sort of blackbox to me.



It would be better if you provide the dry facts of your setup and slab counters
and say that you are missing information to analyse the distribution of slab
usage because of missing kmemcg accounting.


Yes, sure. Will add such information in the commit log for the new version.







The latter case is more interesting:

Process A is the one that asked to get the events.
Process B is the one that is generating the events and queuing them on
the queue that is owned by process A, who is also to blame if the queue
is not being read.



I agree it is not fair to account the memory to the generator. But, afaik,
accounting non-current memcg is not how memcg is designed and works. Please
see the below for some details.



So why should process B be held accountable for memory pressure
caused by, say, an FAN_UNLIMITED_QUEUE that process A created and
doesn't read from.

Is it possible to get an explicit reference to the memcg's  events cache
at fsnotify_group creation time, store it in the group struct and then
allocate
events from the event cache associated with the group (the listener)
rather
than the cache associated with the task generating the event?



I don't think current memcg design can do this. Because kmem accounting
happens at allocation (when calling kmem_cache_alloc) stage, and get the
associated memcg from current task, so basically who does the allocation who
get it accounted. If the producer is in the different memcg of consumer, it
should be just accounted to the producer memcg, although the problem might
be caused by the producer.

However, afaik, both producer and consumer are typically in the same memcg.
So, this might be not a big issue. But, I do admit such unfair accounting
may happen.



That is a reasonable argument, but please make a comment on that fact in
commit message and above creation of events cache, so that it is clear that
event slab accounting is mostly heuristic.


Yes, will add such information in the new version.



But I think there is another problem, not introduced by your change, but could
be amplified because of it - when a non-permission event allocation fails, the
event is silently dropped, AFAICT, with no indication to listener.
That seems like a bug to me, because there is a perfectly safe way to deal with
event allocation failure - queue the overflow event.


I'm not sure if such issue could be amplified by the accounting since 
once the usage exceeds the limit any following kmem allocation would 
fail. So, it might fail at fsnotify event allocation, or other places, 
i.e. fork, open syscall, etc. So, in most cases the generator even can't 
generate new event any more.


The typical output from my LTP test is filesystem dcache allocation 
error or fork error due to kmem limit is reached.



I am not going to be the one to determine if fixing this alleged bug is a
prerequisite for merging your patch, but I think enforcing memory limits on
event allocation could amplify that bug, so it should be fixed.

The upside is that with both your accounting fix and ENOMEM = overlflow
fix, it going to be easy to write a test that verifies both of them:
- Run a listener in memcg with limited kmem and unlimited (or very
large) event queue
- Produce events inside memcg without listener reading them
- Read event and expect an OVERFLOW even

This is a simple variant of LTP tests inotify05 and fanotify05.


I tried to test your patch with LTP, but it sounds not that easy to 
setup a scenario to make fsnotify event allocation just hit the kmem 
limit, since the limit may be hit before a new event is allocated, for 
example allocating dentry cache in open syscall may hit the limit.


So, it sounds the overflow event might be not generated by the producer 
in most cases.


Thanks,
Yang



I realize that is user application behavior change and that documentation
implies that an OVERFLOW event is not expected when using
FAN_UNLIMITED_QUEUE, but IMO no one will come 

Re: [RFC PATCH] fs: fsnotify: account fsnotify metadata to kmemcg

2017-10-23 Thread Yang Shi



On 10/22/17 1:24 AM, Amir Goldstein wrote:

On Sat, Oct 21, 2017 at 12:07 AM, Yang Shi  wrote:



On 10/19/17 8:14 PM, Amir Goldstein wrote:


On Fri, Oct 20, 2017 at 12:20 AM, Yang Shi  wrote:


We observed some misbehaved user applications might consume significant
amount of fsnotify slabs silently. It'd better to account those slabs in
kmemcg so that we can get heads up before misbehaved applications use too
much memory silently.



In what way do they misbehave? create a lot of marks? create a lot of
events?
Not reading events in their queue?



It looks both a lot marks and events. I'm not sure if it is the latter case.
If I knew more about the details of the behavior, I would elaborated more in
the commit log.


If you are not sure, do not refer to user application as "misbehaved".
Is updatedb(8) a misbehaved application because it produces a lot of access
events?


Should be not. It sounds like our in-house applications. But, it is a 
sort of blackbox to me.



It would be better if you provide the dry facts of your setup and slab counters
and say that you are missing information to analyse the distribution of slab
usage because of missing kmemcg accounting.


Yes, sure. Will add such information in the commit log for the new version.







The latter case is more interesting:

Process A is the one that asked to get the events.
Process B is the one that is generating the events and queuing them on
the queue that is owned by process A, who is also to blame if the queue
is not being read.



I agree it is not fair to account the memory to the generator. But, afaik,
accounting non-current memcg is not how memcg is designed and works. Please
see the below for some details.



So why should process B be held accountable for memory pressure
caused by, say, an FAN_UNLIMITED_QUEUE that process A created and
doesn't read from.

Is it possible to get an explicit reference to the memcg's  events cache
at fsnotify_group creation time, store it in the group struct and then
allocate
events from the event cache associated with the group (the listener)
rather
than the cache associated with the task generating the event?



I don't think current memcg design can do this. Because kmem accounting
happens at allocation (when calling kmem_cache_alloc) stage, and get the
associated memcg from current task, so basically who does the allocation who
get it accounted. If the producer is in the different memcg of consumer, it
should be just accounted to the producer memcg, although the problem might
be caused by the producer.

However, afaik, both producer and consumer are typically in the same memcg.
So, this might be not a big issue. But, I do admit such unfair accounting
may happen.



That is a reasonable argument, but please make a comment on that fact in
commit message and above creation of events cache, so that it is clear that
event slab accounting is mostly heuristic.


Yes, will add such information in the new version.



But I think there is another problem, not introduced by your change, but could
be amplified because of it - when a non-permission event allocation fails, the
event is silently dropped, AFAICT, with no indication to listener.
That seems like a bug to me, because there is a perfectly safe way to deal with
event allocation failure - queue the overflow event.


I'm not sure if such issue could be amplified by the accounting since 
once the usage exceeds the limit any following kmem allocation would 
fail. So, it might fail at fsnotify event allocation, or other places, 
i.e. fork, open syscall, etc. So, in most cases the generator even can't 
generate new event any more.


The typical output from my LTP test is filesystem dcache allocation 
error or fork error due to kmem limit is reached.



I am not going to be the one to determine if fixing this alleged bug is a
prerequisite for merging your patch, but I think enforcing memory limits on
event allocation could amplify that bug, so it should be fixed.

The upside is that with both your accounting fix and ENOMEM = overlflow
fix, it going to be easy to write a test that verifies both of them:
- Run a listener in memcg with limited kmem and unlimited (or very
large) event queue
- Produce events inside memcg without listener reading them
- Read event and expect an OVERFLOW even

This is a simple variant of LTP tests inotify05 and fanotify05.


I tried to test your patch with LTP, but it sounds not that easy to 
setup a scenario to make fsnotify event allocation just hit the kmem 
limit, since the limit may be hit before a new event is allocated, for 
example allocating dentry cache in open syscall may hit the limit.


So, it sounds the overflow event might be not generated by the producer 
in most cases.


Thanks,
Yang



I realize that is user application behavior change and that documentation
implies that an OVERFLOW event is not expected when using
FAN_UNLIMITED_QUEUE, but IMO no one will come shouting
if we stop silently dropping events, so it is 

[PATCH] pinctrl: uniphier: remove eMMC hardware reset pin-mux

2017-10-23 Thread Masahiro Yamada
This is handled by the mmc-pwrseq-emmc driver, which controls
an eMMC hardware reset via a GPIO line.

Remove it from the function pin-mux settings.

Signed-off-by: Masahiro Yamada 
---

 drivers/pinctrl/uniphier/pinctrl-uniphier-ld11.c | 4 ++--
 drivers/pinctrl/uniphier/pinctrl-uniphier-ld20.c | 4 ++--
 drivers/pinctrl/uniphier/pinctrl-uniphier-pxs3.c | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/pinctrl/uniphier/pinctrl-uniphier-ld11.c 
b/drivers/pinctrl/uniphier/pinctrl-uniphier-ld11.c
index 9c5e359..8a5ecd6 100644
--- a/drivers/pinctrl/uniphier/pinctrl-uniphier-ld11.c
+++ b/drivers/pinctrl/uniphier/pinctrl-uniphier-ld11.c
@@ -472,8 +472,8 @@ static const struct pinctrl_pin_desc uniphier_ld11_pins[] = 
{
 
 static const unsigned aout_pins[] = {135, 136, 137, 138, 139, 140, 141, 142};
 static const int aout_muxvals[] = {0, 0, 0, 0, 0, 0, 0, 0};
-static const unsigned emmc_pins[] = {18, 19, 20, 21, 22, 23, 24, 25};
-static const int emmc_muxvals[] = {0, 0, 0, 0, 0, 0, 0, 0};
+static const unsigned int emmc_pins[] = {19, 20, 21, 22, 23, 24, 25};
+static const int emmc_muxvals[] = {0, 0, 0, 0, 0, 0, 0};
 static const unsigned emmc_dat8_pins[] = {26, 27, 28, 29};
 static const int emmc_dat8_muxvals[] = {0, 0, 0, 0};
 static const unsigned ether_rmii_pins[] = {6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
diff --git a/drivers/pinctrl/uniphier/pinctrl-uniphier-ld20.c 
b/drivers/pinctrl/uniphier/pinctrl-uniphier-ld20.c
index 8334128..3be7967 100644
--- a/drivers/pinctrl/uniphier/pinctrl-uniphier-ld20.c
+++ b/drivers/pinctrl/uniphier/pinctrl-uniphier-ld20.c
@@ -553,8 +553,8 @@ static const struct pinctrl_pin_desc uniphier_ld20_pins[] = 
{
 
 static const unsigned aout_pins[] = {135, 136, 137, 138, 139, 140, 141, 142};
 static const int aout_muxvals[] = {0, 0, 0, 0, 0, 0, 0, 0};
-static const unsigned emmc_pins[] = {18, 19, 20, 21, 22, 23, 24, 25};
-static const int emmc_muxvals[] = {0, 0, 0, 0, 0, 0, 0, 0};
+static const unsigned int emmc_pins[] = {19, 20, 21, 22, 23, 24, 25};
+static const int emmc_muxvals[] = {0, 0, 0, 0, 0, 0, 0};
 static const unsigned emmc_dat8_pins[] = {26, 27, 28, 29};
 static const int emmc_dat8_muxvals[] = {0, 0, 0, 0};
 static const unsigned ether_rgmii_pins[] = {30, 31, 32, 33, 34, 35, 36, 37, 38,
diff --git a/drivers/pinctrl/uniphier/pinctrl-uniphier-pxs3.c 
b/drivers/pinctrl/uniphier/pinctrl-uniphier-pxs3.c
index d9f166f..dbe94a9 100644
--- a/drivers/pinctrl/uniphier/pinctrl-uniphier-pxs3.c
+++ b/drivers/pinctrl/uniphier/pinctrl-uniphier-pxs3.c
@@ -776,8 +776,8 @@ static const struct pinctrl_pin_desc uniphier_pxs3_pins[] = 
{
 250, UNIPHIER_PIN_PULL_DOWN),
 };
 
-static const unsigned int emmc_pins[] = {31, 32, 33, 34, 35, 36, 37, 38};
-static const int emmc_muxvals[] = {0, 0, 0, 0, 0, 0, 0, 0};
+static const unsigned int emmc_pins[] = {32, 33, 34, 35, 36, 37, 38};
+static const int emmc_muxvals[] = {0, 0, 0, 0, 0, 0, 0};
 static const unsigned int emmc_dat8_pins[] = {39, 40, 41, 42};
 static const int emmc_dat8_muxvals[] = {0, 0, 0, 0};
 static const unsigned int ether_rgmii_pins[] = {52, 53, 54, 55, 56, 57, 58, 59,
-- 
2.7.4



[PATCH] pinctrl: uniphier: remove eMMC hardware reset pin-mux

2017-10-23 Thread Masahiro Yamada
This is handled by the mmc-pwrseq-emmc driver, which controls
an eMMC hardware reset via a GPIO line.

Remove it from the function pin-mux settings.

Signed-off-by: Masahiro Yamada 
---

 drivers/pinctrl/uniphier/pinctrl-uniphier-ld11.c | 4 ++--
 drivers/pinctrl/uniphier/pinctrl-uniphier-ld20.c | 4 ++--
 drivers/pinctrl/uniphier/pinctrl-uniphier-pxs3.c | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/pinctrl/uniphier/pinctrl-uniphier-ld11.c 
b/drivers/pinctrl/uniphier/pinctrl-uniphier-ld11.c
index 9c5e359..8a5ecd6 100644
--- a/drivers/pinctrl/uniphier/pinctrl-uniphier-ld11.c
+++ b/drivers/pinctrl/uniphier/pinctrl-uniphier-ld11.c
@@ -472,8 +472,8 @@ static const struct pinctrl_pin_desc uniphier_ld11_pins[] = 
{
 
 static const unsigned aout_pins[] = {135, 136, 137, 138, 139, 140, 141, 142};
 static const int aout_muxvals[] = {0, 0, 0, 0, 0, 0, 0, 0};
-static const unsigned emmc_pins[] = {18, 19, 20, 21, 22, 23, 24, 25};
-static const int emmc_muxvals[] = {0, 0, 0, 0, 0, 0, 0, 0};
+static const unsigned int emmc_pins[] = {19, 20, 21, 22, 23, 24, 25};
+static const int emmc_muxvals[] = {0, 0, 0, 0, 0, 0, 0};
 static const unsigned emmc_dat8_pins[] = {26, 27, 28, 29};
 static const int emmc_dat8_muxvals[] = {0, 0, 0, 0};
 static const unsigned ether_rmii_pins[] = {6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
diff --git a/drivers/pinctrl/uniphier/pinctrl-uniphier-ld20.c 
b/drivers/pinctrl/uniphier/pinctrl-uniphier-ld20.c
index 8334128..3be7967 100644
--- a/drivers/pinctrl/uniphier/pinctrl-uniphier-ld20.c
+++ b/drivers/pinctrl/uniphier/pinctrl-uniphier-ld20.c
@@ -553,8 +553,8 @@ static const struct pinctrl_pin_desc uniphier_ld20_pins[] = 
{
 
 static const unsigned aout_pins[] = {135, 136, 137, 138, 139, 140, 141, 142};
 static const int aout_muxvals[] = {0, 0, 0, 0, 0, 0, 0, 0};
-static const unsigned emmc_pins[] = {18, 19, 20, 21, 22, 23, 24, 25};
-static const int emmc_muxvals[] = {0, 0, 0, 0, 0, 0, 0, 0};
+static const unsigned int emmc_pins[] = {19, 20, 21, 22, 23, 24, 25};
+static const int emmc_muxvals[] = {0, 0, 0, 0, 0, 0, 0};
 static const unsigned emmc_dat8_pins[] = {26, 27, 28, 29};
 static const int emmc_dat8_muxvals[] = {0, 0, 0, 0};
 static const unsigned ether_rgmii_pins[] = {30, 31, 32, 33, 34, 35, 36, 37, 38,
diff --git a/drivers/pinctrl/uniphier/pinctrl-uniphier-pxs3.c 
b/drivers/pinctrl/uniphier/pinctrl-uniphier-pxs3.c
index d9f166f..dbe94a9 100644
--- a/drivers/pinctrl/uniphier/pinctrl-uniphier-pxs3.c
+++ b/drivers/pinctrl/uniphier/pinctrl-uniphier-pxs3.c
@@ -776,8 +776,8 @@ static const struct pinctrl_pin_desc uniphier_pxs3_pins[] = 
{
 250, UNIPHIER_PIN_PULL_DOWN),
 };
 
-static const unsigned int emmc_pins[] = {31, 32, 33, 34, 35, 36, 37, 38};
-static const int emmc_muxvals[] = {0, 0, 0, 0, 0, 0, 0, 0};
+static const unsigned int emmc_pins[] = {32, 33, 34, 35, 36, 37, 38};
+static const int emmc_muxvals[] = {0, 0, 0, 0, 0, 0, 0};
 static const unsigned int emmc_dat8_pins[] = {39, 40, 41, 42};
 static const int emmc_dat8_muxvals[] = {0, 0, 0, 0};
 static const unsigned int ether_rgmii_pins[] = {52, 53, 54, 55, 56, 57, 58, 59,
-- 
2.7.4



Re: [PATCH v7 1/3] PCI: Add support for wake irq

2017-10-23 Thread jeffy

Hi Brian,

On 10/24/2017 07:02 AM, Brian Norris wrote:

+ PM folks

Hi Jeffy,

It's probably good if you send the whole thing to linux-pm@ in the
future, if you're really trying to implement generic PCI/PM for device
tree systems.

ok


On Thu, Oct 19, 2017 at 07:10:05PM +0800, Jeffy Chen wrote:

Add support for PCIE_WAKE pin.


This is kind of an important change, so it feels like you should
document it a little more thoroughly than this. Particularly, I have a
few questions below, and it seems like some of these questions should be
acknowledged up front. e.g., why does this look so different than the
ACPI hooks?

sure, will do in next version.




Signed-off-by: Jeffy Chen 
---

Changes in v7:
Move PCIE_WAKE handling into pci core.

Changes in v6:
Fix device_init_wake error handling, and add some comments.

Changes in v5:
Rebase

Changes in v3:
Fix error handling

Changes in v2:
Use dev_pm_set_dedicated_wake_irq
 -- Suggested by Brian Norris 

  drivers/pci/pci.c| 34 --
  drivers/pci/probe.c  | 32 
  drivers/pci/remove.c |  9 +
  3 files changed, 69 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f0d68066c726..49080a10bdf0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -603,10 +603,40 @@ static inline pci_power_t 
platform_pci_choose_state(struct pci_dev *dev)
pci_platform_pm->choose_state(dev) : PCI_POWER_ERROR;
  }

+static int pci_dev_check_wakeup(struct pci_dev *dev, void *data)
+{
+   bool *wakeup = data;
+
+   if (device_may_wakeup(>dev))
+   *wakeup = true;
+
+   return *wakeup;
+}
+
  static inline int platform_pci_set_wakeup(struct pci_dev *dev, bool enable)
  {
-   return pci_platform_pm ?
-   pci_platform_pm->set_wakeup(dev, enable) : -ENODEV;
+   struct pci_dev *parent = dev;
+   struct pci_bus *bus;
+   bool wakeup = false;


It feels like you're implementing a set of pci_platform_pm_ops, except
you're not actually implementing them. It almost seems like we should
have a drivers/pci/pci-of.c to do this. But that brings up a few
questions
i saw the drivers might call set_wakeup() in suspend/resume/shutdown to 
configure the wakeup ability, maybe we can call 
device_set_wakeup_enable() here as a common part of set_wakeup()?


static inline int platform_pci_set_wakeup(struct pci_dev *dev, bool 
enable) {

device_set_wakeup_enable()
...
return pci_platform_pm ? pci_platform_pm->set_wakeup(dev, enable) : 
-ENODEV;





+
+   if (pci_platform_pm)


So, if somebody already registered ops, then you won't follow the "OF"
route? That means this all breaks as soon as a kernel has both
CONFIG_ACPI and CONFIG_OF enabled. This is possible on at least ARM64,
which 'select's OF and may also be built/run with CONFIG_ACPI.

And that conflict is the same if we try to register pci_platform_pm_ops
for OF systems -- it'll be a race over who sets them up first (or
rather, last).

Also, what happens on !ACPI && !OF? Or if the device tree did not
contain a "wakeup" definition? You're now implementing a default path
that doesn't make much sense IMO; you may claim wakeup capability
without actually having set it up somewhere.
maybe we can use device_set_wakeup_enable(), which will check the setup 
before setting?


I think you could use some more comments, and (again) a real commit
message.

ok, will do.



+   return pci_platform_pm->set_wakeup(dev, enable);
+
+   device_set_wakeup_capable(>dev, enable);


Why are you setting that here? This function should just be telling the
lower layers to enable the physical WAKE# ability. In our case, it just
means configuring the WAKE# interrupt for wakeup -- or, since you've
used dev_pm_set_dedicated_wake_irq() which handles most of this
automatically...do you need this at all? It seems like you should
*either* implement these callbacks to manually manage the wakeup IRQ or
else use the dedicated wakeirq infrastructure -- not both.

And even if you need this, I don't think you need to do this many times;
you should only need to set up the capabilities once, when you first set
up the device.

And BTW, the description for the set_wakeup() callback says:

  * @set_wakeup: enables/disables wakeup capability for the device

I *don't* think that means "capability" as in the device framework's
view of "wakeup capable"; I think it means capability as in the physical
ability (a la, enable_irq_wake() or similar).
i was thinking maybe we should disable the wakeup if all children 
request set_wakeup(false)?


and it seems like the dedicated wakeirq can be disabled by:
1/ dev_pm_clear_wake_irq(), then we may need to store the irq somewhere 
to set it up again in the future?


2/ let device_may_wakeup return false:
void dev_pm_arm_wake_irq(struct wake_irq *wirq)
{
if (!wirq)
 

Re: [PATCH v7 1/3] PCI: Add support for wake irq

2017-10-23 Thread jeffy

Hi Brian,

On 10/24/2017 07:02 AM, Brian Norris wrote:

+ PM folks

Hi Jeffy,

It's probably good if you send the whole thing to linux-pm@ in the
future, if you're really trying to implement generic PCI/PM for device
tree systems.

ok


On Thu, Oct 19, 2017 at 07:10:05PM +0800, Jeffy Chen wrote:

Add support for PCIE_WAKE pin.


This is kind of an important change, so it feels like you should
document it a little more thoroughly than this. Particularly, I have a
few questions below, and it seems like some of these questions should be
acknowledged up front. e.g., why does this look so different than the
ACPI hooks?

sure, will do in next version.




Signed-off-by: Jeffy Chen 
---

Changes in v7:
Move PCIE_WAKE handling into pci core.

Changes in v6:
Fix device_init_wake error handling, and add some comments.

Changes in v5:
Rebase

Changes in v3:
Fix error handling

Changes in v2:
Use dev_pm_set_dedicated_wake_irq
 -- Suggested by Brian Norris 

  drivers/pci/pci.c| 34 --
  drivers/pci/probe.c  | 32 
  drivers/pci/remove.c |  9 +
  3 files changed, 69 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f0d68066c726..49080a10bdf0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -603,10 +603,40 @@ static inline pci_power_t 
platform_pci_choose_state(struct pci_dev *dev)
pci_platform_pm->choose_state(dev) : PCI_POWER_ERROR;
  }

+static int pci_dev_check_wakeup(struct pci_dev *dev, void *data)
+{
+   bool *wakeup = data;
+
+   if (device_may_wakeup(>dev))
+   *wakeup = true;
+
+   return *wakeup;
+}
+
  static inline int platform_pci_set_wakeup(struct pci_dev *dev, bool enable)
  {
-   return pci_platform_pm ?
-   pci_platform_pm->set_wakeup(dev, enable) : -ENODEV;
+   struct pci_dev *parent = dev;
+   struct pci_bus *bus;
+   bool wakeup = false;


It feels like you're implementing a set of pci_platform_pm_ops, except
you're not actually implementing them. It almost seems like we should
have a drivers/pci/pci-of.c to do this. But that brings up a few
questions
i saw the drivers might call set_wakeup() in suspend/resume/shutdown to 
configure the wakeup ability, maybe we can call 
device_set_wakeup_enable() here as a common part of set_wakeup()?


static inline int platform_pci_set_wakeup(struct pci_dev *dev, bool 
enable) {

device_set_wakeup_enable()
...
return pci_platform_pm ? pci_platform_pm->set_wakeup(dev, enable) : 
-ENODEV;





+
+   if (pci_platform_pm)


So, if somebody already registered ops, then you won't follow the "OF"
route? That means this all breaks as soon as a kernel has both
CONFIG_ACPI and CONFIG_OF enabled. This is possible on at least ARM64,
which 'select's OF and may also be built/run with CONFIG_ACPI.

And that conflict is the same if we try to register pci_platform_pm_ops
for OF systems -- it'll be a race over who sets them up first (or
rather, last).

Also, what happens on !ACPI && !OF? Or if the device tree did not
contain a "wakeup" definition? You're now implementing a default path
that doesn't make much sense IMO; you may claim wakeup capability
without actually having set it up somewhere.
maybe we can use device_set_wakeup_enable(), which will check the setup 
before setting?


I think you could use some more comments, and (again) a real commit
message.

ok, will do.



+   return pci_platform_pm->set_wakeup(dev, enable);
+
+   device_set_wakeup_capable(>dev, enable);


Why are you setting that here? This function should just be telling the
lower layers to enable the physical WAKE# ability. In our case, it just
means configuring the WAKE# interrupt for wakeup -- or, since you've
used dev_pm_set_dedicated_wake_irq() which handles most of this
automatically...do you need this at all? It seems like you should
*either* implement these callbacks to manually manage the wakeup IRQ or
else use the dedicated wakeirq infrastructure -- not both.

And even if you need this, I don't think you need to do this many times;
you should only need to set up the capabilities once, when you first set
up the device.

And BTW, the description for the set_wakeup() callback says:

  * @set_wakeup: enables/disables wakeup capability for the device

I *don't* think that means "capability" as in the device framework's
view of "wakeup capable"; I think it means capability as in the physical
ability (a la, enable_irq_wake() or similar).
i was thinking maybe we should disable the wakeup if all children 
request set_wakeup(false)?


and it seems like the dedicated wakeirq can be disabled by:
1/ dev_pm_clear_wake_irq(), then we may need to store the irq somewhere 
to set it up again in the future?


2/ let device_may_wakeup return false:
void dev_pm_arm_wake_irq(struct wake_irq *wirq)
{
if (!wirq)
return;

if 

Re: [PATCH] net: xfrm_user: use BUG_ON instead of if condition followed by BUG

2017-10-23 Thread Alexei Starovoitov
On Tue, Oct 24, 2017 at 11:53:20AM +0800, Herbert Xu wrote:
> On Mon, Oct 23, 2017 at 10:50:43PM -0500, Gustavo A. R. Silva wrote:
> > 
> > Quoting Herbert Xu :
> > 
> > >On Mon, Oct 23, 2017 at 01:18:57PM -0500, Gustavo A. R. Silva wrote:
> > >>Use BUG_ON instead of if condition followed by BUG.
> > >>
> > >>This issue was detected with the help of Coccinelle.
> > >>
> > >>Signed-off-by: Gustavo A. R. Silva 
> > >
> > >I think this patch is terrible.  Why on earth is Coccinelle even
> > >warning about this?
> > >
> > >If anything we should be converting these constructs to not use
> > >BUG.
> > >
> > 
> > Yeah, and under this assumption the original code is even worse.
> 
> No your patch makes it worse.  The original construct makes it
> clear that the conditional is real code and not just some debugging
> aid.
> 
> With your patch, real code is now inside BUG_ON.

fwiw I had the same argument earlier:
https://lkml.org/lkml/2017/10/9/1139



Re: [PATCH] net: xfrm_user: use BUG_ON instead of if condition followed by BUG

2017-10-23 Thread Alexei Starovoitov
On Tue, Oct 24, 2017 at 11:53:20AM +0800, Herbert Xu wrote:
> On Mon, Oct 23, 2017 at 10:50:43PM -0500, Gustavo A. R. Silva wrote:
> > 
> > Quoting Herbert Xu :
> > 
> > >On Mon, Oct 23, 2017 at 01:18:57PM -0500, Gustavo A. R. Silva wrote:
> > >>Use BUG_ON instead of if condition followed by BUG.
> > >>
> > >>This issue was detected with the help of Coccinelle.
> > >>
> > >>Signed-off-by: Gustavo A. R. Silva 
> > >
> > >I think this patch is terrible.  Why on earth is Coccinelle even
> > >warning about this?
> > >
> > >If anything we should be converting these constructs to not use
> > >BUG.
> > >
> > 
> > Yeah, and under this assumption the original code is even worse.
> 
> No your patch makes it worse.  The original construct makes it
> clear that the conditional is real code and not just some debugging
> aid.
> 
> With your patch, real code is now inside BUG_ON.

fwiw I had the same argument earlier:
https://lkml.org/lkml/2017/10/9/1139



Re: [PATCH] net: xfrm_user: use BUG_ON instead of if condition followed by BUG

2017-10-23 Thread Herbert Xu
On Mon, Oct 23, 2017 at 10:50:43PM -0500, Gustavo A. R. Silva wrote:
> 
> Quoting Herbert Xu :
> 
> >On Mon, Oct 23, 2017 at 01:18:57PM -0500, Gustavo A. R. Silva wrote:
> >>Use BUG_ON instead of if condition followed by BUG.
> >>
> >>This issue was detected with the help of Coccinelle.
> >>
> >>Signed-off-by: Gustavo A. R. Silva 
> >
> >I think this patch is terrible.  Why on earth is Coccinelle even
> >warning about this?
> >
> >If anything we should be converting these constructs to not use
> >BUG.
> >
> 
> Yeah, and under this assumption the original code is even worse.

No your patch makes it worse.  The original construct makes it
clear that the conditional is real code and not just some debugging
aid.

With your patch, real code is now inside BUG_ON.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] net: xfrm_user: use BUG_ON instead of if condition followed by BUG

2017-10-23 Thread Herbert Xu
On Mon, Oct 23, 2017 at 10:50:43PM -0500, Gustavo A. R. Silva wrote:
> 
> Quoting Herbert Xu :
> 
> >On Mon, Oct 23, 2017 at 01:18:57PM -0500, Gustavo A. R. Silva wrote:
> >>Use BUG_ON instead of if condition followed by BUG.
> >>
> >>This issue was detected with the help of Coccinelle.
> >>
> >>Signed-off-by: Gustavo A. R. Silva 
> >
> >I think this patch is terrible.  Why on earth is Coccinelle even
> >warning about this?
> >
> >If anything we should be converting these constructs to not use
> >BUG.
> >
> 
> Yeah, and under this assumption the original code is even worse.

No your patch makes it worse.  The original construct makes it
clear that the conditional is real code and not just some debugging
aid.

With your patch, real code is now inside BUG_ON.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


[PATCH] usb: gadget: f_tcm: mark expected switch fall-through

2017-10-23 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 703128
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/usb/gadget/function/f_tcm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/gadget/function/f_tcm.c 
b/drivers/usb/gadget/function/f_tcm.c
index a82e2bd..c9d741d 100644
--- a/drivers/usb/gadget/function/f_tcm.c
+++ b/drivers/usb/gadget/function/f_tcm.c
@@ -1145,6 +1145,7 @@ static int usbg_submit_command(struct f_uas *fu,
default:
pr_debug_once("Unsupported prio_attr: %02x.\n",
cmd_iu->prio_attr);
+   /* fall through */
case UAS_SIMPLE_TAG:
cmd->prio_attr = TCM_SIMPLE_TAG;
break;
-- 
2.7.4



[PATCH] usb: gadget: f_tcm: mark expected switch fall-through

2017-10-23 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 703128
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/usb/gadget/function/f_tcm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/gadget/function/f_tcm.c 
b/drivers/usb/gadget/function/f_tcm.c
index a82e2bd..c9d741d 100644
--- a/drivers/usb/gadget/function/f_tcm.c
+++ b/drivers/usb/gadget/function/f_tcm.c
@@ -1145,6 +1145,7 @@ static int usbg_submit_command(struct f_uas *fu,
default:
pr_debug_once("Unsupported prio_attr: %02x.\n",
cmd_iu->prio_attr);
+   /* fall through */
case UAS_SIMPLE_TAG:
cmd->prio_attr = TCM_SIMPLE_TAG;
break;
-- 
2.7.4



Re: [PATCH] net: xfrm_user: use BUG_ON instead of if condition followed by BUG

2017-10-23 Thread Gustavo A. R. Silva


Quoting Herbert Xu :


On Mon, Oct 23, 2017 at 01:18:57PM -0500, Gustavo A. R. Silva wrote:

Use BUG_ON instead of if condition followed by BUG.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 


I think this patch is terrible.  Why on earth is Coccinelle even
warning about this?

If anything we should be converting these constructs to not use
BUG.



Yeah, and under this assumption the original code is even worse.

Thanks
--
Gustavo A. R. Silva






Re: [PATCH] net: xfrm_user: use BUG_ON instead of if condition followed by BUG

2017-10-23 Thread Gustavo A. R. Silva


Quoting Herbert Xu :


On Mon, Oct 23, 2017 at 01:18:57PM -0500, Gustavo A. R. Silva wrote:

Use BUG_ON instead of if condition followed by BUG.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 


I think this patch is terrible.  Why on earth is Coccinelle even
warning about this?

If anything we should be converting these constructs to not use
BUG.



Yeah, and under this assumption the original code is even worse.

Thanks
--
Gustavo A. R. Silva






[PATCH] usb: gadget: goku_udc: mark expected switch fall-throughs

2017-10-23 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 145713
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/usb/gadget/udc/goku_udc.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/gadget/udc/goku_udc.c 
b/drivers/usb/gadget/udc/goku_udc.c
index 8433c22..a85407e 100644
--- a/drivers/usb/gadget/udc/goku_udc.c
+++ b/drivers/usb/gadget/udc/goku_udc.c
@@ -127,11 +127,15 @@ goku_ep_enable(struct usb_ep *_ep, const struct 
usb_endpoint_descriptor *desc)
mode = 0;
max = get_unaligned_le16(>wMaxPacketSize);
switch (max) {
-   case 64:mode++;
-   case 32:mode++;
-   case 16:mode++;
-   case 8: mode <<= 3;
-   break;
+   case 64:
+   mode++; /* fall through */
+   case 32:
+   mode++; /* fall through */
+   case 16:
+   mode++; /* fall through */
+   case 8:
+   mode <<= 3;
+   break;
default:
return -EINVAL;
}
-- 
2.7.4



[PATCH] usb: gadget: goku_udc: mark expected switch fall-throughs

2017-10-23 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 145713
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/usb/gadget/udc/goku_udc.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/gadget/udc/goku_udc.c 
b/drivers/usb/gadget/udc/goku_udc.c
index 8433c22..a85407e 100644
--- a/drivers/usb/gadget/udc/goku_udc.c
+++ b/drivers/usb/gadget/udc/goku_udc.c
@@ -127,11 +127,15 @@ goku_ep_enable(struct usb_ep *_ep, const struct 
usb_endpoint_descriptor *desc)
mode = 0;
max = get_unaligned_le16(>wMaxPacketSize);
switch (max) {
-   case 64:mode++;
-   case 32:mode++;
-   case 16:mode++;
-   case 8: mode <<= 3;
-   break;
+   case 64:
+   mode++; /* fall through */
+   case 32:
+   mode++; /* fall through */
+   case 16:
+   mode++; /* fall through */
+   case 8:
+   mode <<= 3;
+   break;
default:
return -EINVAL;
}
-- 
2.7.4



[PATCH] usb: core: urb: mark expected switch fall-through

2017-10-23 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1162594
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/usb/core/urb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/core/urb.c b/drivers/usb/core/urb.c
index 8b800e3..06e0151 100644
--- a/drivers/usb/core/urb.c
+++ b/drivers/usb/core/urb.c
@@ -514,6 +514,7 @@ int usb_submit_urb(struct urb *urb, gfp_t mem_flags)
if ((urb->interval < 6)
&& (xfertype == USB_ENDPOINT_XFER_INT))
return -EINVAL;
+   /* fall through */
default:
if (urb->interval <= 0)
return -EINVAL;
-- 
2.7.4



[PATCH] usb: core: urb: mark expected switch fall-through

2017-10-23 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1162594
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/usb/core/urb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/core/urb.c b/drivers/usb/core/urb.c
index 8b800e3..06e0151 100644
--- a/drivers/usb/core/urb.c
+++ b/drivers/usb/core/urb.c
@@ -514,6 +514,7 @@ int usb_submit_urb(struct urb *urb, gfp_t mem_flags)
if ((urb->interval < 6)
&& (xfertype == USB_ENDPOINT_XFER_INT))
return -EINVAL;
+   /* fall through */
default:
if (urb->interval <= 0)
return -EINVAL;
-- 
2.7.4



Re: [PATCH] net: xfrm_user: use BUG_ON instead of if condition followed by BUG

2017-10-23 Thread Herbert Xu
On Mon, Oct 23, 2017 at 01:18:57PM -0500, Gustavo A. R. Silva wrote:
> Use BUG_ON instead of if condition followed by BUG.
> 
> This issue was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 

I think this patch is terrible.  Why on earth is Coccinelle even
warning about this?

If anything we should be converting these constructs to not use
BUG.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] net: xfrm_user: use BUG_ON instead of if condition followed by BUG

2017-10-23 Thread Herbert Xu
On Mon, Oct 23, 2017 at 01:18:57PM -0500, Gustavo A. R. Silva wrote:
> Use BUG_ON instead of if condition followed by BUG.
> 
> This issue was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 

I think this patch is terrible.  Why on earth is Coccinelle even
warning about this?

If anything we should be converting these constructs to not use
BUG.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


[PATCH] usb: phy: phy-msm-usb: mark expected switch fall-through

2017-10-23 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1222118
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/usb/phy/phy-msm-usb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/phy/phy-msm-usb.c b/drivers/usb/phy/phy-msm-usb.c
index 3d0dd2f..8bc3403 100644
--- a/drivers/usb/phy/phy-msm-usb.c
+++ b/drivers/usb/phy/phy-msm-usb.c
@@ -1261,6 +1261,7 @@ static void msm_chg_detect_work(struct work_struct *w)
/* fall through */
case USB_CHG_STATE_SECONDARY_DONE:
motg->chg_state = USB_CHG_STATE_DETECTED;
+   /* fall through */
case USB_CHG_STATE_DETECTED:
msm_chg_block_off(motg);
dev_dbg(phy->dev, "charger = %d\n", motg->chg_type);
-- 
2.7.4



[PATCH] usb: phy: phy-msm-usb: mark expected switch fall-through

2017-10-23 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1222118
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/usb/phy/phy-msm-usb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/phy/phy-msm-usb.c b/drivers/usb/phy/phy-msm-usb.c
index 3d0dd2f..8bc3403 100644
--- a/drivers/usb/phy/phy-msm-usb.c
+++ b/drivers/usb/phy/phy-msm-usb.c
@@ -1261,6 +1261,7 @@ static void msm_chg_detect_work(struct work_struct *w)
/* fall through */
case USB_CHG_STATE_SECONDARY_DONE:
motg->chg_state = USB_CHG_STATE_DETECTED;
+   /* fall through */
case USB_CHG_STATE_DETECTED:
msm_chg_block_off(motg);
dev_dbg(phy->dev, "charger = %d\n", motg->chg_type);
-- 
2.7.4



Re: [PATCH v2 3/3] watchdog: hpwdt: add ioctl WDIOC_GETPRETIMEOUT

2017-10-23 Thread Guenter Roeck

On 10/23/2017 03:46 PM, Jerry Hoemann wrote:

Add support for WDIOC_GETPRETIMEOUT ioctl so that user applications
can determine when the NMI should arrive.

Signed-off-by: Jerry Hoemann 


I'll leave this for Wim to decide. My take is that we should not add 
functionality
to old-style drivers and convert those drivers to use the watchdog core instead.

Thanks,
Guenter


---
  drivers/watchdog/hpwdt.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index e616583..b64ce43 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -50,6 +50,7 @@
  static bool nowayout = WATCHDOG_NOWAYOUT;
  static char expect_release;
  static unsigned long hpwdt_is_open;
+static const int pretimeout = 9;
  
  static void __iomem *pci_mem_addr;		/* the PCI-memory address */

  static unsigned long __iomem *hpwdt_nmistat;
@@ -631,6 +632,12 @@ static long hpwdt_ioctl(struct file *file, unsigned int 
cmd,
}
break;
  
+	case WDIOC_GETPRETIMEOUT:

+   ret = copy_to_user(argp, , sizeof(pretimeout));
+   if (ret)
+   ret = -EFAULT;
+   break;
+
case WDIOC_SETTIMEOUT:
ret = get_user(new_margin, p);
if (ret)





Re: [PATCH v2 3/3] watchdog: hpwdt: add ioctl WDIOC_GETPRETIMEOUT

2017-10-23 Thread Guenter Roeck

On 10/23/2017 03:46 PM, Jerry Hoemann wrote:

Add support for WDIOC_GETPRETIMEOUT ioctl so that user applications
can determine when the NMI should arrive.

Signed-off-by: Jerry Hoemann 


I'll leave this for Wim to decide. My take is that we should not add 
functionality
to old-style drivers and convert those drivers to use the watchdog core instead.

Thanks,
Guenter


---
  drivers/watchdog/hpwdt.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index e616583..b64ce43 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -50,6 +50,7 @@
  static bool nowayout = WATCHDOG_NOWAYOUT;
  static char expect_release;
  static unsigned long hpwdt_is_open;
+static const int pretimeout = 9;
  
  static void __iomem *pci_mem_addr;		/* the PCI-memory address */

  static unsigned long __iomem *hpwdt_nmistat;
@@ -631,6 +632,12 @@ static long hpwdt_ioctl(struct file *file, unsigned int 
cmd,
}
break;
  
+	case WDIOC_GETPRETIMEOUT:

+   ret = copy_to_user(argp, , sizeof(pretimeout));
+   if (ret)
+   ret = -EFAULT;
+   break;
+
case WDIOC_SETTIMEOUT:
ret = get_user(new_margin, p);
if (ret)





Re: [PATCH v2 2/3] watchdog: hpwdt: Check source of NMI

2017-10-23 Thread Guenter Roeck

On 10/23/2017 03:46 PM, Jerry Hoemann wrote:

Do not claim the NMI (i.e. return NMI_DONE) if the source of
the NMI isn't the iLO watchdog or debug.

Signed-off-by: Jerry Hoemann 


Reviewed-by: Guenter Roeck 


---
  drivers/watchdog/hpwdt.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 9fd869f..e616583 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -52,6 +52,7 @@
  static unsigned long hpwdt_is_open;
  
  static void __iomem *pci_mem_addr;		/* the PCI-memory address */

+static unsigned long __iomem *hpwdt_nmistat;
  static unsigned long __iomem *hpwdt_timer_reg;
  static unsigned long __iomem *hpwdt_timer_con;
  
@@ -474,6 +475,11 @@ static int hpwdt_time_left(void)

return TICKS_TO_SECS(ioread16(hpwdt_timer_reg));
  }
  
+static int hpwdt_my_nmi(void)

+{
+   return ioread8(hpwdt_nmistat) & 0x6;
+}
+
  #ifdef CONFIG_HPWDT_NMI_DECODING
  /*
   *NMI Handler
@@ -486,6 +492,9 @@ static int hpwdt_pretimeout(unsigned int ulReason, struct 
pt_regs *regs)
if (!hpwdt_nmi_decoding)
return NMI_DONE;
  
+	if ((ulReason == NMI_UNKNOWN) && !hpwdt_my_nmi())

+   return NMI_DONE;
+
spin_lock_irqsave(_lock, rom_pl);
if (!die_nmi_called && !is_icru && !is_uefi)
asminline_call(_regs, cru_rom_addr);
@@ -842,6 +851,7 @@ static int hpwdt_init_one(struct pci_dev *dev,
retval = -ENOMEM;
goto error_pci_iomap;
}
+   hpwdt_nmistat   = pci_mem_addr + 0x6e;
hpwdt_timer_reg = pci_mem_addr + 0x70;
hpwdt_timer_con = pci_mem_addr + 0x72;
  





Re: [PATCH v2 1/3] watchdog: hpwdt: SMBIOS check

2017-10-23 Thread Guenter Roeck

On 10/23/2017 03:46 PM, Jerry Hoemann wrote:

This corrects:
commit cce78da76601 ("watchdog: hpwdt: Add check for UEFI bits")

The test on HPE SMBIOS extension type 219 record "Misc Features"
bits for UEFI support is incorrect.  The definition of the Misc Features
bits in the HPE SMBIOS OEM Extensions specification (and related
firmware) was changed to use a different pair of bits to
represent UEFI supported.  Howerver, a corresponding change
to Linux was missed.

Current code/platform work because the iCRU test is working.
But purpose of cce78da766 is to ensure correct functionality
on future systems where iCRU isn't supported.

Signed-off-by: Jerry Hoemann 


Reviewed-by: Guenter Roeck 


---
  drivers/watchdog/hpwdt.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 67fbe35..9fd869f 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -700,7 +700,7 @@ static void dmi_find_icru(const struct dmi_header *dm, void 
*dummy)
smbios_proliant_ptr = (struct smbios_proliant_info *) dm;
if (smbios_proliant_ptr->misc_features & 0x01)
is_icru = 1;
-   if (smbios_proliant_ptr->misc_features & 0x408)
+   if (smbios_proliant_ptr->misc_features & 0x1400)
is_uefi = 1;
}
  }





Re: [PATCH v2 2/3] watchdog: hpwdt: Check source of NMI

2017-10-23 Thread Guenter Roeck

On 10/23/2017 03:46 PM, Jerry Hoemann wrote:

Do not claim the NMI (i.e. return NMI_DONE) if the source of
the NMI isn't the iLO watchdog or debug.

Signed-off-by: Jerry Hoemann 


Reviewed-by: Guenter Roeck 


---
  drivers/watchdog/hpwdt.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 9fd869f..e616583 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -52,6 +52,7 @@
  static unsigned long hpwdt_is_open;
  
  static void __iomem *pci_mem_addr;		/* the PCI-memory address */

+static unsigned long __iomem *hpwdt_nmistat;
  static unsigned long __iomem *hpwdt_timer_reg;
  static unsigned long __iomem *hpwdt_timer_con;
  
@@ -474,6 +475,11 @@ static int hpwdt_time_left(void)

return TICKS_TO_SECS(ioread16(hpwdt_timer_reg));
  }
  
+static int hpwdt_my_nmi(void)

+{
+   return ioread8(hpwdt_nmistat) & 0x6;
+}
+
  #ifdef CONFIG_HPWDT_NMI_DECODING
  /*
   *NMI Handler
@@ -486,6 +492,9 @@ static int hpwdt_pretimeout(unsigned int ulReason, struct 
pt_regs *regs)
if (!hpwdt_nmi_decoding)
return NMI_DONE;
  
+	if ((ulReason == NMI_UNKNOWN) && !hpwdt_my_nmi())

+   return NMI_DONE;
+
spin_lock_irqsave(_lock, rom_pl);
if (!die_nmi_called && !is_icru && !is_uefi)
asminline_call(_regs, cru_rom_addr);
@@ -842,6 +851,7 @@ static int hpwdt_init_one(struct pci_dev *dev,
retval = -ENOMEM;
goto error_pci_iomap;
}
+   hpwdt_nmistat   = pci_mem_addr + 0x6e;
hpwdt_timer_reg = pci_mem_addr + 0x70;
hpwdt_timer_con = pci_mem_addr + 0x72;
  





Re: [PATCH v2 1/3] watchdog: hpwdt: SMBIOS check

2017-10-23 Thread Guenter Roeck

On 10/23/2017 03:46 PM, Jerry Hoemann wrote:

This corrects:
commit cce78da76601 ("watchdog: hpwdt: Add check for UEFI bits")

The test on HPE SMBIOS extension type 219 record "Misc Features"
bits for UEFI support is incorrect.  The definition of the Misc Features
bits in the HPE SMBIOS OEM Extensions specification (and related
firmware) was changed to use a different pair of bits to
represent UEFI supported.  Howerver, a corresponding change
to Linux was missed.

Current code/platform work because the iCRU test is working.
But purpose of cce78da766 is to ensure correct functionality
on future systems where iCRU isn't supported.

Signed-off-by: Jerry Hoemann 


Reviewed-by: Guenter Roeck 


---
  drivers/watchdog/hpwdt.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 67fbe35..9fd869f 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -700,7 +700,7 @@ static void dmi_find_icru(const struct dmi_header *dm, void 
*dummy)
smbios_proliant_ptr = (struct smbios_proliant_info *) dm;
if (smbios_proliant_ptr->misc_features & 0x01)
is_icru = 1;
-   if (smbios_proliant_ptr->misc_features & 0x408)
+   if (smbios_proliant_ptr->misc_features & 0x1400)
is_uefi = 1;
}
  }





[PATCH] usb: gadget: serial: mark expected switch fall-through

2017-10-23 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1350962
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/usb/gadget/function/u_serial.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/gadget/function/u_serial.c 
b/drivers/usb/gadget/function/u_serial.c
index 4176216..961457e 100644
--- a/drivers/usb/gadget/function/u_serial.c
+++ b/drivers/usb/gadget/function/u_serial.c
@@ -1078,6 +1078,7 @@ static void gs_complete_out(struct usb_ep *ep, struct 
usb_request *req)
default:
pr_warn("%s: unexpected %s status %d\n",
__func__, ep->name, req->status);
+   /* fall through */
case 0:
/* normal completion */
spin_lock(>con_lock);
-- 
2.7.4



[PATCH] usb: gadget: serial: mark expected switch fall-through

2017-10-23 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1350962
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/usb/gadget/function/u_serial.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/gadget/function/u_serial.c 
b/drivers/usb/gadget/function/u_serial.c
index 4176216..961457e 100644
--- a/drivers/usb/gadget/function/u_serial.c
+++ b/drivers/usb/gadget/function/u_serial.c
@@ -1078,6 +1078,7 @@ static void gs_complete_out(struct usb_ep *ep, struct 
usb_request *req)
default:
pr_warn("%s: unexpected %s status %d\n",
__func__, ep->name, req->status);
+   /* fall through */
case 0:
/* normal completion */
spin_lock(>con_lock);
-- 
2.7.4



[PATCH] usb: musb_core: mark expected switch fall-through

2017-10-23 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1397608
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/usb/musb/musb_core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/musb/musb_core.c b/drivers/usb/musb/musb_core.c
index ff5a1a8..889ca9b 100644
--- a/drivers/usb/musb/musb_core.c
+++ b/drivers/usb/musb/musb_core.c
@@ -767,6 +767,7 @@ static irqreturn_t musb_stage0_irq(struct musb *musb, u8 
int_usb,
case OTG_STATE_B_IDLE:
if (!musb->is_active)
break;
+   /* fall through */
case OTG_STATE_B_PERIPHERAL:
musb_g_suspend(musb);
musb->is_active = musb->g.b_hnp_enable;
-- 
2.7.4



[PATCH] usb: musb_core: mark expected switch fall-through

2017-10-23 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1397608
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/usb/musb/musb_core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/musb/musb_core.c b/drivers/usb/musb/musb_core.c
index ff5a1a8..889ca9b 100644
--- a/drivers/usb/musb/musb_core.c
+++ b/drivers/usb/musb/musb_core.c
@@ -767,6 +767,7 @@ static irqreturn_t musb_stage0_irq(struct musb *musb, u8 
int_usb,
case OTG_STATE_B_IDLE:
if (!musb->is_active)
break;
+   /* fall through */
case OTG_STATE_B_PERIPHERAL:
musb_g_suspend(musb);
musb->is_active = musb->g.b_hnp_enable;
-- 
2.7.4



[PATCH] phy: phy-mtk-tphy: use auto instead of force to bypass utmi signals

2017-10-23 Thread Chunfeng Yun
When system is running, if usb2 phy is forced to bypass utmi signals,
all PLL will be turned off, and it can't detect device connection
anymore, so replace force mode with auto mode which can bypass utmi
signals automatically if no device attached for normal flow.
But keep the force mode to fix RX sensitivity degradation issue.

Signed-off-by: Chunfeng Yun 
---
 drivers/phy/mediatek/phy-mtk-tphy.c |   19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/phy/mediatek/phy-mtk-tphy.c 
b/drivers/phy/mediatek/phy-mtk-tphy.c
index 402385f..54cc44b 100644
--- a/drivers/phy/mediatek/phy-mtk-tphy.c
+++ b/drivers/phy/mediatek/phy-mtk-tphy.c
@@ -440,9 +440,9 @@ static void u2_phy_instance_init(struct mtk_tphy *tphy,
u32 index = instance->index;
u32 tmp;
 
-   /* switch to USB function. (system register, force ip into usb mode) */
+   /* switch to USB function, and enable usb pll */
tmp = readl(com + U3P_U2PHYDTM0);
-   tmp &= ~P2C_FORCE_UART_EN;
+   tmp &= ~(P2C_FORCE_UART_EN | P2C_FORCE_SUSPENDM);
tmp |= P2C_RG_XCVRSEL_VAL(1) | P2C_RG_DATAIN_VAL(0);
writel(tmp, com + U3P_U2PHYDTM0);
 
@@ -502,10 +502,8 @@ static void u2_phy_instance_power_on(struct mtk_tphy *tphy,
u32 index = instance->index;
u32 tmp;
 
-   /* (force_suspendm=0) (let suspendm=1, enable usb 480MHz pll) */
tmp = readl(com + U3P_U2PHYDTM0);
-   tmp &= ~(P2C_FORCE_SUSPENDM | P2C_RG_XCVRSEL);
-   tmp &= ~(P2C_RG_DATAIN | P2C_DTM0_PART_MASK);
+   tmp &= ~(P2C_RG_XCVRSEL | P2C_RG_DATAIN | P2C_DTM0_PART_MASK);
writel(tmp, com + U3P_U2PHYDTM0);
 
/* OTG Enable */
@@ -540,7 +538,6 @@ static void u2_phy_instance_power_off(struct mtk_tphy *tphy,
 
tmp = readl(com + U3P_U2PHYDTM0);
tmp &= ~(P2C_RG_XCVRSEL | P2C_RG_DATAIN);
-   tmp |= P2C_FORCE_SUSPENDM;
writel(tmp, com + U3P_U2PHYDTM0);
 
/* OTG Disable */
@@ -548,18 +545,16 @@ static void u2_phy_instance_power_off(struct mtk_tphy 
*tphy,
tmp &= ~PA6_RG_U2_OTG_VBUSCMP_EN;
writel(tmp, com + U3P_USBPHYACR6);
 
-   /* let suspendm=0, set utmi into analog power down */
-   tmp = readl(com + U3P_U2PHYDTM0);
-   tmp &= ~P2C_RG_SUSPENDM;
-   writel(tmp, com + U3P_U2PHYDTM0);
-   udelay(1);
-
tmp = readl(com + U3P_U2PHYDTM1);
tmp &= ~(P2C_RG_VBUSVALID | P2C_RG_AVALID);
tmp |= P2C_RG_SESSEND;
writel(tmp, com + U3P_U2PHYDTM1);
 
if (tphy->pdata->avoid_rx_sen_degradation && index) {
+   tmp = readl(com + U3P_U2PHYDTM0);
+   tmp &= ~(P2C_RG_SUSPENDM | P2C_FORCE_SUSPENDM);
+   writel(tmp, com + U3P_U2PHYDTM0);
+
tmp = readl(com + U3D_U2PHYDCR0);
tmp &= ~P2C_RG_SIF_U2PLL_FORCE_ON;
writel(tmp, com + U3D_U2PHYDCR0);
-- 
1.7.9.5



[PATCH] phy: phy-mtk-tphy: use auto instead of force to bypass utmi signals

2017-10-23 Thread Chunfeng Yun
When system is running, if usb2 phy is forced to bypass utmi signals,
all PLL will be turned off, and it can't detect device connection
anymore, so replace force mode with auto mode which can bypass utmi
signals automatically if no device attached for normal flow.
But keep the force mode to fix RX sensitivity degradation issue.

Signed-off-by: Chunfeng Yun 
---
 drivers/phy/mediatek/phy-mtk-tphy.c |   19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/phy/mediatek/phy-mtk-tphy.c 
b/drivers/phy/mediatek/phy-mtk-tphy.c
index 402385f..54cc44b 100644
--- a/drivers/phy/mediatek/phy-mtk-tphy.c
+++ b/drivers/phy/mediatek/phy-mtk-tphy.c
@@ -440,9 +440,9 @@ static void u2_phy_instance_init(struct mtk_tphy *tphy,
u32 index = instance->index;
u32 tmp;
 
-   /* switch to USB function. (system register, force ip into usb mode) */
+   /* switch to USB function, and enable usb pll */
tmp = readl(com + U3P_U2PHYDTM0);
-   tmp &= ~P2C_FORCE_UART_EN;
+   tmp &= ~(P2C_FORCE_UART_EN | P2C_FORCE_SUSPENDM);
tmp |= P2C_RG_XCVRSEL_VAL(1) | P2C_RG_DATAIN_VAL(0);
writel(tmp, com + U3P_U2PHYDTM0);
 
@@ -502,10 +502,8 @@ static void u2_phy_instance_power_on(struct mtk_tphy *tphy,
u32 index = instance->index;
u32 tmp;
 
-   /* (force_suspendm=0) (let suspendm=1, enable usb 480MHz pll) */
tmp = readl(com + U3P_U2PHYDTM0);
-   tmp &= ~(P2C_FORCE_SUSPENDM | P2C_RG_XCVRSEL);
-   tmp &= ~(P2C_RG_DATAIN | P2C_DTM0_PART_MASK);
+   tmp &= ~(P2C_RG_XCVRSEL | P2C_RG_DATAIN | P2C_DTM0_PART_MASK);
writel(tmp, com + U3P_U2PHYDTM0);
 
/* OTG Enable */
@@ -540,7 +538,6 @@ static void u2_phy_instance_power_off(struct mtk_tphy *tphy,
 
tmp = readl(com + U3P_U2PHYDTM0);
tmp &= ~(P2C_RG_XCVRSEL | P2C_RG_DATAIN);
-   tmp |= P2C_FORCE_SUSPENDM;
writel(tmp, com + U3P_U2PHYDTM0);
 
/* OTG Disable */
@@ -548,18 +545,16 @@ static void u2_phy_instance_power_off(struct mtk_tphy 
*tphy,
tmp &= ~PA6_RG_U2_OTG_VBUSCMP_EN;
writel(tmp, com + U3P_USBPHYACR6);
 
-   /* let suspendm=0, set utmi into analog power down */
-   tmp = readl(com + U3P_U2PHYDTM0);
-   tmp &= ~P2C_RG_SUSPENDM;
-   writel(tmp, com + U3P_U2PHYDTM0);
-   udelay(1);
-
tmp = readl(com + U3P_U2PHYDTM1);
tmp &= ~(P2C_RG_VBUSVALID | P2C_RG_AVALID);
tmp |= P2C_RG_SESSEND;
writel(tmp, com + U3P_U2PHYDTM1);
 
if (tphy->pdata->avoid_rx_sen_degradation && index) {
+   tmp = readl(com + U3P_U2PHYDTM0);
+   tmp &= ~(P2C_RG_SUSPENDM | P2C_FORCE_SUSPENDM);
+   writel(tmp, com + U3P_U2PHYDTM0);
+
tmp = readl(com + U3D_U2PHYDCR0);
tmp &= ~P2C_RG_SIF_U2PLL_FORCE_ON;
writel(tmp, com + U3D_U2PHYDCR0);
-- 
1.7.9.5



[PATCH] usb: image: mdc800: mark expected switch fall-through

2017-10-23 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/usb/image/mdc800.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/image/mdc800.c b/drivers/usb/image/mdc800.c
index e92540a..185c4e2 100644
--- a/drivers/usb/image/mdc800.c
+++ b/drivers/usb/image/mdc800.c
@@ -893,6 +893,7 @@ static ssize_t mdc800_device_write (struct file *file, 
const char __user *buf, s
return -EIO;
}
mdc800->pic_len=-1;
+   /* fall through */
 
case 0x09: /* Download Thumbnail */
mdc800->download_left=answersize+64;
-- 
2.7.4



[PATCH] usb: image: mdc800: mark expected switch fall-through

2017-10-23 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/usb/image/mdc800.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/image/mdc800.c b/drivers/usb/image/mdc800.c
index e92540a..185c4e2 100644
--- a/drivers/usb/image/mdc800.c
+++ b/drivers/usb/image/mdc800.c
@@ -893,6 +893,7 @@ static ssize_t mdc800_device_write (struct file *file, 
const char __user *buf, s
return -EIO;
}
mdc800->pic_len=-1;
+   /* fall through */
 
case 0x09: /* Download Thumbnail */
mdc800->download_left=answersize+64;
-- 
2.7.4



492b95e597 ("rcuperf: Set more user-friendly defaults"): WARNING: CPU: 0 PID: 1 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule

2017-10-23 Thread Fengguang Wu
Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

commit 492b95e59735998312f678d77a2d5fe20af6b0b9
Author: Paul E. McKenney 
AuthorDate: Fri Apr 21 16:09:15 2017 -0700
Commit: Paul E. McKenney 
CommitDate: Thu Jun 8 08:25:31 2017 -0700

 rcuperf: Set more user-friendly defaults
 
 Common-case use of rcuperf must set rcuperf.nreaders=0 and if not built
 as a module, rcuperf.shutdown.  This commit therefore sets the default
 for rcuperf.nreaders to zero and sets the default for rcuperf.shutdown
 to zero if rcuperf is built as a module and to one otherwise.
 
 Signed-off-by: Paul E. McKenney 

3ddf20c953  srcu: Shrink Tiny SRCU a bit more
492b95e597  rcuperf: Set more user-friendly defaults
bb176f6709  Linux 4.14-rc6
36ef71cae3  Add linux-next specific files for 20171018
+--+++---+---+
|  | 3ddf20c953 | 
492b95e597 | v4.14-rc6 | next-20171018 |
+--+++---+---+
| boot_successes   | 910| 
21 | 199   | 60|
| boot_failures| 0  | 9 
 | 111   | 21|
| WARNING:at_arch/x86/kernel/smp.c:#native_smp_send_reschedule | 0  | 9 
 | 111   | 21|
+--+++---+---+

[   29.173654] augmented rbtree testing
[   32.263469]  -> 83223 cycles
[   32.540332] gpio_it87: no device
[   32.541422] sched: Unexpected reschedule of offline CPU#1!
[   32.542355] [ cut here ]
[   32.543057] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/smp.c:128 
native_smp_send_reschedule+0x83/0xbb
[   32.544646] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.12.0-rc4-00028-g492b95e #1
[   32.545397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[   32.546381] task: 88001e2e task.stack: 88001e2e8000
[   32.547031] RIP: 0010:native_smp_send_reschedule+0x83/0xbb
[   32.547642] RSP: :88001e2ebca0 EFLAGS: 00010086
[   32.548225] RAX: 002e RBX: 0003 RCX: 
[   32.548989] RDX: 002e0001 RSI:  RDI: 0096
[   32.550034] RBP: 88001e2ebcb8 R08: 0003 R09: 
[   32.551039] R10: 88001e2e0758 R11: 00021001 R12: 0001
[   32.551996] R13:  R14: 88001e38 R15: 88001e50
[   32.552727] FS:  () GS:88001e40() 
knlGS:
[   32.553585] CS:  0010 DS:  ES:  CR0: 80050033
[   32.554227] CR2:  CR3: 0260f000 CR4: 06b0
[   32.555288] Call Trace:
[   32.555687]  smp_send_reschedule+0xa/0xc
[   32.556313]  try_to_wake_up+0x27e/0x2e4
[   32.556870]  wake_up_process+0x15/0x17
[   32.557374]  wake_up_worker+0x5f/0x6a
[   32.557982]  pwq_adjust_max_active+0x154/0x191
[   32.558444]  link_pwq+0xc3/0x10a
[   32.558828]  __alloc_workqueue_key+0x325/0x695
[   32.559288]  ? ftrace_likely_update+0x78/0x81
[   32.559810]  ? pci_epf_init+0x52/0x52
[   32.560222]  ? do_early_param+0xfd/0xfd
[   32.560643]  pci_epf_test_init+0x30/0x7c
[   32.561063]  ? pci_epf_init+0x52/0x52
[   32.561431]  do_one_initcall+0xbc/0x1f7
[   32.562099]  ? do_early_param+0xfd/0xfd
[   32.562760]  kernel_init_freeable+0x13f/0x233
[   32.563497]  ? rest_init+0xd6/0xd6
[   32.564046]  kernel_init+0x14/0x1c5
[   32.564565]  ? rest_init+0xd6/0xd6
[   32.564987]  ret_from_fork+0x2a/0x40
[   32.565345] Code: 00 00 31 d2 be 01 00 00 00 48 c7 c7 88 40 7d 82 e8 8c 48 
18 00 44 89 e6 48 c7 c7 5a 68 3d 82 48 ff 05 70 78 87 01 e8 45 a5 1b 00 <0f> ff 
b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 58 40 7d 82 
[   32.568020] ---[ end trace 6590e27884081b3a ]---
[   32.570056] VIA Graphics Integration Chipset framebuffer 2.4 initializing

   # HH:MM RESULT GOOD 
BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD
git bisect start v4.13 v4.12 --
git bisect  bad 63a86362130f4c17eaa57f3ef5171ec43111a54e  # 14:39  B 16 
10   6  Merge tag 'pm-4.13-rc2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect  bad 090a81d8766e21d33ab3e4d24e6c8e5eedf086dd  # 14:44  B 11 
50   7  Merge branch 'for-spi' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect  bad 3bad2f1c676581d01e7645eb03e9b27e28b0a92e  # 16:22  B 45 
27  13  Merge branch 

492b95e597 ("rcuperf: Set more user-friendly defaults"): WARNING: CPU: 0 PID: 1 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule

2017-10-23 Thread Fengguang Wu
Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

commit 492b95e59735998312f678d77a2d5fe20af6b0b9
Author: Paul E. McKenney 
AuthorDate: Fri Apr 21 16:09:15 2017 -0700
Commit: Paul E. McKenney 
CommitDate: Thu Jun 8 08:25:31 2017 -0700

 rcuperf: Set more user-friendly defaults
 
 Common-case use of rcuperf must set rcuperf.nreaders=0 and if not built
 as a module, rcuperf.shutdown.  This commit therefore sets the default
 for rcuperf.nreaders to zero and sets the default for rcuperf.shutdown
 to zero if rcuperf is built as a module and to one otherwise.
 
 Signed-off-by: Paul E. McKenney 

3ddf20c953  srcu: Shrink Tiny SRCU a bit more
492b95e597  rcuperf: Set more user-friendly defaults
bb176f6709  Linux 4.14-rc6
36ef71cae3  Add linux-next specific files for 20171018
+--+++---+---+
|  | 3ddf20c953 | 
492b95e597 | v4.14-rc6 | next-20171018 |
+--+++---+---+
| boot_successes   | 910| 
21 | 199   | 60|
| boot_failures| 0  | 9 
 | 111   | 21|
| WARNING:at_arch/x86/kernel/smp.c:#native_smp_send_reschedule | 0  | 9 
 | 111   | 21|
+--+++---+---+

[   29.173654] augmented rbtree testing
[   32.263469]  -> 83223 cycles
[   32.540332] gpio_it87: no device
[   32.541422] sched: Unexpected reschedule of offline CPU#1!
[   32.542355] [ cut here ]
[   32.543057] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/smp.c:128 
native_smp_send_reschedule+0x83/0xbb
[   32.544646] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.12.0-rc4-00028-g492b95e #1
[   32.545397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[   32.546381] task: 88001e2e task.stack: 88001e2e8000
[   32.547031] RIP: 0010:native_smp_send_reschedule+0x83/0xbb
[   32.547642] RSP: :88001e2ebca0 EFLAGS: 00010086
[   32.548225] RAX: 002e RBX: 0003 RCX: 
[   32.548989] RDX: 002e0001 RSI:  RDI: 0096
[   32.550034] RBP: 88001e2ebcb8 R08: 0003 R09: 
[   32.551039] R10: 88001e2e0758 R11: 00021001 R12: 0001
[   32.551996] R13:  R14: 88001e38 R15: 88001e50
[   32.552727] FS:  () GS:88001e40() 
knlGS:
[   32.553585] CS:  0010 DS:  ES:  CR0: 80050033
[   32.554227] CR2:  CR3: 0260f000 CR4: 06b0
[   32.555288] Call Trace:
[   32.555687]  smp_send_reschedule+0xa/0xc
[   32.556313]  try_to_wake_up+0x27e/0x2e4
[   32.556870]  wake_up_process+0x15/0x17
[   32.557374]  wake_up_worker+0x5f/0x6a
[   32.557982]  pwq_adjust_max_active+0x154/0x191
[   32.558444]  link_pwq+0xc3/0x10a
[   32.558828]  __alloc_workqueue_key+0x325/0x695
[   32.559288]  ? ftrace_likely_update+0x78/0x81
[   32.559810]  ? pci_epf_init+0x52/0x52
[   32.560222]  ? do_early_param+0xfd/0xfd
[   32.560643]  pci_epf_test_init+0x30/0x7c
[   32.561063]  ? pci_epf_init+0x52/0x52
[   32.561431]  do_one_initcall+0xbc/0x1f7
[   32.562099]  ? do_early_param+0xfd/0xfd
[   32.562760]  kernel_init_freeable+0x13f/0x233
[   32.563497]  ? rest_init+0xd6/0xd6
[   32.564046]  kernel_init+0x14/0x1c5
[   32.564565]  ? rest_init+0xd6/0xd6
[   32.564987]  ret_from_fork+0x2a/0x40
[   32.565345] Code: 00 00 31 d2 be 01 00 00 00 48 c7 c7 88 40 7d 82 e8 8c 48 
18 00 44 89 e6 48 c7 c7 5a 68 3d 82 48 ff 05 70 78 87 01 e8 45 a5 1b 00 <0f> ff 
b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 58 40 7d 82 
[   32.568020] ---[ end trace 6590e27884081b3a ]---
[   32.570056] VIA Graphics Integration Chipset framebuffer 2.4 initializing

   # HH:MM RESULT GOOD 
BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD
git bisect start v4.13 v4.12 --
git bisect  bad 63a86362130f4c17eaa57f3ef5171ec43111a54e  # 14:39  B 16 
10   6  Merge tag 'pm-4.13-rc2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect  bad 090a81d8766e21d33ab3e4d24e6c8e5eedf086dd  # 14:44  B 11 
50   7  Merge branch 'for-spi' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect  bad 3bad2f1c676581d01e7645eb03e9b27e28b0a92e  # 16:22  B 45 
27  13  Merge branch 'work.misc-set_fs' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect  

[PATCH -mm] mm, swap: Fix false error message in __swp_swapcount()

2017-10-23 Thread Huang, Ying
From: Ying Huang 

__swp_swapcount() is used in __read_swap_cache_async().  Where the
invalid swap entry (offset > max) may be supplied during swap
readahead.  But __swp_swapcount() will print error message for these
expected invalid swap entry as below, which will make the users
confusing.

  swap_info_get: Bad swap offset entry 0200f8a7

So the swap entry checking code in __swp_swapcount() is changed to
avoid printing error message for it.  To avoid to duplicate code with
__swap_duplicate(), a new helper function named
__swap_info_get_silence() is added and invoked in both places.

Cc: Tim Chen 
Cc: Minchan Kim 
Cc: Michal Hocko 
Cc:  # 4.11-4.13
Reported-by: Christian Kujau 
Fixes: e8c26ab60598 ("mm/swap: skip readahead for unreferenced swap slots")
Signed-off-by: "Huang, Ying" 
---
 mm/swapfile.c | 42 --
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 3074b02eaa09..3193aa670c90 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1107,6 +1107,30 @@ static struct swap_info_struct 
*swap_info_get_cont(swp_entry_t entry,
return p;
 }
 
+static struct swap_info_struct *__swap_info_get_silence(swp_entry_t entry)
+{
+   struct swap_info_struct *p;
+   unsigned long offset, type;
+
+   if (non_swap_entry(entry))
+   goto out;
+
+   type = swp_type(entry);
+   if (type >= nr_swapfiles)
+   goto bad_file;
+   p = swap_info[type];
+   offset = swp_offset(entry);
+   if (unlikely(offset >= p->max))
+   goto out;
+
+   return p;
+
+bad_file:
+   pr_err("swap_info_get_silence: %s%08lx\n", Bad_file, entry.val);
+out:
+   return NULL;
+}
+
 static unsigned char __swap_entry_free(struct swap_info_struct *p,
   swp_entry_t entry, unsigned char usage)
 {
@@ -1357,7 +1381,7 @@ int __swp_swapcount(swp_entry_t entry)
int count = 0;
struct swap_info_struct *si;
 
-   si = __swap_info_get(entry);
+   si = __swap_info_get_silence(entry);
if (si)
count = swap_swapcount(si, entry);
return count;
@@ -3356,22 +3380,16 @@ static int __swap_duplicate(swp_entry_t entry, unsigned 
char usage)
 {
struct swap_info_struct *p;
struct swap_cluster_info *ci;
-   unsigned long offset, type;
+   unsigned long offset;
unsigned char count;
unsigned char has_cache;
int err = -EINVAL;
 
-   if (non_swap_entry(entry))
+   p = __swap_info_get_silence(entry);
+   if (!p)
goto out;
 
-   type = swp_type(entry);
-   if (type >= nr_swapfiles)
-   goto bad_file;
-   p = swap_info[type];
offset = swp_offset(entry);
-   if (unlikely(offset >= p->max))
-   goto out;
-
ci = lock_cluster_or_swap_info(p, offset);
 
count = p->swap_map[offset];
@@ -3418,10 +3436,6 @@ static int __swap_duplicate(swp_entry_t entry, unsigned 
char usage)
unlock_cluster_or_swap_info(p, ci);
 out:
return err;
-
-bad_file:
-   pr_err("swap_dup: %s%08lx\n", Bad_file, entry.val);
-   goto out;
 }
 
 /*
-- 
2.14.2



[PATCH -mm] mm, swap: Fix false error message in __swp_swapcount()

2017-10-23 Thread Huang, Ying
From: Ying Huang 

__swp_swapcount() is used in __read_swap_cache_async().  Where the
invalid swap entry (offset > max) may be supplied during swap
readahead.  But __swp_swapcount() will print error message for these
expected invalid swap entry as below, which will make the users
confusing.

  swap_info_get: Bad swap offset entry 0200f8a7

So the swap entry checking code in __swp_swapcount() is changed to
avoid printing error message for it.  To avoid to duplicate code with
__swap_duplicate(), a new helper function named
__swap_info_get_silence() is added and invoked in both places.

Cc: Tim Chen 
Cc: Minchan Kim 
Cc: Michal Hocko 
Cc:  # 4.11-4.13
Reported-by: Christian Kujau 
Fixes: e8c26ab60598 ("mm/swap: skip readahead for unreferenced swap slots")
Signed-off-by: "Huang, Ying" 
---
 mm/swapfile.c | 42 --
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 3074b02eaa09..3193aa670c90 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1107,6 +1107,30 @@ static struct swap_info_struct 
*swap_info_get_cont(swp_entry_t entry,
return p;
 }
 
+static struct swap_info_struct *__swap_info_get_silence(swp_entry_t entry)
+{
+   struct swap_info_struct *p;
+   unsigned long offset, type;
+
+   if (non_swap_entry(entry))
+   goto out;
+
+   type = swp_type(entry);
+   if (type >= nr_swapfiles)
+   goto bad_file;
+   p = swap_info[type];
+   offset = swp_offset(entry);
+   if (unlikely(offset >= p->max))
+   goto out;
+
+   return p;
+
+bad_file:
+   pr_err("swap_info_get_silence: %s%08lx\n", Bad_file, entry.val);
+out:
+   return NULL;
+}
+
 static unsigned char __swap_entry_free(struct swap_info_struct *p,
   swp_entry_t entry, unsigned char usage)
 {
@@ -1357,7 +1381,7 @@ int __swp_swapcount(swp_entry_t entry)
int count = 0;
struct swap_info_struct *si;
 
-   si = __swap_info_get(entry);
+   si = __swap_info_get_silence(entry);
if (si)
count = swap_swapcount(si, entry);
return count;
@@ -3356,22 +3380,16 @@ static int __swap_duplicate(swp_entry_t entry, unsigned 
char usage)
 {
struct swap_info_struct *p;
struct swap_cluster_info *ci;
-   unsigned long offset, type;
+   unsigned long offset;
unsigned char count;
unsigned char has_cache;
int err = -EINVAL;
 
-   if (non_swap_entry(entry))
+   p = __swap_info_get_silence(entry);
+   if (!p)
goto out;
 
-   type = swp_type(entry);
-   if (type >= nr_swapfiles)
-   goto bad_file;
-   p = swap_info[type];
offset = swp_offset(entry);
-   if (unlikely(offset >= p->max))
-   goto out;
-
ci = lock_cluster_or_swap_info(p, offset);
 
count = p->swap_map[offset];
@@ -3418,10 +3436,6 @@ static int __swap_duplicate(swp_entry_t entry, unsigned 
char usage)
unlock_cluster_or_swap_info(p, ci);
 out:
return err;
-
-bad_file:
-   pr_err("swap_dup: %s%08lx\n", Bad_file, entry.val);
-   goto out;
 }
 
 /*
-- 
2.14.2



ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!

2017-10-23 Thread Fengguang Wu
2.052366]  ? mmap_region+0x248/0x480
[2.052366]  ? mmap_region+0x2d2/0x480
[2.052366]  ? do_mmap+0x2c5/0x3a0
[2.052366]  ? vm_mmap_pgoff+0x8f/0xb0
[2.052366]  ? SyS_mmap_pgoff+0x1e7/0x210
[2.052366]  ? do_int80_syscall_32+0x76/0x130
[2.052366]  ? entry_INT80_32+0x33/0x33
[2.052366] Code: 00 00 00 a1 60 0e be c8 05 00 00 80 00 39 c2 72 bb a1 78 
94 30 c8 2d 00 b0 78 00 25 00 00 e0 ff 2d 00 20 00 00 39 c2 73 a3 0f 0b <0f> 0b 
8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 53 e8
[2.052366] EIP: __phys_addr+0x80/0x90 SS:ESP: 0068:cd915e58
[2.058327] ---[ end trace 51b6b410d44658b1 ]---
[2.058607] Kernel panic - not syncing: Fatal exception

   # HH:MM RESULT GOOD 
BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD
git bisect start 13769afc2a5ef8e2d19b0b1486bf8ae08caf9f4b 
33d930e59a98fa10a0db9f56c7fa2f21a4aef9b9 --
git bisect good 7021889c264abc7a4eef71cb0586f76a22091658  # 16:32  G 10 
00   0  Merge 'sailus-media/atomisp' into devel-spot-201710231057
git bisect  bad 56b2129ddeae19f6a20494b88b61eaba91e519b5  # 17:00  B  0 
8   20   0  Merge 
'linux-review/Aishwarya-Pant/coccinelle-boolconv-improve-script-to-handle-more-cases/20171022-210918'
 into devel-spot-201710231057
git bisect  bad c7d414af43141682ee0b828bd71d1d9cc190f1bd  # 17:22  B  0
11   24   0  Merge 'f2fs/dev-test' into devel-spot-201710231057
git bisect good 89630c8626339b2ec6368ac195237c2ebea3ca23  # 17:47  G 10 
00   0  Merge 'jpirko-mlxsw/jiri_devel_miniq' into devel-spot-201710231057
git bisect good dbf5855b11e4857696b24d9f621aaf1d4ad35dc2  # 18:04  G 10 
00   0  Merge 
'linux-review/SF-Markus-Elfring/gpio-adnp-Use-common-error-handling-code-in-adnp_gpio_dbg_show/20171023-043514'
 into devel-spot-201710231057
git bisect  bad a0831a3f7f72d8ce846ffd2ff7ea73b88a59da17  # 18:42  B  0
11   24   0  Merge 
'linux-review/SF-Markus-Elfring/dmaengine-ioat-Use-common-error-handling-code-in-ioat_xor_val_self_test/20171023-032235'
 into devel-spot-201710231057
git bisect good 085cf9bfc92a20a7297468f01e868cf2a4f6f4c3  # 19:00  G 10 
00   0  Merge branch 'perf-urgent-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect  bad ce56a86e2ade45d052b3228cdfebe913a1ae7381  # 19:14  B  0
11   23   0  x86/mm: Limit mmap() of /dev/mem to valid physical addresses
git bisect good 723f2828a98c8ca19842042f418fb30dd8cfc0f7  # 19:47  G 10 
00   0  x86/microcode/intel: Disable late loading on model 79
git bisect good 4e57b94664fef55aa71cac33b4632fdfdd52b695  # 20:11  G 10 
00   0  x86/mm: Tidy up "x86/mm: Flush more aggressively in lazy TLB mode"
git bisect good 7ac7f2c315ef76437f5119df354d334448534fb5  # 20:49  G 10 
00   0  x86/mm: Remove debug/x86/tlb_defer_switch_to_init_mm
# first bad commit: [ce56a86e2ade45d052b3228cdfebe913a1ae7381] x86/mm: Limit 
mmap() of /dev/mem to valid physical addresses
git bisect good 7ac7f2c315ef76437f5119df354d334448534fb5  # 21:14  G 30 
00   0  x86/mm: Remove debug/x86/tlb_defer_switch_to_init_mm
# extra tests with CONFIG_DEBUG_INFO_REDUCED
git bisect  bad ce56a86e2ade45d052b3228cdfebe913a1ae7381  # 21:39  B  0
11   24   0  x86/mm: Limit mmap() of /dev/mem to valid physical addresses
# extra tests on HEAD of linux-devel/devel-spot-201710231057
git bisect  bad 13769afc2a5ef8e2d19b0b1486bf8ae08caf9f4b  # 21:39  B  0
12   27   0  0day head guard for 'devel-spot-201710231057'
# extra tests on tree/branch linus/master
git bisect  bad bb176f67090ca54869fc1262c913aa69d2ede070  # 21:52  B  0
11   23   0  Linux 4.14-rc6
# extra tests with first bad commit reverted
git bisect good 668ce515181e53af5f88325ee13fb17d79295670  # 22:16  G 11 
00   0  Revert "x86/mm: Limit mmap() of /dev/mem to valid physical 
addresses"
# extra tests on tree/branch linux-next/master
git bisect good 36ef71cae353f88fd6e095e2aaa3e5953af1685d  # 22:45  G 10 
03  22  Add linux-next specific files for 20171018

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/lkp  Intel Corporation


dmesg-yocto-lkp-hsw01-102:20171023191152:i386-randconfig-c0-10231306:4.14.0-rc5-7-gce56a86:1.gz
Description: application/gzip
#!/bin/bash

kernel=$1
initrd=yocto-trinity-i386.cgz

wget --no-clobber 
https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd

kvm=(
qemu-system-x86_64
-enable-kvm
-cpu Haswell,+smep
-kernel $kernel
-initrd $initrd
-m 256
-smp 1
-device e1000,netdev=net0
-netdev user,id=net0
-boot order=nc
-no-reboot
-watchdog i6300esb
-watchdog-action debug
-rtc base=localtime
-serial stdio
-display none
-monitor null

ce56a86e2a ("x86/mm: Limit mmap() of /dev/mem to valid physical addresses"): kernel BUG at arch/x86/mm/physaddr.c:79!

2017-10-23 Thread Fengguang Wu
 00 20 00 00 39 c2 73 a3 0f 0b <0f> 0b 
8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 53 e8
[2.052366] EIP: __phys_addr+0x80/0x90 SS:ESP: 0068:cd915e58
[2.058327] ---[ end trace 51b6b410d44658b1 ]---
[2.058607] Kernel panic - not syncing: Fatal exception

   # HH:MM RESULT GOOD 
BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD
git bisect start 13769afc2a5ef8e2d19b0b1486bf8ae08caf9f4b 
33d930e59a98fa10a0db9f56c7fa2f21a4aef9b9 --
git bisect good 7021889c264abc7a4eef71cb0586f76a22091658  # 16:32  G 10 
00   0  Merge 'sailus-media/atomisp' into devel-spot-201710231057
git bisect  bad 56b2129ddeae19f6a20494b88b61eaba91e519b5  # 17:00  B  0 
8   20   0  Merge 
'linux-review/Aishwarya-Pant/coccinelle-boolconv-improve-script-to-handle-more-cases/20171022-210918'
 into devel-spot-201710231057
git bisect  bad c7d414af43141682ee0b828bd71d1d9cc190f1bd  # 17:22  B  0
11   24   0  Merge 'f2fs/dev-test' into devel-spot-201710231057
git bisect good 89630c8626339b2ec6368ac195237c2ebea3ca23  # 17:47  G 10 
00   0  Merge 'jpirko-mlxsw/jiri_devel_miniq' into devel-spot-201710231057
git bisect good dbf5855b11e4857696b24d9f621aaf1d4ad35dc2  # 18:04  G 10 
00   0  Merge 
'linux-review/SF-Markus-Elfring/gpio-adnp-Use-common-error-handling-code-in-adnp_gpio_dbg_show/20171023-043514'
 into devel-spot-201710231057
git bisect  bad a0831a3f7f72d8ce846ffd2ff7ea73b88a59da17  # 18:42  B  0
11   24   0  Merge 
'linux-review/SF-Markus-Elfring/dmaengine-ioat-Use-common-error-handling-code-in-ioat_xor_val_self_test/20171023-032235'
 into devel-spot-201710231057
git bisect good 085cf9bfc92a20a7297468f01e868cf2a4f6f4c3  # 19:00  G 10 
00   0  Merge branch 'perf-urgent-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect  bad ce56a86e2ade45d052b3228cdfebe913a1ae7381  # 19:14  B  0
11   23   0  x86/mm: Limit mmap() of /dev/mem to valid physical addresses
git bisect good 723f2828a98c8ca19842042f418fb30dd8cfc0f7  # 19:47  G 10 
00   0  x86/microcode/intel: Disable late loading on model 79
git bisect good 4e57b94664fef55aa71cac33b4632fdfdd52b695  # 20:11  G 10 
00   0  x86/mm: Tidy up "x86/mm: Flush more aggressively in lazy TLB mode"
git bisect good 7ac7f2c315ef76437f5119df354d334448534fb5  # 20:49  G 10 
00   0  x86/mm: Remove debug/x86/tlb_defer_switch_to_init_mm
# first bad commit: [ce56a86e2ade45d052b3228cdfebe913a1ae7381] x86/mm: Limit 
mmap() of /dev/mem to valid physical addresses
git bisect good 7ac7f2c315ef76437f5119df354d334448534fb5  # 21:14  G 30 
00   0  x86/mm: Remove debug/x86/tlb_defer_switch_to_init_mm
# extra tests with CONFIG_DEBUG_INFO_REDUCED
git bisect  bad ce56a86e2ade45d052b3228cdfebe913a1ae7381  # 21:39  B  0
11   24   0  x86/mm: Limit mmap() of /dev/mem to valid physical addresses
# extra tests on HEAD of linux-devel/devel-spot-201710231057
git bisect  bad 13769afc2a5ef8e2d19b0b1486bf8ae08caf9f4b  # 21:39  B  0
12   27   0  0day head guard for 'devel-spot-201710231057'
# extra tests on tree/branch linus/master
git bisect  bad bb176f67090ca54869fc1262c913aa69d2ede070  # 21:52  B  0
11   23   0  Linux 4.14-rc6
# extra tests with first bad commit reverted
git bisect good 668ce515181e53af5f88325ee13fb17d79295670  # 22:16  G 11 
00   0  Revert "x86/mm: Limit mmap() of /dev/mem to valid physical 
addresses"
# extra tests on tree/branch linux-next/master
git bisect good 36ef71cae353f88fd6e095e2aaa3e5953af1685d  # 22:45  G 10 
03  22  Add linux-next specific files for 20171018

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/lkp  Intel Corporation


dmesg-yocto-lkp-hsw01-102:20171023191152:i386-randconfig-c0-10231306:4.14.0-rc5-7-gce56a86:1.gz
Description: application/gzip
#!/bin/bash

kernel=$1
initrd=yocto-trinity-i386.cgz

wget --no-clobber 
https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd

kvm=(
qemu-system-x86_64
-enable-kvm
-cpu Haswell,+smep
-kernel $kernel
-initrd $initrd
-m 256
-smp 1
-device e1000,netdev=net0
-netdev user,id=net0
-boot order=nc
-no-reboot
-watchdog i6300esb
-watchdog-action debug
-rtc base=localtime
-serial stdio
-display none
-monitor null
)

append=(
root=/dev/ram0
hung_task_panic=1
debug
apic=debug
sysrq_always_enabled
rcupdate.rcu_cpu_stall_timeout=100
net.ifnames=0
printk.devkmsg=on
panic=-1
softlockup_panic=1
nmi_watchdog=panic
oops=panic
load_ramdisk=2
prompt_ramdisk=0
drbd.minor_count=8
systemd.log_level=err
ignore_log

  1   2   3   4   5   6   7   8   9   10   >