[PATCH v2] perf/core: Fix updating cgroup time with descendants

2017-09-28 Thread Lin Xiulei
From: "leilei.lin" 

This fix updating cgroup time when event is being scheduled in
by descendants

Signed-off-by: leilei.lin 
Reviewed-and-tested-by: Jiri Olsa 
---
 kernel/events/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3e691b7..e3a5e32 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -662,7 +662,7 @@ static inline void
update_cgrp_time_from_event(struct perf_event *event)
/*
 * Do not update time when cgroup is not active
 */
-   if (cgrp == event->cgrp)
+   if (cgroup_is_descendant(cgrp->css.cgroup, event->cgrp->css.cgroup))
__update_cgrp_time(event->cgrp);
 }


[PATCH v2] perf/core: Fix updating cgroup time with descendants

2017-09-28 Thread Lin Xiulei
From: "leilei.lin" 

This fix updating cgroup time when event is being scheduled in
by descendants

Signed-off-by: leilei.lin 
Reviewed-and-tested-by: Jiri Olsa 
---
 kernel/events/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3e691b7..e3a5e32 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -662,7 +662,7 @@ static inline void
update_cgrp_time_from_event(struct perf_event *event)
/*
 * Do not update time when cgroup is not active
 */
-   if (cgrp == event->cgrp)
+   if (cgroup_is_descendant(cgrp->css.cgroup, event->cgrp->css.cgroup))
__update_cgrp_time(event->cgrp);
 }


Re: [Part1 PATCH v5 16/17] X86/KVM: Decrypt shared per-cpu variables when SEV is active

2017-09-28 Thread Borislav Petkov
On Wed, Sep 27, 2017 at 10:13:28AM -0500, Brijesh Singh wrote:
> When SEV is active, guest memory is encrypted with a guest-specific key, a
> guest memory region shared with the hypervisor must be mapped as decrypted
> before we can share it.
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Borislav Petkov 
> Cc: Paolo Bonzini 
> Cc: "Radim Krčmář" 
> Cc: Tom Lendacky 
> Cc: x...@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: k...@vger.kernel.org
> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/kernel/kvm.c | 41 ++---
>  1 file changed, 38 insertions(+), 3 deletions(-)

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [Part1 PATCH v5 16/17] X86/KVM: Decrypt shared per-cpu variables when SEV is active

2017-09-28 Thread Borislav Petkov
On Wed, Sep 27, 2017 at 10:13:28AM -0500, Brijesh Singh wrote:
> When SEV is active, guest memory is encrypted with a guest-specific key, a
> guest memory region shared with the hypervisor must be mapped as decrypted
> before we can share it.
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Borislav Petkov 
> Cc: Paolo Bonzini 
> Cc: "Radim Krčmář" 
> Cc: Tom Lendacky 
> Cc: x...@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: k...@vger.kernel.org
> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/kernel/kvm.c | 41 ++---
>  1 file changed, 38 insertions(+), 3 deletions(-)

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


linux-next: Tree for Sep 29

2017-09-28 Thread Stephen Rothwell
Hi all,

News: I will not be doing linux-next releases from Setp 30 to Oct 30
(inclusive).

Changes since 20170928:

The net-next tree gained a build failure, due to in interaction with the
net tree, for which I applied a merge fix patch.

The drm tree still had its build failure for which I applied a fix patch.

The ipmi tree lost its build failure.

The akpm tree lost a patch that turned up elsewhere.

Non-merge commits (relative to Linus' tree): 2815
 2768 files changed, 96917 insertions(+), 40111 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And
finally, a simple boot test of the powerpc pseries_le_defconfig kernel
in qemu.

Below is a summary of the state of the merge.

I am currently merging 267 trees (counting Linus' and 41 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (770b782f555d Merge tag 'acpi-4.14-rc3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm)
Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi)
Merging kbuild-current/fixes (cd4175b11685 Merge branch 'parisc-4.14-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux)
Merging arc-current/for-curr (ef6c1bae4792 arc: remove redundant UTS_MACHINE 
define in arch/arc/Makefile)
Merging arm-current/fixes (746a272e4414 ARM: 8692/1: mm: abort uaccess retries 
upon fatal signal)
Merging m68k-current/for-linus (558d5ad276c9 m68k/mac: Avoid soft-lockup 
warning after mach_power_off)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (d8bd9f3f0925 powerpc: Handle MCE on POWER9 with 
only DSISR bit 30 set)
Merging sparc/master (23198ddffb6c sparc32: Add cmpxchg64().)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (9d538fa60bad net: Set sk_prot_creator when cloning sockets 
to the right proto)
Merging ipsec/master (dd269db84908 xfrm: don't call xfrm_policy_cache_flush 
under xfrm_state_lock)
Merging netfilter/master (7f4f7dd4417d netfilter: ipset: ipset list may return 
wrong member count for set with timeout)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (3e747fa18202 Merge ath-current from ath.git)
Merging mac80211/master (265698d7e613 nl80211: fix null-ptr dereference on 
invalid mesh configuration)
Merging sound-current/for-linus (bfc81a8bc18e ALSA: usb-audio: Check 
out-of-bounds access by corrupted buffer descriptor)
Merging pci-current/for-linus (9561475db680 PCI: Fix race condition with 
driver_override)
Merging driver-core.current/driver-core-linus (850fdec8d2fd driver core: remove 
DRIVER_ATTR)
Merging tty.current/tty-linus (c91261437985 serial: sccnxp: Fix error handling 
in sccnxp_probe())
Merging usb.current/usb-linus (8fec9355a968 USB: cdc-wdm: ignore -EPIPE from 
GetEncapsulatedResponse)
Merging usb-gadget-fixes/fixes (c3cdce45f8d3 usb: dwc3: of-simple: Add 
compatible for Spreadtrum SC9860 platform)
Merging usb-serial-fixes/usb-linus (c496ad835c31 USB: serial: cp210x: add 
support for ELV TFD500)
Merging usb-chipidea-fixes/ci-for-usb-stable (cbb22ebcfb99 usb: chipidea: core: 
check before accessing ci_role in ci_role_show)
Merging phy/fixes (26e03d803c81 phy: rockchip-typec: Don't set the aux voltage 
swing to 400 mV)
Merging staging.current/staging-linus (b2

linux-next: Tree for Sep 29

2017-09-28 Thread Stephen Rothwell
Hi all,

News: I will not be doing linux-next releases from Setp 30 to Oct 30
(inclusive).

Changes since 20170928:

The net-next tree gained a build failure, due to in interaction with the
net tree, for which I applied a merge fix patch.

The drm tree still had its build failure for which I applied a fix patch.

The ipmi tree lost its build failure.

The akpm tree lost a patch that turned up elsewhere.

Non-merge commits (relative to Linus' tree): 2815
 2768 files changed, 96917 insertions(+), 40111 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And
finally, a simple boot test of the powerpc pseries_le_defconfig kernel
in qemu.

Below is a summary of the state of the merge.

I am currently merging 267 trees (counting Linus' and 41 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (770b782f555d Merge tag 'acpi-4.14-rc3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm)
Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi)
Merging kbuild-current/fixes (cd4175b11685 Merge branch 'parisc-4.14-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux)
Merging arc-current/for-curr (ef6c1bae4792 arc: remove redundant UTS_MACHINE 
define in arch/arc/Makefile)
Merging arm-current/fixes (746a272e4414 ARM: 8692/1: mm: abort uaccess retries 
upon fatal signal)
Merging m68k-current/for-linus (558d5ad276c9 m68k/mac: Avoid soft-lockup 
warning after mach_power_off)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (d8bd9f3f0925 powerpc: Handle MCE on POWER9 with 
only DSISR bit 30 set)
Merging sparc/master (23198ddffb6c sparc32: Add cmpxchg64().)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (9d538fa60bad net: Set sk_prot_creator when cloning sockets 
to the right proto)
Merging ipsec/master (dd269db84908 xfrm: don't call xfrm_policy_cache_flush 
under xfrm_state_lock)
Merging netfilter/master (7f4f7dd4417d netfilter: ipset: ipset list may return 
wrong member count for set with timeout)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (3e747fa18202 Merge ath-current from ath.git)
Merging mac80211/master (265698d7e613 nl80211: fix null-ptr dereference on 
invalid mesh configuration)
Merging sound-current/for-linus (bfc81a8bc18e ALSA: usb-audio: Check 
out-of-bounds access by corrupted buffer descriptor)
Merging pci-current/for-linus (9561475db680 PCI: Fix race condition with 
driver_override)
Merging driver-core.current/driver-core-linus (850fdec8d2fd driver core: remove 
DRIVER_ATTR)
Merging tty.current/tty-linus (c91261437985 serial: sccnxp: Fix error handling 
in sccnxp_probe())
Merging usb.current/usb-linus (8fec9355a968 USB: cdc-wdm: ignore -EPIPE from 
GetEncapsulatedResponse)
Merging usb-gadget-fixes/fixes (c3cdce45f8d3 usb: dwc3: of-simple: Add 
compatible for Spreadtrum SC9860 platform)
Merging usb-serial-fixes/usb-linus (c496ad835c31 USB: serial: cp210x: add 
support for ELV TFD500)
Merging usb-chipidea-fixes/ci-for-usb-stable (cbb22ebcfb99 usb: chipidea: core: 
check before accessing ci_role in ci_role_show)
Merging phy/fixes (26e03d803c81 phy: rockchip-typec: Don't set the aux voltage 
swing to 400 mV)
Merging staging.current/staging-linus (b2

Re: DMA error when sg->offset value is greater than PAGE_SIZE in Intel IOMMU

2017-09-28 Thread Harsh Jain

On 28-09-2017 18:35, Raj, Ashok wrote:
> Thanks for trying that Harsh. 
>
> sp_off turns of super page support. Which this mode, do you still see offsets 
> greater than 4k?
Yes, offset greater than 4k is still there. Refer below.
[56732.774872] offset 4110 len 76 dma addr 3a531200e dma len 76
[56732.804187] offset 4110 len 84 dma addr 3a63b200e dma len 84
[56732.805104] offset 4110 len 68 dma addr 3a531200e dma len 68
[56732.806870] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.808987] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.811215] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.813155] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.814823] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.816481] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.818159] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.819712] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.821629] offset 4110 len 56 dma addr 3a531200e dma len 56
[root@heptagon linux_t4_build]#
[root@heptagon linux_t4_build]#
[root@heptagon linux_t4_build]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.9.51 root=UUID=ccbb7f18-b3f0-43df-89de-07521e9c02fe ro 
intel_iommu=sp_off crashkernel=auto rhgb quiet rhgb quiet console=ttyS0,115200, 
console=tty0 LANG=en_US.UTF-8

>
> On Thu, Sep 28, 2017 at 07:08:21PM +0530, Harsh Jain wrote:
>>
>> Today I tried with "Intel_iommu=sp_off" boot option. Traffic runs without 
>> any error for more than 1 hrs. What magic this option did? :)
> Cheers,
> Ashok



Re: DMA error when sg->offset value is greater than PAGE_SIZE in Intel IOMMU

2017-09-28 Thread Harsh Jain

On 28-09-2017 18:35, Raj, Ashok wrote:
> Thanks for trying that Harsh. 
>
> sp_off turns of super page support. Which this mode, do you still see offsets 
> greater than 4k?
Yes, offset greater than 4k is still there. Refer below.
[56732.774872] offset 4110 len 76 dma addr 3a531200e dma len 76
[56732.804187] offset 4110 len 84 dma addr 3a63b200e dma len 84
[56732.805104] offset 4110 len 68 dma addr 3a531200e dma len 68
[56732.806870] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.808987] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.811215] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.813155] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.814823] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.816481] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.818159] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.819712] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.821629] offset 4110 len 56 dma addr 3a531200e dma len 56
[root@heptagon linux_t4_build]#
[root@heptagon linux_t4_build]#
[root@heptagon linux_t4_build]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.9.51 root=UUID=ccbb7f18-b3f0-43df-89de-07521e9c02fe ro 
intel_iommu=sp_off crashkernel=auto rhgb quiet rhgb quiet console=ttyS0,115200, 
console=tty0 LANG=en_US.UTF-8

>
> On Thu, Sep 28, 2017 at 07:08:21PM +0530, Harsh Jain wrote:
>>
>> Today I tried with "Intel_iommu=sp_off" boot option. Traffic runs without 
>> any error for more than 1 hrs. What magic this option did? :)
> Cheers,
> Ashok



[PATCH] nvme-pci: Use PCI bus address for data/queues in CMB

2017-09-28 Thread Abhishek Shah
Currently, NVMe PCI host driver is programming CMB dma address as
I/O SQs addresses. This results in failures on systems where 1:1
outbound mapping is not used (example Broadcom iProc SOCs) because
CMB BAR will be progammed with PCI bus address but NVMe PCI EP will
try to access CMB using dma address.

To have CMB working on systems without 1:1 outbound mapping, we
program PCI bus address for I/O SQs instead of dma address. This
approach will work on systems with/without 1:1 outbound mapping.

The patch is tested on Broadcom Stingray platform(arm64), which
does not have 1:1 outbound mapping, as well as on x86 platform,
which has 1:1 outbound mapping.

Fixes: 8ffaadf7 ("NVMe: Use CMB for the IO SQes if available")
Cc: sta...@vger.kernel.org
Signed-off-by: Abhishek Shah 
Reviewed-by: Anup Patel 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 drivers/nvme/host/pci.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 4a21213..29e3bd8 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -94,6 +94,7 @@ struct nvme_dev {
bool subsystem;
void __iomem *cmb;
dma_addr_t cmb_dma_addr;
+   pci_bus_addr_t cmb_bus_addr;
u64 cmb_size;
u32 cmbsz;
u32 cmbloc;
@@ -1220,7 +1221,7 @@ static int nvme_alloc_sq_cmds(struct nvme_dev *dev, 
struct nvme_queue *nvmeq,
if (qid && dev->cmb && use_cmb_sqes && NVME_CMB_SQS(dev->cmbsz)) {
unsigned offset = (qid - 1) * roundup(SQ_SIZE(depth),
  dev->ctrl.page_size);
-   nvmeq->sq_dma_addr = dev->cmb_dma_addr + offset;
+   nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset;
nvmeq->sq_cmds_io = dev->cmb + offset;
} else {
nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
@@ -1514,8 +1515,28 @@ static ssize_t nvme_cmb_show(struct device *dev,
 }
 static DEVICE_ATTR(cmb, S_IRUGO, nvme_cmb_show, NULL);
 
+static int nvme_find_cmb_bus_addr(struct pci_dev *pdev,
+ dma_addr_t dma_addr,
+ u64 size,
+ pci_bus_addr_t *bus_addr)
+{
+   struct resource *res;
+   struct pci_bus_region region;
+   struct resource tres = DEFINE_RES_MEM(dma_addr, size);
+
+   res = pci_find_resource(pdev, );
+   if (!res)
+   return -EIO;
+
+   pcibios_resource_to_bus(pdev->bus, , res);
+   *bus_addr = region.start + (dma_addr - res->start);
+
+   return 0;
+}
+
 static void __iomem *nvme_map_cmb(struct nvme_dev *dev)
 {
+   int rc;
u64 szu, size, offset;
resource_size_t bar_size;
struct pci_dev *pdev = to_pci_dev(dev->dev);
@@ -1553,6 +1574,13 @@ static void __iomem *nvme_map_cmb(struct nvme_dev *dev)
 
dev->cmb_dma_addr = dma_addr;
dev->cmb_size = size;
+
+   rc = nvme_find_cmb_bus_addr(pdev, dma_addr, size, >cmb_bus_addr);
+   if (rc) {
+   iounmap(cmb);
+   return NULL;
+   }
+
return cmb;
 }
 
-- 
2.7.4



[PATCH] nvme-pci: Use PCI bus address for data/queues in CMB

2017-09-28 Thread Abhishek Shah
Currently, NVMe PCI host driver is programming CMB dma address as
I/O SQs addresses. This results in failures on systems where 1:1
outbound mapping is not used (example Broadcom iProc SOCs) because
CMB BAR will be progammed with PCI bus address but NVMe PCI EP will
try to access CMB using dma address.

To have CMB working on systems without 1:1 outbound mapping, we
program PCI bus address for I/O SQs instead of dma address. This
approach will work on systems with/without 1:1 outbound mapping.

The patch is tested on Broadcom Stingray platform(arm64), which
does not have 1:1 outbound mapping, as well as on x86 platform,
which has 1:1 outbound mapping.

Fixes: 8ffaadf7 ("NVMe: Use CMB for the IO SQes if available")
Cc: sta...@vger.kernel.org
Signed-off-by: Abhishek Shah 
Reviewed-by: Anup Patel 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 drivers/nvme/host/pci.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 4a21213..29e3bd8 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -94,6 +94,7 @@ struct nvme_dev {
bool subsystem;
void __iomem *cmb;
dma_addr_t cmb_dma_addr;
+   pci_bus_addr_t cmb_bus_addr;
u64 cmb_size;
u32 cmbsz;
u32 cmbloc;
@@ -1220,7 +1221,7 @@ static int nvme_alloc_sq_cmds(struct nvme_dev *dev, 
struct nvme_queue *nvmeq,
if (qid && dev->cmb && use_cmb_sqes && NVME_CMB_SQS(dev->cmbsz)) {
unsigned offset = (qid - 1) * roundup(SQ_SIZE(depth),
  dev->ctrl.page_size);
-   nvmeq->sq_dma_addr = dev->cmb_dma_addr + offset;
+   nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset;
nvmeq->sq_cmds_io = dev->cmb + offset;
} else {
nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
@@ -1514,8 +1515,28 @@ static ssize_t nvme_cmb_show(struct device *dev,
 }
 static DEVICE_ATTR(cmb, S_IRUGO, nvme_cmb_show, NULL);
 
+static int nvme_find_cmb_bus_addr(struct pci_dev *pdev,
+ dma_addr_t dma_addr,
+ u64 size,
+ pci_bus_addr_t *bus_addr)
+{
+   struct resource *res;
+   struct pci_bus_region region;
+   struct resource tres = DEFINE_RES_MEM(dma_addr, size);
+
+   res = pci_find_resource(pdev, );
+   if (!res)
+   return -EIO;
+
+   pcibios_resource_to_bus(pdev->bus, , res);
+   *bus_addr = region.start + (dma_addr - res->start);
+
+   return 0;
+}
+
 static void __iomem *nvme_map_cmb(struct nvme_dev *dev)
 {
+   int rc;
u64 szu, size, offset;
resource_size_t bar_size;
struct pci_dev *pdev = to_pci_dev(dev->dev);
@@ -1553,6 +1574,13 @@ static void __iomem *nvme_map_cmb(struct nvme_dev *dev)
 
dev->cmb_dma_addr = dma_addr;
dev->cmb_size = size;
+
+   rc = nvme_find_cmb_bus_addr(pdev, dma_addr, size, >cmb_bus_addr);
+   if (rc) {
+   iounmap(cmb);
+   return NULL;
+   }
+
return cmb;
 }
 
-- 
2.7.4



Re: [PATCH 10/12] writeback: only allow one inflight and pending full flush

2017-09-28 Thread Amir Goldstein
On Fri, Sep 29, 2017 at 3:17 AM, Jens Axboe  wrote:
> On 09/28/2017 11:44 PM, Linus Torvalds wrote:
>> On Thu, Sep 28, 2017 at 2:41 PM, Andrew Morton
>>  wrote:
>>>
>>> test_and_set_bit()?
>>
>> If there aren't any atomicity concerns (either because of higher-level
>> locking, or because racing and having two people set the bit is fine),
>> it can be better to do them separately if the test_bit() is the common
>> case and you can avoid dirtying a cacheline that way.
>>
>> But yeah, if that is the case, it might be worth documenting, because
>> test_and_set_bit() is the more obviously appropriate "there can be
>> only one" model.
>
> It is documented though, but maybe not well enough...
>
> I've actually had to document/explain it enough times now, that it
> might be worth making a general construct. Though it has to be
> used carefully, so perhaps it's better contained as separate use
> cases.
>

Maybe change "Ensure that we only allow one of them pending"
in the comment above. Only the "allow one inflight" part is correct.

Or apply your follow up patch and be done with in...

Amir.


Re: [PATCH 10/12] writeback: only allow one inflight and pending full flush

2017-09-28 Thread Amir Goldstein
On Fri, Sep 29, 2017 at 3:17 AM, Jens Axboe  wrote:
> On 09/28/2017 11:44 PM, Linus Torvalds wrote:
>> On Thu, Sep 28, 2017 at 2:41 PM, Andrew Morton
>>  wrote:
>>>
>>> test_and_set_bit()?
>>
>> If there aren't any atomicity concerns (either because of higher-level
>> locking, or because racing and having two people set the bit is fine),
>> it can be better to do them separately if the test_bit() is the common
>> case and you can avoid dirtying a cacheline that way.
>>
>> But yeah, if that is the case, it might be worth documenting, because
>> test_and_set_bit() is the more obviously appropriate "there can be
>> only one" model.
>
> It is documented though, but maybe not well enough...
>
> I've actually had to document/explain it enough times now, that it
> might be worth making a general construct. Though it has to be
> used carefully, so perhaps it's better contained as separate use
> cases.
>

Maybe change "Ensure that we only allow one of them pending"
in the comment above. Only the "allow one inflight" part is correct.

Or apply your follow up patch and be done with in...

Amir.


[PATCH] usb: dwc3: workaround: disable device-initiated U1/U2

2017-09-28 Thread Ran Wang
Issue: When the USB controller is configured as a USB device
mode, the device initiates low power when an ACK is pending for a
data packet (DP). When operating in SuperSpeed mode and when the
internal condition for low power (u1/u2) is satisfied, the device
initiates u1/u2 even though it has just received a DPH of the DP
header (DPH). This causes the link to enter and exit low power before
the device sends an ACK for the DP. This behavior can cause a
transaction timeout on the host for the DP. Impact: Depending on the
host transaction timeout value, the host may timeout on the
transaction and the host retries the transfer. If the same issue
happens again, this could result in the host resetting the device and
re-enumerating.

Workaround: Disable USB_DCTL (InitU1Ena, InitU2Ena) bits. As a
result,the device does not initiate lowpower requests; however,
it can still accept low-power requests from the host/hub and enter
low power.

Signed-off-by: Ran Wang 
---
 Documentation/devicetree/bindings/usb/dwc3.txt | 2 ++
 drivers/usb/dwc3/core.c| 2 ++
 drivers/usb/dwc3/core.h| 2 ++
 drivers/usb/dwc3/ep0.c | 4 ++--
 drivers/usb/dwc3/gadget.c  | 7 +++
 5 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt 
b/Documentation/devicetree/bindings/usb/dwc3.txt
index 7d4f90c16cd4..9afa8e95831e 100644
--- a/Documentation/devicetree/bindings/usb/dwc3.txt
+++ b/Documentation/devicetree/bindings/usb/dwc3.txt
@@ -47,6 +47,8 @@ Optional properties:
from P0 to P1/P2/P3 without delay.
  - snps,dis-tx-ipgap-linecheck-quirk: when set, disable u2mac linestate check
during HS transmit.
+ - snps,disable_devinit_u1u2: when set, disable device-initiated U1/U2
+   LPM request in USB device mode.
  - snps,is-utmi-l1-suspend: true when DWC3 asserts output signal
utmi_l1_suspend_n, false when asserts utmi_sleep_n
  - snps,hird-threshold: HIRD threshold
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index f13096b0900e..63d599872a43 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1143,6 +1143,8 @@ static void dwc3_get_properties(struct dwc3 *dwc)
 
dwc->tx_de_emphasis_quirk = device_property_read_bool(dev,
"snps,tx_de_emphasis_quirk");
+   dwc->disable_devinit_u1u2_quirk = device_property_read_bool(dev,
+   "snps,disable_devinit_u1u2");
device_property_read_u8(dev, "snps,tx_de_emphasis",
_de_emphasis);
device_property_read_string(dev, "snps,hsphy_interface",
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 7c2f84dc218a..2be63c1a6ab6 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -896,6 +896,7 @@ struct dwc3_scratchpad_array {
  * 1   - -3.5dB de-emphasis
  * 2   - No de-emphasis
  * 3   - Reserved
+ * @disable_devinit_u1u2_quirk: disable device-initiated U1/U2 request.
  * @imod_interval: set the interrupt moderation interval in 250ns
  * increments or 0 to disable.
  */
@@ -1057,6 +1058,7 @@ struct dwc3 {
 
unsignedtx_de_emphasis_quirk:1;
unsignedtx_de_emphasis:2;
+   unsigneddisable_devinit_u1u2_quirk:1;
 
u16 imod_interval;
 };
diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index 827e376bfa97..bbbf46f031e2 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -391,7 +391,7 @@ static int dwc3_ep0_handle_u1(struct dwc3 *dwc, enum 
usb_device_state state,
return -EINVAL;
 
reg = dwc3_readl(dwc->regs, DWC3_DCTL);
-   if (set)
+   if (set && !dwc->disable_devinit_u1u2_quirk)
reg |= DWC3_DCTL_INITU1ENA;
else
reg &= ~DWC3_DCTL_INITU1ENA;
@@ -413,7 +413,7 @@ static int dwc3_ep0_handle_u2(struct dwc3 *dwc, enum 
usb_device_state state,
return -EINVAL;
 
reg = dwc3_readl(dwc->regs, DWC3_DCTL);
-   if (set)
+   if (set && !dwc->disable_devinit_u1u2_quirk)
reg |= DWC3_DCTL_INITU2ENA;
else
reg &= ~DWC3_DCTL_INITU2ENA;
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index f064f1549333..61141c6350dc 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -3199,6 +3199,7 @@ int dwc3_gadget_init(struct dwc3 *dwc)
 {
int ret;
int irq;
+   u32 reg;
 
irq = dwc3_gadget_get_irq(dwc);
if (irq < 0) {
@@ -3275,6 +3276,12 @@ int dwc3_gadget_init(struct dwc3 *dwc)
goto err4;
}
 
+   if (dwc->disable_devinit_u1u2_quirk) {
+   reg = dwc3_readl(dwc->regs, DWC3_DCTL);
+   reg &= 

[PATCH] usb: dwc3: workaround: disable device-initiated U1/U2

2017-09-28 Thread Ran Wang
Issue: When the USB controller is configured as a USB device
mode, the device initiates low power when an ACK is pending for a
data packet (DP). When operating in SuperSpeed mode and when the
internal condition for low power (u1/u2) is satisfied, the device
initiates u1/u2 even though it has just received a DPH of the DP
header (DPH). This causes the link to enter and exit low power before
the device sends an ACK for the DP. This behavior can cause a
transaction timeout on the host for the DP. Impact: Depending on the
host transaction timeout value, the host may timeout on the
transaction and the host retries the transfer. If the same issue
happens again, this could result in the host resetting the device and
re-enumerating.

Workaround: Disable USB_DCTL (InitU1Ena, InitU2Ena) bits. As a
result,the device does not initiate lowpower requests; however,
it can still accept low-power requests from the host/hub and enter
low power.

Signed-off-by: Ran Wang 
---
 Documentation/devicetree/bindings/usb/dwc3.txt | 2 ++
 drivers/usb/dwc3/core.c| 2 ++
 drivers/usb/dwc3/core.h| 2 ++
 drivers/usb/dwc3/ep0.c | 4 ++--
 drivers/usb/dwc3/gadget.c  | 7 +++
 5 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt 
b/Documentation/devicetree/bindings/usb/dwc3.txt
index 7d4f90c16cd4..9afa8e95831e 100644
--- a/Documentation/devicetree/bindings/usb/dwc3.txt
+++ b/Documentation/devicetree/bindings/usb/dwc3.txt
@@ -47,6 +47,8 @@ Optional properties:
from P0 to P1/P2/P3 without delay.
  - snps,dis-tx-ipgap-linecheck-quirk: when set, disable u2mac linestate check
during HS transmit.
+ - snps,disable_devinit_u1u2: when set, disable device-initiated U1/U2
+   LPM request in USB device mode.
  - snps,is-utmi-l1-suspend: true when DWC3 asserts output signal
utmi_l1_suspend_n, false when asserts utmi_sleep_n
  - snps,hird-threshold: HIRD threshold
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index f13096b0900e..63d599872a43 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1143,6 +1143,8 @@ static void dwc3_get_properties(struct dwc3 *dwc)
 
dwc->tx_de_emphasis_quirk = device_property_read_bool(dev,
"snps,tx_de_emphasis_quirk");
+   dwc->disable_devinit_u1u2_quirk = device_property_read_bool(dev,
+   "snps,disable_devinit_u1u2");
device_property_read_u8(dev, "snps,tx_de_emphasis",
_de_emphasis);
device_property_read_string(dev, "snps,hsphy_interface",
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 7c2f84dc218a..2be63c1a6ab6 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -896,6 +896,7 @@ struct dwc3_scratchpad_array {
  * 1   - -3.5dB de-emphasis
  * 2   - No de-emphasis
  * 3   - Reserved
+ * @disable_devinit_u1u2_quirk: disable device-initiated U1/U2 request.
  * @imod_interval: set the interrupt moderation interval in 250ns
  * increments or 0 to disable.
  */
@@ -1057,6 +1058,7 @@ struct dwc3 {
 
unsignedtx_de_emphasis_quirk:1;
unsignedtx_de_emphasis:2;
+   unsigneddisable_devinit_u1u2_quirk:1;
 
u16 imod_interval;
 };
diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index 827e376bfa97..bbbf46f031e2 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -391,7 +391,7 @@ static int dwc3_ep0_handle_u1(struct dwc3 *dwc, enum 
usb_device_state state,
return -EINVAL;
 
reg = dwc3_readl(dwc->regs, DWC3_DCTL);
-   if (set)
+   if (set && !dwc->disable_devinit_u1u2_quirk)
reg |= DWC3_DCTL_INITU1ENA;
else
reg &= ~DWC3_DCTL_INITU1ENA;
@@ -413,7 +413,7 @@ static int dwc3_ep0_handle_u2(struct dwc3 *dwc, enum 
usb_device_state state,
return -EINVAL;
 
reg = dwc3_readl(dwc->regs, DWC3_DCTL);
-   if (set)
+   if (set && !dwc->disable_devinit_u1u2_quirk)
reg |= DWC3_DCTL_INITU2ENA;
else
reg &= ~DWC3_DCTL_INITU2ENA;
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index f064f1549333..61141c6350dc 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -3199,6 +3199,7 @@ int dwc3_gadget_init(struct dwc3 *dwc)
 {
int ret;
int irq;
+   u32 reg;
 
irq = dwc3_gadget_get_irq(dwc);
if (irq < 0) {
@@ -3275,6 +3276,12 @@ int dwc3_gadget_init(struct dwc3 *dwc)
goto err4;
}
 
+   if (dwc->disable_devinit_u1u2_quirk) {
+   reg = dwc3_readl(dwc->regs, DWC3_DCTL);
+   reg &= ~(DWC3_DCTL_INITU1ENA | 

Re: [PATCH] ratelimit: use deferred printk() version

2017-09-28 Thread Sergey Senozhatsky
Hello,

(Cc-ing Andrew  
lkml.kernel.org/r/20170928120405.18273-1-sergey.senozhat...@gmail.com )

On (09/28/17 21:13), Sergey Senozhatsky wrote:
> (Cc-ing Sasha)
> 
> On (09/28/17 21:04), Sergey Senozhatsky wrote:
> [..]
> >  : process 9121 (trinity-c78) no longer affine to cpu8
> >  : smpboot: CPU 8 is now offline
> > 
> > Fixes: 6b1d174b0c27b ("ratelimit: extend to print suppressed messages on 
> > release")
> > Signed-off-by: Sergey Senozhatsky 
> > Reported-by: Sasha Levin 
> > Cc: sta...@vger.kernel.org
> > Cc: Peter Zijlstra 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: Borislav Petkov 
> > Cc: Steven Rostedt 
> > Cc: Petr Mladek 

a quick question, who is going to pick it up? or shall we ask Andrew?

-ss


Re: [PATCH] extcon: Split out extcon header file for consumer and provider device

2017-09-28 Thread Chanwoo Choi
Hi,

On 2017년 09월 29일 11:03, Yoshihiro Shimoda wrote:
> Hi,
> 
>> From: Chanwoo Choi
>> Sent: Friday, September 29, 2017 9:02 AM
>>
> < snip >
>>  drivers/phy/renesas/phy-rcar-gen3-usb2.c  |   2 +-
> < snip >
>>  drivers/usb/gadget/udc/renesas_usb3.c |   2 +-
> 
> These two drivers need the modification.
> But...
> 
> < snip >
>> diff --git a/drivers/usb/renesas_usbhs/common.h 
>> b/drivers/usb/renesas_usbhs/common.h
>> index 8c5fc12ad778..a78764bc23eb 100644
>> --- a/drivers/usb/renesas_usbhs/common.h
>> +++ b/drivers/usb/renesas_usbhs/common.h
>> @@ -17,7 +17,7 @@
>>  #ifndef RENESAS_USB_DRIVER_H
>>  #define RENESAS_USB_DRIVER_H
>>
>> -#include 
>> +#include 
> 
> Since this driver doesn't use any extcon-provider APIs for now,
> we doesn't need the modification, IIUC.

I don't modify 'drivers/usb/renesas_usbhs/common.h'
on v2 patch. Thanks for your comment.

> 
> Best regards,
> Yoshihiro Shimoda
> 
> 
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics


Re: [PATCH] ratelimit: use deferred printk() version

2017-09-28 Thread Sergey Senozhatsky
Hello,

(Cc-ing Andrew  
lkml.kernel.org/r/20170928120405.18273-1-sergey.senozhat...@gmail.com )

On (09/28/17 21:13), Sergey Senozhatsky wrote:
> (Cc-ing Sasha)
> 
> On (09/28/17 21:04), Sergey Senozhatsky wrote:
> [..]
> >  : process 9121 (trinity-c78) no longer affine to cpu8
> >  : smpboot: CPU 8 is now offline
> > 
> > Fixes: 6b1d174b0c27b ("ratelimit: extend to print suppressed messages on 
> > release")
> > Signed-off-by: Sergey Senozhatsky 
> > Reported-by: Sasha Levin 
> > Cc: sta...@vger.kernel.org
> > Cc: Peter Zijlstra 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: Borislav Petkov 
> > Cc: Steven Rostedt 
> > Cc: Petr Mladek 

a quick question, who is going to pick it up? or shall we ask Andrew?

-ss


Re: [PATCH] extcon: Split out extcon header file for consumer and provider device

2017-09-28 Thread Chanwoo Choi
Hi,

On 2017년 09월 29일 11:03, Yoshihiro Shimoda wrote:
> Hi,
> 
>> From: Chanwoo Choi
>> Sent: Friday, September 29, 2017 9:02 AM
>>
> < snip >
>>  drivers/phy/renesas/phy-rcar-gen3-usb2.c  |   2 +-
> < snip >
>>  drivers/usb/gadget/udc/renesas_usb3.c |   2 +-
> 
> These two drivers need the modification.
> But...
> 
> < snip >
>> diff --git a/drivers/usb/renesas_usbhs/common.h 
>> b/drivers/usb/renesas_usbhs/common.h
>> index 8c5fc12ad778..a78764bc23eb 100644
>> --- a/drivers/usb/renesas_usbhs/common.h
>> +++ b/drivers/usb/renesas_usbhs/common.h
>> @@ -17,7 +17,7 @@
>>  #ifndef RENESAS_USB_DRIVER_H
>>  #define RENESAS_USB_DRIVER_H
>>
>> -#include 
>> +#include 
> 
> Since this driver doesn't use any extcon-provider APIs for now,
> we doesn't need the modification, IIUC.

I don't modify 'drivers/usb/renesas_usbhs/common.h'
on v2 patch. Thanks for your comment.

> 
> Best regards,
> Yoshihiro Shimoda
> 
> 
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics


[PATCH] zsmalloc: calling zs_map_object() from irq is a bug

2017-09-28 Thread Sergey Senozhatsky
Use BUG_ON(in_interrupt()) in zs_map_object(). This is not a
new BUG_ON(), it's always been there, but was recently changed
to VM_BUG_ON(). There are several problems there. First, we use
use per-CPU mappings both in zsmalloc and in zram, and interrupt
may easily corrupt those buffers. Second, and more importantly,
we believe it's possible to start leaking sensitive information.
Consider the following case:

-> process P
swap out
 zram
  per-cpu mapping CPU1
   compress page A
-> IRQ

swap out
 zram
  per-cpu mapping CPU1
   compress page B
write page from per-cpu mapping CPU1 to zsmalloc pool
iret

-> process P
write page from per-cpu mapping CPU1 to zsmalloc pool  [*]
return

* so we store overwritten data that actually belongs to another
  page (task) and potentially contains sensitive data. And when
  process P will page fault it's going to read (swap in) that
  other task's data.

Signed-off-by: Sergey Senozhatsky 
Acked-by: Minchan Kim 
---
 mm/zsmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 7c38e850a8fc..685049a9048d 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1349,7 +1349,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long 
handle,
 * pools/users, we can't allow mapping in interrupt context
 * because it can corrupt another users mappings.
 */
-   WARN_ON_ONCE(in_interrupt());
+   BUG_ON(in_interrupt());
 
/* From now on, migration cannot move the object */
pin_tag(handle);
-- 
2.14.2



[PATCH] zsmalloc: calling zs_map_object() from irq is a bug

2017-09-28 Thread Sergey Senozhatsky
Use BUG_ON(in_interrupt()) in zs_map_object(). This is not a
new BUG_ON(), it's always been there, but was recently changed
to VM_BUG_ON(). There are several problems there. First, we use
use per-CPU mappings both in zsmalloc and in zram, and interrupt
may easily corrupt those buffers. Second, and more importantly,
we believe it's possible to start leaking sensitive information.
Consider the following case:

-> process P
swap out
 zram
  per-cpu mapping CPU1
   compress page A
-> IRQ

swap out
 zram
  per-cpu mapping CPU1
   compress page B
write page from per-cpu mapping CPU1 to zsmalloc pool
iret

-> process P
write page from per-cpu mapping CPU1 to zsmalloc pool  [*]
return

* so we store overwritten data that actually belongs to another
  page (task) and potentially contains sensitive data. And when
  process P will page fault it's going to read (swap in) that
  other task's data.

Signed-off-by: Sergey Senozhatsky 
Acked-by: Minchan Kim 
---
 mm/zsmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 7c38e850a8fc..685049a9048d 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1349,7 +1349,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long 
handle,
 * pools/users, we can't allow mapping in interrupt context
 * because it can corrupt another users mappings.
 */
-   WARN_ON_ONCE(in_interrupt());
+   BUG_ON(in_interrupt());
 
/* From now on, migration cannot move the object */
pin_tag(handle);
-- 
2.14.2



[PATCH] USB: serial: qcserial: add Dell DW5818, DW5819

2017-09-28 Thread Shrirang Bagul
Dell Wireless 5819/5818 devices are re-branded Sierra Wireless MC74
series which will by default boot with vid 0x413c and pid's 0x81cf,
0x81d0, 0x81d1,0x81d2.

Signed-off-by: Shrirang Bagul 
---
 drivers/usb/serial/qcserial.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/usb/serial/qcserial.c b/drivers/usb/serial/qcserial.c
index ebc0beea69d6..eb9928963a53 100644
--- a/drivers/usb/serial/qcserial.c
+++ b/drivers/usb/serial/qcserial.c
@@ -174,6 +174,10 @@ static const struct usb_device_id id_table[] = {
{DEVICE_SWI(0x413c, 0x81b3)},   /* Dell Wireless 5809e Gobi(TM) 4G LTE 
Mobile Broadband Card (rev3) */
{DEVICE_SWI(0x413c, 0x81b5)},   /* Dell Wireless 5811e QDL */
{DEVICE_SWI(0x413c, 0x81b6)},   /* Dell Wireless 5811e QDL */
+   {DEVICE_SWI(0x413c, 0x81cf)},   /* Dell Wireless 5819 */
+   {DEVICE_SWI(0x413c, 0x81d0)},   /* Dell Wireless 5819 */
+   {DEVICE_SWI(0x413c, 0x81d1)},   /* Dell Wireless 5818 */
+   {DEVICE_SWI(0x413c, 0x81d2)},   /* Dell Wireless 5818 */
 
/* Huawei devices */
{DEVICE_HWI(0x03f0, 0x581d)},   /* HP lt4112 LTE/HSPA+ Gobi 4G Modem 
(Huawei me906e) */
-- 
2.11.0



[PATCH] USB: serial: qcserial: add Dell DW5818, DW5819

2017-09-28 Thread Shrirang Bagul
Dell Wireless 5819/5818 devices are re-branded Sierra Wireless MC74
series which will by default boot with vid 0x413c and pid's 0x81cf,
0x81d0, 0x81d1,0x81d2.

Signed-off-by: Shrirang Bagul 
---
 drivers/usb/serial/qcserial.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/usb/serial/qcserial.c b/drivers/usb/serial/qcserial.c
index ebc0beea69d6..eb9928963a53 100644
--- a/drivers/usb/serial/qcserial.c
+++ b/drivers/usb/serial/qcserial.c
@@ -174,6 +174,10 @@ static const struct usb_device_id id_table[] = {
{DEVICE_SWI(0x413c, 0x81b3)},   /* Dell Wireless 5809e Gobi(TM) 4G LTE 
Mobile Broadband Card (rev3) */
{DEVICE_SWI(0x413c, 0x81b5)},   /* Dell Wireless 5811e QDL */
{DEVICE_SWI(0x413c, 0x81b6)},   /* Dell Wireless 5811e QDL */
+   {DEVICE_SWI(0x413c, 0x81cf)},   /* Dell Wireless 5819 */
+   {DEVICE_SWI(0x413c, 0x81d0)},   /* Dell Wireless 5819 */
+   {DEVICE_SWI(0x413c, 0x81d1)},   /* Dell Wireless 5818 */
+   {DEVICE_SWI(0x413c, 0x81d2)},   /* Dell Wireless 5818 */
 
/* Huawei devices */
{DEVICE_HWI(0x03f0, 0x581d)},   /* HP lt4112 LTE/HSPA+ Gobi 4G Modem 
(Huawei me906e) */
-- 
2.11.0



[PATCH v2] drm/i915: Replace *_reference/unreference() or *_ref/unref with _get/put()

2017-09-28 Thread Harsha Sharma
Replace instances of drm_framebuffer_reference/unreference() with
*_get/put() suffixes and drm_dev_unref with *_put() suffix
because get/put is shorter and consistent with the
kernel use of *_get/put suffixes.
Done with following coccinelle semantic patch

@@ 
expression ex; 
@@ 

( 
-drm_framebuffer_unreference(ex); 
+drm_framebuffer_put(ex); 
| 
-drm_dev_unref(ex); 
+drm_dev_put(ex); 
| 
-drm_framebuffer_reference(ex); 
+drm_framebuffer_get(ex); 
) 


Signed-off-by: Harsha Sharma 
---
Changes in v2:
 -Added coccinelle patch in log message 
 -cc to all driver-specific mailing lists

 drivers/gpu/drm/i915/i915_pci.c|  2 +-
 drivers/gpu/drm/i915/intel_display.c   | 10 +-
 drivers/gpu/drm/i915/intel_fbdev.c |  4 ++--
 drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_evict.c|  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c  |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_object.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_request.c  |  2 +-
 drivers/gpu/drm/i915/selftests/i915_vma.c  |  2 +-
 drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c |  2 +-
 10 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 09d97e0..2f106cc 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -510,7 +510,7 @@ static void i915_pci_remove(struct pci_dev *pdev)
struct drm_device *dev = pci_get_drvdata(pdev);
 
i915_driver_unload(dev);
-   drm_dev_unref(dev);
+   drm_dev_put(dev);
 }
 
 static int i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id 
*ent)
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index f172755..92f8304 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2856,7 +2856,7 @@ static int skl_format_to_fourcc(int format, bool 
rgb_order, bool alpha)
 
if (intel_plane_ggtt_offset(state) == plane_config->base) {
fb = c->primary->fb;
-   drm_framebuffer_reference(fb);
+   drm_framebuffer_get(fb);
goto valid_fb;
}
}
@@ -2887,7 +2887,7 @@ static int skl_format_to_fourcc(int format, bool 
rgb_order, bool alpha)
  intel_crtc->pipe, PTR_ERR(intel_state->vma));
 
intel_state->vma = NULL;
-   drm_framebuffer_unreference(fb);
+   drm_framebuffer_put(fb);
return;
}
 
@@ -2908,7 +2908,7 @@ static int skl_format_to_fourcc(int format, bool 
rgb_order, bool alpha)
if (i915_gem_object_is_tiled(obj))
dev_priv->preserve_bios_swizzle = true;
 
-   drm_framebuffer_reference(fb);
+   drm_framebuffer_get(fb);
primary->fb = primary->state->fb = fb;
primary->crtc = primary->state->crtc = _crtc->base;
 
@@ -9847,7 +9847,7 @@ struct drm_framebuffer *
if (obj->base.size < mode->vdisplay * fb->pitches[0])
return NULL;
 
-   drm_framebuffer_reference(fb);
+   drm_framebuffer_get(fb);
return fb;
 #else
return NULL;
@@ -10028,7 +10028,7 @@ int intel_get_load_detect_pipe(struct drm_connector 
*connector,
if (ret)
goto fail;
 
-   drm_framebuffer_unreference(fb);
+   drm_framebuffer_put(fb);
 
ret = drm_atomic_set_mode_for_crtc(_state->base, mode);
if (ret)
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c 
b/drivers/gpu/drm/i915/intel_fbdev.c
index 262e75c..1ff7149 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -189,7 +189,7 @@ static int intelfb_create(struct drm_fb_helper *helper,
  " releasing it\n",
  intel_fb->base.width, intel_fb->base.height,
  sizes->fb_width, sizes->fb_height);
-   drm_framebuffer_unreference(_fb->base);
+   drm_framebuffer_put(_fb->base);
intel_fb = ifbdev->fb = NULL;
}
if (!intel_fb || WARN_ON(!intel_fb->obj)) {
@@ -624,7 +624,7 @@ static bool intel_fbdev_init_bios(struct drm_device *dev,
ifbdev->preferred_bpp = fb->base.format->cpp[0] * 8;
ifbdev->fb = fb;
 
-   drm_framebuffer_reference(>fb->base);
+   drm_framebuffer_put(>fb->base);
 
/* Final pass to check if any active pipes don't have fbs */
for_each_crtc(dev, crtc) {
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c
index 89dc25a..a7055b1 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c
@@ -389,7 +389,7 @@ int i915_gem_dmabuf_mock_selftests(void)
 
err = 

[PATCH v2] drm/i915: Replace *_reference/unreference() or *_ref/unref with _get/put()

2017-09-28 Thread Harsha Sharma
Replace instances of drm_framebuffer_reference/unreference() with
*_get/put() suffixes and drm_dev_unref with *_put() suffix
because get/put is shorter and consistent with the
kernel use of *_get/put suffixes.
Done with following coccinelle semantic patch

@@ 
expression ex; 
@@ 

( 
-drm_framebuffer_unreference(ex); 
+drm_framebuffer_put(ex); 
| 
-drm_dev_unref(ex); 
+drm_dev_put(ex); 
| 
-drm_framebuffer_reference(ex); 
+drm_framebuffer_get(ex); 
) 


Signed-off-by: Harsha Sharma 
---
Changes in v2:
 -Added coccinelle patch in log message 
 -cc to all driver-specific mailing lists

 drivers/gpu/drm/i915/i915_pci.c|  2 +-
 drivers/gpu/drm/i915/intel_display.c   | 10 +-
 drivers/gpu/drm/i915/intel_fbdev.c |  4 ++--
 drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_evict.c|  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c  |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_object.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_request.c  |  2 +-
 drivers/gpu/drm/i915/selftests/i915_vma.c  |  2 +-
 drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c |  2 +-
 10 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 09d97e0..2f106cc 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -510,7 +510,7 @@ static void i915_pci_remove(struct pci_dev *pdev)
struct drm_device *dev = pci_get_drvdata(pdev);
 
i915_driver_unload(dev);
-   drm_dev_unref(dev);
+   drm_dev_put(dev);
 }
 
 static int i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id 
*ent)
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index f172755..92f8304 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2856,7 +2856,7 @@ static int skl_format_to_fourcc(int format, bool 
rgb_order, bool alpha)
 
if (intel_plane_ggtt_offset(state) == plane_config->base) {
fb = c->primary->fb;
-   drm_framebuffer_reference(fb);
+   drm_framebuffer_get(fb);
goto valid_fb;
}
}
@@ -2887,7 +2887,7 @@ static int skl_format_to_fourcc(int format, bool 
rgb_order, bool alpha)
  intel_crtc->pipe, PTR_ERR(intel_state->vma));
 
intel_state->vma = NULL;
-   drm_framebuffer_unreference(fb);
+   drm_framebuffer_put(fb);
return;
}
 
@@ -2908,7 +2908,7 @@ static int skl_format_to_fourcc(int format, bool 
rgb_order, bool alpha)
if (i915_gem_object_is_tiled(obj))
dev_priv->preserve_bios_swizzle = true;
 
-   drm_framebuffer_reference(fb);
+   drm_framebuffer_get(fb);
primary->fb = primary->state->fb = fb;
primary->crtc = primary->state->crtc = _crtc->base;
 
@@ -9847,7 +9847,7 @@ struct drm_framebuffer *
if (obj->base.size < mode->vdisplay * fb->pitches[0])
return NULL;
 
-   drm_framebuffer_reference(fb);
+   drm_framebuffer_get(fb);
return fb;
 #else
return NULL;
@@ -10028,7 +10028,7 @@ int intel_get_load_detect_pipe(struct drm_connector 
*connector,
if (ret)
goto fail;
 
-   drm_framebuffer_unreference(fb);
+   drm_framebuffer_put(fb);
 
ret = drm_atomic_set_mode_for_crtc(_state->base, mode);
if (ret)
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c 
b/drivers/gpu/drm/i915/intel_fbdev.c
index 262e75c..1ff7149 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -189,7 +189,7 @@ static int intelfb_create(struct drm_fb_helper *helper,
  " releasing it\n",
  intel_fb->base.width, intel_fb->base.height,
  sizes->fb_width, sizes->fb_height);
-   drm_framebuffer_unreference(_fb->base);
+   drm_framebuffer_put(_fb->base);
intel_fb = ifbdev->fb = NULL;
}
if (!intel_fb || WARN_ON(!intel_fb->obj)) {
@@ -624,7 +624,7 @@ static bool intel_fbdev_init_bios(struct drm_device *dev,
ifbdev->preferred_bpp = fb->base.format->cpp[0] * 8;
ifbdev->fb = fb;
 
-   drm_framebuffer_reference(>fb->base);
+   drm_framebuffer_put(>fb->base);
 
/* Final pass to check if any active pipes don't have fbs */
for_each_crtc(dev, crtc) {
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c
index 89dc25a..a7055b1 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c
@@ -389,7 +389,7 @@ int i915_gem_dmabuf_mock_selftests(void)
 
err = i915_subtests(tests, i915);
 
- 

Re: [virtio-dev] Re: [PATCH v15 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG

2017-09-28 Thread Michael S. Tsirkin
On Fri, Sep 08, 2017 at 07:09:24PM +0800, Wei Wang wrote:
> On 09/08/2017 11:36 AM, Michael S. Tsirkin wrote:
> > On Tue, Aug 29, 2017 at 11:09:18AM +0800, Wei Wang wrote:
> > > On 08/29/2017 02:03 AM, Michael S. Tsirkin wrote:
> > > > On Mon, Aug 28, 2017 at 06:08:31PM +0800, Wei Wang wrote:
> > > > > Add a new feature, VIRTIO_BALLOON_F_SG, which enables the transfer
> > > > > of balloon (i.e. inflated/deflated) pages using scatter-gather lists
> > > > > to the host.
> > > > > 
> > > > > The implementation of the previous virtio-balloon is not very
> > > > > efficient, because the balloon pages are transferred to the
> > > > > host one by one. Here is the breakdown of the time in percentage
> > > > > spent on each step of the balloon inflating process (inflating
> > > > > 7GB of an 8GB idle guest).
> > > > > 
> > > > > 1) allocating pages (6.5%)
> > > > > 2) sending PFNs to host (68.3%)
> > > > > 3) address translation (6.1%)
> > > > > 4) madvise (19%)
> > > > > 
> > > > > It takes about 4126ms for the inflating process to complete.
> > > > > The above profiling shows that the bottlenecks are stage 2)
> > > > > and stage 4).
> > > > > 
> > > > > This patch optimizes step 2) by transferring pages to the host in
> > > > > sgs. An sg describes a chunk of guest physically continuous pages.
> > > > > With this mechanism, step 4) can also be optimized by doing address
> > > > > translation and madvise() in chunks rather than page by page.
> > > > > 
> > > > > With this new feature, the above ballooning process takes ~597ms
> > > > > resulting in an improvement of ~86%.
> > > > > 
> > > > > TODO: optimize stage 1) by allocating/freeing a chunk of pages
> > > > > instead of a single page each time.
> > > > > 
> > > > > Signed-off-by: Wei Wang 
> > > > > Signed-off-by: Liang Li 
> > > > > Suggested-by: Michael S. Tsirkin 
> > > > > ---
> > > > >drivers/virtio/virtio_balloon.c | 171 
> > > > > 
> > > > >include/uapi/linux/virtio_balloon.h |   1 +
> > > > >2 files changed, 155 insertions(+), 17 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/virtio/virtio_balloon.c 
> > > > > b/drivers/virtio/virtio_balloon.c
> > > > > index f0b3a0b..8ecc1d4 100644
> > > > > --- a/drivers/virtio/virtio_balloon.c
> > > > > +++ b/drivers/virtio/virtio_balloon.c
> > > > > @@ -32,6 +32,8 @@
> > > > >#include 
> > > > >#include 
> > > > >#include 
> > > > > +#include 
> > > > > +#include 
> > > > >/*
> > > > > * Balloon device works in 4K page units.  So each page is pointed 
> > > > > to by
> > > > > @@ -79,6 +81,9 @@ struct virtio_balloon {
> > > > >   /* Synchronize access/update to this struct virtio_balloon 
> > > > > elements */
> > > > >   struct mutex balloon_lock;
> > > > > + /* The xbitmap used to record balloon pages */
> > > > > + struct xb page_xb;
> > > > > +
> > > > >   /* The array of pfns we tell the Host about. */
> > > > >   unsigned int num_pfns;
> > > > >   __virtio32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];
> > > > > @@ -141,13 +146,111 @@ static void set_page_pfns(struct 
> > > > > virtio_balloon *vb,
> > > > > page_to_balloon_pfn(page) + 
> > > > > i);
> > > > >}
> > > > > +static int add_one_sg(struct virtqueue *vq, void *addr, uint32_t 
> > > > > size)
> > > > > +{
> > > > > + struct scatterlist sg;
> > > > > +
> > > > > + sg_init_one(, addr, size);
> > > > > + return virtqueue_add_inbuf(vq, , 1, vq, GFP_KERNEL);
> > > > > +}
> > > > > +
> > > > > +static void send_balloon_page_sg(struct virtio_balloon *vb,
> > > > > +  struct virtqueue *vq,
> > > > > +  void *addr,
> > > > > +  uint32_t size,
> > > > > +  bool batch)
> > > > > +{
> > > > > + unsigned int len;
> > > > > + int err;
> > > > > +
> > > > > + err = add_one_sg(vq, addr, size);
> > > > > + /* Sanity check: this can't really happen */
> > > > > + WARN_ON(err);
> > > > It might be cleaner to detect that add failed due to
> > > > ring full and kick then. Just an idea, up to you
> > > > whether to do it.
> > > > 
> > > > > +
> > > > > + /* If batching is in use, we batch the sgs till the vq is full. 
> > > > > */
> > > > > + if (!batch || !vq->num_free) {
> > > > > + virtqueue_kick(vq);
> > > > > + wait_event(vb->acked, virtqueue_get_buf(vq, ));
> > > > > + /* Release all the entries if there are */
> > > > Meaning
> > > > Account for all used entries if any
> > > > ?
> > > > 
> > > > > + while (virtqueue_get_buf(vq, ))
> > > > > + ;
> > > > Above code is reused below. Add a function?
> > > > 
> > > > > + }
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * Send balloon pages in sgs to host. The balloon pages are recorded 

Re: [virtio-dev] Re: [PATCH v15 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG

2017-09-28 Thread Michael S. Tsirkin
On Fri, Sep 08, 2017 at 07:09:24PM +0800, Wei Wang wrote:
> On 09/08/2017 11:36 AM, Michael S. Tsirkin wrote:
> > On Tue, Aug 29, 2017 at 11:09:18AM +0800, Wei Wang wrote:
> > > On 08/29/2017 02:03 AM, Michael S. Tsirkin wrote:
> > > > On Mon, Aug 28, 2017 at 06:08:31PM +0800, Wei Wang wrote:
> > > > > Add a new feature, VIRTIO_BALLOON_F_SG, which enables the transfer
> > > > > of balloon (i.e. inflated/deflated) pages using scatter-gather lists
> > > > > to the host.
> > > > > 
> > > > > The implementation of the previous virtio-balloon is not very
> > > > > efficient, because the balloon pages are transferred to the
> > > > > host one by one. Here is the breakdown of the time in percentage
> > > > > spent on each step of the balloon inflating process (inflating
> > > > > 7GB of an 8GB idle guest).
> > > > > 
> > > > > 1) allocating pages (6.5%)
> > > > > 2) sending PFNs to host (68.3%)
> > > > > 3) address translation (6.1%)
> > > > > 4) madvise (19%)
> > > > > 
> > > > > It takes about 4126ms for the inflating process to complete.
> > > > > The above profiling shows that the bottlenecks are stage 2)
> > > > > and stage 4).
> > > > > 
> > > > > This patch optimizes step 2) by transferring pages to the host in
> > > > > sgs. An sg describes a chunk of guest physically continuous pages.
> > > > > With this mechanism, step 4) can also be optimized by doing address
> > > > > translation and madvise() in chunks rather than page by page.
> > > > > 
> > > > > With this new feature, the above ballooning process takes ~597ms
> > > > > resulting in an improvement of ~86%.
> > > > > 
> > > > > TODO: optimize stage 1) by allocating/freeing a chunk of pages
> > > > > instead of a single page each time.
> > > > > 
> > > > > Signed-off-by: Wei Wang 
> > > > > Signed-off-by: Liang Li 
> > > > > Suggested-by: Michael S. Tsirkin 
> > > > > ---
> > > > >drivers/virtio/virtio_balloon.c | 171 
> > > > > 
> > > > >include/uapi/linux/virtio_balloon.h |   1 +
> > > > >2 files changed, 155 insertions(+), 17 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/virtio/virtio_balloon.c 
> > > > > b/drivers/virtio/virtio_balloon.c
> > > > > index f0b3a0b..8ecc1d4 100644
> > > > > --- a/drivers/virtio/virtio_balloon.c
> > > > > +++ b/drivers/virtio/virtio_balloon.c
> > > > > @@ -32,6 +32,8 @@
> > > > >#include 
> > > > >#include 
> > > > >#include 
> > > > > +#include 
> > > > > +#include 
> > > > >/*
> > > > > * Balloon device works in 4K page units.  So each page is pointed 
> > > > > to by
> > > > > @@ -79,6 +81,9 @@ struct virtio_balloon {
> > > > >   /* Synchronize access/update to this struct virtio_balloon 
> > > > > elements */
> > > > >   struct mutex balloon_lock;
> > > > > + /* The xbitmap used to record balloon pages */
> > > > > + struct xb page_xb;
> > > > > +
> > > > >   /* The array of pfns we tell the Host about. */
> > > > >   unsigned int num_pfns;
> > > > >   __virtio32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];
> > > > > @@ -141,13 +146,111 @@ static void set_page_pfns(struct 
> > > > > virtio_balloon *vb,
> > > > > page_to_balloon_pfn(page) + 
> > > > > i);
> > > > >}
> > > > > +static int add_one_sg(struct virtqueue *vq, void *addr, uint32_t 
> > > > > size)
> > > > > +{
> > > > > + struct scatterlist sg;
> > > > > +
> > > > > + sg_init_one(, addr, size);
> > > > > + return virtqueue_add_inbuf(vq, , 1, vq, GFP_KERNEL);
> > > > > +}
> > > > > +
> > > > > +static void send_balloon_page_sg(struct virtio_balloon *vb,
> > > > > +  struct virtqueue *vq,
> > > > > +  void *addr,
> > > > > +  uint32_t size,
> > > > > +  bool batch)
> > > > > +{
> > > > > + unsigned int len;
> > > > > + int err;
> > > > > +
> > > > > + err = add_one_sg(vq, addr, size);
> > > > > + /* Sanity check: this can't really happen */
> > > > > + WARN_ON(err);
> > > > It might be cleaner to detect that add failed due to
> > > > ring full and kick then. Just an idea, up to you
> > > > whether to do it.
> > > > 
> > > > > +
> > > > > + /* If batching is in use, we batch the sgs till the vq is full. 
> > > > > */
> > > > > + if (!batch || !vq->num_free) {
> > > > > + virtqueue_kick(vq);
> > > > > + wait_event(vb->acked, virtqueue_get_buf(vq, ));
> > > > > + /* Release all the entries if there are */
> > > > Meaning
> > > > Account for all used entries if any
> > > > ?
> > > > 
> > > > > + while (virtqueue_get_buf(vq, ))
> > > > > + ;
> > > > Above code is reused below. Add a function?
> > > > 
> > > > > + }
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * Send balloon pages in sgs to host. The balloon pages are recorded 
> > > > > in the
> > > > > + * page xbitmap. Each bit in 

Re: Kernel panic - not syncing: Fatal exception in interrupt (file_free_rcu+0x14)

2017-09-28 Thread Linus Torvalds
On Thu, Sep 28, 2017 at 8:32 PM, Kyle Sanderson  wrote:
> Not sure if the stack is crap or not, but this looks like an RCU crash?
>
> https://i.imgur.com/sBnNe1p.jpg

Hmm. Not the clearest picture, and the "Code:" line in particular is
missing the interesting part, but at a guess it's taking a fault in
put_cred(), which inlines to

if (atomic_dec_and_test(&(cred)->usage))
__put_cred(cred);

and I think it's that "cred" pointer that may be NULL, which makes
"&(cred)->usage" be a NULL pointer too, and you get a page fault when
it tries to decrement the usage count.

Now, it goes without saying that the cred pointer should never *be*
NULL on a filp that is on the RCU freeing list, because we always
initialize file->f_cred when we allocate a file to the current creds.

So there's something odd going on. Possibly entirely unrelated memory
corruption.

Nothing obvious stands out, I think we'd need to see more of a pattern
of the problem to see what is up.

 Linus


Re: Kernel panic - not syncing: Fatal exception in interrupt (file_free_rcu+0x14)

2017-09-28 Thread Linus Torvalds
On Thu, Sep 28, 2017 at 8:32 PM, Kyle Sanderson  wrote:
> Not sure if the stack is crap or not, but this looks like an RCU crash?
>
> https://i.imgur.com/sBnNe1p.jpg

Hmm. Not the clearest picture, and the "Code:" line in particular is
missing the interesting part, but at a guess it's taking a fault in
put_cred(), which inlines to

if (atomic_dec_and_test(&(cred)->usage))
__put_cred(cred);

and I think it's that "cred" pointer that may be NULL, which makes
"&(cred)->usage" be a NULL pointer too, and you get a page fault when
it tries to decrement the usage count.

Now, it goes without saying that the cred pointer should never *be*
NULL on a filp that is on the RCU freeing list, because we always
initialize file->f_cred when we allocate a file to the current creds.

So there's something odd going on. Possibly entirely unrelated memory
corruption.

Nothing obvious stands out, I think we'd need to see more of a pattern
of the problem to see what is up.

 Linus


[PATCH] ASoC: rockchip: Fix wrong allocation size of dapm routes

2017-09-28 Thread Jeffy Chen
The allocation size of dapm routes is wrong, correct it.

Fixes: d9f9c167edae ("ASoC: rockchip: Init dapm routes dynamically")
Signed-off-by: Jeffy Chen 
---

 sound/soc/rockchip/rk3399_gru_sound.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/sound/soc/rockchip/rk3399_gru_sound.c 
b/sound/soc/rockchip/rk3399_gru_sound.c
index 30eed83e8a13..008452e55ef8 100644
--- a/sound/soc/rockchip/rk3399_gru_sound.c
+++ b/sound/soc/rockchip/rk3399_gru_sound.c
@@ -493,14 +493,18 @@ static int rockchip_sound_of_parse_dais(struct device 
*dev,
struct device_node *np_codec;
struct snd_soc_dai_link *dai;
struct snd_soc_dapm_route *routes;
-   int i, index;
+   int i, index, max_num_routes;
 
card->dai_link = devm_kzalloc(dev, sizeof(rockchip_dais),
  GFP_KERNEL);
if (!card->dai_link)
return -ENOMEM;
 
-   routes = devm_kzalloc(dev, sizeof(rockchip_routes),
+   max_num_routes = 0;
+   for (i = 0; i < ARRAY_SIZE(rockchip_dais); i++)
+   max_num_routes += rockchip_routes[i].num_routes;
+
+   routes = devm_kzalloc(dev, max_num_routes * sizeof(*routes),
  GFP_KERNEL);
if (!routes)
return -ENOMEM;
-- 
2.11.0




[PATCH] ASoC: rockchip: Fix wrong allocation size of dapm routes

2017-09-28 Thread Jeffy Chen
The allocation size of dapm routes is wrong, correct it.

Fixes: d9f9c167edae ("ASoC: rockchip: Init dapm routes dynamically")
Signed-off-by: Jeffy Chen 
---

 sound/soc/rockchip/rk3399_gru_sound.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/sound/soc/rockchip/rk3399_gru_sound.c 
b/sound/soc/rockchip/rk3399_gru_sound.c
index 30eed83e8a13..008452e55ef8 100644
--- a/sound/soc/rockchip/rk3399_gru_sound.c
+++ b/sound/soc/rockchip/rk3399_gru_sound.c
@@ -493,14 +493,18 @@ static int rockchip_sound_of_parse_dais(struct device 
*dev,
struct device_node *np_codec;
struct snd_soc_dai_link *dai;
struct snd_soc_dapm_route *routes;
-   int i, index;
+   int i, index, max_num_routes;
 
card->dai_link = devm_kzalloc(dev, sizeof(rockchip_dais),
  GFP_KERNEL);
if (!card->dai_link)
return -ENOMEM;
 
-   routes = devm_kzalloc(dev, sizeof(rockchip_routes),
+   max_num_routes = 0;
+   for (i = 0; i < ARRAY_SIZE(rockchip_dais); i++)
+   max_num_routes += rockchip_routes[i].num_routes;
+
+   routes = devm_kzalloc(dev, max_num_routes * sizeof(*routes),
  GFP_KERNEL);
if (!routes)
return -ENOMEM;
-- 
2.11.0




Kernel panic - not syncing: Fatal exception in interrupt (file_free_rcu+0x14)

2017-09-28 Thread Kyle Sanderson
Not sure if the stack is crap or not, but this looks like an RCU crash?

https://i.imgur.com/sBnNe1p.jpg

Kyle.

FileServer ~ # uname -a
Linux FileServer.OpenWRT.local 4.12.5-gentoo #1 SMP PREEMPT Fri Aug 18
17:23:00 PDT 2017 x86_64 Intel(R) Atom(TM) CPU 330 @ 1.60GHz
GenuineIntel GNU/Linux
FileServer ~ # cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 28
model name  : Intel(R) Atom(TM) CPU  330   @ 1.60GHz
stepping: 2
microcode   : 0x20d
cpu MHz : 1999.917
cache size  : 512 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts nopl cpuid aperfmperf pni dtes64
monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm
bugs:
bogomips: 3999.83
clflush size: 64
cache_alignment : 64
address sizes   : 32 bits physical, 48 bits virtual
power management:


Kernel panic - not syncing: Fatal exception in interrupt (file_free_rcu+0x14)

2017-09-28 Thread Kyle Sanderson
Not sure if the stack is crap or not, but this looks like an RCU crash?

https://i.imgur.com/sBnNe1p.jpg

Kyle.

FileServer ~ # uname -a
Linux FileServer.OpenWRT.local 4.12.5-gentoo #1 SMP PREEMPT Fri Aug 18
17:23:00 PDT 2017 x86_64 Intel(R) Atom(TM) CPU 330 @ 1.60GHz
GenuineIntel GNU/Linux
FileServer ~ # cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 28
model name  : Intel(R) Atom(TM) CPU  330   @ 1.60GHz
stepping: 2
microcode   : 0x20d
cpu MHz : 1999.917
cache size  : 512 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts nopl cpuid aperfmperf pni dtes64
monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm
bugs:
bogomips: 3999.83
clflush size: 64
cache_alignment : 64
address sizes   : 32 bits physical, 48 bits virtual
power management:


[PATCH 1/7] regulator: axp20x: Fix poly-phase bit offset for AXP803 DCDC5/6

2017-09-28 Thread Chen-Yu Tsai
The bit offset used to check if DCDC5 and DCDC6 are tied together in
poly-phase output is wrong. It was checking against a reserved bit,
which is always false.

In reality, neither the reference design layout nor actually produced
boards tie these two buck regulators together. But we should still
fix it, just in case.

Fixes: 1dbe0ccb0631 ("regulator: axp20x-regulator: add support for AXP803")
Signed-off-by: Chen-Yu Tsai 
Tested-by: Maxime Ripard 
---
 drivers/regulator/axp20x-regulator.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/regulator/axp20x-regulator.c 
b/drivers/regulator/axp20x-regulator.c
index f18b36dd57dd..376a99b7cf5d 100644
--- a/drivers/regulator/axp20x-regulator.c
+++ b/drivers/regulator/axp20x-regulator.c
@@ -590,7 +590,7 @@ static bool axp20x_is_polyphase_slave(struct axp20x_dev 
*axp20x, int id)
case AXP803_DCDC3:
return !!(reg & BIT(6));
case AXP803_DCDC6:
-   return !!(reg & BIT(7));
+   return !!(reg & BIT(5));
}
break;
 
-- 
2.14.2



[PATCH 1/7] regulator: axp20x: Fix poly-phase bit offset for AXP803 DCDC5/6

2017-09-28 Thread Chen-Yu Tsai
The bit offset used to check if DCDC5 and DCDC6 are tied together in
poly-phase output is wrong. It was checking against a reserved bit,
which is always false.

In reality, neither the reference design layout nor actually produced
boards tie these two buck regulators together. But we should still
fix it, just in case.

Fixes: 1dbe0ccb0631 ("regulator: axp20x-regulator: add support for AXP803")
Signed-off-by: Chen-Yu Tsai 
Tested-by: Maxime Ripard 
---
 drivers/regulator/axp20x-regulator.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/regulator/axp20x-regulator.c 
b/drivers/regulator/axp20x-regulator.c
index f18b36dd57dd..376a99b7cf5d 100644
--- a/drivers/regulator/axp20x-regulator.c
+++ b/drivers/regulator/axp20x-regulator.c
@@ -590,7 +590,7 @@ static bool axp20x_is_polyphase_slave(struct axp20x_dev 
*axp20x, int id)
case AXP803_DCDC3:
return !!(reg & BIT(6));
case AXP803_DCDC6:
-   return !!(reg & BIT(7));
+   return !!(reg & BIT(5));
}
break;
 
-- 
2.14.2



[PATCH 4/7] ARM: dts: sunxi: Add dtsi for AXP81x PMIC

2017-09-28 Thread Chen-Yu Tsai
The AXP81x family of PMIC is used with the Allwinner A83T and H8 SoCs.
This includes the AXP813 and AXP818. There is no discernible difference
except the labeling. The AXP813 is paired with the A83T, while the
AXP818 is paired with the H8.

This patch adds a dtsi file for all the common bindings for these two
PMICs. Currently this is just listing all the regulator nodes. The
regulators are initialized based on their device node names.

In the future this would be expanded to include power supplies and
GPIO controllers.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/axp81x.dtsi | 139 ++
 1 file changed, 139 insertions(+)
 create mode 100644 arch/arm/boot/dts/axp81x.dtsi

diff --git a/arch/arm/boot/dts/axp81x.dtsi b/arch/arm/boot/dts/axp81x.dtsi
new file mode 100644
index ..73b761f850c5
--- /dev/null
+++ b/arch/arm/boot/dts/axp81x.dtsi
@@ -0,0 +1,139 @@
+/*
+ * Copyright 2017 Chen-Yu Tsai
+ *
+ * Chen-Yu Tsai 
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ * obtaining a copy of this software and associated documentation
+ * files (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use,
+ * copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following
+ * conditions:
+ *
+ * The above copyright notice and this permission notice shall be
+ * included in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/* AXP813/818 Integrated Power Management Chip */
+
+ {
+   interrupt-controller;
+   #interrupt-cells = <1>;
+
+   regulators {
+   /* Default work frequency for buck regulators */
+   x-powers,dcdc-freq = <3000>;
+
+   reg_dcdc1: dcdc1 {
+   };
+
+   reg_dcdc2: dcdc2 {
+   };
+
+   reg_dcdc3: dcdc3 {
+   };
+
+   reg_dcdc4: dcdc4 {
+   };
+
+   reg_dcdc5: dcdc5 {
+   };
+
+   reg_dcdc6: dcdc6 {
+   };
+
+   reg_dcdc7: dcdc7 {
+   };
+
+   reg_aldo1: aldo1 {
+   };
+
+   reg_aldo2: aldo2 {
+   };
+
+   reg_aldo3: aldo3 {
+   };
+
+   reg_dldo1: dldo1 {
+   };
+
+   reg_dldo2: dldo2 {
+   };
+
+   reg_dldo3: dldo3 {
+   };
+
+   reg_dldo4: dldo4 {
+   };
+
+   reg_eldo1: eldo1 {
+   };
+
+   reg_eldo2: eldo2 {
+   };
+
+   reg_eldo3: eldo3 {
+   };
+
+   reg_fldo1: fldo1 {
+   };
+
+   reg_fldo2: fldo2 {
+   };
+
+   reg_fldo3: fldo3 {
+   };
+
+   reg_ldo_io0: ldo-io0 {
+   /* Disable by default to avoid conflicts with GPIO */
+   status = "disabled";
+   };
+
+   reg_ldo_io1: ldo-io1 {
+   /* Disable by default to avoid conflicts with GPIO */
+   status = "disabled";
+   };
+
+   reg_rtc_ldo: rtc-ldo {
+   /* RTC_LDO is a fixed, always-on regulator */
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = 

[PATCH 4/7] ARM: dts: sunxi: Add dtsi for AXP81x PMIC

2017-09-28 Thread Chen-Yu Tsai
The AXP81x family of PMIC is used with the Allwinner A83T and H8 SoCs.
This includes the AXP813 and AXP818. There is no discernible difference
except the labeling. The AXP813 is paired with the A83T, while the
AXP818 is paired with the H8.

This patch adds a dtsi file for all the common bindings for these two
PMICs. Currently this is just listing all the regulator nodes. The
regulators are initialized based on their device node names.

In the future this would be expanded to include power supplies and
GPIO controllers.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/axp81x.dtsi | 139 ++
 1 file changed, 139 insertions(+)
 create mode 100644 arch/arm/boot/dts/axp81x.dtsi

diff --git a/arch/arm/boot/dts/axp81x.dtsi b/arch/arm/boot/dts/axp81x.dtsi
new file mode 100644
index ..73b761f850c5
--- /dev/null
+++ b/arch/arm/boot/dts/axp81x.dtsi
@@ -0,0 +1,139 @@
+/*
+ * Copyright 2017 Chen-Yu Tsai
+ *
+ * Chen-Yu Tsai 
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ * obtaining a copy of this software and associated documentation
+ * files (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use,
+ * copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following
+ * conditions:
+ *
+ * The above copyright notice and this permission notice shall be
+ * included in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/* AXP813/818 Integrated Power Management Chip */
+
+ {
+   interrupt-controller;
+   #interrupt-cells = <1>;
+
+   regulators {
+   /* Default work frequency for buck regulators */
+   x-powers,dcdc-freq = <3000>;
+
+   reg_dcdc1: dcdc1 {
+   };
+
+   reg_dcdc2: dcdc2 {
+   };
+
+   reg_dcdc3: dcdc3 {
+   };
+
+   reg_dcdc4: dcdc4 {
+   };
+
+   reg_dcdc5: dcdc5 {
+   };
+
+   reg_dcdc6: dcdc6 {
+   };
+
+   reg_dcdc7: dcdc7 {
+   };
+
+   reg_aldo1: aldo1 {
+   };
+
+   reg_aldo2: aldo2 {
+   };
+
+   reg_aldo3: aldo3 {
+   };
+
+   reg_dldo1: dldo1 {
+   };
+
+   reg_dldo2: dldo2 {
+   };
+
+   reg_dldo3: dldo3 {
+   };
+
+   reg_dldo4: dldo4 {
+   };
+
+   reg_eldo1: eldo1 {
+   };
+
+   reg_eldo2: eldo2 {
+   };
+
+   reg_eldo3: eldo3 {
+   };
+
+   reg_fldo1: fldo1 {
+   };
+
+   reg_fldo2: fldo2 {
+   };
+
+   reg_fldo3: fldo3 {
+   };
+
+   reg_ldo_io0: ldo-io0 {
+   /* Disable by default to avoid conflicts with GPIO */
+   status = "disabled";
+   };
+
+   reg_ldo_io1: ldo-io1 {
+   /* Disable by default to avoid conflicts with GPIO */
+   status = "disabled";
+   };
+
+   reg_rtc_ldo: rtc-ldo {
+   /* RTC_LDO is a fixed, always-on regulator */
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   };
+
+   

[PATCH 2/7] regulator: axp20x: Add support for AXP813 regulators

2017-09-28 Thread Chen-Yu Tsai
The AXP813 PMIC has 7 DC-DC buck regulators, 16 LDOs (including the
fixed RTC LDO and 2 GPIO LDOs), and 1 switchable. The drive-vbus
feature is also supported. All the hardware details are very similar
to the AXP803, with the following exceptions:

  - Extra DCDC7 buck regulator, with the same range as DCDC6

  - SWitch now has a separate supply pin, instead of being chained
internaly from DCDC1

  - RTC LDO output voltage is now 1.8V

  - FLDO3 is an LDO with switchable supplies, but unconfigurable output
voltage. The voltage is always half that of its supply.

Support for FLDO3 is currently unimplemented, as it requires runtime
switching of its supplies, something the regulator subsystem does not
support. It is not used in either the reference designs nor actually
produced boards available.

Signed-off-by: Chen-Yu Tsai 
Tested-by: Maxime Ripard 
---
 drivers/regulator/axp20x-regulator.c | 102 +--
 include/linux/mfd/axp20x.h   |   3 ++
 2 files changed, 101 insertions(+), 4 deletions(-)

diff --git a/drivers/regulator/axp20x-regulator.c 
b/drivers/regulator/axp20x-regulator.c
index 376a99b7cf5d..e1761df4cbfd 100644
--- a/drivers/regulator/axp20x-regulator.c
+++ b/drivers/regulator/axp20x-regulator.c
@@ -244,6 +244,7 @@ static const struct regulator_desc 
axp22x_drivevbus_regulator = {
.ops= _ops_sw,
 };
 
+/* DCDC ranges shared with AXP813 */
 static const struct regulator_linear_range axp803_dcdc234_ranges[] = {
REGULATOR_LINEAR_RANGE(50, 0x0, 0x46, 1),
REGULATOR_LINEAR_RANGE(122, 0x47, 0x4b, 2),
@@ -426,6 +427,69 @@ static const struct regulator_desc axp809_regulators[] = {
AXP_DESC_SW(AXP809, SW, "sw", "swin", AXP22X_PWR_OUT_CTRL2, BIT(6)),
 };
 
+static const struct regulator_desc axp813_regulators[] = {
+   AXP_DESC(AXP813, DCDC1, "dcdc1", "vin1", 1600, 3400, 100,
+AXP803_DCDC1_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL1, BIT(0)),
+   AXP_DESC_RANGES(AXP813, DCDC2, "dcdc2", "vin2", axp803_dcdc234_ranges,
+   76, AXP803_DCDC2_V_OUT, 0x7f, AXP22X_PWR_OUT_CTRL1,
+   BIT(1)),
+   AXP_DESC_RANGES(AXP813, DCDC3, "dcdc3", "vin3", axp803_dcdc234_ranges,
+   76, AXP803_DCDC3_V_OUT, 0x7f, AXP22X_PWR_OUT_CTRL1,
+   BIT(2)),
+   AXP_DESC_RANGES(AXP813, DCDC4, "dcdc4", "vin4", axp803_dcdc234_ranges,
+   76, AXP803_DCDC4_V_OUT, 0x7f, AXP22X_PWR_OUT_CTRL1,
+   BIT(3)),
+   AXP_DESC_RANGES(AXP813, DCDC5, "dcdc5", "vin5", axp803_dcdc5_ranges,
+   68, AXP803_DCDC5_V_OUT, 0x7f, AXP22X_PWR_OUT_CTRL1,
+   BIT(4)),
+   AXP_DESC_RANGES(AXP813, DCDC6, "dcdc6", "vin6", axp803_dcdc6_ranges,
+   72, AXP803_DCDC6_V_OUT, 0x7f, AXP22X_PWR_OUT_CTRL1,
+   BIT(5)),
+   AXP_DESC_RANGES(AXP813, DCDC7, "dcdc7", "vin7", axp803_dcdc6_ranges,
+   72, AXP813_DCDC7_V_OUT, 0x7f, AXP22X_PWR_OUT_CTRL1,
+   BIT(6)),
+   AXP_DESC(AXP813, ALDO1, "aldo1", "aldoin", 700, 3300, 100,
+AXP22X_ALDO1_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL3, BIT(5)),
+   AXP_DESC(AXP813, ALDO2, "aldo2", "aldoin", 700, 3300, 100,
+AXP22X_ALDO2_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL3, BIT(6)),
+   AXP_DESC(AXP813, ALDO3, "aldo3", "aldoin", 700, 3300, 100,
+AXP22X_ALDO3_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL3, BIT(7)),
+   AXP_DESC(AXP813, DLDO1, "dldo1", "dldoin", 700, 3300, 100,
+AXP22X_DLDO1_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(3)),
+   AXP_DESC_RANGES(AXP813, DLDO2, "dldo2", "dldoin", axp803_dldo2_ranges,
+   32, AXP22X_DLDO2_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2,
+   BIT(4)),
+   AXP_DESC(AXP813, DLDO3, "dldo3", "dldoin", 700, 3300, 100,
+AXP22X_DLDO3_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(5)),
+   AXP_DESC(AXP813, DLDO4, "dldo4", "dldoin", 700, 3300, 100,
+AXP22X_DLDO4_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(6)),
+   AXP_DESC(AXP813, ELDO1, "eldo1", "eldoin", 700, 1900, 50,
+AXP22X_ELDO1_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(0)),
+   AXP_DESC(AXP813, ELDO2, "eldo2", "eldoin", 700, 1900, 50,
+AXP22X_ELDO2_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(1)),
+   AXP_DESC(AXP813, ELDO3, "eldo3", "eldoin", 700, 1900, 50,
+AXP22X_ELDO3_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(2)),
+   /* to do / check ... */
+   AXP_DESC(AXP813, FLDO1, "fldo1", "fldoin", 700, 1450, 50,
+AXP803_FLDO1_V_OUT, 0x0f, AXP22X_PWR_OUT_CTRL3, BIT(2)),
+   AXP_DESC(AXP813, FLDO2, "fldo2", "fldoin", 700, 1450, 50,
+AXP803_FLDO2_V_OUT, 0x0f, AXP22X_PWR_OUT_CTRL3, BIT(3)),
+   /*
+* TODO: FLDO3 = {DCDC5, FLDOIN} / 2
+*
+   

Re: [RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken exclusively

2017-09-28 Thread Linus Torvalds
On Thu, Sep 28, 2017 at 6:53 PM, Mimi Zohar  wrote:
>
> The locking issue isn't with validating the file hash, but with the
> setxattr, chmod, chown syscalls.  Each of these syscalls takes the
> i_rwsem exclusively before IMA (or EVM) is called.

Read my email again.

> In setxattr, chmod, chown syscalls, IMA (and EVM) are called after the
> i_rwsem is already taken.  So the locking would be:
>
> lock: i_rwsem
> lock: iint->mutex

No.

Two locks. One inner, one outer. Only the actual ones that calculates
the hash would take the outer one. Read my email.

   Linus


[PATCH 2/7] regulator: axp20x: Add support for AXP813 regulators

2017-09-28 Thread Chen-Yu Tsai
The AXP813 PMIC has 7 DC-DC buck regulators, 16 LDOs (including the
fixed RTC LDO and 2 GPIO LDOs), and 1 switchable. The drive-vbus
feature is also supported. All the hardware details are very similar
to the AXP803, with the following exceptions:

  - Extra DCDC7 buck regulator, with the same range as DCDC6

  - SWitch now has a separate supply pin, instead of being chained
internaly from DCDC1

  - RTC LDO output voltage is now 1.8V

  - FLDO3 is an LDO with switchable supplies, but unconfigurable output
voltage. The voltage is always half that of its supply.

Support for FLDO3 is currently unimplemented, as it requires runtime
switching of its supplies, something the regulator subsystem does not
support. It is not used in either the reference designs nor actually
produced boards available.

Signed-off-by: Chen-Yu Tsai 
Tested-by: Maxime Ripard 
---
 drivers/regulator/axp20x-regulator.c | 102 +--
 include/linux/mfd/axp20x.h   |   3 ++
 2 files changed, 101 insertions(+), 4 deletions(-)

diff --git a/drivers/regulator/axp20x-regulator.c 
b/drivers/regulator/axp20x-regulator.c
index 376a99b7cf5d..e1761df4cbfd 100644
--- a/drivers/regulator/axp20x-regulator.c
+++ b/drivers/regulator/axp20x-regulator.c
@@ -244,6 +244,7 @@ static const struct regulator_desc 
axp22x_drivevbus_regulator = {
.ops= _ops_sw,
 };
 
+/* DCDC ranges shared with AXP813 */
 static const struct regulator_linear_range axp803_dcdc234_ranges[] = {
REGULATOR_LINEAR_RANGE(50, 0x0, 0x46, 1),
REGULATOR_LINEAR_RANGE(122, 0x47, 0x4b, 2),
@@ -426,6 +427,69 @@ static const struct regulator_desc axp809_regulators[] = {
AXP_DESC_SW(AXP809, SW, "sw", "swin", AXP22X_PWR_OUT_CTRL2, BIT(6)),
 };
 
+static const struct regulator_desc axp813_regulators[] = {
+   AXP_DESC(AXP813, DCDC1, "dcdc1", "vin1", 1600, 3400, 100,
+AXP803_DCDC1_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL1, BIT(0)),
+   AXP_DESC_RANGES(AXP813, DCDC2, "dcdc2", "vin2", axp803_dcdc234_ranges,
+   76, AXP803_DCDC2_V_OUT, 0x7f, AXP22X_PWR_OUT_CTRL1,
+   BIT(1)),
+   AXP_DESC_RANGES(AXP813, DCDC3, "dcdc3", "vin3", axp803_dcdc234_ranges,
+   76, AXP803_DCDC3_V_OUT, 0x7f, AXP22X_PWR_OUT_CTRL1,
+   BIT(2)),
+   AXP_DESC_RANGES(AXP813, DCDC4, "dcdc4", "vin4", axp803_dcdc234_ranges,
+   76, AXP803_DCDC4_V_OUT, 0x7f, AXP22X_PWR_OUT_CTRL1,
+   BIT(3)),
+   AXP_DESC_RANGES(AXP813, DCDC5, "dcdc5", "vin5", axp803_dcdc5_ranges,
+   68, AXP803_DCDC5_V_OUT, 0x7f, AXP22X_PWR_OUT_CTRL1,
+   BIT(4)),
+   AXP_DESC_RANGES(AXP813, DCDC6, "dcdc6", "vin6", axp803_dcdc6_ranges,
+   72, AXP803_DCDC6_V_OUT, 0x7f, AXP22X_PWR_OUT_CTRL1,
+   BIT(5)),
+   AXP_DESC_RANGES(AXP813, DCDC7, "dcdc7", "vin7", axp803_dcdc6_ranges,
+   72, AXP813_DCDC7_V_OUT, 0x7f, AXP22X_PWR_OUT_CTRL1,
+   BIT(6)),
+   AXP_DESC(AXP813, ALDO1, "aldo1", "aldoin", 700, 3300, 100,
+AXP22X_ALDO1_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL3, BIT(5)),
+   AXP_DESC(AXP813, ALDO2, "aldo2", "aldoin", 700, 3300, 100,
+AXP22X_ALDO2_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL3, BIT(6)),
+   AXP_DESC(AXP813, ALDO3, "aldo3", "aldoin", 700, 3300, 100,
+AXP22X_ALDO3_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL3, BIT(7)),
+   AXP_DESC(AXP813, DLDO1, "dldo1", "dldoin", 700, 3300, 100,
+AXP22X_DLDO1_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(3)),
+   AXP_DESC_RANGES(AXP813, DLDO2, "dldo2", "dldoin", axp803_dldo2_ranges,
+   32, AXP22X_DLDO2_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2,
+   BIT(4)),
+   AXP_DESC(AXP813, DLDO3, "dldo3", "dldoin", 700, 3300, 100,
+AXP22X_DLDO3_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(5)),
+   AXP_DESC(AXP813, DLDO4, "dldo4", "dldoin", 700, 3300, 100,
+AXP22X_DLDO4_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(6)),
+   AXP_DESC(AXP813, ELDO1, "eldo1", "eldoin", 700, 1900, 50,
+AXP22X_ELDO1_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(0)),
+   AXP_DESC(AXP813, ELDO2, "eldo2", "eldoin", 700, 1900, 50,
+AXP22X_ELDO2_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(1)),
+   AXP_DESC(AXP813, ELDO3, "eldo3", "eldoin", 700, 1900, 50,
+AXP22X_ELDO3_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(2)),
+   /* to do / check ... */
+   AXP_DESC(AXP813, FLDO1, "fldo1", "fldoin", 700, 1450, 50,
+AXP803_FLDO1_V_OUT, 0x0f, AXP22X_PWR_OUT_CTRL3, BIT(2)),
+   AXP_DESC(AXP813, FLDO2, "fldo2", "fldoin", 700, 1450, 50,
+AXP803_FLDO2_V_OUT, 0x0f, AXP22X_PWR_OUT_CTRL3, BIT(3)),
+   /*
+* TODO: FLDO3 = {DCDC5, FLDOIN} / 2
+*
+* This means FLDO3 effectively switches 

Re: [RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken exclusively

2017-09-28 Thread Linus Torvalds
On Thu, Sep 28, 2017 at 6:53 PM, Mimi Zohar  wrote:
>
> The locking issue isn't with validating the file hash, but with the
> setxattr, chmod, chown syscalls.  Each of these syscalls takes the
> i_rwsem exclusively before IMA (or EVM) is called.

Read my email again.

> In setxattr, chmod, chown syscalls, IMA (and EVM) are called after the
> i_rwsem is already taken.  So the locking would be:
>
> lock: i_rwsem
> lock: iint->mutex

No.

Two locks. One inner, one outer. Only the actual ones that calculates
the hash would take the outer one. Read my email.

   Linus


[PATCH 3/7] mfd: axp20x: Add axp20x-regulator cell for AXP813

2017-09-28 Thread Chen-Yu Tsai
Now that axp20x-regulator supports AXP813, we can add a cell for it
to enable it.

Signed-off-by: Chen-Yu Tsai 
Tested-by: Maxime Ripard 
---
 drivers/mfd/axp20x.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/mfd/axp20x.c b/drivers/mfd/axp20x.c
index 336de66ca408..2468b431bb22 100644
--- a/drivers/mfd/axp20x.c
+++ b/drivers/mfd/axp20x.c
@@ -876,6 +876,8 @@ static struct mfd_cell axp813_cells[] = {
.name   = "axp221-pek",
.num_resources  = ARRAY_SIZE(axp803_pek_resources),
.resources  = axp803_pek_resources,
+   }, {
+   .name   = "axp20x-regulator",
}
 };
 
-- 
2.14.2



[PATCH 3/7] mfd: axp20x: Add axp20x-regulator cell for AXP813

2017-09-28 Thread Chen-Yu Tsai
Now that axp20x-regulator supports AXP813, we can add a cell for it
to enable it.

Signed-off-by: Chen-Yu Tsai 
Tested-by: Maxime Ripard 
---
 drivers/mfd/axp20x.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/mfd/axp20x.c b/drivers/mfd/axp20x.c
index 336de66ca408..2468b431bb22 100644
--- a/drivers/mfd/axp20x.c
+++ b/drivers/mfd/axp20x.c
@@ -876,6 +876,8 @@ static struct mfd_cell axp813_cells[] = {
.name   = "axp221-pek",
.num_resources  = ARRAY_SIZE(axp803_pek_resources),
.resources  = axp803_pek_resources,
+   }, {
+   .name   = "axp20x-regulator",
}
 };
 
-- 
2.14.2



[PATCH 0/7] regulator: axp20x: Add support for AXP813/818 regulators

2017-09-28 Thread Chen-Yu Tsai
Hi everyone,

This series adds support for the X-Powers AXP813/818 [1] PMICs'
regulators. The series is quite straightforward. There are no compile
time dependencies between the driver patches, so each can go through
their respective (mfd and regulator) trees.

Patch 1 fixes a wrong bit offset for the AXP803 DCDC5/6 poly-phase
detection code. This code path is not exercised as we don't have any
boards that tie these two outputs together.

Patch 2 adds driver support for the AXP813 regulators. The DT binding
part was merged together with the PMIC compatible string and basic
descriptions.

Patch 3 adds a axp20x-regulator cell for AXP813, thereby enabling the
regulators.

Patch 4 adds a shared dtsi file for the PMIC. This currently contains
a list of regulator nodes, but will be expanded with Quentin's power
supply work.

Patches 5 through 7 add regulator nodes to board dts files for the A83T
boards that I have. They are not squashed together as each file has
substantial additions.

Originally my work also included enabling SDIO WiFi and Ethernet. But
the Ethernet bindings were reverted, and SDIO probing somehow didn't
work after v4.14-rc1. Everything can be found here:

https://github.com/wens/linux/tree/a83t-regulator-wifi-eth

Please have a look and merge if everything looks OK.


Regards
ChenYu


[1] AXP813 and AXP818 are functionally identical. They have different
labels and are bundled with different SoCs (A83T and H8), as a sort
of product or market segmentation.


Chen-Yu Tsai (7):
  regulator: axp20x: Fix poly-phase bit offset for AXP803 DCDC5/6
  regulator: axp20x: Add support for AXP813 regulators
  mfd: axp20x: Add axp20x-regulator cell for AXP813
  ARM: dts: sunxi: Add dtsi for AXP81x PMIC
  ARM: dts: sun8i: a83t: cubietruck-plus: Add AXP818 regulator nodes
  ARM: dts: sun8i: a83t: bananapi-m3: Add AXP813 regulator nodes
  ARM: dts: sun8i: a83t: allwinner-h8homlet-v2: Add AXP818 regulator
nodes

 .../{sun8i-a83t-bananapi-m3.dts => axp81x.dtsi}| 157 ++---
 .../boot/dts/sun8i-a83t-allwinner-h8homlet-v2.dts  | 126 -
 arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts   | 134 +-
 arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts   | 150 +++-
 drivers/mfd/axp20x.c   |   2 +
 drivers/regulator/axp20x-regulator.c   | 104 +-
 include/linux/mfd/axp20x.h |   3 +
 7 files changed, 582 insertions(+), 94 deletions(-)
 copy arch/arm/boot/dts/{sun8i-a83t-bananapi-m3.dts => axp81x.dtsi} (52%)

-- 
2.14.2



[PATCH 7/7] ARM: dts: sun8i: a83t: allwinner-h8homlet-v2: Add AXP818 regulator nodes

2017-09-28 Thread Chen-Yu Tsai
This patch adds device nodes for all the regulators of the AXP818 PMIC.
References to the 3V dummy regulator are replaced, and it is disabled.
The 3.3V and 5V are also disabled.

Signed-off-by: Chen-Yu Tsai 
---
 .../boot/dts/sun8i-a83t-allwinner-h8homlet-v2.dts  | 126 -
 1 file changed, 124 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/sun8i-a83t-allwinner-h8homlet-v2.dts 
b/arch/arm/boot/dts/sun8i-a83t-allwinner-h8homlet-v2.dts
index 1f0d60afb25b..1c7371d6bbb2 100644
--- a/arch/arm/boot/dts/sun8i-a83t-allwinner-h8homlet-v2.dts
+++ b/arch/arm/boot/dts/sun8i-a83t-allwinner-h8homlet-v2.dts
@@ -65,7 +65,7 @@
  {
pinctrl-names = "default";
pinctrl-0 = <_pins>;
-   vmmc-supply = <_vcc3v0>;
+   vmmc-supply = <_dcdc1>;
cd-gpios = < 5 6 GPIO_ACTIVE_HIGH>; /* PF6 */
bus-width = <4>;
cd-inverted;
@@ -75,7 +75,8 @@
  {
pinctrl-names = "default";
pinctrl-0 = <_8bit_emmc_pins>;
-   vmmc-supply = <_vcc3v0>;
+   vmmc-supply = <_dcdc1>;
+   vqmmc-supply = <_dcdc1>;
bus-width = <8>;
non-removable;
cap-mmc-hw-reset;
@@ -104,6 +105,8 @@
reg = <0x3a3>;
interrupt-parent = <_intc>;
interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
+   eldoin-supply = <_dcdc1>;
+   swin-supply = <_dcdc1>;
};
 
ac100: codec@e89 {
@@ -131,6 +134,125 @@
};
 };
 
+#include "axp81x.dtsi"
+
+_aldo1 {
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-name = "vcc18-csi2-dsi-efuse-hdmi";
+};
+
+_aldo2 {
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-name = "vdd-drampll-vcc18-pll-adc-cpvdd-ldoin";
+};
+
+_aldo3 {
+   regulator-always-on;
+   regulator-min-microvolt = <300>;
+   regulator-max-microvolt = <300>;
+   regulator-name = "vcc-pl-avcc";
+};
+
+_dcdc1 {
+   regulator-always-on;
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vcc-3v3";
+};
+
+_dcdc2 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpua";
+};
+
+_dcdc3 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpub";
+};
+
+_dcdc4 {
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-gpu";
+};
+
+_dcdc5 {
+   regulator-always-on;
+   regulator-min-microvolt = <150>;
+   regulator-max-microvolt = <150>;
+   regulator-name = "vcc-dram";
+};
+
+_dcdc6 {
+   regulator-always-on;
+   regulator-min-microvolt = <90>;
+   regulator-max-microvolt = <90>;
+   regulator-name = "vdd-sys-vdd09-usb0-hdmi";
+};
+
+_dldo2 {
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vcc-mipi-3v3";
+};
+
+_dldo4 {
+   /*
+* The PHY requires 20ms after all voltages are applied until core
+* logic is ready and 30ms after the reset pin is de-asserted.
+* Set a 100ms delay to account for PMIC ramp time and board traces.
+*/
+   regulator-enable-ramp-delay = <10>;
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vdd33-pd-ave-ephy";
+};
+
+_fldo1 {
+   regulator-min-microvolt = <108>;
+   regulator-max-microvolt = <132>;
+   regulator-name = "vdd12-hsic";
+};
+
+_fldo2 {
+   /*
+* Despite the embedded CPUs core not being used in any way,
+* this must remain on or the system will hang.
+*/
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpus";
+};
+
+_rtc_ldo {
+   regulator-name = "vcc-rtc-vdd1v8-io-vdd18-lvds";
+};
+
+_sw {
+   regulator-name = "vcc-wifi";
+};
+
+_vcc3v0 {
+   status = "disabled";
+};
+
+_vcc3v3 {
+   status = "disabled";
+};
+
+_vcc5v0 {
+   status = "disabled";
+};
+
  {
pinctrl-names = "default";
pinctrl-0 = <_pb_pins>;
-- 
2.14.2



[PATCH 6/7] ARM: dts: sun8i: a83t: bananapi-m3: Add AXP813 regulator nodes

2017-09-28 Thread Chen-Yu Tsai
This patch adds device nodes for all the regulators of the AXP813 PMIC.
References to the 3.3V dummy regulator are replaced, and it is disabled.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts | 134 ++-
 1 file changed, 132 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts 
b/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts
index 2bafd7e99ef7..c7dae2e5a668 100644
--- a/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts
+++ b/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts
@@ -71,7 +71,7 @@
  {
pinctrl-names = "default";
pinctrl-0 = <_pins>;
-   vmmc-supply = <_vcc3v3>;
+   vmmc-supply = <_dcdc1>;
bus-width = <4>;
cd-gpios = < 5 6 GPIO_ACTIVE_HIGH>; /* PF6 */
cd-inverted;
@@ -81,7 +81,7 @@
  {
pinctrl-names = "default";
pinctrl-0 = <_8bit_emmc_pins>;
-   vmmc-supply = <_vcc3v3>;
+   vmmc-supply = <_dcdc1>;
bus-width = <8>;
non-removable;
cap-mmc-hw-reset;
@@ -96,6 +96,10 @@
reg = <0x3a3>;
interrupt-parent = <_intc>;
interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
+   eldoin-supply = <_dcdc1>;
+   fldoin-supply = <_dcdc5>;
+   swin-supply = <_dcdc1>;
+   x-powers,drive-vbus-en;
};
 
ac100: codec@e89 {
@@ -123,6 +127,128 @@
};
 };
 
+#include "axp81x.dtsi"
+
+_aldo1 {
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-name = "vcc18-csi2-dsi-efuse-hdmi";
+};
+
+_aldo2 {
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-name = "vdd-drampll-vcc18-pll-adc-cpvdd-ldoin";
+};
+
+_aldo3 {
+   regulator-always-on;
+   regulator-min-microvolt = <300>;
+   regulator-max-microvolt = <300>;
+   regulator-name = "vcc-pl-avcc";
+};
+
+_dcdc1 {
+   /* schematics says 3.1V but FEX file says 3.3V */
+   regulator-always-on;
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vcc-3v3";
+};
+
+_dcdc2 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpua";
+};
+
+_dcdc3 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpub";
+};
+
+_dcdc4 {
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-gpu";
+};
+
+_dcdc5 {
+   regulator-always-on;
+   regulator-min-microvolt = <120>;
+   regulator-max-microvolt = <120>;
+   regulator-name = "vcc-dram";
+};
+
+_dcdc6 {
+   regulator-always-on;
+   regulator-min-microvolt = <90>;
+   regulator-max-microvolt = <90>;
+   regulator-name = "vdd-sys-vdd09-usb0-hdmi";
+};
+
+_dldo1 {
+   /*
+* This powers both the WiFi/BT module's main power, I/O supply,
+* and external pull-ups on all the data lines. It should be set
+* to the same voltage as the I/O supply (DCDC1 in this case) to
+* avoid any leakage or mismatch.
+*/
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vcc-wifi";
+};
+
+_dldo3 {
+   regulator-always-on;
+   regulator-min-microvolt = <250>;
+   regulator-max-microvolt = <250>;
+   regulator-name = "vcc-pd";
+};
+
+_drivevbus {
+   regulator-name = "usb0-vbus";
+   status = "okay";
+};
+
+_fldo1 {
+   regulator-min-microvolt = <108>;
+   regulator-max-microvolt = <132>;
+   regulator-name = "vdd12-hsic";
+};
+
+_fldo2 {
+   /*
+* Despite the embedded CPUs core not being used in any way,
+* this must remain on or the system will hang.
+*/
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpus";
+};
+
+_rtc_ldo {
+   regulator-name = "vcc-rtc-vdd1v8-io-vdd18-lvds";
+};
+
+_sw {
+   /*
+* The PHY requires 20ms after all voltages
+* are applied until core logic is ready and
+* 30ms after the reset pin is de-asserted.
+* Set a 100ms delay to account for PMIC
+* ramp time and board traces.
+*/
+   regulator-enable-ramp-delay = <10>;
+   regulator-name = "vcc-gmac";
+};
+
 _usb1_vbus {
gpio = < 3 24 GPIO_ACTIVE_HIGH>; /* PD24 */
status = "okay";
@@ -132,6 +258,10 @@
status = "disabled";
 };
 
+_vcc3v3 {
+   status = "disabled";
+};
+
 _vcc5v0 {
status = "disabled";
 };
-- 
2.14.2



[PATCH 5/7] ARM: dts: sun8i: a83t: cubietruck-plus: Add AXP818 regulator nodes

2017-09-28 Thread Chen-Yu Tsai
This patch adds device nodes for all the regulators of the AXP818 PMIC.
References to the 3.3V dummy regulator are replaced, and it is disabled.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts | 150 ++-
 1 file changed, 148 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts 
b/arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts
index 716a205c6dbb..7e1b1f6ca5f4 100644
--- a/arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts
+++ b/arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts
@@ -127,7 +127,7 @@
  {
pinctrl-names = "default";
pinctrl-0 = <_pins>;
-   vmmc-supply = <_vcc3v3>;
+   vmmc-supply = <_dcdc1>;
bus-width = <4>;
cd-gpios = < 5 6 GPIO_ACTIVE_HIGH>; /* PF6 */
cd-inverted;
@@ -137,7 +137,7 @@
  {
pinctrl-names = "default";
pinctrl-0 = <_8bit_emmc_pins>;
-   vmmc-supply = <_vcc3v3>;
+   vmmc-supply = <_dcdc1>;
bus-width = <8>;
non-removable;
cap-mmc-hw-reset;
@@ -152,6 +152,9 @@
reg = <0x3a3>;
interrupt-parent = <_intc>;
interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
+   eldoin-supply = <_dcdc1>;
+   swin-supply = <_dcdc1>;
+   x-powers,drive-vbus-en;
};
 
ac100: codec@e89 {
@@ -179,6 +182,145 @@
};
 };
 
+#include "axp81x.dtsi"
+
+_aldo1 {
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-name = "vcc18-csi2-dsi-efuse-hdmi-d4dp";
+};
+
+_aldo2 {
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-name = "vdd-drampll-vcc18-pll-adc-cpvdd-ldoin";
+};
+
+_aldo3 {
+   regulator-always-on;
+   regulator-min-microvolt = <300>;
+   regulator-max-microvolt = <300>;
+   regulator-name = "vcc-pl-avcc";
+};
+
+_dcdc1 {
+   /*
+* The schematics say this should be 3.3V, but the FEX file says
+* it should be 3V. The latter makes sense, as the WiFi module's
+* I/O is indirectly powered from DCDC1, through SW. It is rated
+* at 2.98V maximum.
+*/
+   regulator-always-on;
+   regulator-min-microvolt = <300>;
+   regulator-max-microvolt = <300>;
+   regulator-name = "vcc-3v";
+};
+
+_dcdc2 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpua";
+};
+
+_dcdc3 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpub";
+};
+
+_dcdc4 {
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-gpu";
+};
+
+_dcdc5 {
+   regulator-always-on;
+   regulator-min-microvolt = <150>;
+   regulator-max-microvolt = <150>;
+   regulator-name = "vcc-dram";
+};
+
+_dcdc6 {
+   regulator-always-on;
+   regulator-min-microvolt = <90>;
+   regulator-max-microvolt = <90>;
+   regulator-name = "vdd-sys-vdd09-usb0-hdmi";
+};
+
+_dldo2 {
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vcc-mipi-3v3-d4dpio";
+};
+
+_dldo3 {
+   regulator-always-on;
+   regulator-min-microvolt = <250>;
+   regulator-max-microvolt = <250>;
+   regulator-name = "vcc-pd-vdd25-ephy";
+};
+
+_dldo4 {
+   /*
+* The PHY requires 20ms after all voltages are applied until core
+* logic is ready and 30ms after the reset pin is de-asserted.
+* Set a 100ms delay to account for PMIC ramp time and board traces.
+*/
+   regulator-enable-ramp-delay = <10>;
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vdd33-ephy";
+};
+
+_drivevbus {
+   regulator-name = "usb0-vbus";
+   status = "okay";
+};
+
+_eldo1 {
+   regulator-min-microvolt = <120>;
+   regulator-max-microvolt = <120>;
+   regulator-name = "vdd12-d4dp-1";
+};
+
+_eldo2 {
+   regulator-min-microvolt = <120>;
+   regulator-max-microvolt = <120>;
+   regulator-name = "vdd12-d4dp-2";
+};
+
+_fldo1 {
+   /* TODO should be handled by USB PHY */
+   regulator-always-on;
+   regulator-min-microvolt = <108>;
+   regulator-max-microvolt = <132>;
+   regulator-name = "vdd12-hsic";
+};
+
+_fldo2 {
+   /*
+* Despite the embedded CPUs core not being used in any way,
+* this must remain on or the system will hang.
+*/
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpus";
+};
+

[PATCH 0/7] regulator: axp20x: Add support for AXP813/818 regulators

2017-09-28 Thread Chen-Yu Tsai
Hi everyone,

This series adds support for the X-Powers AXP813/818 [1] PMICs'
regulators. The series is quite straightforward. There are no compile
time dependencies between the driver patches, so each can go through
their respective (mfd and regulator) trees.

Patch 1 fixes a wrong bit offset for the AXP803 DCDC5/6 poly-phase
detection code. This code path is not exercised as we don't have any
boards that tie these two outputs together.

Patch 2 adds driver support for the AXP813 regulators. The DT binding
part was merged together with the PMIC compatible string and basic
descriptions.

Patch 3 adds a axp20x-regulator cell for AXP813, thereby enabling the
regulators.

Patch 4 adds a shared dtsi file for the PMIC. This currently contains
a list of regulator nodes, but will be expanded with Quentin's power
supply work.

Patches 5 through 7 add regulator nodes to board dts files for the A83T
boards that I have. They are not squashed together as each file has
substantial additions.

Originally my work also included enabling SDIO WiFi and Ethernet. But
the Ethernet bindings were reverted, and SDIO probing somehow didn't
work after v4.14-rc1. Everything can be found here:

https://github.com/wens/linux/tree/a83t-regulator-wifi-eth

Please have a look and merge if everything looks OK.


Regards
ChenYu


[1] AXP813 and AXP818 are functionally identical. They have different
labels and are bundled with different SoCs (A83T and H8), as a sort
of product or market segmentation.


Chen-Yu Tsai (7):
  regulator: axp20x: Fix poly-phase bit offset for AXP803 DCDC5/6
  regulator: axp20x: Add support for AXP813 regulators
  mfd: axp20x: Add axp20x-regulator cell for AXP813
  ARM: dts: sunxi: Add dtsi for AXP81x PMIC
  ARM: dts: sun8i: a83t: cubietruck-plus: Add AXP818 regulator nodes
  ARM: dts: sun8i: a83t: bananapi-m3: Add AXP813 regulator nodes
  ARM: dts: sun8i: a83t: allwinner-h8homlet-v2: Add AXP818 regulator
nodes

 .../{sun8i-a83t-bananapi-m3.dts => axp81x.dtsi}| 157 ++---
 .../boot/dts/sun8i-a83t-allwinner-h8homlet-v2.dts  | 126 -
 arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts   | 134 +-
 arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts   | 150 +++-
 drivers/mfd/axp20x.c   |   2 +
 drivers/regulator/axp20x-regulator.c   | 104 +-
 include/linux/mfd/axp20x.h |   3 +
 7 files changed, 582 insertions(+), 94 deletions(-)
 copy arch/arm/boot/dts/{sun8i-a83t-bananapi-m3.dts => axp81x.dtsi} (52%)

-- 
2.14.2



[PATCH 7/7] ARM: dts: sun8i: a83t: allwinner-h8homlet-v2: Add AXP818 regulator nodes

2017-09-28 Thread Chen-Yu Tsai
This patch adds device nodes for all the regulators of the AXP818 PMIC.
References to the 3V dummy regulator are replaced, and it is disabled.
The 3.3V and 5V are also disabled.

Signed-off-by: Chen-Yu Tsai 
---
 .../boot/dts/sun8i-a83t-allwinner-h8homlet-v2.dts  | 126 -
 1 file changed, 124 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/sun8i-a83t-allwinner-h8homlet-v2.dts 
b/arch/arm/boot/dts/sun8i-a83t-allwinner-h8homlet-v2.dts
index 1f0d60afb25b..1c7371d6bbb2 100644
--- a/arch/arm/boot/dts/sun8i-a83t-allwinner-h8homlet-v2.dts
+++ b/arch/arm/boot/dts/sun8i-a83t-allwinner-h8homlet-v2.dts
@@ -65,7 +65,7 @@
  {
pinctrl-names = "default";
pinctrl-0 = <_pins>;
-   vmmc-supply = <_vcc3v0>;
+   vmmc-supply = <_dcdc1>;
cd-gpios = < 5 6 GPIO_ACTIVE_HIGH>; /* PF6 */
bus-width = <4>;
cd-inverted;
@@ -75,7 +75,8 @@
  {
pinctrl-names = "default";
pinctrl-0 = <_8bit_emmc_pins>;
-   vmmc-supply = <_vcc3v0>;
+   vmmc-supply = <_dcdc1>;
+   vqmmc-supply = <_dcdc1>;
bus-width = <8>;
non-removable;
cap-mmc-hw-reset;
@@ -104,6 +105,8 @@
reg = <0x3a3>;
interrupt-parent = <_intc>;
interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
+   eldoin-supply = <_dcdc1>;
+   swin-supply = <_dcdc1>;
};
 
ac100: codec@e89 {
@@ -131,6 +134,125 @@
};
 };
 
+#include "axp81x.dtsi"
+
+_aldo1 {
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-name = "vcc18-csi2-dsi-efuse-hdmi";
+};
+
+_aldo2 {
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-name = "vdd-drampll-vcc18-pll-adc-cpvdd-ldoin";
+};
+
+_aldo3 {
+   regulator-always-on;
+   regulator-min-microvolt = <300>;
+   regulator-max-microvolt = <300>;
+   regulator-name = "vcc-pl-avcc";
+};
+
+_dcdc1 {
+   regulator-always-on;
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vcc-3v3";
+};
+
+_dcdc2 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpua";
+};
+
+_dcdc3 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpub";
+};
+
+_dcdc4 {
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-gpu";
+};
+
+_dcdc5 {
+   regulator-always-on;
+   regulator-min-microvolt = <150>;
+   regulator-max-microvolt = <150>;
+   regulator-name = "vcc-dram";
+};
+
+_dcdc6 {
+   regulator-always-on;
+   regulator-min-microvolt = <90>;
+   regulator-max-microvolt = <90>;
+   regulator-name = "vdd-sys-vdd09-usb0-hdmi";
+};
+
+_dldo2 {
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vcc-mipi-3v3";
+};
+
+_dldo4 {
+   /*
+* The PHY requires 20ms after all voltages are applied until core
+* logic is ready and 30ms after the reset pin is de-asserted.
+* Set a 100ms delay to account for PMIC ramp time and board traces.
+*/
+   regulator-enable-ramp-delay = <10>;
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vdd33-pd-ave-ephy";
+};
+
+_fldo1 {
+   regulator-min-microvolt = <108>;
+   regulator-max-microvolt = <132>;
+   regulator-name = "vdd12-hsic";
+};
+
+_fldo2 {
+   /*
+* Despite the embedded CPUs core not being used in any way,
+* this must remain on or the system will hang.
+*/
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpus";
+};
+
+_rtc_ldo {
+   regulator-name = "vcc-rtc-vdd1v8-io-vdd18-lvds";
+};
+
+_sw {
+   regulator-name = "vcc-wifi";
+};
+
+_vcc3v0 {
+   status = "disabled";
+};
+
+_vcc3v3 {
+   status = "disabled";
+};
+
+_vcc5v0 {
+   status = "disabled";
+};
+
  {
pinctrl-names = "default";
pinctrl-0 = <_pb_pins>;
-- 
2.14.2



[PATCH 6/7] ARM: dts: sun8i: a83t: bananapi-m3: Add AXP813 regulator nodes

2017-09-28 Thread Chen-Yu Tsai
This patch adds device nodes for all the regulators of the AXP813 PMIC.
References to the 3.3V dummy regulator are replaced, and it is disabled.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts | 134 ++-
 1 file changed, 132 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts 
b/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts
index 2bafd7e99ef7..c7dae2e5a668 100644
--- a/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts
+++ b/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts
@@ -71,7 +71,7 @@
  {
pinctrl-names = "default";
pinctrl-0 = <_pins>;
-   vmmc-supply = <_vcc3v3>;
+   vmmc-supply = <_dcdc1>;
bus-width = <4>;
cd-gpios = < 5 6 GPIO_ACTIVE_HIGH>; /* PF6 */
cd-inverted;
@@ -81,7 +81,7 @@
  {
pinctrl-names = "default";
pinctrl-0 = <_8bit_emmc_pins>;
-   vmmc-supply = <_vcc3v3>;
+   vmmc-supply = <_dcdc1>;
bus-width = <8>;
non-removable;
cap-mmc-hw-reset;
@@ -96,6 +96,10 @@
reg = <0x3a3>;
interrupt-parent = <_intc>;
interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
+   eldoin-supply = <_dcdc1>;
+   fldoin-supply = <_dcdc5>;
+   swin-supply = <_dcdc1>;
+   x-powers,drive-vbus-en;
};
 
ac100: codec@e89 {
@@ -123,6 +127,128 @@
};
 };
 
+#include "axp81x.dtsi"
+
+_aldo1 {
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-name = "vcc18-csi2-dsi-efuse-hdmi";
+};
+
+_aldo2 {
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-name = "vdd-drampll-vcc18-pll-adc-cpvdd-ldoin";
+};
+
+_aldo3 {
+   regulator-always-on;
+   regulator-min-microvolt = <300>;
+   regulator-max-microvolt = <300>;
+   regulator-name = "vcc-pl-avcc";
+};
+
+_dcdc1 {
+   /* schematics says 3.1V but FEX file says 3.3V */
+   regulator-always-on;
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vcc-3v3";
+};
+
+_dcdc2 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpua";
+};
+
+_dcdc3 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpub";
+};
+
+_dcdc4 {
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-gpu";
+};
+
+_dcdc5 {
+   regulator-always-on;
+   regulator-min-microvolt = <120>;
+   regulator-max-microvolt = <120>;
+   regulator-name = "vcc-dram";
+};
+
+_dcdc6 {
+   regulator-always-on;
+   regulator-min-microvolt = <90>;
+   regulator-max-microvolt = <90>;
+   regulator-name = "vdd-sys-vdd09-usb0-hdmi";
+};
+
+_dldo1 {
+   /*
+* This powers both the WiFi/BT module's main power, I/O supply,
+* and external pull-ups on all the data lines. It should be set
+* to the same voltage as the I/O supply (DCDC1 in this case) to
+* avoid any leakage or mismatch.
+*/
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vcc-wifi";
+};
+
+_dldo3 {
+   regulator-always-on;
+   regulator-min-microvolt = <250>;
+   regulator-max-microvolt = <250>;
+   regulator-name = "vcc-pd";
+};
+
+_drivevbus {
+   regulator-name = "usb0-vbus";
+   status = "okay";
+};
+
+_fldo1 {
+   regulator-min-microvolt = <108>;
+   regulator-max-microvolt = <132>;
+   regulator-name = "vdd12-hsic";
+};
+
+_fldo2 {
+   /*
+* Despite the embedded CPUs core not being used in any way,
+* this must remain on or the system will hang.
+*/
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpus";
+};
+
+_rtc_ldo {
+   regulator-name = "vcc-rtc-vdd1v8-io-vdd18-lvds";
+};
+
+_sw {
+   /*
+* The PHY requires 20ms after all voltages
+* are applied until core logic is ready and
+* 30ms after the reset pin is de-asserted.
+* Set a 100ms delay to account for PMIC
+* ramp time and board traces.
+*/
+   regulator-enable-ramp-delay = <10>;
+   regulator-name = "vcc-gmac";
+};
+
 _usb1_vbus {
gpio = < 3 24 GPIO_ACTIVE_HIGH>; /* PD24 */
status = "okay";
@@ -132,6 +258,10 @@
status = "disabled";
 };
 
+_vcc3v3 {
+   status = "disabled";
+};
+
 _vcc5v0 {
status = "disabled";
 };
-- 
2.14.2



[PATCH 5/7] ARM: dts: sun8i: a83t: cubietruck-plus: Add AXP818 regulator nodes

2017-09-28 Thread Chen-Yu Tsai
This patch adds device nodes for all the regulators of the AXP818 PMIC.
References to the 3.3V dummy regulator are replaced, and it is disabled.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts | 150 ++-
 1 file changed, 148 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts 
b/arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts
index 716a205c6dbb..7e1b1f6ca5f4 100644
--- a/arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts
+++ b/arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts
@@ -127,7 +127,7 @@
  {
pinctrl-names = "default";
pinctrl-0 = <_pins>;
-   vmmc-supply = <_vcc3v3>;
+   vmmc-supply = <_dcdc1>;
bus-width = <4>;
cd-gpios = < 5 6 GPIO_ACTIVE_HIGH>; /* PF6 */
cd-inverted;
@@ -137,7 +137,7 @@
  {
pinctrl-names = "default";
pinctrl-0 = <_8bit_emmc_pins>;
-   vmmc-supply = <_vcc3v3>;
+   vmmc-supply = <_dcdc1>;
bus-width = <8>;
non-removable;
cap-mmc-hw-reset;
@@ -152,6 +152,9 @@
reg = <0x3a3>;
interrupt-parent = <_intc>;
interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
+   eldoin-supply = <_dcdc1>;
+   swin-supply = <_dcdc1>;
+   x-powers,drive-vbus-en;
};
 
ac100: codec@e89 {
@@ -179,6 +182,145 @@
};
 };
 
+#include "axp81x.dtsi"
+
+_aldo1 {
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-name = "vcc18-csi2-dsi-efuse-hdmi-d4dp";
+};
+
+_aldo2 {
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-name = "vdd-drampll-vcc18-pll-adc-cpvdd-ldoin";
+};
+
+_aldo3 {
+   regulator-always-on;
+   regulator-min-microvolt = <300>;
+   regulator-max-microvolt = <300>;
+   regulator-name = "vcc-pl-avcc";
+};
+
+_dcdc1 {
+   /*
+* The schematics say this should be 3.3V, but the FEX file says
+* it should be 3V. The latter makes sense, as the WiFi module's
+* I/O is indirectly powered from DCDC1, through SW. It is rated
+* at 2.98V maximum.
+*/
+   regulator-always-on;
+   regulator-min-microvolt = <300>;
+   regulator-max-microvolt = <300>;
+   regulator-name = "vcc-3v";
+};
+
+_dcdc2 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpua";
+};
+
+_dcdc3 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpub";
+};
+
+_dcdc4 {
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-gpu";
+};
+
+_dcdc5 {
+   regulator-always-on;
+   regulator-min-microvolt = <150>;
+   regulator-max-microvolt = <150>;
+   regulator-name = "vcc-dram";
+};
+
+_dcdc6 {
+   regulator-always-on;
+   regulator-min-microvolt = <90>;
+   regulator-max-microvolt = <90>;
+   regulator-name = "vdd-sys-vdd09-usb0-hdmi";
+};
+
+_dldo2 {
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vcc-mipi-3v3-d4dpio";
+};
+
+_dldo3 {
+   regulator-always-on;
+   regulator-min-microvolt = <250>;
+   regulator-max-microvolt = <250>;
+   regulator-name = "vcc-pd-vdd25-ephy";
+};
+
+_dldo4 {
+   /*
+* The PHY requires 20ms after all voltages are applied until core
+* logic is ready and 30ms after the reset pin is de-asserted.
+* Set a 100ms delay to account for PMIC ramp time and board traces.
+*/
+   regulator-enable-ramp-delay = <10>;
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vdd33-ephy";
+};
+
+_drivevbus {
+   regulator-name = "usb0-vbus";
+   status = "okay";
+};
+
+_eldo1 {
+   regulator-min-microvolt = <120>;
+   regulator-max-microvolt = <120>;
+   regulator-name = "vdd12-d4dp-1";
+};
+
+_eldo2 {
+   regulator-min-microvolt = <120>;
+   regulator-max-microvolt = <120>;
+   regulator-name = "vdd12-d4dp-2";
+};
+
+_fldo1 {
+   /* TODO should be handled by USB PHY */
+   regulator-always-on;
+   regulator-min-microvolt = <108>;
+   regulator-max-microvolt = <132>;
+   regulator-name = "vdd12-hsic";
+};
+
+_fldo2 {
+   /*
+* Despite the embedded CPUs core not being used in any way,
+* this must remain on or the system will hang.
+*/
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpus";
+};
+
+_rtc_ldo {
+   

[RFC PATCH v6 0/3] ACPI / EC: Tune the timing of EC events arrival during S3-exit

2017-09-28 Thread Lv Zheng
If EC events occurred during BIOS S3-exit and early OS S3-exit steps can
be detected by OS earlier, then there can be less driver order issues
between acpi_ec_resume() and some other drivers' .resume() hook (e.x.
acpi_button_resume()).

However there are known facts that EC FW does drop EC events during S3,
and it takes time for EC FW to initialize (maximum 1.4 observed) while
Windows acts normally, so detecting EC event earlier might just be a
workaround for other drivers (they should be aware of this order issue and
deal with it themselves). As such, this patchset is marked as an RFC.

If Linux EC driver started to detect events during early OS S3-exit, it
need to timely poll EC events during noirq stages as in this stage there is
no EC event triggering source.

This patchset implements earlier EC event handling for Linux.

Lv Zheng (3):
  ACPI / EC: Fix possible driver order issue by moving EC event handling
earlier
  ACPI / EC: Add event detection support for noirq stages
  ACPI / EC: Enable noirq stage event detection

 drivers/acpi/ec.c   | 128 +++-
 drivers/acpi/internal.h |   1 +
 2 files changed, 118 insertions(+), 11 deletions(-)

-- 
2.7.4



[RFC PATCH v6 1/3] ACPI / EC: Fix possible driver order issue by moving EC event handling earlier

2017-09-28 Thread Lv Zheng
This patch tries to detect EC events earlier after resume, so that if an
event occurred before invoking acpi_ec_unblock_transactions(), it could be
detected by acpi_ec_unblock_transactions() which is the earliest EC driver
call after resume.

However after the noirq stage, if an event ocurred after
acpi_ec_unblock_transactions() and before acpi_ec_resume(), there was no
mean to detect and trigger it right then, but can only detect it and handle
it after acpi_ec_resume().

Now the final logic is:
1. If ec_freeze_events=Y, event handling is stopped in acpi_ec_suspend(),
   restarted in acpi_ec_resume();
2. If ec_freeze_events=N, event handling is stopped in
   acpi_ec_block_transactions(), restarted in
   acpi_ec_unblock_transactions();
3. In order to handling the conflict of the edge-trigger nature of EC IRQ
   and the Linux noirq stage, advance_transaction() is invoked where the
   event handling is enabled and the noirq stage is ended.

Known issue:
1. Event ocurred between acpi_ec_unblock_transactions() and
   acpi_ec_resume() may still lead to the order issue. This can only be
   fixed by adding a periodic detection mechanism during the noirq stage.

Signed-off-by: Lv Zheng 
Tested-by: Tomislav Ivek 
Tested-by: Luya Tshimbalanga 
---
 drivers/acpi/ec.c | 35 ++-
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index df84246..f1f320b 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -249,6 +249,11 @@ static bool acpi_ec_started(struct acpi_ec *ec)
   !test_bit(EC_FLAGS_STOPPED, >flags);
 }
 
+static bool acpi_ec_no_sleep_events(void)
+{
+   return acpi_sleep_no_ec_events() && ec_freeze_events;
+}
+
 static bool acpi_ec_event_enabled(struct acpi_ec *ec)
 {
/*
@@ -260,14 +265,14 @@ static bool acpi_ec_event_enabled(struct acpi_ec *ec)
return false;
/*
 * However, disabling the event handling is experimental for late
-* stage (suspend), and is controlled by the boot parameter of
-* "ec_freeze_events":
+* stage (suspend), and is controlled by
+* "acpi_ec_no_sleep_events()":
 * 1. true:  The EC event handling is disabled before entering
 *   the noirq stage.
 * 2. false: The EC event handling is automatically disabled as
 *   soon as the EC driver is stopped.
 */
-   if (ec_freeze_events)
+   if (acpi_ec_no_sleep_events())
return acpi_ec_started(ec);
else
return test_bit(EC_FLAGS_STARTED, >flags);
@@ -524,8 +529,8 @@ static bool acpi_ec_query_flushed(struct acpi_ec *ec)
 static void __acpi_ec_flush_event(struct acpi_ec *ec)
 {
/*
-* When ec_freeze_events is true, we need to flush events in
-* the proper position before entering the noirq stage.
+* When acpi_ec_no_sleep_events() is true, we need to flush events
+* in the proper position before entering the noirq stage.
 */
wait_event(ec->wait, acpi_ec_query_flushed(ec));
if (ec_query_wq)
@@ -948,7 +953,8 @@ static void acpi_ec_start(struct acpi_ec *ec, bool resuming)
if (!resuming) {
acpi_ec_submit_request(ec);
ec_dbg_ref(ec, "Increase driver");
-   }
+   } else if (!acpi_ec_no_sleep_events())
+   __acpi_ec_enable_event(ec);
ec_log_drv("EC started");
}
spin_unlock_irqrestore(>lock, flags);
@@ -980,7 +986,7 @@ static void acpi_ec_stop(struct acpi_ec *ec, bool 
suspending)
if (!suspending) {
acpi_ec_complete_request(ec);
ec_dbg_ref(ec, "Decrease driver");
-   } else if (!ec_freeze_events)
+   } else if (!acpi_ec_no_sleep_events())
__acpi_ec_disable_event(ec);
clear_bit(EC_FLAGS_STARTED, >flags);
clear_bit(EC_FLAGS_STOPPED, >flags);
@@ -1910,7 +1916,7 @@ static int acpi_ec_suspend(struct device *dev)
struct acpi_ec *ec =
acpi_driver_data(to_acpi_device(dev));
 
-   if (acpi_sleep_no_ec_events() && ec_freeze_events)
+   if (acpi_ec_no_sleep_events())
acpi_ec_disable_event(ec);
return 0;
 }
@@ -1946,7 +1952,18 @@ static int acpi_ec_resume(struct device *dev)
struct acpi_ec *ec =
acpi_driver_data(to_acpi_device(dev));
 
-   acpi_ec_enable_event(ec);
+   if (acpi_ec_no_sleep_events())
+   acpi_ec_enable_event(ec);
+   else {
+   /*
+* Though whether there is an event pending has been
+* checked in acpi_ec_unblock_transactions() when
+* acpi_ec_no_sleep_events() is false, check it one more
+* time after 

[RFC PATCH v6 2/3] ACPI / EC: Add event detection support for noirq stages

2017-09-28 Thread Lv Zheng
This patch adds a timer to poll EC events:
1. between acpi_ec_suspend() and acpi_ec_block_transactions(),
2. between acpi_ec_unblock_transactions() and acpi_ec_resume().
During these periods, if an EC event occurred, we have not mean to detect
it. Thus the events occurred in late S3-entry could be dropped, and the
events occurred in early S3-exit could be deferred to acpi_ec_resume().

This patch solves event losses in S3-entry and resume order in S3-exit by
timely polling EC events during these periods.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=196129 [#1]
Signed-off-by: Lv Zheng 
Tested-by: Tomislav Ivek 
---
 drivers/acpi/ec.c   | 93 +++--
 drivers/acpi/internal.h |  1 +
 2 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index f1f320b..389c499 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "internal.h"
@@ -102,6 +103,7 @@ enum ec_command {
 #define ACPI_EC_CLEAR_MAX  100 /* Maximum number of events to query
 * when trying to clear the EC */
 #define ACPI_EC_MAX_QUERIES16  /* Maximum number of parallel queries */
+#define ACPI_EC_EVENT_INTERVAL 500 /* Detecting event every 500ms */
 
 enum {
EC_FLAGS_QUERY_ENABLED, /* Query is enabled */
@@ -113,6 +115,7 @@ enum {
EC_FLAGS_STARTED,   /* Driver is started */
EC_FLAGS_STOPPED,   /* Driver is stopped */
EC_FLAGS_GPE_MASKED,/* GPE masked */
+   EC_FLAGS_GPE_POLLING,   /* GPE polling is enabled */
 };
 
 #define ACPI_EC_COMMAND_POLL   0x01 /* Available for command byte */
@@ -154,6 +157,15 @@ static bool ec_no_wakeup __read_mostly;
 module_param(ec_no_wakeup, bool, 0644);
 MODULE_PARM_DESC(ec_no_wakeup, "Do not wake up from suspend-to-idle");
 
+static bool ec_detect_noirq_events __read_mostly;
+module_param(ec_detect_noirq_events, bool, 0644);
+MODULE_PARM_DESC(ec_detect_noirq_events, "Enabling event detection during 
noirq stage");
+
+static unsigned int
+ec_detect_noirq_interval __read_mostly = ACPI_EC_EVENT_INTERVAL;
+module_param(ec_detect_noirq_interval, uint, 0644);
+MODULE_PARM_DESC(ec_detect_noirq_interval, "Event detection interval(ms) 
during noirq stage");
+
 struct acpi_ec_query_handler {
struct list_head node;
acpi_ec_query_func func;
@@ -358,6 +370,48 @@ static inline bool acpi_ec_is_gpe_raised(struct acpi_ec 
*ec)
return (gpe_status & ACPI_EVENT_FLAG_STATUS_SET) ? true : false;
 }
 
+static void acpi_ec_gpe_tick(struct acpi_ec *ec)
+{
+   mod_timer(>timer,
+ jiffies + msecs_to_jiffies(ec_detect_noirq_interval));
+}
+
+static void ec_start_gpe_poller(struct acpi_ec *ec)
+{
+   unsigned long flags;
+   bool start_tick = false;
+
+   if (!acpi_ec_no_sleep_events() || !ec_detect_noirq_events)
+   return;
+   spin_lock_irqsave(>lock, flags);
+   if (!test_and_set_bit(EC_FLAGS_GPE_POLLING, >flags)) {
+   ec_log_drv("GPE poller started");
+   start_tick = true;
+   /* kick off GPE polling without delay */
+   advance_transaction(ec);
+   }
+   spin_unlock_irqrestore(>lock, flags);
+   if (start_tick)
+   acpi_ec_gpe_tick(ec);
+}
+
+static void ec_stop_gpe_poller(struct acpi_ec *ec)
+{
+   unsigned long flags;
+   bool stop_tick = false;
+
+   if (!acpi_ec_no_sleep_events() || !ec_detect_noirq_events)
+   return;
+   spin_lock_irqsave(>lock, flags);
+   if (test_and_clear_bit(EC_FLAGS_GPE_POLLING, >flags))
+   stop_tick = true;
+   spin_unlock_irqrestore(>lock, flags);
+   if (stop_tick) {
+   del_timer_sync(>timer);
+   ec_log_drv("GPE poller stopped");
+   }
+}
+
 static inline void acpi_ec_enable_gpe(struct acpi_ec *ec, bool open)
 {
if (open)
@@ -1017,6 +1071,12 @@ static void acpi_ec_leave_noirq(struct acpi_ec *ec)
spin_unlock_irqrestore(>lock, flags);
 }
 
+/*
+ * Note: this API is prepared for tuning the order of the ACPI
+ * suspend/resume steps as the last entry of EC during suspend, thus it
+ * must be invoked after acpi_ec_suspend() or everything should be done in
+ * acpi_ec_suspend().
+ */
 void acpi_ec_block_transactions(void)
 {
struct acpi_ec *ec = first_ec;
@@ -1028,16 +1088,28 @@ void acpi_ec_block_transactions(void)
/* Prevent transactions from being carried out */
acpi_ec_stop(ec, true);
mutex_unlock(>mutex);
+   ec_stop_gpe_poller(ec);
 }
 
+/*
+ * Note: this API is prepared for tuning the order of the ACPI
+ * suspend/resume steps as the first entry of EC during resume, thus it
+ * must be invoked before acpi_ec_resume() or everything should be done in
+ * 

[RFC PATCH v6 3/3] ACPI / EC: Enable noirq stage event detection

2017-09-28 Thread Lv Zheng
This patch enables noirq stage event detection for the EC driver.

EC is a very special driver, required to detecting events throughout the
entire suspend/resume process. Thus this patch enables event detection for
EC during noirq stages to meet this requirement. This is done by making
sure that the EC sleep APIs:
  acpi_ec_block_transactions()
  acpi_ec_unblock_transactions()
rather than the EC driver suspend/resume hooks:
  acpi_ec_suspend()
  acpi_ec_resume()
are the boundary of the EC event handling during suspend/resume, so that
the ACPI sleep core can tune their invocation timing to handle special BIOS
requirements.

If this commit is bisected to be a regression culprit, please report this
to bugzilla.kernel.org for further investigation.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=196129
Signed-off-by: Lv Zheng 
Tested-by: Tomislav Ivek 
---
 drivers/acpi/ec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index 389c499..a48a2b3 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -157,7 +157,7 @@ static bool ec_no_wakeup __read_mostly;
 module_param(ec_no_wakeup, bool, 0644);
 MODULE_PARM_DESC(ec_no_wakeup, "Do not wake up from suspend-to-idle");
 
-static bool ec_detect_noirq_events __read_mostly;
+static bool ec_detect_noirq_events __read_mostly = true;
 module_param(ec_detect_noirq_events, bool, 0644);
 MODULE_PARM_DESC(ec_detect_noirq_events, "Enabling event detection during 
noirq stage");
 
-- 
2.7.4



[RFC PATCH v6 0/3] ACPI / EC: Tune the timing of EC events arrival during S3-exit

2017-09-28 Thread Lv Zheng
If EC events occurred during BIOS S3-exit and early OS S3-exit steps can
be detected by OS earlier, then there can be less driver order issues
between acpi_ec_resume() and some other drivers' .resume() hook (e.x.
acpi_button_resume()).

However there are known facts that EC FW does drop EC events during S3,
and it takes time for EC FW to initialize (maximum 1.4 observed) while
Windows acts normally, so detecting EC event earlier might just be a
workaround for other drivers (they should be aware of this order issue and
deal with it themselves). As such, this patchset is marked as an RFC.

If Linux EC driver started to detect events during early OS S3-exit, it
need to timely poll EC events during noirq stages as in this stage there is
no EC event triggering source.

This patchset implements earlier EC event handling for Linux.

Lv Zheng (3):
  ACPI / EC: Fix possible driver order issue by moving EC event handling
earlier
  ACPI / EC: Add event detection support for noirq stages
  ACPI / EC: Enable noirq stage event detection

 drivers/acpi/ec.c   | 128 +++-
 drivers/acpi/internal.h |   1 +
 2 files changed, 118 insertions(+), 11 deletions(-)

-- 
2.7.4



[RFC PATCH v6 1/3] ACPI / EC: Fix possible driver order issue by moving EC event handling earlier

2017-09-28 Thread Lv Zheng
This patch tries to detect EC events earlier after resume, so that if an
event occurred before invoking acpi_ec_unblock_transactions(), it could be
detected by acpi_ec_unblock_transactions() which is the earliest EC driver
call after resume.

However after the noirq stage, if an event ocurred after
acpi_ec_unblock_transactions() and before acpi_ec_resume(), there was no
mean to detect and trigger it right then, but can only detect it and handle
it after acpi_ec_resume().

Now the final logic is:
1. If ec_freeze_events=Y, event handling is stopped in acpi_ec_suspend(),
   restarted in acpi_ec_resume();
2. If ec_freeze_events=N, event handling is stopped in
   acpi_ec_block_transactions(), restarted in
   acpi_ec_unblock_transactions();
3. In order to handling the conflict of the edge-trigger nature of EC IRQ
   and the Linux noirq stage, advance_transaction() is invoked where the
   event handling is enabled and the noirq stage is ended.

Known issue:
1. Event ocurred between acpi_ec_unblock_transactions() and
   acpi_ec_resume() may still lead to the order issue. This can only be
   fixed by adding a periodic detection mechanism during the noirq stage.

Signed-off-by: Lv Zheng 
Tested-by: Tomislav Ivek 
Tested-by: Luya Tshimbalanga 
---
 drivers/acpi/ec.c | 35 ++-
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index df84246..f1f320b 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -249,6 +249,11 @@ static bool acpi_ec_started(struct acpi_ec *ec)
   !test_bit(EC_FLAGS_STOPPED, >flags);
 }
 
+static bool acpi_ec_no_sleep_events(void)
+{
+   return acpi_sleep_no_ec_events() && ec_freeze_events;
+}
+
 static bool acpi_ec_event_enabled(struct acpi_ec *ec)
 {
/*
@@ -260,14 +265,14 @@ static bool acpi_ec_event_enabled(struct acpi_ec *ec)
return false;
/*
 * However, disabling the event handling is experimental for late
-* stage (suspend), and is controlled by the boot parameter of
-* "ec_freeze_events":
+* stage (suspend), and is controlled by
+* "acpi_ec_no_sleep_events()":
 * 1. true:  The EC event handling is disabled before entering
 *   the noirq stage.
 * 2. false: The EC event handling is automatically disabled as
 *   soon as the EC driver is stopped.
 */
-   if (ec_freeze_events)
+   if (acpi_ec_no_sleep_events())
return acpi_ec_started(ec);
else
return test_bit(EC_FLAGS_STARTED, >flags);
@@ -524,8 +529,8 @@ static bool acpi_ec_query_flushed(struct acpi_ec *ec)
 static void __acpi_ec_flush_event(struct acpi_ec *ec)
 {
/*
-* When ec_freeze_events is true, we need to flush events in
-* the proper position before entering the noirq stage.
+* When acpi_ec_no_sleep_events() is true, we need to flush events
+* in the proper position before entering the noirq stage.
 */
wait_event(ec->wait, acpi_ec_query_flushed(ec));
if (ec_query_wq)
@@ -948,7 +953,8 @@ static void acpi_ec_start(struct acpi_ec *ec, bool resuming)
if (!resuming) {
acpi_ec_submit_request(ec);
ec_dbg_ref(ec, "Increase driver");
-   }
+   } else if (!acpi_ec_no_sleep_events())
+   __acpi_ec_enable_event(ec);
ec_log_drv("EC started");
}
spin_unlock_irqrestore(>lock, flags);
@@ -980,7 +986,7 @@ static void acpi_ec_stop(struct acpi_ec *ec, bool 
suspending)
if (!suspending) {
acpi_ec_complete_request(ec);
ec_dbg_ref(ec, "Decrease driver");
-   } else if (!ec_freeze_events)
+   } else if (!acpi_ec_no_sleep_events())
__acpi_ec_disable_event(ec);
clear_bit(EC_FLAGS_STARTED, >flags);
clear_bit(EC_FLAGS_STOPPED, >flags);
@@ -1910,7 +1916,7 @@ static int acpi_ec_suspend(struct device *dev)
struct acpi_ec *ec =
acpi_driver_data(to_acpi_device(dev));
 
-   if (acpi_sleep_no_ec_events() && ec_freeze_events)
+   if (acpi_ec_no_sleep_events())
acpi_ec_disable_event(ec);
return 0;
 }
@@ -1946,7 +1952,18 @@ static int acpi_ec_resume(struct device *dev)
struct acpi_ec *ec =
acpi_driver_data(to_acpi_device(dev));
 
-   acpi_ec_enable_event(ec);
+   if (acpi_ec_no_sleep_events())
+   acpi_ec_enable_event(ec);
+   else {
+   /*
+* Though whether there is an event pending has been
+* checked in acpi_ec_unblock_transactions() when
+* acpi_ec_no_sleep_events() is false, check it one more
+* time after noirq stage to detect events occurred after
+* 

[RFC PATCH v6 2/3] ACPI / EC: Add event detection support for noirq stages

2017-09-28 Thread Lv Zheng
This patch adds a timer to poll EC events:
1. between acpi_ec_suspend() and acpi_ec_block_transactions(),
2. between acpi_ec_unblock_transactions() and acpi_ec_resume().
During these periods, if an EC event occurred, we have not mean to detect
it. Thus the events occurred in late S3-entry could be dropped, and the
events occurred in early S3-exit could be deferred to acpi_ec_resume().

This patch solves event losses in S3-entry and resume order in S3-exit by
timely polling EC events during these periods.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=196129 [#1]
Signed-off-by: Lv Zheng 
Tested-by: Tomislav Ivek 
---
 drivers/acpi/ec.c   | 93 +++--
 drivers/acpi/internal.h |  1 +
 2 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index f1f320b..389c499 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "internal.h"
@@ -102,6 +103,7 @@ enum ec_command {
 #define ACPI_EC_CLEAR_MAX  100 /* Maximum number of events to query
 * when trying to clear the EC */
 #define ACPI_EC_MAX_QUERIES16  /* Maximum number of parallel queries */
+#define ACPI_EC_EVENT_INTERVAL 500 /* Detecting event every 500ms */
 
 enum {
EC_FLAGS_QUERY_ENABLED, /* Query is enabled */
@@ -113,6 +115,7 @@ enum {
EC_FLAGS_STARTED,   /* Driver is started */
EC_FLAGS_STOPPED,   /* Driver is stopped */
EC_FLAGS_GPE_MASKED,/* GPE masked */
+   EC_FLAGS_GPE_POLLING,   /* GPE polling is enabled */
 };
 
 #define ACPI_EC_COMMAND_POLL   0x01 /* Available for command byte */
@@ -154,6 +157,15 @@ static bool ec_no_wakeup __read_mostly;
 module_param(ec_no_wakeup, bool, 0644);
 MODULE_PARM_DESC(ec_no_wakeup, "Do not wake up from suspend-to-idle");
 
+static bool ec_detect_noirq_events __read_mostly;
+module_param(ec_detect_noirq_events, bool, 0644);
+MODULE_PARM_DESC(ec_detect_noirq_events, "Enabling event detection during 
noirq stage");
+
+static unsigned int
+ec_detect_noirq_interval __read_mostly = ACPI_EC_EVENT_INTERVAL;
+module_param(ec_detect_noirq_interval, uint, 0644);
+MODULE_PARM_DESC(ec_detect_noirq_interval, "Event detection interval(ms) 
during noirq stage");
+
 struct acpi_ec_query_handler {
struct list_head node;
acpi_ec_query_func func;
@@ -358,6 +370,48 @@ static inline bool acpi_ec_is_gpe_raised(struct acpi_ec 
*ec)
return (gpe_status & ACPI_EVENT_FLAG_STATUS_SET) ? true : false;
 }
 
+static void acpi_ec_gpe_tick(struct acpi_ec *ec)
+{
+   mod_timer(>timer,
+ jiffies + msecs_to_jiffies(ec_detect_noirq_interval));
+}
+
+static void ec_start_gpe_poller(struct acpi_ec *ec)
+{
+   unsigned long flags;
+   bool start_tick = false;
+
+   if (!acpi_ec_no_sleep_events() || !ec_detect_noirq_events)
+   return;
+   spin_lock_irqsave(>lock, flags);
+   if (!test_and_set_bit(EC_FLAGS_GPE_POLLING, >flags)) {
+   ec_log_drv("GPE poller started");
+   start_tick = true;
+   /* kick off GPE polling without delay */
+   advance_transaction(ec);
+   }
+   spin_unlock_irqrestore(>lock, flags);
+   if (start_tick)
+   acpi_ec_gpe_tick(ec);
+}
+
+static void ec_stop_gpe_poller(struct acpi_ec *ec)
+{
+   unsigned long flags;
+   bool stop_tick = false;
+
+   if (!acpi_ec_no_sleep_events() || !ec_detect_noirq_events)
+   return;
+   spin_lock_irqsave(>lock, flags);
+   if (test_and_clear_bit(EC_FLAGS_GPE_POLLING, >flags))
+   stop_tick = true;
+   spin_unlock_irqrestore(>lock, flags);
+   if (stop_tick) {
+   del_timer_sync(>timer);
+   ec_log_drv("GPE poller stopped");
+   }
+}
+
 static inline void acpi_ec_enable_gpe(struct acpi_ec *ec, bool open)
 {
if (open)
@@ -1017,6 +1071,12 @@ static void acpi_ec_leave_noirq(struct acpi_ec *ec)
spin_unlock_irqrestore(>lock, flags);
 }
 
+/*
+ * Note: this API is prepared for tuning the order of the ACPI
+ * suspend/resume steps as the last entry of EC during suspend, thus it
+ * must be invoked after acpi_ec_suspend() or everything should be done in
+ * acpi_ec_suspend().
+ */
 void acpi_ec_block_transactions(void)
 {
struct acpi_ec *ec = first_ec;
@@ -1028,16 +1088,28 @@ void acpi_ec_block_transactions(void)
/* Prevent transactions from being carried out */
acpi_ec_stop(ec, true);
mutex_unlock(>mutex);
+   ec_stop_gpe_poller(ec);
 }
 
+/*
+ * Note: this API is prepared for tuning the order of the ACPI
+ * suspend/resume steps as the first entry of EC during resume, thus it
+ * must be invoked before acpi_ec_resume() or everything should be done in
+ * acpi_ec_resume().
+ */
 void 

[RFC PATCH v6 3/3] ACPI / EC: Enable noirq stage event detection

2017-09-28 Thread Lv Zheng
This patch enables noirq stage event detection for the EC driver.

EC is a very special driver, required to detecting events throughout the
entire suspend/resume process. Thus this patch enables event detection for
EC during noirq stages to meet this requirement. This is done by making
sure that the EC sleep APIs:
  acpi_ec_block_transactions()
  acpi_ec_unblock_transactions()
rather than the EC driver suspend/resume hooks:
  acpi_ec_suspend()
  acpi_ec_resume()
are the boundary of the EC event handling during suspend/resume, so that
the ACPI sleep core can tune their invocation timing to handle special BIOS
requirements.

If this commit is bisected to be a regression culprit, please report this
to bugzilla.kernel.org for further investigation.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=196129
Signed-off-by: Lv Zheng 
Tested-by: Tomislav Ivek 
---
 drivers/acpi/ec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index 389c499..a48a2b3 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -157,7 +157,7 @@ static bool ec_no_wakeup __read_mostly;
 module_param(ec_no_wakeup, bool, 0644);
 MODULE_PARM_DESC(ec_no_wakeup, "Do not wake up from suspend-to-idle");
 
-static bool ec_detect_noirq_events __read_mostly;
+static bool ec_detect_noirq_events __read_mostly = true;
 module_param(ec_detect_noirq_events, bool, 0644);
 MODULE_PARM_DESC(ec_detect_noirq_events, "Enabling event detection during 
noirq stage");
 
-- 
2.7.4



Re: [kbuild-all] [0-Day CI notification] 0-Day kernel test service will be shut down from Sep 30 3PM to Oct 5

2017-09-28 Thread Fengguang Wu

CC LKML. Sorry it's a site level power down during the 10.1 holidays. :(

On Fri, Sep 29, 2017 at 10:12:20AM +0800, Philip Li wrote:

Hi all, this is Philip who maintains the 0-Day kernel test service. Thanks for
subscribing to 0-Day kernel testing. We will have lab power down from Oct 1
to Oct 5, so that the service will be shut down from Asia Pacific Time Sep 30 
3PM
and will recover from Oct 6 as soon as we can. Sorry for any inconvenience 
caused
due to the service shut down.

Thanks
___
kbuild-all mailing list
kbuild-...@lists.01.org
https://lists.01.org/mailman/listinfo/kbuild-all


Re: [kbuild-all] [0-Day CI notification] 0-Day kernel test service will be shut down from Sep 30 3PM to Oct 5

2017-09-28 Thread Fengguang Wu

CC LKML. Sorry it's a site level power down during the 10.1 holidays. :(

On Fri, Sep 29, 2017 at 10:12:20AM +0800, Philip Li wrote:

Hi all, this is Philip who maintains the 0-Day kernel test service. Thanks for
subscribing to 0-Day kernel testing. We will have lab power down from Oct 1
to Oct 5, so that the service will be shut down from Asia Pacific Time Sep 30 
3PM
and will recover from Oct 6 as soon as we can. Sorry for any inconvenience 
caused
due to the service shut down.

Thanks
___
kbuild-all mailing list
kbuild-...@lists.01.org
https://lists.01.org/mailman/listinfo/kbuild-all


Re: [lkp-robot] [mac80211] 31e9170bde: hwsim.sta_dynamic_down_up.fail

2017-09-28 Thread Xiang Gao
Thanks, I will look into it.
Xiang Gao


2017-09-28 4:06 GMT-04:00 kernel test robot :
>
> FYI, we noticed the following commit:
>
> commit: 31e9170bdeb6ebe66426337b4e2b9924683a412b ("mac80211: aead api to 
> reduce redundancy")
> url: 
> https://github.com/0day-ci/linux/commits/Xiang-Gao/mac80211-aead-api-to-reduce-redundancy/20170926-053110
> base: https://git.kernel.org/cgit/linux/kernel/git/jberg/mac80211-next.git 
> master
>
> in testcase: hwsim
> with following parameters:
>
> group: hwsim-10
>
>
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 2G
>
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
>
>
> 2017-09-27 16:04:27 ./run-tests.py sta_dynamic_down_up
> DEV: wlan0: 02:00:00:00:00:00
> DEV: wlan1: 02:00:00:00:01:00
> DEV: wlan2: 02:00:00:00:02:00
> APDEV: wlan3
> APDEV: wlan4
> START sta_dynamic_down_up 1/1
> Test: Dynamically added wpa_supplicant interface down/up
> Starting AP wlan3
> Create a dynamic wpa_supplicant interface and connect
> Connect STA wlan5 to AP
> dev1->dev2 unicast data delivery failed
> Traceback (most recent call last):
>   File "./run-tests.py", line 453, in main
> t(dev, apdev)
>   File "/lkp/benchmarks/hwsim/tests/hwsim/test_sta_dynamic.py", line 122, in 
> test_sta_dynamic_down_up
> hwsim_utils.test_connectivity(wpas, hapd)
>   File "/lkp/benchmarks/hwsim/tests/hwsim/hwsim_utils.py", line 165, in 
> test_connectivity
> raise Exception(last_err)
> Exception: dev1->dev2 unicast data delivery failed
> FAIL sta_dynamic_down_up 5.397413 2017-09-27 16:04:32.540689
> passed 0 test case(s)
> skipped 0 test case(s)
> failed tests: sta_dynamic_down_up
>
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp qemu -k  job-script  # job-script is attached in 
> this email
>
>
>
> Thanks,
> Xiaolong


Re: [lkp-robot] [mac80211] 31e9170bde: hwsim.sta_dynamic_down_up.fail

2017-09-28 Thread Xiang Gao
Thanks, I will look into it.
Xiang Gao


2017-09-28 4:06 GMT-04:00 kernel test robot :
>
> FYI, we noticed the following commit:
>
> commit: 31e9170bdeb6ebe66426337b4e2b9924683a412b ("mac80211: aead api to 
> reduce redundancy")
> url: 
> https://github.com/0day-ci/linux/commits/Xiang-Gao/mac80211-aead-api-to-reduce-redundancy/20170926-053110
> base: https://git.kernel.org/cgit/linux/kernel/git/jberg/mac80211-next.git 
> master
>
> in testcase: hwsim
> with following parameters:
>
> group: hwsim-10
>
>
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 2G
>
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
>
>
> 2017-09-27 16:04:27 ./run-tests.py sta_dynamic_down_up
> DEV: wlan0: 02:00:00:00:00:00
> DEV: wlan1: 02:00:00:00:01:00
> DEV: wlan2: 02:00:00:00:02:00
> APDEV: wlan3
> APDEV: wlan4
> START sta_dynamic_down_up 1/1
> Test: Dynamically added wpa_supplicant interface down/up
> Starting AP wlan3
> Create a dynamic wpa_supplicant interface and connect
> Connect STA wlan5 to AP
> dev1->dev2 unicast data delivery failed
> Traceback (most recent call last):
>   File "./run-tests.py", line 453, in main
> t(dev, apdev)
>   File "/lkp/benchmarks/hwsim/tests/hwsim/test_sta_dynamic.py", line 122, in 
> test_sta_dynamic_down_up
> hwsim_utils.test_connectivity(wpas, hapd)
>   File "/lkp/benchmarks/hwsim/tests/hwsim/hwsim_utils.py", line 165, in 
> test_connectivity
> raise Exception(last_err)
> Exception: dev1->dev2 unicast data delivery failed
> FAIL sta_dynamic_down_up 5.397413 2017-09-27 16:04:32.540689
> passed 0 test case(s)
> skipped 0 test case(s)
> failed tests: sta_dynamic_down_up
>
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp qemu -k  job-script  # job-script is attached in 
> this email
>
>
>
> Thanks,
> Xiaolong


Re: [PATCH v6 2/2] tracing: Add support for preempt and irq enable/disable events

2017-09-28 Thread Joel Fernandes
Hi Steven, Peter,

I'm planning to make the following changes for the next rev, could you
let me know if you're Ok with it?

1. Drop the stop_critical_timings changes - previous patch was
generating the preempt_enable/disable events but they aren't "real"
events. Instead since we already have cpuidle trace events, we can
just rely on those for now to understand how much time was spent in
idle. A future patch could do something smarter.

2. Drop the recursion protection from trace_preempt_enable/disable.
The trace_preempt_enable/disable calls don't nest, so there's no need
to protect it with a per-cpu variable.

3. trace_irq_enable/disable on the other hand are called in this way,
so I'll add some comments about why per-cpu variable is needed.

thanks a lot,

- Joel

On Mon, Sep 25, 2017 at 3:57 PM, Joel Fernandes  wrote:
> On Mon, Sep 25, 2017 at 3:52 AM, Steven Rostedt  wrote:
>> On Mon, 25 Sep 2017 12:32:23 +0200
>> Peter Zijlstra  wrote:
>>
>>
>>> > You mean you want to trace all calls to preempt and irq off even if
>>> > preempt and irqs are already off?
>>>
>>> Sure, why not? This stuff naturally nests, and who is to say its not a
>>> useful thing to trace all of them?
>>>
>>> By also tracing the nested sections you can, for instance, see how much
>>> you'd really win by getting rid of the outer one. If, for instance, the
>>> outer only accounts for 1% of the time, while the inner ones are
>>> interlinked and span the other 99%, there's more work to do than if that
>>> were not the case.
>>
>> If we do this we need a field to record if the preemption or irqs were
>> toggled by that call. Something that filters could easily be added to
>> only show what this patch set has.
>
> I request that we please not do this for my patchset, there are a
> number of reasons in my mind:
>
> 1. trace_preempt_off in existing code is only called the first time
> preempt is turned off. This is the definition of "preempt off", its
> the first time Preemption is actually turned off, and has nothing much
> to do with going into a deeper preempt count. Whether the count
> increases or not, preempt is already off and that's confirmed by the
> first preempt off event.
>
> This is how I read it in the comments in sched/core.c as well:
> "
>  * If the value passed in is equal to the current preempt count
>  * then we just disabled preemption."
>
> This is how I based this patchset as well, againt its not my usecase
> and it can be a future patch if its useful to track this.
>
> 2. This stuff is already a bit trace heavy, I prefer not to generate
> event every time the preempt count increases. Ofcourse filters but
> still then we have the filtering overhead for a usecase that I am not
> really targetting with this patchset.
>
> 3. It will complicate the patch more, apart from adding filters as
> Steven suggested, it would also mean we change how
> preempt_latency_start in sched/core.c works.
>
> Do you mind if we please keep this as a 'future' usecase for a future
> patch? Its not my usecase at all for this patchset and not what I was
> intending.
>
> I will reply to Peter's other comments on the other email shortly.
>
> thanks!
>
> - Joel


Re: [PATCH v6 2/2] tracing: Add support for preempt and irq enable/disable events

2017-09-28 Thread Joel Fernandes
Hi Steven, Peter,

I'm planning to make the following changes for the next rev, could you
let me know if you're Ok with it?

1. Drop the stop_critical_timings changes - previous patch was
generating the preempt_enable/disable events but they aren't "real"
events. Instead since we already have cpuidle trace events, we can
just rely on those for now to understand how much time was spent in
idle. A future patch could do something smarter.

2. Drop the recursion protection from trace_preempt_enable/disable.
The trace_preempt_enable/disable calls don't nest, so there's no need
to protect it with a per-cpu variable.

3. trace_irq_enable/disable on the other hand are called in this way,
so I'll add some comments about why per-cpu variable is needed.

thanks a lot,

- Joel

On Mon, Sep 25, 2017 at 3:57 PM, Joel Fernandes  wrote:
> On Mon, Sep 25, 2017 at 3:52 AM, Steven Rostedt  wrote:
>> On Mon, 25 Sep 2017 12:32:23 +0200
>> Peter Zijlstra  wrote:
>>
>>
>>> > You mean you want to trace all calls to preempt and irq off even if
>>> > preempt and irqs are already off?
>>>
>>> Sure, why not? This stuff naturally nests, and who is to say its not a
>>> useful thing to trace all of them?
>>>
>>> By also tracing the nested sections you can, for instance, see how much
>>> you'd really win by getting rid of the outer one. If, for instance, the
>>> outer only accounts for 1% of the time, while the inner ones are
>>> interlinked and span the other 99%, there's more work to do than if that
>>> were not the case.
>>
>> If we do this we need a field to record if the preemption or irqs were
>> toggled by that call. Something that filters could easily be added to
>> only show what this patch set has.
>
> I request that we please not do this for my patchset, there are a
> number of reasons in my mind:
>
> 1. trace_preempt_off in existing code is only called the first time
> preempt is turned off. This is the definition of "preempt off", its
> the first time Preemption is actually turned off, and has nothing much
> to do with going into a deeper preempt count. Whether the count
> increases or not, preempt is already off and that's confirmed by the
> first preempt off event.
>
> This is how I read it in the comments in sched/core.c as well:
> "
>  * If the value passed in is equal to the current preempt count
>  * then we just disabled preemption."
>
> This is how I based this patchset as well, againt its not my usecase
> and it can be a future patch if its useful to track this.
>
> 2. This stuff is already a bit trace heavy, I prefer not to generate
> event every time the preempt count increases. Ofcourse filters but
> still then we have the filtering overhead for a usecase that I am not
> really targetting with this patchset.
>
> 3. It will complicate the patch more, apart from adding filters as
> Steven suggested, it would also mean we change how
> preempt_latency_start in sched/core.c works.
>
> Do you mind if we please keep this as a 'future' usecase for a future
> patch? Its not my usecase at all for this patchset and not what I was
> intending.
>
> I will reply to Peter's other comments on the other email shortly.
>
> thanks!
>
> - Joel


[0-Day CI notification] 0-Day kernel test service will be shut down from Sep 30 3PM to Oct 5

2017-09-28 Thread Philip Li
Hi all, this is Philip who maintains the 0-Day kernel test service. Thanks for
subscribing to 0-Day kernel testing. We will have lab power down from Oct 1
to Oct 5, so that the service will be shut down from Asia Pacific Time Sep 30 
3PM
and will recover from Oct 6 as soon as we can. Sorry for any inconvenience 
caused
due to the service shut down.

Thanks


[0-Day CI notification] 0-Day kernel test service will be shut down from Sep 30 3PM to Oct 5

2017-09-28 Thread Philip Li
Hi all, this is Philip who maintains the 0-Day kernel test service. Thanks for
subscribing to 0-Day kernel testing. We will have lab power down from Oct 1
to Oct 5, so that the service will be shut down from Asia Pacific Time Sep 30 
3PM
and will recover from Oct 6 as soon as we can. Sorry for any inconvenience 
caused
due to the service shut down.

Thanks


Re: linux-next: build failure after merge of the net-next tree

2017-09-28 Thread Florian Fainelli
Le 09/28/17 à 18:36, Stephen Rothwell a écrit :
> Hi all,
> 
> After merging the net-next tree, today's linux-next build (arm
> multi_v7_defconfig) failed like this:
> 
> net/dsa/slave.c: In function 'dsa_slave_create':
> net/dsa/slave.c:1191:18: error: 'struct dsa_slave_priv' has no member named 
> 'phy'
>   phy_disconnect(p->phy);
>   ^
> 
> Caused by commit
> 
>   0115dcd1787d ("net: dsa: use slave device phydev")
> 
> Interacting with commit
> 
>   e804441cfe0b ("net: dsa: Fix network device registration order")
> 
> from the net tree.
> 
> I applied the following merge fix patch (which I am not sure about):

Your resolution looks fine to me, thanks Stephen!

> 
> From: Stephen Rothwell 
> Date: Fri, 29 Sep 2017 11:28:45 +1000
> Subject: [PATCH] net: dsa: merge fix patch for removal of phy
> 
> Signed-off-by: Stephen Rothwell 
> ---
>  net/dsa/slave.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index 8869954485db..9191c929c6c8 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -1188,7 +1188,7 @@ int dsa_slave_create(struct dsa_port *port, const char 
> *name)
>   return 0;
>  
>  out_phy:
> - phy_disconnect(p->phy);
> + phy_disconnect(slave_dev->phydev);
>   if (of_phy_is_fixed_link(p->dp->dn))
>   of_phy_deregister_fixed_link(p->dp->dn);
>  out_free:
> 


-- 
Florian


Re: linux-next: build failure after merge of the net-next tree

2017-09-28 Thread Florian Fainelli
Le 09/28/17 à 18:36, Stephen Rothwell a écrit :
> Hi all,
> 
> After merging the net-next tree, today's linux-next build (arm
> multi_v7_defconfig) failed like this:
> 
> net/dsa/slave.c: In function 'dsa_slave_create':
> net/dsa/slave.c:1191:18: error: 'struct dsa_slave_priv' has no member named 
> 'phy'
>   phy_disconnect(p->phy);
>   ^
> 
> Caused by commit
> 
>   0115dcd1787d ("net: dsa: use slave device phydev")
> 
> Interacting with commit
> 
>   e804441cfe0b ("net: dsa: Fix network device registration order")
> 
> from the net tree.
> 
> I applied the following merge fix patch (which I am not sure about):

Your resolution looks fine to me, thanks Stephen!

> 
> From: Stephen Rothwell 
> Date: Fri, 29 Sep 2017 11:28:45 +1000
> Subject: [PATCH] net: dsa: merge fix patch for removal of phy
> 
> Signed-off-by: Stephen Rothwell 
> ---
>  net/dsa/slave.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index 8869954485db..9191c929c6c8 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -1188,7 +1188,7 @@ int dsa_slave_create(struct dsa_port *port, const char 
> *name)
>   return 0;
>  
>  out_phy:
> - phy_disconnect(p->phy);
> + phy_disconnect(slave_dev->phydev);
>   if (of_phy_is_fixed_link(p->dp->dn))
>   of_phy_deregister_fixed_link(p->dp->dn);
>  out_free:
> 


-- 
Florian


Re: [PATCH for-next 2/9] RDMA/hns: Factor out the code for checking sdb status into a new function

2017-09-28 Thread Wei Hu (Xavier)



On 2017/9/28 21:50, Leon Romanovsky wrote:

On Thu, Sep 28, 2017 at 12:57:27PM +0800, Wei Hu (Xavier) wrote:

From: Lijun Ou 

It mainly places the lines for checking send doorbell status
into a special functions. As a result, we can directly call it in
check_qp_db_process_status function and keep consistent indenting
style.

It fixes: 5f110ac4bed8 ("IB/hns: Fix for checkpatch.pl comment style)

You forgot " at the end of the line, and there is need to put fixes
(should be Fixes) in the line before Signed-off-by.

Thanks, Leon
We will modify the statement(Fixes: xx)  and put it before signed-off-by 
in patch v2.



The warning from static checker:
drivers/infiniband/hw/hns/hns_roce_hw_v1.c:3562 check_qp_db_process_status()
warn: inconsistent indenting

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu (Xavier) 
Signed-off-by: Shaobo Xu 
---
  drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 95 --
  1 file changed, 51 insertions(+), 44 deletions(-)






Re: [PATCH for-next 2/9] RDMA/hns: Factor out the code for checking sdb status into a new function

2017-09-28 Thread Wei Hu (Xavier)



On 2017/9/28 21:50, Leon Romanovsky wrote:

On Thu, Sep 28, 2017 at 12:57:27PM +0800, Wei Hu (Xavier) wrote:

From: Lijun Ou 

It mainly places the lines for checking send doorbell status
into a special functions. As a result, we can directly call it in
check_qp_db_process_status function and keep consistent indenting
style.

It fixes: 5f110ac4bed8 ("IB/hns: Fix for checkpatch.pl comment style)

You forgot " at the end of the line, and there is need to put fixes
(should be Fixes) in the line before Signed-off-by.

Thanks, Leon
We will modify the statement(Fixes: xx)  and put it before signed-off-by 
in patch v2.



The warning from static checker:
drivers/infiniband/hw/hns/hns_roce_hw_v1.c:3562 check_qp_db_process_status()
warn: inconsistent indenting

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu (Xavier) 
Signed-off-by: Shaobo Xu 
---
  drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 95 --
  1 file changed, 51 insertions(+), 44 deletions(-)






Re: [PATCH v3 0/4] Update TMDSEVM3530 support for omap3-evm

2017-09-28 Thread Derald D. Woods
On Tue, Sep 12, 2017 at 06:48:20PM -0500, Derald D. Woods wrote:
> This patch set allows TMDSEVM3530(omap3-evm.dts) to boot using common
> processor module data that is shared with 'omap3-evm-37xx.dts'. A new
> common file for processor module data is introduced to help facilitate
> the updated OMAP3530 support.
> 


Are there any further concerns after v3?

Derald


> Changes in v3
> -
> - Drop unnecessary compatible string change to Sharp LCD panel
> 
> Changes in v2
> -
> - Pull in change from linux-next
>   (ARM: dts: omap*: Replace deprecated "vmmc_aux" with "vqmmc")
> - Add compatible and supply fix for LCD panel
> - Add supply references for DSS
> - Add "Signed-off-by" for each patch
> 
> Derald D. Woods (4):
>   ARM: dts: omap3-evm-37xx: Add common processor module support
>   ARM: dts: omap3-evm: Add OMAP3530 specific device tree processor data
>   ARM: dts: omap3: Add Sharp LS037V7DW01 'envdd' supply
>   ARM: dts: omap3-evm: Add DSS {vdds_dsi,vdda_video}-supply references
> 
>  arch/arm/boot/dts/omap3-evm-37xx.dts   | 209 +---
>  arch/arm/boot/dts/omap3-evm-processor-common.dtsi  | 216 
> +
>  arch/arm/boot/dts/omap3-evm.dts|  76 +++-
>  .../boot/dts/omap3-panel-sharp-ls037v7dw01.dtsi|   1 +
>  4 files changed, 290 insertions(+), 212 deletions(-)
>  create mode 100644 arch/arm/boot/dts/omap3-evm-processor-common.dtsi
> 
> -- 
> 2.14.1
> 


Re: [PATCH v3 0/4] Update TMDSEVM3530 support for omap3-evm

2017-09-28 Thread Derald D. Woods
On Tue, Sep 12, 2017 at 06:48:20PM -0500, Derald D. Woods wrote:
> This patch set allows TMDSEVM3530(omap3-evm.dts) to boot using common
> processor module data that is shared with 'omap3-evm-37xx.dts'. A new
> common file for processor module data is introduced to help facilitate
> the updated OMAP3530 support.
> 


Are there any further concerns after v3?

Derald


> Changes in v3
> -
> - Drop unnecessary compatible string change to Sharp LCD panel
> 
> Changes in v2
> -
> - Pull in change from linux-next
>   (ARM: dts: omap*: Replace deprecated "vmmc_aux" with "vqmmc")
> - Add compatible and supply fix for LCD panel
> - Add supply references for DSS
> - Add "Signed-off-by" for each patch
> 
> Derald D. Woods (4):
>   ARM: dts: omap3-evm-37xx: Add common processor module support
>   ARM: dts: omap3-evm: Add OMAP3530 specific device tree processor data
>   ARM: dts: omap3: Add Sharp LS037V7DW01 'envdd' supply
>   ARM: dts: omap3-evm: Add DSS {vdds_dsi,vdda_video}-supply references
> 
>  arch/arm/boot/dts/omap3-evm-37xx.dts   | 209 +---
>  arch/arm/boot/dts/omap3-evm-processor-common.dtsi  | 216 
> +
>  arch/arm/boot/dts/omap3-evm.dts|  76 +++-
>  .../boot/dts/omap3-panel-sharp-ls037v7dw01.dtsi|   1 +
>  4 files changed, 290 insertions(+), 212 deletions(-)
>  create mode 100644 arch/arm/boot/dts/omap3-evm-processor-common.dtsi
> 
> -- 
> 2.14.1
> 


RE: [PATCH] extcon: Split out extcon header file for consumer and provider device

2017-09-28 Thread Yoshihiro Shimoda
Hi,

> From: Chanwoo Choi
> Sent: Friday, September 29, 2017 9:02 AM
> 
< snip >
>  drivers/phy/renesas/phy-rcar-gen3-usb2.c  |   2 +-
< snip >
>  drivers/usb/gadget/udc/renesas_usb3.c |   2 +-

These two drivers need the modification.
But...

< snip >
> diff --git a/drivers/usb/renesas_usbhs/common.h 
> b/drivers/usb/renesas_usbhs/common.h
> index 8c5fc12ad778..a78764bc23eb 100644
> --- a/drivers/usb/renesas_usbhs/common.h
> +++ b/drivers/usb/renesas_usbhs/common.h
> @@ -17,7 +17,7 @@
>  #ifndef RENESAS_USB_DRIVER_H
>  #define RENESAS_USB_DRIVER_H
> 
> -#include 
> +#include 

Since this driver doesn't use any extcon-provider APIs for now,
we doesn't need the modification, IIUC.

Best regards,
Yoshihiro Shimoda



RE: [PATCH] extcon: Split out extcon header file for consumer and provider device

2017-09-28 Thread Yoshihiro Shimoda
Hi,

> From: Chanwoo Choi
> Sent: Friday, September 29, 2017 9:02 AM
> 
< snip >
>  drivers/phy/renesas/phy-rcar-gen3-usb2.c  |   2 +-
< snip >
>  drivers/usb/gadget/udc/renesas_usb3.c |   2 +-

These two drivers need the modification.
But...

< snip >
> diff --git a/drivers/usb/renesas_usbhs/common.h 
> b/drivers/usb/renesas_usbhs/common.h
> index 8c5fc12ad778..a78764bc23eb 100644
> --- a/drivers/usb/renesas_usbhs/common.h
> +++ b/drivers/usb/renesas_usbhs/common.h
> @@ -17,7 +17,7 @@
>  #ifndef RENESAS_USB_DRIVER_H
>  #define RENESAS_USB_DRIVER_H
> 
> -#include 
> +#include 

Since this driver doesn't use any extcon-provider APIs for now,
we doesn't need the modification, IIUC.

Best regards,
Yoshihiro Shimoda



Re: [PATCH 01/12] mmc: dt-bindings: update Mediatek MMC bindings

2017-09-28 Thread Chaotian Jing
On Wed, 2017-09-27 at 09:18 +0800, Chaotian Jing wrote:
> On Wed, 2017-09-27 at 00:33 +0200, Ulf Hansson wrote:
> > On 14 September 2017 at 04:10, Chaotian Jing  
> > wrote:
> > > On Wed, 2017-09-13 at 09:10 -0500, Rob Herring wrote:
> > >> On Tue, Sep 12, 2017 at 05:07:41PM +0800, Chaotian Jing wrote:
> > >> > Change the comptiable for support of multi-platform
> > >> > Add description for reg
> > >> > Add description for source_cg
> > >> > Add description for mediatek,latch-ck
> > >>
> > >> This is at least the 3rd patch with exactly the same vague subject.
> > >> Please make the subject somewhat unique.
> > >>
> > > Thx, will change the subject at next version
> > >> >
> > >> > Signed-off-by: Chaotian Jing 
> > >> > ---
> > >> >  Documentation/devicetree/bindings/mmc/mtk-sd.txt | 13 ++---
> > >> >  1 file changed, 10 insertions(+), 3 deletions(-)
> > >> >
> > >> > diff --git a/Documentation/devicetree/bindings/mmc/mtk-sd.txt 
> > >> > b/Documentation/devicetree/bindings/mmc/mtk-sd.txt
> > >> > index 4182ea3..405cd06 100644
> > >> > --- a/Documentation/devicetree/bindings/mmc/mtk-sd.txt
> > >> > +++ b/Documentation/devicetree/bindings/mmc/mtk-sd.txt
> > >> > @@ -7,10 +7,15 @@ This file documents differences between the core 
> > >> > properties in mmc.txt
> > >> >  and the properties used by the msdc driver.
> > >> >
> > >> >  Required properties:
> > >> > -- compatible: Should be "mediatek,mt8173-mmc","mediatek,mt8135-mmc"
> > >> > +- compatible: value should be either of the following.
> > >> > +   "mediatek,mt8135-mmc": for mmc host ip compatible with mt8135
> > >> > +   "mediatek,mt8173-mmc": for mmc host ip compatible with mt8173
> > >> > +   "mediatek,mt2701-mmc": for mmc host ip compatible with mt2701
> > >> > +   "mediatek,mt2712-mmc": for mmc host ip compatible with mt2712
> > >> > +- reg: physical base address of the controller and length
> > >> >  - interrupts: Should contain MSDC interrupt number
> > >> > -- clocks: MSDC source clock, HCLK
> > >> > -- clock-names: "source", "hclk"
> > >> > +- clocks: MSDC source clock, HCLK, source_cg
> > >> > +- clock-names: "source", "hclk", "source_cg"
> > >>
> > >> All chips support source_cg? That's not backwards compatible for
> > >> existing compatible strings if the driver requires it.
> > > Not all chips support source_cg, for chips which do not support
> > > source_cg, no need source_cg here, and the driver will parse it
> > > to know if current chip support it.
> > 
> > In such case you must not add add a required binding for it. I think
> > that is what Rob is trying to point out for you.
> > 
> > [...]
> > 
> > Kind regards
> > Uffe
> The source_cg is required(MUST) at MT2712 and future SoCs, but not
> required(do not have it) at previous SoCs, so that put it at required
> properties, let the driver to handle it.

Any other comments about it ? still must not add a required binding for
it ? if add a optional binding for it, how to add it ? as cannot
duplicate "clocks" & "clock-names" in one node.




Re: [PATCH 01/12] mmc: dt-bindings: update Mediatek MMC bindings

2017-09-28 Thread Chaotian Jing
On Wed, 2017-09-27 at 09:18 +0800, Chaotian Jing wrote:
> On Wed, 2017-09-27 at 00:33 +0200, Ulf Hansson wrote:
> > On 14 September 2017 at 04:10, Chaotian Jing  
> > wrote:
> > > On Wed, 2017-09-13 at 09:10 -0500, Rob Herring wrote:
> > >> On Tue, Sep 12, 2017 at 05:07:41PM +0800, Chaotian Jing wrote:
> > >> > Change the comptiable for support of multi-platform
> > >> > Add description for reg
> > >> > Add description for source_cg
> > >> > Add description for mediatek,latch-ck
> > >>
> > >> This is at least the 3rd patch with exactly the same vague subject.
> > >> Please make the subject somewhat unique.
> > >>
> > > Thx, will change the subject at next version
> > >> >
> > >> > Signed-off-by: Chaotian Jing 
> > >> > ---
> > >> >  Documentation/devicetree/bindings/mmc/mtk-sd.txt | 13 ++---
> > >> >  1 file changed, 10 insertions(+), 3 deletions(-)
> > >> >
> > >> > diff --git a/Documentation/devicetree/bindings/mmc/mtk-sd.txt 
> > >> > b/Documentation/devicetree/bindings/mmc/mtk-sd.txt
> > >> > index 4182ea3..405cd06 100644
> > >> > --- a/Documentation/devicetree/bindings/mmc/mtk-sd.txt
> > >> > +++ b/Documentation/devicetree/bindings/mmc/mtk-sd.txt
> > >> > @@ -7,10 +7,15 @@ This file documents differences between the core 
> > >> > properties in mmc.txt
> > >> >  and the properties used by the msdc driver.
> > >> >
> > >> >  Required properties:
> > >> > -- compatible: Should be "mediatek,mt8173-mmc","mediatek,mt8135-mmc"
> > >> > +- compatible: value should be either of the following.
> > >> > +   "mediatek,mt8135-mmc": for mmc host ip compatible with mt8135
> > >> > +   "mediatek,mt8173-mmc": for mmc host ip compatible with mt8173
> > >> > +   "mediatek,mt2701-mmc": for mmc host ip compatible with mt2701
> > >> > +   "mediatek,mt2712-mmc": for mmc host ip compatible with mt2712
> > >> > +- reg: physical base address of the controller and length
> > >> >  - interrupts: Should contain MSDC interrupt number
> > >> > -- clocks: MSDC source clock, HCLK
> > >> > -- clock-names: "source", "hclk"
> > >> > +- clocks: MSDC source clock, HCLK, source_cg
> > >> > +- clock-names: "source", "hclk", "source_cg"
> > >>
> > >> All chips support source_cg? That's not backwards compatible for
> > >> existing compatible strings if the driver requires it.
> > > Not all chips support source_cg, for chips which do not support
> > > source_cg, no need source_cg here, and the driver will parse it
> > > to know if current chip support it.
> > 
> > In such case you must not add add a required binding for it. I think
> > that is what Rob is trying to point out for you.
> > 
> > [...]
> > 
> > Kind regards
> > Uffe
> The source_cg is required(MUST) at MT2712 and future SoCs, but not
> required(do not have it) at previous SoCs, so that put it at required
> properties, let the driver to handle it.

Any other comments about it ? still must not add a required binding for
it ? if add a optional binding for it, how to add it ? as cannot
duplicate "clocks" & "clock-names" in one node.




Re: [RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken exclusively

2017-09-28 Thread Mimi Zohar
On Thu, 2017-09-28 at 17:33 -0700, Linus Torvalds wrote:
> On Thu, Sep 28, 2017 at 5:12 PM, Mimi Zohar  wrote:
> >
> > Originally IMA did define it's own lock, prior to IMA-appraisal.  IMA-
> > appraisal introduced writing the file hash as an xattr, which required
> > taking the i_mutex.  process_measurement() and ima_file_free() took
> > the iint->mutex first and then the i_mutex, while setxattr, chmod and
> > chown took the locks in reverse order.  To resolve the potential
> > deadlock, the iint->mutex was eliminated.
> 
> Umm. You already have an explicit invalidation model, where you
> invalidate after a write has occurred.

Invalidating after each write would be horrible performance.  Only
after all the changes are made, after the file close, is the file
integrity status invalidated and the file hash re-calculated and
written out.

At some point, we might want to go back and look at having finer grain
file integrity invalidation.

> But the locking of the generation count (or "invalidation status" or
> whatever) can - and should be - entirely independent of the locking of
> the actual appraisal.

The locking issue isn't with validating the file hash, but with the
setxattr, chmod, chown syscalls.  Each of these syscalls takes the
i_rwsem exclusively before IMA (or EVM) is called.

In ima_file_free(), the locking would be:

lock: iint->mutex
lock: i_rwsem
write hash as xattr
unlock: i_rwsem
unlock iint->mutex


In setxattr, chmod, chown syscalls, IMA (and EVM) are called after the
i_rwsem is already taken.  So the locking would be:

lock: i_rwsem
lock: iint->mutex

unlock: iint->mutex
unlock: i_rwsem

Perhaps now the problem is clearer?

Mimi
 

> So make the appraisal itself use a semaphore ("only one appraisal at a time").
> 
> But use a separate lock for the generation count.
> So then appraisal is:
> 
>  - get appraisal semaphore
>   - get generation count lock
> read generation count
>   - drop generation count lock
>   - do the actual appraisal
>  - drop appraisal semaphore
> 
> Note that you now have a tuple of "generation count, appraisal" that
> you have *not* saved off yet, but it's your stable thing.
> 
> Now you can write the xattr:
> 
>   - get exclusive inode lock (for xattr)
>   - get generation count lock
>   - if the appraisal generation does not match, do NOT write
> the appraisal you just calculated, since it's pointless: it's already
> stale.
>   - otherwise write the appraisal and generation count to the xattr
>   - drop generation count lock
>   - release exclusive inode lock
> 
> and then for anything that does setxattr or chmod or whatever, just
> use that generation count lock to invalidate the appraisal. You don't
> need to actual appraisal lock for that.
> 
> So now the appraisal lock is always the outermost one, and the
> generation count lock is always the innermost.
> 
> Anyway, I haven't looked at the details of what IMA does, but
> something like the above really sounds like it should work and seems
> pretty straightforward.
> 
> No?
> 
>Linus
> 



Re: [RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken exclusively

2017-09-28 Thread Mimi Zohar
On Thu, 2017-09-28 at 17:33 -0700, Linus Torvalds wrote:
> On Thu, Sep 28, 2017 at 5:12 PM, Mimi Zohar  wrote:
> >
> > Originally IMA did define it's own lock, prior to IMA-appraisal.  IMA-
> > appraisal introduced writing the file hash as an xattr, which required
> > taking the i_mutex.  process_measurement() and ima_file_free() took
> > the iint->mutex first and then the i_mutex, while setxattr, chmod and
> > chown took the locks in reverse order.  To resolve the potential
> > deadlock, the iint->mutex was eliminated.
> 
> Umm. You already have an explicit invalidation model, where you
> invalidate after a write has occurred.

Invalidating after each write would be horrible performance.  Only
after all the changes are made, after the file close, is the file
integrity status invalidated and the file hash re-calculated and
written out.

At some point, we might want to go back and look at having finer grain
file integrity invalidation.

> But the locking of the generation count (or "invalidation status" or
> whatever) can - and should be - entirely independent of the locking of
> the actual appraisal.

The locking issue isn't with validating the file hash, but with the
setxattr, chmod, chown syscalls.  Each of these syscalls takes the
i_rwsem exclusively before IMA (or EVM) is called.

In ima_file_free(), the locking would be:

lock: iint->mutex
lock: i_rwsem
write hash as xattr
unlock: i_rwsem
unlock iint->mutex


In setxattr, chmod, chown syscalls, IMA (and EVM) are called after the
i_rwsem is already taken.  So the locking would be:

lock: i_rwsem
lock: iint->mutex

unlock: iint->mutex
unlock: i_rwsem

Perhaps now the problem is clearer?

Mimi
 

> So make the appraisal itself use a semaphore ("only one appraisal at a time").
> 
> But use a separate lock for the generation count.
> So then appraisal is:
> 
>  - get appraisal semaphore
>   - get generation count lock
> read generation count
>   - drop generation count lock
>   - do the actual appraisal
>  - drop appraisal semaphore
> 
> Note that you now have a tuple of "generation count, appraisal" that
> you have *not* saved off yet, but it's your stable thing.
> 
> Now you can write the xattr:
> 
>   - get exclusive inode lock (for xattr)
>   - get generation count lock
>   - if the appraisal generation does not match, do NOT write
> the appraisal you just calculated, since it's pointless: it's already
> stale.
>   - otherwise write the appraisal and generation count to the xattr
>   - drop generation count lock
>   - release exclusive inode lock
> 
> and then for anything that does setxattr or chmod or whatever, just
> use that generation count lock to invalidate the appraisal. You don't
> need to actual appraisal lock for that.
> 
> So now the appraisal lock is always the outermost one, and the
> generation count lock is always the innermost.
> 
> Anyway, I haven't looked at the details of what IMA does, but
> something like the above really sounds like it should work and seems
> pretty straightforward.
> 
> No?
> 
>Linus
> 



Re: [PATCH v3] mm, sysctl: make NUMA stats configurable

2017-09-28 Thread kemi


On 2017年09月29日 05:29, Andrew Morton wrote:
> On Thu, 28 Sep 2017 14:11:41 +0800 Kemi Wang  wrote:
> 
>> This is the second step which introduces a tunable interface that allow
>> numa stats configurable for optimizing zone_statistics(), as suggested by
>> Dave Hansen and Ying Huang.
> 
> Looks OK I guess.
> 
> I fiddled with it a lot.  Please consider:
> 

Thanks for your help to make it more graceful! I will be more careful next time.
There may be a typo error in Documentation/sysctl/vm.txt, see comment below.

> From: Andrew Morton 
> Subject: mm-sysctl-make-numa-stats-configurable-fix
> 
> - tweak documentation
> 
> - move advisory message from start_kernel() into mm_init() (I'm not sure
>   we really need this message)
> 
> - use strcasecmp() in __parse_vm_numa_stats_mode()
> 
> - clean up coding style amd nessages in sysctl_vm_numa_stats_mode_handler()
> 
> Cc: Aaron Lu 
> Cc: Andi Kleen 
> Cc: Christopher Lameter 
> Cc: Dave Hansen 
> Cc: Jesper Dangaard Brouer 
> Cc: Johannes Weiner 
> Cc: Jonathan Corbet 
> Cc: Kees Cook 
> Cc: Kemi Wang 
> Cc: "Luis R . Rodriguez" 
> Cc: Mel Gorman 
> Cc: Michal Hocko 
> Cc: Sebastian Andrzej Siewior 
> Cc: Tim Chen 
> Cc: Vlastimil Babka 
> Cc: Ying Huang 
> Signed-off-by: Andrew Morton 
> ---
> 
>  Documentation/sysctl/vm.txt |   15 ++---
>  init/main.c |6 ++---
>  mm/vmstat.c |   39 +++---
>  3 files changed, 29 insertions(+), 31 deletions(-)
> 
> diff -puN 
> Documentation/sysctl/vm.txt~mm-sysctl-make-numa-stats-configurable-fix 
> Documentation/sysctl/vm.txt
> --- a/Documentation/sysctl/vm.txt~mm-sysctl-make-numa-stats-configurable-fix
> +++ a/Documentation/sysctl/vm.txt
> @@ -853,7 +853,7 @@ ten times more freeable objects than the
>  
>  numa_stats_mode
>  
> -This interface allows numa statistics configurable.
> +This interface allows runtime configuration *or* numa statistics.
>  

typo? or->of/for?

>  When page allocation performance becomes a bottleneck and you can tolerate
>  some possible tool breakage and decreased numa counter precision, you can
> @@ -864,13 +864,14 @@ When page allocation performance is not
>  tooling to work, you can do:
>   echo [S|s]trict > /proc/sys/vm/numa_stat_mode
>  
> -We recommend automatic detection of numa statistics by system, because numa
> -statistics does not affect system's decision and it is very rarely
> -consumed. you can do:
> +We recommend automatic detection of numa statistics by system, because
> +numa statistics do not affect system decisions and it is very rarely
> +consumed.  In this case you can do:
>   echo [A|a]uto > /proc/sys/vm/numa_stats_mode
> -This is also system default configuration, with this default setting, numa
> -counters update is skipped unless the counter is *read* by users at least
> -once.
> +
> +This is the system default configuration.  With this default setting, numa
> +counter updates are skipped until the counter is *read* by userspace at
> +least once.
>  
>  ==
>  
> diff -puN drivers/base/node.c~mm-sysctl-make-numa-stats-configurable-fix 
> drivers/base/node.c
> diff -puN include/linux/vmstat.h~mm-sysctl-make-numa-stats-configurable-fix 
> include/linux/vmstat.h
> diff -puN init/main.c~mm-sysctl-make-numa-stats-configurable-fix init/main.c
> --- a/init/main.c~mm-sysctl-make-numa-stats-configurable-fix
> +++ a/init/main.c
> @@ -504,6 +504,9 @@ static void __init mm_init(void)
>   pgtable_init();
>   vmalloc_init();
>   ioremap_huge_init();
> +#ifdef CONFIG_NUMA
> + pr_info("vmstat: NUMA stat updates are skipped unless they have been 
> used\n");
> +#endif
>  }
>  
>  asmlinkage __visible void __init start_kernel(void)
> @@ -567,9 +570,6 @@ asmlinkage __visible void __init start_k
>   sort_main_extable();
>   trap_init();
>   mm_init();
> -#ifdef CONFIG_NUMA
> - pr_info("vmstat: NUMA stats is skipped unless it has been consumed\n");
> -#endif
>  
>   ftrace_init();
>  
> diff -puN kernel/sysctl.c~mm-sysctl-make-numa-stats-configurable-fix 
> kernel/sysctl.c
> diff -puN mm/page_alloc.c~mm-sysctl-make-numa-stats-configurable-fix 
> mm/page_alloc.c
> diff -puN mm/vmstat.c~mm-sysctl-make-numa-stats-configurable-fix mm/vmstat.c
> --- a/mm/vmstat.c~mm-sysctl-make-numa-stats-configurable-fix
> +++ a/mm/vmstat.c
> @@ -40,13 +40,11 @@ static DEFINE_MUTEX(vm_numa_stats_mode_l
>  
>  static int __parse_vm_numa_stats_mode(char *s)
>  {
> - const char *str = s;
> -
> - if (strcmp(str, "auto") == 0 || strcmp(str, 

Re: [PATCH v3] mm, sysctl: make NUMA stats configurable

2017-09-28 Thread kemi


On 2017年09月29日 05:29, Andrew Morton wrote:
> On Thu, 28 Sep 2017 14:11:41 +0800 Kemi Wang  wrote:
> 
>> This is the second step which introduces a tunable interface that allow
>> numa stats configurable for optimizing zone_statistics(), as suggested by
>> Dave Hansen and Ying Huang.
> 
> Looks OK I guess.
> 
> I fiddled with it a lot.  Please consider:
> 

Thanks for your help to make it more graceful! I will be more careful next time.
There may be a typo error in Documentation/sysctl/vm.txt, see comment below.

> From: Andrew Morton 
> Subject: mm-sysctl-make-numa-stats-configurable-fix
> 
> - tweak documentation
> 
> - move advisory message from start_kernel() into mm_init() (I'm not sure
>   we really need this message)
> 
> - use strcasecmp() in __parse_vm_numa_stats_mode()
> 
> - clean up coding style amd nessages in sysctl_vm_numa_stats_mode_handler()
> 
> Cc: Aaron Lu 
> Cc: Andi Kleen 
> Cc: Christopher Lameter 
> Cc: Dave Hansen 
> Cc: Jesper Dangaard Brouer 
> Cc: Johannes Weiner 
> Cc: Jonathan Corbet 
> Cc: Kees Cook 
> Cc: Kemi Wang 
> Cc: "Luis R . Rodriguez" 
> Cc: Mel Gorman 
> Cc: Michal Hocko 
> Cc: Sebastian Andrzej Siewior 
> Cc: Tim Chen 
> Cc: Vlastimil Babka 
> Cc: Ying Huang 
> Signed-off-by: Andrew Morton 
> ---
> 
>  Documentation/sysctl/vm.txt |   15 ++---
>  init/main.c |6 ++---
>  mm/vmstat.c |   39 +++---
>  3 files changed, 29 insertions(+), 31 deletions(-)
> 
> diff -puN 
> Documentation/sysctl/vm.txt~mm-sysctl-make-numa-stats-configurable-fix 
> Documentation/sysctl/vm.txt
> --- a/Documentation/sysctl/vm.txt~mm-sysctl-make-numa-stats-configurable-fix
> +++ a/Documentation/sysctl/vm.txt
> @@ -853,7 +853,7 @@ ten times more freeable objects than the
>  
>  numa_stats_mode
>  
> -This interface allows numa statistics configurable.
> +This interface allows runtime configuration *or* numa statistics.
>  

typo? or->of/for?

>  When page allocation performance becomes a bottleneck and you can tolerate
>  some possible tool breakage and decreased numa counter precision, you can
> @@ -864,13 +864,14 @@ When page allocation performance is not
>  tooling to work, you can do:
>   echo [S|s]trict > /proc/sys/vm/numa_stat_mode
>  
> -We recommend automatic detection of numa statistics by system, because numa
> -statistics does not affect system's decision and it is very rarely
> -consumed. you can do:
> +We recommend automatic detection of numa statistics by system, because
> +numa statistics do not affect system decisions and it is very rarely
> +consumed.  In this case you can do:
>   echo [A|a]uto > /proc/sys/vm/numa_stats_mode
> -This is also system default configuration, with this default setting, numa
> -counters update is skipped unless the counter is *read* by users at least
> -once.
> +
> +This is the system default configuration.  With this default setting, numa
> +counter updates are skipped until the counter is *read* by userspace at
> +least once.
>  
>  ==
>  
> diff -puN drivers/base/node.c~mm-sysctl-make-numa-stats-configurable-fix 
> drivers/base/node.c
> diff -puN include/linux/vmstat.h~mm-sysctl-make-numa-stats-configurable-fix 
> include/linux/vmstat.h
> diff -puN init/main.c~mm-sysctl-make-numa-stats-configurable-fix init/main.c
> --- a/init/main.c~mm-sysctl-make-numa-stats-configurable-fix
> +++ a/init/main.c
> @@ -504,6 +504,9 @@ static void __init mm_init(void)
>   pgtable_init();
>   vmalloc_init();
>   ioremap_huge_init();
> +#ifdef CONFIG_NUMA
> + pr_info("vmstat: NUMA stat updates are skipped unless they have been 
> used\n");
> +#endif
>  }
>  
>  asmlinkage __visible void __init start_kernel(void)
> @@ -567,9 +570,6 @@ asmlinkage __visible void __init start_k
>   sort_main_extable();
>   trap_init();
>   mm_init();
> -#ifdef CONFIG_NUMA
> - pr_info("vmstat: NUMA stats is skipped unless it has been consumed\n");
> -#endif
>  
>   ftrace_init();
>  
> diff -puN kernel/sysctl.c~mm-sysctl-make-numa-stats-configurable-fix 
> kernel/sysctl.c
> diff -puN mm/page_alloc.c~mm-sysctl-make-numa-stats-configurable-fix 
> mm/page_alloc.c
> diff -puN mm/vmstat.c~mm-sysctl-make-numa-stats-configurable-fix mm/vmstat.c
> --- a/mm/vmstat.c~mm-sysctl-make-numa-stats-configurable-fix
> +++ a/mm/vmstat.c
> @@ -40,13 +40,11 @@ static DEFINE_MUTEX(vm_numa_stats_mode_l
>  
>  static int __parse_vm_numa_stats_mode(char *s)
>  {
> - const char *str = s;
> -
> - if (strcmp(str, "auto") == 0 || strcmp(str, "Auto") == 0)
> + if (strcasecmp(s, "auto"))
>   vm_numa_stats_mode = VM_NUMA_STAT_AUTO_MODE;
> - else if (strcmp(str, "strict") == 0 || strcmp(str, "Strict") == 0)
> + else if (strcasecmp(s, "strict") == 0)
>   vm_numa_stats_mode = VM_NUMA_STAT_STRICT_MODE;
> - else if (strcmp(str, "coarse") == 0 || strcmp(str, "Coarse") == 0)
> + else if (strcasecmp(s, 

Re: [PATCH] x86/asm: Fix inline asm call constraints for GCC 4.4

2017-09-28 Thread Josh Poimboeuf
On Thu, Sep 28, 2017 at 04:53:09PM -0700, Linus Torvalds wrote:
> On Thu, Sep 28, 2017 at 2:58 PM, Josh Poimboeuf  wrote:
> >
> > Reported-by: kernel test robot 
> > Fixes: f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang")
> > Signed-off-by: Josh Poimboeuf 
> 
> Side note: it's not like I personally need the credit, but in general
> I really want people to pick up on who debugged the code and pointed
> to the solution. That's often more of the work than the fix itself.
> 
> The kernel test robot report looked to be ignored as a "gcc-4.4 is too
> old to worry about" thing. People who then step up and analyze the
> problem are rare as it is. They need to be credited in the commit
> logs.
> 
> We don't have any fixed format for that, but it's pretty free-form. So
> we have tags like
> 
>   Root-caused-by:
>   Diagnosed-by:
>   Analyzed-by:
>   Debugged-by:
>   Bisected-by:
>   Fix-suggested-by:
> 
> etc for giving credit to people who figured out some part of a bug
> (and, having grepped for this, we also a _shitload_ of miss-spellings
> of various things ;)

Indeed, credit is important and I try to give it where it's due.  Sorry
for the snub!  I anoint you with:

Debugged-by: Linus Torvalds 

-- 
Josh


Re: [PATCH] x86/asm: Fix inline asm call constraints for GCC 4.4

2017-09-28 Thread Josh Poimboeuf
On Thu, Sep 28, 2017 at 04:53:09PM -0700, Linus Torvalds wrote:
> On Thu, Sep 28, 2017 at 2:58 PM, Josh Poimboeuf  wrote:
> >
> > Reported-by: kernel test robot 
> > Fixes: f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang")
> > Signed-off-by: Josh Poimboeuf 
> 
> Side note: it's not like I personally need the credit, but in general
> I really want people to pick up on who debugged the code and pointed
> to the solution. That's often more of the work than the fix itself.
> 
> The kernel test robot report looked to be ignored as a "gcc-4.4 is too
> old to worry about" thing. People who then step up and analyze the
> problem are rare as it is. They need to be credited in the commit
> logs.
> 
> We don't have any fixed format for that, but it's pretty free-form. So
> we have tags like
> 
>   Root-caused-by:
>   Diagnosed-by:
>   Analyzed-by:
>   Debugged-by:
>   Bisected-by:
>   Fix-suggested-by:
> 
> etc for giving credit to people who figured out some part of a bug
> (and, having grepped for this, we also a _shitload_ of miss-spellings
> of various things ;)

Indeed, credit is important and I try to give it where it's due.  Sorry
for the snub!  I anoint you with:

Debugged-by: Linus Torvalds 

-- 
Josh


linux-next: build failure after merge of the net-next tree

2017-09-28 Thread Stephen Rothwell
Hi all,

After merging the net-next tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:

net/dsa/slave.c: In function 'dsa_slave_create':
net/dsa/slave.c:1191:18: error: 'struct dsa_slave_priv' has no member named 
'phy'
  phy_disconnect(p->phy);
  ^

Caused by commit

  0115dcd1787d ("net: dsa: use slave device phydev")

Interacting with commit

  e804441cfe0b ("net: dsa: Fix network device registration order")

from the net tree.

I applied the following merge fix patch (which I am not sure about):

From: Stephen Rothwell 
Date: Fri, 29 Sep 2017 11:28:45 +1000
Subject: [PATCH] net: dsa: merge fix patch for removal of phy

Signed-off-by: Stephen Rothwell 
---
 net/dsa/slave.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 8869954485db..9191c929c6c8 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1188,7 +1188,7 @@ int dsa_slave_create(struct dsa_port *port, const char 
*name)
return 0;
 
 out_phy:
-   phy_disconnect(p->phy);
+   phy_disconnect(slave_dev->phydev);
if (of_phy_is_fixed_link(p->dp->dn))
of_phy_deregister_fixed_link(p->dp->dn);
 out_free:
-- 
2.14.1

-- 
Cheers,
Stephen Rothwell


linux-next: build failure after merge of the net-next tree

2017-09-28 Thread Stephen Rothwell
Hi all,

After merging the net-next tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:

net/dsa/slave.c: In function 'dsa_slave_create':
net/dsa/slave.c:1191:18: error: 'struct dsa_slave_priv' has no member named 
'phy'
  phy_disconnect(p->phy);
  ^

Caused by commit

  0115dcd1787d ("net: dsa: use slave device phydev")

Interacting with commit

  e804441cfe0b ("net: dsa: Fix network device registration order")

from the net tree.

I applied the following merge fix patch (which I am not sure about):

From: Stephen Rothwell 
Date: Fri, 29 Sep 2017 11:28:45 +1000
Subject: [PATCH] net: dsa: merge fix patch for removal of phy

Signed-off-by: Stephen Rothwell 
---
 net/dsa/slave.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 8869954485db..9191c929c6c8 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1188,7 +1188,7 @@ int dsa_slave_create(struct dsa_port *port, const char 
*name)
return 0;
 
 out_phy:
-   phy_disconnect(p->phy);
+   phy_disconnect(slave_dev->phydev);
if (of_phy_is_fixed_link(p->dp->dn))
of_phy_deregister_fixed_link(p->dp->dn);
 out_free:
-- 
2.14.1

-- 
Cheers,
Stephen Rothwell


[PATCH RESEND] KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume

2017-09-28 Thread Wanpeng Li
From: Wanpeng Li 

[ cut here ]
 WARNING: CPU: 4 PID: 5280 at /home/kernel/linux/arch/x86/kvm//vmx.c:11394 
nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
 CPU: 4 PID: 5280 Comm: qemu-system-x86 Tainted: GW  OE   4.13.0+ #17
 RIP: 0010:nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
 Call Trace:
  ? emulator_read_emulated+0x15/0x20 [kvm]
  ? segmented_read+0xae/0xf0 [kvm]
  vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
  ? vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
  x86_emulate_instruction+0x733/0x810 [kvm]
  vmx_handle_exit+0x2f4/0xda0 [kvm_intel]
  ? kvm_arch_vcpu_ioctl_run+0xd2f/0x1c60 [kvm]
  kvm_arch_vcpu_ioctl_run+0xdab/0x1c60 [kvm]
  ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
  kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? __fget+0xfc/0x210
  do_vfs_ioctl+0xa4/0x6a0
  ? __fget+0x11d/0x210
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x23/0xc2

A nested #PF is triggered during L0 emulating instruction for L2. However, it
doesn't consider we should not break L1's vmlauch/vmresme. This patch fixes
it by queuing the #PF exception instead ,requesting an immediate VM exit from
L2 and keeping the exception for L1 pending for a subsequent nested VM exit.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/vmx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c83d28b..1ca91c8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9840,7 +9840,8 @@ static void vmx_inject_page_fault_nested(struct kvm_vcpu 
*vcpu,
 
WARN_ON(!is_guest_mode(vcpu));
 
-   if (nested_vmx_is_page_fault_vmexit(vmcs12, fault->error_code)) {
+   if (nested_vmx_is_page_fault_vmexit(vmcs12, fault->error_code) &&
+   !to_vmx(vcpu)->nested.nested_run_pending) {
vmcs12->vm_exit_intr_error_code = fault->error_code;
nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
  PF_VECTOR | INTR_TYPE_HARD_EXCEPTION |
-- 
2.7.4



[PATCH RESEND] KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume

2017-09-28 Thread Wanpeng Li
From: Wanpeng Li 

[ cut here ]
 WARNING: CPU: 4 PID: 5280 at /home/kernel/linux/arch/x86/kvm//vmx.c:11394 
nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
 CPU: 4 PID: 5280 Comm: qemu-system-x86 Tainted: GW  OE   4.13.0+ #17
 RIP: 0010:nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
 Call Trace:
  ? emulator_read_emulated+0x15/0x20 [kvm]
  ? segmented_read+0xae/0xf0 [kvm]
  vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
  ? vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
  x86_emulate_instruction+0x733/0x810 [kvm]
  vmx_handle_exit+0x2f4/0xda0 [kvm_intel]
  ? kvm_arch_vcpu_ioctl_run+0xd2f/0x1c60 [kvm]
  kvm_arch_vcpu_ioctl_run+0xdab/0x1c60 [kvm]
  ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
  kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? __fget+0xfc/0x210
  do_vfs_ioctl+0xa4/0x6a0
  ? __fget+0x11d/0x210
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x23/0xc2

A nested #PF is triggered during L0 emulating instruction for L2. However, it
doesn't consider we should not break L1's vmlauch/vmresme. This patch fixes
it by queuing the #PF exception instead ,requesting an immediate VM exit from
L2 and keeping the exception for L1 pending for a subsequent nested VM exit.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/vmx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c83d28b..1ca91c8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9840,7 +9840,8 @@ static void vmx_inject_page_fault_nested(struct kvm_vcpu 
*vcpu,
 
WARN_ON(!is_guest_mode(vcpu));
 
-   if (nested_vmx_is_page_fault_vmexit(vmcs12, fault->error_code)) {
+   if (nested_vmx_is_page_fault_vmexit(vmcs12, fault->error_code) &&
+   !to_vmx(vcpu)->nested.nested_run_pending) {
vmcs12->vm_exit_intr_error_code = fault->error_code;
nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
  PF_VECTOR | INTR_TYPE_HARD_EXCEPTION |
-- 
2.7.4



Re: [PATCH 2/3] srcu: queue work without holding the lock

2017-09-28 Thread Paul E. McKenney
On Thu, Sep 28, 2017 at 06:03:57PM +0200, Sebastian Andrzej Siewior wrote:
> On 2017-09-22 11:46:10 [-0700], Paul E. McKenney wrote:
> > On Fri, Sep 22, 2017 at 05:28:05PM +0200, Sebastian Andrzej Siewior wrote:
> > > On RT we can't invoke queue_delayed_work() within an atomic section
> > > (which is provided by raw_spin_lock_irqsave()).
> > > srcu_reschedule() invokes queue_delayed_work() outside of the
> > > raw_spin_lock_irq_rcu_node() section so this should be fine here, too.
> > > If the remaining callers of call_srcu() aren't atomic
> > > (spin_lock_irqsave() is fine) then this should work on RT, too.
> > 
> > Just to make sure I understand...   The problem is not the _irqsave,
> > but rather the raw_?
> 
> exactly. The _irqsave is translated into a sleeping lock on RT and does
> not matter. The raw_ ones stay as they are and queue_delayed_work() uses
> sleeping locks itself and this is where things fall apart.

OK, internally I could get rid of raw_ at the expense of some code bloat,
but in the call_srcu() case, the caller might well hold a raw_ lock.

Thoughts?

Thanx, Paul



Re: [PATCH 2/3] srcu: queue work without holding the lock

2017-09-28 Thread Paul E. McKenney
On Thu, Sep 28, 2017 at 06:03:57PM +0200, Sebastian Andrzej Siewior wrote:
> On 2017-09-22 11:46:10 [-0700], Paul E. McKenney wrote:
> > On Fri, Sep 22, 2017 at 05:28:05PM +0200, Sebastian Andrzej Siewior wrote:
> > > On RT we can't invoke queue_delayed_work() within an atomic section
> > > (which is provided by raw_spin_lock_irqsave()).
> > > srcu_reschedule() invokes queue_delayed_work() outside of the
> > > raw_spin_lock_irq_rcu_node() section so this should be fine here, too.
> > > If the remaining callers of call_srcu() aren't atomic
> > > (spin_lock_irqsave() is fine) then this should work on RT, too.
> > 
> > Just to make sure I understand...   The problem is not the _irqsave,
> > but rather the raw_?
> 
> exactly. The _irqsave is translated into a sleeping lock on RT and does
> not matter. The raw_ ones stay as they are and queue_delayed_work() uses
> sleeping locks itself and this is where things fall apart.

OK, internally I could get rid of raw_ at the expense of some code bloat,
but in the call_srcu() case, the caller might well hold a raw_ lock.

Thoughts?

Thanx, Paul



Re: [PATCH 1/3] srcu: use cpu_online() instead custom check

2017-09-28 Thread Paul E. McKenney
On Thu, Sep 28, 2017 at 06:02:08PM +0200, Sebastian Andrzej Siewior wrote:
> On 2017-09-22 11:43:14 [-0700], Paul E. McKenney wrote:
> > On Fri, Sep 22, 2017 at 05:28:04PM +0200, Sebastian Andrzej Siewior wrote:
> > > The current check via srcu_online is slightly racy because after looking
> > > at srcu_online there could be an interrupt that interrupted us long
> > > enough until the CPU we checked against went offline.
> > 
> > But in that case, wouldn't the interrupt block the synchronize_sched()
> > later in the offline sequence?
> 
> What I meant is:
> 
>   CPU0CPU1
>   preempt_disable();
>   if (READ_ONCE(per_cpu(srcu_online, 1)))
>   *interrupt*
>   WRITE_ONCE(per_cpu(srcu_online, 
> cpu), false);
>   and CPU is offnline
>   
>   ret = queue_delayed_work_on(1, wq, dwork, delay);
> 
> is this possible or are there a safety belt for this?

I don't see anything that would prevent this.  It is unlikely, but not
so unlikely that it should not be fixed.

> > More to the point, are you actually seeing this failure, or is this
> > a theoretical bug?
> 
> I need to get rid of the preempt_disable() section in which
> queue_delayed_work*() is invoked for RT.

OK, but please see below...

> > > An alternative would be to hold the hotplug rwsem (so the CPUs don't
> > > change their state) and then check based on cpu_online() if we queue it
> > > on a specific CPU or not. queue_work_on() itself can handle if something
> > > is enqueued on an offline CPU but a timer which is enqueued on an offline
> > > CPU won't fire until the CPU is back online.
> > > 
> > > I am not sure if the removal in rcu_init() is okay or not. I assume that
> > > SRCU won't enqueue a work item before SRCU is up and ready.
> > 
> > Another alternative would be to disable preemption across the check and
> > the call to queue_delayed_work_on().
> 
> you need to ensure the *other* CPU won't in the middle of checking its
> status. preempt_disable() won't do this on the other CPU.

Agreed.

> > Yet another alternative would be to have an SRCU-specific per-CPU lock
> > that is acquired across the setting and clearing of srcu_online,
> > and also across the check and the call to queue_delayed_work_on().
> > This last would be more consistent with a desire to remove the
> > synchronize_sched() from the offline sequence.
> > 
> > Or am I missing something here?
> The perCPU lock should work. And cpus_read_lock() is basically that
> except that srcu_online_cpu() is not holding it but the CPU-HP code.
> 
> So you want keep things as-is or do you prefer a per-CPU rwsem instead?

The per-CPU rwsem seems like a reasonable approach.  Except for the
call_srcu() path, given that call_srcu()'s caller might have preemption
(or even interrupts) disabled.

Thoughts?

Thanx, Paul



Re: [PATCH 1/3] srcu: use cpu_online() instead custom check

2017-09-28 Thread Paul E. McKenney
On Thu, Sep 28, 2017 at 06:02:08PM +0200, Sebastian Andrzej Siewior wrote:
> On 2017-09-22 11:43:14 [-0700], Paul E. McKenney wrote:
> > On Fri, Sep 22, 2017 at 05:28:04PM +0200, Sebastian Andrzej Siewior wrote:
> > > The current check via srcu_online is slightly racy because after looking
> > > at srcu_online there could be an interrupt that interrupted us long
> > > enough until the CPU we checked against went offline.
> > 
> > But in that case, wouldn't the interrupt block the synchronize_sched()
> > later in the offline sequence?
> 
> What I meant is:
> 
>   CPU0CPU1
>   preempt_disable();
>   if (READ_ONCE(per_cpu(srcu_online, 1)))
>   *interrupt*
>   WRITE_ONCE(per_cpu(srcu_online, 
> cpu), false);
>   and CPU is offnline
>   
>   ret = queue_delayed_work_on(1, wq, dwork, delay);
> 
> is this possible or are there a safety belt for this?

I don't see anything that would prevent this.  It is unlikely, but not
so unlikely that it should not be fixed.

> > More to the point, are you actually seeing this failure, or is this
> > a theoretical bug?
> 
> I need to get rid of the preempt_disable() section in which
> queue_delayed_work*() is invoked for RT.

OK, but please see below...

> > > An alternative would be to hold the hotplug rwsem (so the CPUs don't
> > > change their state) and then check based on cpu_online() if we queue it
> > > on a specific CPU or not. queue_work_on() itself can handle if something
> > > is enqueued on an offline CPU but a timer which is enqueued on an offline
> > > CPU won't fire until the CPU is back online.
> > > 
> > > I am not sure if the removal in rcu_init() is okay or not. I assume that
> > > SRCU won't enqueue a work item before SRCU is up and ready.
> > 
> > Another alternative would be to disable preemption across the check and
> > the call to queue_delayed_work_on().
> 
> you need to ensure the *other* CPU won't in the middle of checking its
> status. preempt_disable() won't do this on the other CPU.

Agreed.

> > Yet another alternative would be to have an SRCU-specific per-CPU lock
> > that is acquired across the setting and clearing of srcu_online,
> > and also across the check and the call to queue_delayed_work_on().
> > This last would be more consistent with a desire to remove the
> > synchronize_sched() from the offline sequence.
> > 
> > Or am I missing something here?
> The perCPU lock should work. And cpus_read_lock() is basically that
> except that srcu_online_cpu() is not holding it but the CPU-HP code.
> 
> So you want keep things as-is or do you prefer a per-CPU rwsem instead?

The per-CPU rwsem seems like a reasonable approach.  Except for the
call_srcu() path, given that call_srcu()'s caller might have preemption
(or even interrupts) disabled.

Thoughts?

Thanx, Paul



[PATCH v2] KVM: VMX: Don't expose PLE enable if there is no hardware support

2017-09-28 Thread Wanpeng Li
From: Wanpeng Li 

PLE_Window: Software can configure this field as an upper bound on the amount 
of time
a guest is allowed to execute in a PAUSE LOOP.

KVM doesn't expose the PLE capability to the L1 hypervisor, however, ple_window 
still
shows the default value on L1 hypervisor. This patch fixes it by clearing all 
the
PLE related module parameter if there is no PLE capability.

Reviewed-by: Konrad Rzeszutek Wilk 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
v1 -> v2:
 * fix typo in patch description

 arch/x86/kvm/vmx.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c83d28b..4d4f9b4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6781,8 +6781,13 @@ static __init int hardware_setup(void)
if (enable_ept && !cpu_has_vmx_ept_2m_page())
kvm_disable_largepages();
 
-   if (!cpu_has_vmx_ple())
+   if (!cpu_has_vmx_ple()) {
ple_gap = 0;
+   ple_window = 0;
+   ple_window_grow = 0;
+   ple_window_max = 0;
+   ple_window_shrink = 0;
+   }
 
if (!cpu_has_vmx_apicv()) {
enable_apicv = 0;
-- 
2.7.4



[PATCH v2] KVM: VMX: Don't expose PLE enable if there is no hardware support

2017-09-28 Thread Wanpeng Li
From: Wanpeng Li 

PLE_Window: Software can configure this field as an upper bound on the amount 
of time
a guest is allowed to execute in a PAUSE LOOP.

KVM doesn't expose the PLE capability to the L1 hypervisor, however, ple_window 
still
shows the default value on L1 hypervisor. This patch fixes it by clearing all 
the
PLE related module parameter if there is no PLE capability.

Reviewed-by: Konrad Rzeszutek Wilk 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
v1 -> v2:
 * fix typo in patch description

 arch/x86/kvm/vmx.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c83d28b..4d4f9b4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6781,8 +6781,13 @@ static __init int hardware_setup(void)
if (enable_ept && !cpu_has_vmx_ept_2m_page())
kvm_disable_largepages();
 
-   if (!cpu_has_vmx_ple())
+   if (!cpu_has_vmx_ple()) {
ple_gap = 0;
+   ple_window = 0;
+   ple_window_grow = 0;
+   ple_window_max = 0;
+   ple_window_shrink = 0;
+   }
 
if (!cpu_has_vmx_apicv()) {
enable_apicv = 0;
-- 
2.7.4



[PATCH v2 2/4] KVM: LAPIC: Keep timer running when switching between one-shot and periodic mode

2017-09-28 Thread Wanpeng Li
From: Wanpeng Li 

If we take TSC-deadline mode timer out of the picture, the Intel SDM
does not say that the timer is disable when the timer mode is change,
either from one-shot to periodic or vice versa.

After this patch, the timer is no longer disarmed on change of mode, so
the counter (TMCCT) keeps counting down.

So what does a write to LVTT changes ? On baremetal, the change of mode
is probably taken into account only when the counter reach 0. When this
happen, LVTT is use to figure out if the counter should restard counting
down from TMICT (so periodic mode) or stop counting (if one-shot mode).

This patch is based on observation of the behavior of the APIC timer on
baremetal as well as check that they does not go against the description
written in the Intel SDM.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/lapic.c | 40 
 1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a739cbb..946c11b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1301,7 +1301,7 @@ static void update_divide_count(struct kvm_lapic *apic)
   apic->divide_count);
 }
 
-static void apic_update_lvtt(struct kvm_lapic *apic)
+static bool apic_update_lvtt(struct kvm_lapic *apic)
 {
u32 timer_mode = kvm_lapic_get_reg(apic, APIC_LVTT) &
apic->lapic_timer.timer_mode_mask;
@@ -1309,7 +1309,9 @@ static void apic_update_lvtt(struct kvm_lapic *apic)
if (apic->lapic_timer.timer_mode != timer_mode) {
apic->lapic_timer.timer_mode = timer_mode;
hrtimer_cancel(>lapic_timer.timer);
+   return true;
}
+   return false;
 }
 
 static void apic_timer_expired(struct kvm_lapic *apic)
@@ -1430,11 +1432,12 @@ static void start_sw_period(struct kvm_lapic *apic)
HRTIMER_MODE_ABS_PINNED);
 }
 
-static bool set_target_expiration(struct kvm_lapic *apic)
+static bool set_target_expiration(struct kvm_lapic *apic, bool timer_update)
 {
-   ktime_t now;
-   u64 tscl = rdtsc();
+   ktime_t now, remaining;
+   u64 tscl = rdtsc(), delta;
 
+   /* Calculate the next time the timer should trigger an interrupt */
now = ktime_get();
apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
* APIC_BUS_CYCLE_NS * apic->divide_count;
@@ -1470,9 +1473,21 @@ static bool set_target_expiration(struct kvm_lapic *apic)
   ktime_to_ns(ktime_add_ns(now,
apic->lapic_timer.period)));
 
+   if (!timer_update)
+   delta = apic->lapic_timer.period;
+   else {
+   remaining = ktime_sub(apic->lapic_timer.target_expiration, now);
+   if (ktime_to_ns(remaining) < 0)
+   remaining = 0;
+   delta = mod_64(ktime_to_ns(remaining), 
apic->lapic_timer.period);
+   }
+
+   if (!delta)
+   return false;
+
apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
-   nsec_to_cycles(apic->vcpu, apic->lapic_timer.period);
-   apic->lapic_timer.target_expiration = ktime_add_ns(now, 
apic->lapic_timer.period);
+   nsec_to_cycles(apic->vcpu, delta);
+   apic->lapic_timer.target_expiration = ktime_add_ns(now, delta);
 
return true;
 }
@@ -1609,12 +1624,12 @@ void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu)
restart_apic_timer(apic);
 }
 
-static void start_apic_timer(struct kvm_lapic *apic)
+static void start_apic_timer(struct kvm_lapic *apic, bool timer_update)
 {
atomic_set(>lapic_timer.pending, 0);
 
if ((apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
-   && !set_target_expiration(apic))
+   && !set_target_expiration(apic, timer_update))
return;
 
restart_apic_timer(apic);
@@ -1729,7 +1744,8 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
val |= APIC_LVT_MASKED;
val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask);
kvm_lapic_set_reg(apic, APIC_LVTT, val);
-   apic_update_lvtt(apic);
+   if (apic_update_lvtt(apic) && !apic_lvtt_tscdeadline(apic))
+   start_apic_timer(apic, true);
break;
 
case APIC_TMICT:
@@ -1738,7 +1754,7 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
 
hrtimer_cancel(>lapic_timer.timer);
kvm_lapic_set_reg(apic, APIC_TMICT, val);
-   start_apic_timer(apic);
+   start_apic_timer(apic, false);
break;
 
case APIC_TDCR:
@@ -1872,7 +1888,7 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, 
u64 data)
 

[PATCH v2 2/4] KVM: LAPIC: Keep timer running when switching between one-shot and periodic mode

2017-09-28 Thread Wanpeng Li
From: Wanpeng Li 

If we take TSC-deadline mode timer out of the picture, the Intel SDM
does not say that the timer is disable when the timer mode is change,
either from one-shot to periodic or vice versa.

After this patch, the timer is no longer disarmed on change of mode, so
the counter (TMCCT) keeps counting down.

So what does a write to LVTT changes ? On baremetal, the change of mode
is probably taken into account only when the counter reach 0. When this
happen, LVTT is use to figure out if the counter should restard counting
down from TMICT (so periodic mode) or stop counting (if one-shot mode).

This patch is based on observation of the behavior of the APIC timer on
baremetal as well as check that they does not go against the description
written in the Intel SDM.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/lapic.c | 40 
 1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a739cbb..946c11b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1301,7 +1301,7 @@ static void update_divide_count(struct kvm_lapic *apic)
   apic->divide_count);
 }
 
-static void apic_update_lvtt(struct kvm_lapic *apic)
+static bool apic_update_lvtt(struct kvm_lapic *apic)
 {
u32 timer_mode = kvm_lapic_get_reg(apic, APIC_LVTT) &
apic->lapic_timer.timer_mode_mask;
@@ -1309,7 +1309,9 @@ static void apic_update_lvtt(struct kvm_lapic *apic)
if (apic->lapic_timer.timer_mode != timer_mode) {
apic->lapic_timer.timer_mode = timer_mode;
hrtimer_cancel(>lapic_timer.timer);
+   return true;
}
+   return false;
 }
 
 static void apic_timer_expired(struct kvm_lapic *apic)
@@ -1430,11 +1432,12 @@ static void start_sw_period(struct kvm_lapic *apic)
HRTIMER_MODE_ABS_PINNED);
 }
 
-static bool set_target_expiration(struct kvm_lapic *apic)
+static bool set_target_expiration(struct kvm_lapic *apic, bool timer_update)
 {
-   ktime_t now;
-   u64 tscl = rdtsc();
+   ktime_t now, remaining;
+   u64 tscl = rdtsc(), delta;
 
+   /* Calculate the next time the timer should trigger an interrupt */
now = ktime_get();
apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
* APIC_BUS_CYCLE_NS * apic->divide_count;
@@ -1470,9 +1473,21 @@ static bool set_target_expiration(struct kvm_lapic *apic)
   ktime_to_ns(ktime_add_ns(now,
apic->lapic_timer.period)));
 
+   if (!timer_update)
+   delta = apic->lapic_timer.period;
+   else {
+   remaining = ktime_sub(apic->lapic_timer.target_expiration, now);
+   if (ktime_to_ns(remaining) < 0)
+   remaining = 0;
+   delta = mod_64(ktime_to_ns(remaining), 
apic->lapic_timer.period);
+   }
+
+   if (!delta)
+   return false;
+
apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
-   nsec_to_cycles(apic->vcpu, apic->lapic_timer.period);
-   apic->lapic_timer.target_expiration = ktime_add_ns(now, 
apic->lapic_timer.period);
+   nsec_to_cycles(apic->vcpu, delta);
+   apic->lapic_timer.target_expiration = ktime_add_ns(now, delta);
 
return true;
 }
@@ -1609,12 +1624,12 @@ void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu)
restart_apic_timer(apic);
 }
 
-static void start_apic_timer(struct kvm_lapic *apic)
+static void start_apic_timer(struct kvm_lapic *apic, bool timer_update)
 {
atomic_set(>lapic_timer.pending, 0);
 
if ((apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
-   && !set_target_expiration(apic))
+   && !set_target_expiration(apic, timer_update))
return;
 
restart_apic_timer(apic);
@@ -1729,7 +1744,8 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
val |= APIC_LVT_MASKED;
val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask);
kvm_lapic_set_reg(apic, APIC_LVTT, val);
-   apic_update_lvtt(apic);
+   if (apic_update_lvtt(apic) && !apic_lvtt_tscdeadline(apic))
+   start_apic_timer(apic, true);
break;
 
case APIC_TMICT:
@@ -1738,7 +1754,7 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
 
hrtimer_cancel(>lapic_timer.timer);
kvm_lapic_set_reg(apic, APIC_TMICT, val);
-   start_apic_timer(apic);
+   start_apic_timer(apic, false);
break;
 
case APIC_TDCR:
@@ -1872,7 +1888,7 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, 
u64 data)
 
hrtimer_cancel(>lapic_timer.timer);
apic->lapic_timer.tscdeadline = data;
-   

[PATCH v2 4/4] KVM: LAPIC: Don't silently accept bad vectors

2017-09-28 Thread Wanpeng Li
From: Wanpeng Li 

Vectors 0-15 are reserved, and a physical LAPIC - upon sending or
receiving one - would generate an APIC error instead of doing the
requested action. Make our emulation behave similarly.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/lapic.c | 30 --
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6bafd06..a779ba9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -935,6 +935,25 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct 
kvm_lapic_irq *irq,
return ret;
 }
 
+static void apic_error(struct kvm_lapic *apic, unsigned long errmask)
+{
+   uint32_t esr;
+
+   esr = kvm_lapic_get_reg(apic, APIC_ESR);
+
+   if ((esr & errmask) != errmask) {
+   uint32_t lvterr = kvm_lapic_get_reg(apic, APIC_LVTERR);
+
+   kvm_lapic_set_reg(apic, APIC_ESR, esr | errmask);
+   if (!(lvterr & APIC_LVT_MASKED)) {
+   struct kvm_lapic_irq irq;
+
+   irq.vector = lvterr & 0xff;
+   kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, , 
NULL);
+   }
+   }
+}
+
 /*
  * Add a pending IRQ into lapic.
  * Return 1 if successfully added and 0 if discarded.
@@ -946,6 +965,11 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
delivery_mode,
int result = 0;
struct kvm_vcpu *vcpu = apic->vcpu;
 
+   if (unlikely(vector < 16) && delivery_mode == APIC_DM_FIXED) {
+   apic_error(apic, APIC_ESR_RECVILL);
+   return 0;
+   }
+
trace_kvm_apic_accept_irq(vcpu->vcpu_id, delivery_mode,
  trig_mode, vector);
switch (delivery_mode) {
@@ -1146,7 +1170,10 @@ static void apic_send_ipi(struct kvm_lapic *apic)
   irq.trig_mode, irq.level, irq.dest_mode, irq.delivery_mode,
   irq.vector, irq.msi_redir_hint);
 
-   kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, , NULL);
+   if (unlikely(irq.vector < 16 && irq.delivery_mode == APIC_DM_FIXED))
+   apic_error(apic, APIC_ESR_SENDILL);
+   else
+   kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, , NULL);
 }
 
 static u32 apic_get_tmcct(struct kvm_lapic *apic)
@@ -1734,7 +1761,6 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
case APIC_LVTPC:
case APIC_LVT1:
case APIC_LVTERR:
-   /* TODO: Check vector */
if (!kvm_apic_sw_enabled(apic))
val |= APIC_LVT_MASKED;
 
-- 
2.7.4



[PATCH v2 3/4] KVM: LAPIC: Apply change to TDCR right away to the timer

2017-09-28 Thread Wanpeng Li
From: Wanpeng Li 

The description in the Intel SDM of how the divide configuration
register is used: "The APIC timer frequency will be the processor's bus
clock or core crystal clock frequency divided by the value specified in
the divide configuration register."

Observation of baremetal shown that when the TDCR is change, the TMCCT
does not change or make a big jump in value, but the rate at which it
count down change.

The patch update the emulation to APIC timer to so that a change to the
divide configuration would be reflected in the value of the counter and
when the next interrupt is triggered.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/lapic.c | 31 +--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 946c11b..6bafd06 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1432,7 +1432,7 @@ static void start_sw_period(struct kvm_lapic *apic)
HRTIMER_MODE_ABS_PINNED);
 }
 
-static bool set_target_expiration(struct kvm_lapic *apic, bool timer_update)
+static bool set_target_expiration(struct kvm_lapic *apic, bool timer_update, 
uint32_t old_divisor)
 {
ktime_t now, remaining;
u64 tscl = rdtsc(), delta;
@@ -1440,7 +1440,7 @@ static bool set_target_expiration(struct kvm_lapic *apic, 
bool timer_update)
/* Calculate the next time the timer should trigger an interrupt */
now = ktime_get();
apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
-   * APIC_BUS_CYCLE_NS * apic->divide_count;
+   * APIC_BUS_CYCLE_NS * old_divisor;
 
if (!apic->lapic_timer.period)
return false;
@@ -1485,6 +1485,12 @@ static bool set_target_expiration(struct kvm_lapic 
*apic, bool timer_update)
if (!delta)
return false;
 
+   if (apic->divide_count != old_divisor) {
+   apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, 
APIC_TMICT)
+   * APIC_BUS_CYCLE_NS * apic->divide_count;
+   delta = delta * apic->divide_count / old_divisor;
+   }
+
apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
nsec_to_cycles(apic->vcpu, delta);
apic->lapic_timer.target_expiration = ktime_add_ns(now, delta);
@@ -1624,12 +1630,13 @@ void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu)
restart_apic_timer(apic);
 }
 
-static void start_apic_timer(struct kvm_lapic *apic, bool timer_update)
+static void start_apic_timer(struct kvm_lapic *apic, bool timer_update,
+   uint32_t old_divisor)
 {
atomic_set(>lapic_timer.pending, 0);
 
if ((apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
-   && !set_target_expiration(apic, timer_update))
+   && !set_target_expiration(apic, timer_update, old_divisor))
return;
 
restart_apic_timer(apic);
@@ -1745,7 +1752,7 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask);
kvm_lapic_set_reg(apic, APIC_LVTT, val);
if (apic_update_lvtt(apic) && !apic_lvtt_tscdeadline(apic))
-   start_apic_timer(apic, true);
+   start_apic_timer(apic, true, apic->divide_count);
break;
 
case APIC_TMICT:
@@ -1754,16 +1761,20 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 
reg, u32 val)
 
hrtimer_cancel(>lapic_timer.timer);
kvm_lapic_set_reg(apic, APIC_TMICT, val);
-   start_apic_timer(apic, false);
+   start_apic_timer(apic, false, apic->divide_count);
break;
 
-   case APIC_TDCR:
+   case APIC_TDCR: {
+   uint32_t current_divisor = apic->divide_count;
+
if (val & 4)
apic_debug("KVM_WRITE:TDCR %x\n", val);
kvm_lapic_set_reg(apic, APIC_TDCR, val);
update_divide_count(apic);
+   hrtimer_cancel(>lapic_timer.timer);
+   start_apic_timer(apic, true, current_divisor);
break;
-
+   }
case APIC_ESR:
if (apic_x2apic_mode(apic) && val != 0) {
apic_debug("KVM_WRITE:ESR not zero %x\n", val);
@@ -1888,7 +1899,7 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, 
u64 data)
 
hrtimer_cancel(>lapic_timer.timer);
apic->lapic_timer.tscdeadline = data;
-   start_apic_timer(apic, false);
+   start_apic_timer(apic, false, apic->divide_count);
 }
 
 void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8)
@@ -2254,7 +2265,7 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct 
kvm_lapic_state *s)

[PATCH v2 4/4] KVM: LAPIC: Don't silently accept bad vectors

2017-09-28 Thread Wanpeng Li
From: Wanpeng Li 

Vectors 0-15 are reserved, and a physical LAPIC - upon sending or
receiving one - would generate an APIC error instead of doing the
requested action. Make our emulation behave similarly.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/lapic.c | 30 --
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6bafd06..a779ba9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -935,6 +935,25 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct 
kvm_lapic_irq *irq,
return ret;
 }
 
+static void apic_error(struct kvm_lapic *apic, unsigned long errmask)
+{
+   uint32_t esr;
+
+   esr = kvm_lapic_get_reg(apic, APIC_ESR);
+
+   if ((esr & errmask) != errmask) {
+   uint32_t lvterr = kvm_lapic_get_reg(apic, APIC_LVTERR);
+
+   kvm_lapic_set_reg(apic, APIC_ESR, esr | errmask);
+   if (!(lvterr & APIC_LVT_MASKED)) {
+   struct kvm_lapic_irq irq;
+
+   irq.vector = lvterr & 0xff;
+   kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, , 
NULL);
+   }
+   }
+}
+
 /*
  * Add a pending IRQ into lapic.
  * Return 1 if successfully added and 0 if discarded.
@@ -946,6 +965,11 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
delivery_mode,
int result = 0;
struct kvm_vcpu *vcpu = apic->vcpu;
 
+   if (unlikely(vector < 16) && delivery_mode == APIC_DM_FIXED) {
+   apic_error(apic, APIC_ESR_RECVILL);
+   return 0;
+   }
+
trace_kvm_apic_accept_irq(vcpu->vcpu_id, delivery_mode,
  trig_mode, vector);
switch (delivery_mode) {
@@ -1146,7 +1170,10 @@ static void apic_send_ipi(struct kvm_lapic *apic)
   irq.trig_mode, irq.level, irq.dest_mode, irq.delivery_mode,
   irq.vector, irq.msi_redir_hint);
 
-   kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, , NULL);
+   if (unlikely(irq.vector < 16 && irq.delivery_mode == APIC_DM_FIXED))
+   apic_error(apic, APIC_ESR_SENDILL);
+   else
+   kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, , NULL);
 }
 
 static u32 apic_get_tmcct(struct kvm_lapic *apic)
@@ -1734,7 +1761,6 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
case APIC_LVTPC:
case APIC_LVT1:
case APIC_LVTERR:
-   /* TODO: Check vector */
if (!kvm_apic_sw_enabled(apic))
val |= APIC_LVT_MASKED;
 
-- 
2.7.4



[PATCH v2 3/4] KVM: LAPIC: Apply change to TDCR right away to the timer

2017-09-28 Thread Wanpeng Li
From: Wanpeng Li 

The description in the Intel SDM of how the divide configuration
register is used: "The APIC timer frequency will be the processor's bus
clock or core crystal clock frequency divided by the value specified in
the divide configuration register."

Observation of baremetal shown that when the TDCR is change, the TMCCT
does not change or make a big jump in value, but the rate at which it
count down change.

The patch update the emulation to APIC timer to so that a change to the
divide configuration would be reflected in the value of the counter and
when the next interrupt is triggered.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/lapic.c | 31 +--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 946c11b..6bafd06 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1432,7 +1432,7 @@ static void start_sw_period(struct kvm_lapic *apic)
HRTIMER_MODE_ABS_PINNED);
 }
 
-static bool set_target_expiration(struct kvm_lapic *apic, bool timer_update)
+static bool set_target_expiration(struct kvm_lapic *apic, bool timer_update, 
uint32_t old_divisor)
 {
ktime_t now, remaining;
u64 tscl = rdtsc(), delta;
@@ -1440,7 +1440,7 @@ static bool set_target_expiration(struct kvm_lapic *apic, 
bool timer_update)
/* Calculate the next time the timer should trigger an interrupt */
now = ktime_get();
apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
-   * APIC_BUS_CYCLE_NS * apic->divide_count;
+   * APIC_BUS_CYCLE_NS * old_divisor;
 
if (!apic->lapic_timer.period)
return false;
@@ -1485,6 +1485,12 @@ static bool set_target_expiration(struct kvm_lapic 
*apic, bool timer_update)
if (!delta)
return false;
 
+   if (apic->divide_count != old_divisor) {
+   apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, 
APIC_TMICT)
+   * APIC_BUS_CYCLE_NS * apic->divide_count;
+   delta = delta * apic->divide_count / old_divisor;
+   }
+
apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
nsec_to_cycles(apic->vcpu, delta);
apic->lapic_timer.target_expiration = ktime_add_ns(now, delta);
@@ -1624,12 +1630,13 @@ void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu)
restart_apic_timer(apic);
 }
 
-static void start_apic_timer(struct kvm_lapic *apic, bool timer_update)
+static void start_apic_timer(struct kvm_lapic *apic, bool timer_update,
+   uint32_t old_divisor)
 {
atomic_set(>lapic_timer.pending, 0);
 
if ((apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
-   && !set_target_expiration(apic, timer_update))
+   && !set_target_expiration(apic, timer_update, old_divisor))
return;
 
restart_apic_timer(apic);
@@ -1745,7 +1752,7 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask);
kvm_lapic_set_reg(apic, APIC_LVTT, val);
if (apic_update_lvtt(apic) && !apic_lvtt_tscdeadline(apic))
-   start_apic_timer(apic, true);
+   start_apic_timer(apic, true, apic->divide_count);
break;
 
case APIC_TMICT:
@@ -1754,16 +1761,20 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 
reg, u32 val)
 
hrtimer_cancel(>lapic_timer.timer);
kvm_lapic_set_reg(apic, APIC_TMICT, val);
-   start_apic_timer(apic, false);
+   start_apic_timer(apic, false, apic->divide_count);
break;
 
-   case APIC_TDCR:
+   case APIC_TDCR: {
+   uint32_t current_divisor = apic->divide_count;
+
if (val & 4)
apic_debug("KVM_WRITE:TDCR %x\n", val);
kvm_lapic_set_reg(apic, APIC_TDCR, val);
update_divide_count(apic);
+   hrtimer_cancel(>lapic_timer.timer);
+   start_apic_timer(apic, true, current_divisor);
break;
-
+   }
case APIC_ESR:
if (apic_x2apic_mode(apic) && val != 0) {
apic_debug("KVM_WRITE:ESR not zero %x\n", val);
@@ -1888,7 +1899,7 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, 
u64 data)
 
hrtimer_cancel(>lapic_timer.timer);
apic->lapic_timer.tscdeadline = data;
-   start_apic_timer(apic, false);
+   start_apic_timer(apic, false, apic->divide_count);
 }
 
 void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8)
@@ -2254,7 +2265,7 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct 
kvm_lapic_state *s)
apic_update_lvtt(apic);
apic_manage_nmi_watchdog(apic, kvm_lapic_get_reg(apic, APIC_LVT0));

[PATCH v2 1/4] KVM: LAPIC: Fix lapic timer mode transition

2017-09-28 Thread Wanpeng Li
From: Wanpeng Li 

SDM 10.5.4.1 TSC-Deadline Mode mentioned that "Transitioning between 
TSC-Deadline
mode and other timer modes also disarms the timer". So the APIC Timer Initial 
Count
Register for one-shot/periodic mode should be reset. This patch do it.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/include/asm/apicdef.h | 1 +
 arch/x86/kvm/lapic.c   | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index c46bb99..d8ef1b4 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -100,6 +100,7 @@
 #defineAPIC_TIMER_BASE_CLKIN   0x0
 #defineAPIC_TIMER_BASE_TMBASE  0x1
 #defineAPIC_TIMER_BASE_DIV 0x2
+#defineAPIC_LVT_TIMER_MASK (3 << 17)
 #defineAPIC_LVT_TIMER_ONESHOT  (0 << 17)
 #defineAPIC_LVT_TIMER_PERIODIC (1 << 17)
 #defineAPIC_LVT_TIMER_TSCDEADLINE  (2 << 17)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 69c5612..a739cbb 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1722,6 +1722,9 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
break;
 
case APIC_LVTT:
+   if (apic_lvtt_tscdeadline(apic) != ((val &
+   APIC_LVT_TIMER_MASK) == APIC_LVT_TIMER_TSCDEADLINE))
+   kvm_lapic_set_reg(apic, APIC_TMICT, 0);
if (!kvm_apic_sw_enabled(apic))
val |= APIC_LVT_MASKED;
val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask);
-- 
2.7.4



[PATCH v2 0/4] KVM: LAPIC: Rework lapic timer to behave more like real-hardware

2017-09-28 Thread Wanpeng Li
The issue is reported in xen community.

Anthony PERARD pointed out:

https://www.mail-archive.com/xen-devel@lists.xen.org/msg117283.html#

 | When developing PVH for OVMF, I've used the lapic timer. It turns out that 
the
 | way it is used by OVMF did not work with Xen [1]. I tried to find out how
 | real-hw behave, and write a XTF tests [2]. And this patch series tries to fix
 | the behavior of the vlapic timer.
 | 
 | 
 | The OVMF driver for the APIC timer initialize the timer like this:
 |  write to TMICT (initial counter)
 |  write to TMDCR (divide configuration)
 |  enable the timer (this may change timer mode from one-shot to periodic)
 | It turns out that TMICT is set to 0 on the last step, but OVMF expect the 
timer
 | to run.
 | 
 | Here is some description of the APIC timer, base on observation as well as 
read
 | of the Intel SDM. The description is also patch of patch description
 | (reworded).
 | 
 | Maybe a way of thinking how the APIC timer is evaluated, is to think of how
 | hardward will do it. There is a counter TMCCT which always keeps counting 
down.
 | 
 | Setting TMICT also set TMCCT, nothing else matter.
 | Setting LVTT does not change anything right away.
 | Setting TMDCR does not change much.
 | 
 | Now TMCCT keeps counting down, by a value related to TMDCR.
 | Once, TMCCT reach 0, it is only at this time that LVTT is taken into account.
 | Is there an interrupt to deliver? Should the timer restart counting from the
 | value in TMICT?
 | 
 | In the Intel SDM, there is the word "disarm" of the timer used. I guess the
 | easier way to disarm the APIC timer (when in periodic or one-shot) is to set
 | TMICT to 0. But if we take TSC-Deadline mode out of the picture, there is
 | nothing in the manual that say that the timer is disarm or stopped when
 | changing timer mode (there is only two modes left, period and one-shot).
 | 
 | As for the TSC-deadline timer mode, observation shown that changing to it (or
 | from it) does reset and disarm both timers, so effectively TMICT and the
 | tscdeadline are set to 0.
 | 
 | [1] 
https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg00959.html
 | [2] v1: 
 | https://lists.xenproject.org/archives/html/xen-devel/2017-03/msg02533.html
 | v2: look for "[XTF PATCH V2 0/3] Testing vlapic timer"

 In addition, Patch 3/4 implements the illegal vector error handling according 
to 
 SDM 10.5.2~10.5.3.

v1 -> v2:
 * add cover-letter and collect recent lapic patches to one patchset

Wanpeng Li (4):
  KVM: LAPIC: Fix lapic timer mode transition
  KVM: LAPIC: Keep timer running when switching between one-shot and periodic 
mode
  KVM: LAPIC: Apply change to TDCR right away to the timer
  KVM: LAPIC: Don't silently accept bad vectors

 arch/x86/include/asm/apicdef.h |  1 +
 arch/x86/kvm/lapic.c   | 90 ++
 2 files changed, 74 insertions(+), 17 deletions(-)

-- 
2.7.4



  1   2   3   4   5   6   7   8   9   10   >