Re: [Part2 PATCH v4 01/29] Documentation/virtual/kvm: Add AMD Secure Encrypted Virtualization (SEV)

2017-09-28 Thread Borislav Petkov
On Tue, Sep 19, 2017 at 03:45:59PM -0500, Brijesh Singh wrote:
> Create a Documentation entry to describe the AMD Secure Encrypted
> Virtualization (SEV) feature.
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Paolo Bonzini 
> Cc: "Radim Krčmář" 
> Cc: Jonathan Corbet 
> Cc: Borislav Petkov 
> Cc: Tom Lendacky 
> Cc: k...@vger.kernel.org
> Cc: x...@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh 
> ---
>  Documentation/virtual/kvm/00-INDEX |   3 +
>  .../virtual/kvm/amd-memory-encryption.txt  | 210 
> +
>  2 files changed, 213 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/amd-memory-encryption.txt

Here's a diff which applies ontop of this one, it moves the KVM_SEV_*
commands to Documentation/virtual/kvm/api.txt where they're all together
in one place for obvious advantages.

Also I did some small cleanups while at it.

Notable is that the commands are a smaller number now and
KVM_SEV_RECEIVE_UPDATE_DATA, KVM_SEV_RECEIVE_START and a couple more are
missing.

On purpose?

---
diff --git a/Documentation/virtual/kvm/amd-memory-encryption.txt 
b/Documentation/virtual/kvm/amd-memory-encryption.txt
index 5b38b4feb9b7..2f17400f5720 100644
--- a/Documentation/virtual/kvm/amd-memory-encryption.txt
+++ b/Documentation/virtual/kvm/amd-memory-encryption.txt
@@ -1,16 +1,18 @@
 Secure Encrypted Virtualization (SEV) is a feature found on AMD processors.
 
-SEV is an extension to the AMD-V architecture which supports running virtual
-machines (VMs) under the control of a hypervisor. When enabled, the memory
-contents of VM will be transparently encrypted with a key unique to the VM.
+SEV is an extension to the AMD-V architecture which supports running
+virtual machines (VMs) under the control of a hypervisor. When enabled,
+the memory contents of a VM will be transparently encrypted with a key
+unique to that VM.
 
-Hypervisor can determine the SEV support through the CPUID instruction. The 
CPUID
-function 0x801f reports information related to SEV:
+The hypervisor can determine the SEV support through the CPUID
+instruction. The CPUID function 0x801f reports information related
+to SEV:
 
0x801f[eax]:
Bit[1]  indicates support for SEV
-
-   0x801f[ecx]:
+   ...
+ [ecx]:
Bits[31:0]  Number of encrypted guests supported 
simultaneously
 
 If support for SEV is present, MSR 0xc001_0010 (MSR_K8_SYSCFG) and MSR 
0xc001_0015
@@ -24,8 +26,8 @@ If support for SEV is present, MSR 0xc001_0010 
(MSR_K8_SYSCFG) and MSR 0xc001_00
Bit[0] 1 = memory encryption can be enabled
   0 = memory encryption can not be enabled
 
-When SEV support is available, it can be enabled in a specific VM by setting 
SEV
-bit before executing VMRUN.
+When SEV support is available, it can be enabled in a specific VM by
+setting the SEV bit before executing VMRUN.
 
VMCB[0x90]:
Bit[1]  1 = SEV is enabled
@@ -45,182 +47,6 @@ more information, see SEV Key Management spec at
 
 http://support.amd.com/TechDocs/55766_SEV-KM%20API_Specification.pdf
 
-KVM implements the following commands to support SEV guests common lifecycle
-events such as launching, running, snapshotting, migrating and decommissioning
-guests.
-
-1. KVM_SEV_INIT
-
-Returns: 0 on success, -negative on error
-
-The KVM_SEV_INIT command is used by the hypervisor to initialize the SEV 
platform
-context. In a typical workflow, this command should be the first command 
issued.
-
-2. KVM_SEV_LAUNCH_START
-
-Parameters: struct  kvm_sev_launch_start (in/out)
-Returns: 0 on success, -negative on error
-
-The KVM_SEV_LAUNCH_START command is used for creating the memory encryption
-context. To create the encryption context, user must provide a guest policy,
-the owner's public Diffie-Hellman (PDH) key and session information.
-
-struct kvm_sev_launch_start {
-   /* if zero then firmware creates a new handle */
-   __u32 handle;
-
-   /* guest's policy */
-   __u32 policy;
-
-   /* userspace address pointing to the guest owner's PDH key */
-   __u64 dh_uaddr;
-   __u32 dh_len;
-
-   /* userspace address which points to the guest session information */
-   __u64 session_addr;
-   __u32 session_len;
-};
-
-On success, the 'handle' field contain a new handle and on error, a negative 
value.
-
-For more details, see SEV spec Section 6.2.
-
-3. KVM_SEV_LAUNCH_UPDATE_DATA
-
-Parameters (in): struct  kvm_sev_launch_update_data
-Returns: 0 on success, -negative on error
-
-The KVM_SEV_LAUNCH_UPDATE_DATA is used for encrypting the memory region. It 
also
-calculates a measurement of the memory contents. The measurement is a signature
-of the memory contents that can be sent to the guest owners as an attestation
-that the memory was encrypted correctly by the firmware.
-
-struct kvm_sev_laun

Re: [virtio-dev] Re: [PATCH v15 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG

2017-09-28 Thread Wei Wang

On 09/29/2017 12:01 PM, Michael S. Tsirkin wrote:

On Fri, Sep 08, 2017 at 07:09:24PM +0800, Wei Wang wrote:

On 09/08/2017 11:36 AM, Michael S. Tsirkin wrote:

On Tue, Aug 29, 2017 at 11:09:18AM +0800, Wei Wang wrote:

On 08/29/2017 02:03 AM, Michael S. Tsirkin wrote:

On Mon, Aug 28, 2017 at 06:08:31PM +0800, Wei Wang wrote:

Add a new feature, VIRTIO_BALLOON_F_SG, which enables the transfer
of balloon (i.e. inflated/deflated) pages using scatter-gather lists
to the host.

The implementation of the previous virtio-balloon is not very
efficient, because the balloon pages are transferred to the
host one by one. Here is the breakdown of the time in percentage
spent on each step of the balloon inflating process (inflating
7GB of an 8GB idle guest).

1) allocating pages (6.5%)
2) sending PFNs to host (68.3%)
3) address translation (6.1%)
4) madvise (19%)

It takes about 4126ms for the inflating process to complete.
The above profiling shows that the bottlenecks are stage 2)
and stage 4).

This patch optimizes step 2) by transferring pages to the host in
sgs. An sg describes a chunk of guest physically continuous pages.
With this mechanism, step 4) can also be optimized by doing address
translation and madvise() in chunks rather than page by page.

With this new feature, the above ballooning process takes ~597ms
resulting in an improvement of ~86%.

TODO: optimize stage 1) by allocating/freeing a chunk of pages
instead of a single page each time.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Suggested-by: Michael S. Tsirkin 
---
drivers/virtio/virtio_balloon.c | 171 

include/uapi/linux/virtio_balloon.h |   1 +
2 files changed, 155 insertions(+), 17 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index f0b3a0b..8ecc1d4 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -32,6 +32,8 @@
#include 
#include 
#include 
+#include 
+#include 
/*
 * Balloon device works in 4K page units.  So each page is pointed to by
@@ -79,6 +81,9 @@ struct virtio_balloon {
/* Synchronize access/update to this struct virtio_balloon elements */
struct mutex balloon_lock;
+   /* The xbitmap used to record balloon pages */
+   struct xb page_xb;
+
/* The array of pfns we tell the Host about. */
unsigned int num_pfns;
__virtio32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];
@@ -141,13 +146,111 @@ static void set_page_pfns(struct virtio_balloon *vb,
  page_to_balloon_pfn(page) + i);
}
+static int add_one_sg(struct virtqueue *vq, void *addr, uint32_t size)
+{
+   struct scatterlist sg;
+
+   sg_init_one(&sg, addr, size);
+   return virtqueue_add_inbuf(vq, &sg, 1, vq, GFP_KERNEL);
+}
+
+static void send_balloon_page_sg(struct virtio_balloon *vb,
+struct virtqueue *vq,
+void *addr,
+uint32_t size,
+bool batch)
+{
+   unsigned int len;
+   int err;
+
+   err = add_one_sg(vq, addr, size);
+   /* Sanity check: this can't really happen */
+   WARN_ON(err);

It might be cleaner to detect that add failed due to
ring full and kick then. Just an idea, up to you
whether to do it.


+
+   /* If batching is in use, we batch the sgs till the vq is full. */
+   if (!batch || !vq->num_free) {
+   virtqueue_kick(vq);
+   wait_event(vb->acked, virtqueue_get_buf(vq, &len));
+   /* Release all the entries if there are */

Meaning
Account for all used entries if any
?


+   while (virtqueue_get_buf(vq, &len))
+   ;

Above code is reused below. Add a function?


+   }
+}
+
+/*
+ * Send balloon pages in sgs to host. The balloon pages are recorded in the
+ * page xbitmap. Each bit in the bitmap corresponds to a page of PAGE_SIZE.
+ * The page xbitmap is searched for continuous "1" bits, which correspond
+ * to continuous pages, to chunk into sgs.
+ *
+ * @page_xb_start and @page_xb_end form the range of bits in the xbitmap that
+ * need to be searched.
+ */
+static void tell_host_sgs(struct virtio_balloon *vb,
+ struct virtqueue *vq,
+ unsigned long page_xb_start,
+ unsigned long page_xb_end)
+{
+   unsigned long sg_pfn_start, sg_pfn_end;
+   void *sg_addr;
+   uint32_t sg_len, sg_max_len = round_down(UINT_MAX, PAGE_SIZE);
+
+   sg_pfn_start = page_xb_start;
+   while (sg_pfn_start < page_xb_end) {
+   sg_pfn_start = xb_find_next_bit(&vb->page_xb, sg_pfn_start,
+   page_xb_end, 1);
+   if (sg_pfn_start == page_xb_end + 1)
+   break;
+   sg_pfn_end = xb_find_next_bit(&vb->page_xb, sg_pfn_start + 

[PATCH] powernv: Add OCC driver to mmap sensor area

2017-09-28 Thread Shilpasri G Bhat
This driver provides interface to mmap the OCC sensor area
to userspace to parse and read OCC inband sensors.

Signed-off-by: Shilpasri G Bhat 
---
- The skiboot patch for this is posted here:
https://lists.ozlabs.org/pipermail/skiboot/2017-September/009209.html

 arch/powerpc/platforms/powernv/Makefile   |  2 +-
 arch/powerpc/platforms/powernv/opal-occ.c | 88 +++
 arch/powerpc/platforms/powernv/opal.c |  3 ++
 3 files changed, 92 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/platforms/powernv/opal-occ.c

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 37d60f7..7911295 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -2,7 +2,7 @@ obj-y   += setup.o opal-wrappers.o opal.o 
opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
-obj-y  += opal-kmsg.o opal-powercap.o opal-psr.o 
opal-sensor-groups.o
+obj-y  += opal-kmsg.o opal-powercap.o opal-psr.o 
opal-sensor-groups.o opal-occ.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o
diff --git a/arch/powerpc/platforms/powernv/opal-occ.c 
b/arch/powerpc/platforms/powernv/opal-occ.c
new file mode 100644
index 000..5ca3a41
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-occ.c
@@ -0,0 +1,88 @@
+/*
+ * Copyright IBM Corporation 2017
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#define pr_fmt(fmt) "opal-occ: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct miscdevice occ;
+static u64 sensor_base, sensor_size;
+
+static int opal_occ_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   if (vma->vm_flags & VM_WRITE)
+   return -EINVAL;
+
+   return vm_iomap_memory(vma, sensor_base, sensor_size);
+}
+
+static const struct file_operations opal_occ_fops = {
+   .mmap   = opal_occ_mmap,
+   .owner  = THIS_MODULE,
+};
+
+static int opal_occ_probe(struct platform_device *pdev)
+{
+   u64 reg[2];
+   int rc;
+
+   if (!pdev || !pdev->dev.of_node)
+   return -ENODEV;
+
+   if (of_property_read_u64_array(pdev->dev.of_node, "occ-sensors",
+  ®[0], 2)) {
+   pr_warn("occ-sensors property not found\n");
+   return -ENODEV;
+   }
+
+   sensor_base = reg[0];
+   sensor_size = reg[1];
+   occ.minor = MISC_DYNAMIC_MINOR;
+   occ.name = "occ";
+   occ.fops = &opal_occ_fops;
+   rc = misc_register(&occ);
+   if (rc)
+   pr_warn("Failed to register OCC device\n");
+
+   return rc;
+}
+
+static int opal_occ_remove(struct platform_device *pdev)
+{
+   misc_deregister(&occ);
+   return 0;
+}
+
+static const struct of_device_id opal_occ_match[] = {
+   { .compatible = "ibm,opal-occ-inband-sensors" },
+   { },
+};
+
+static struct platform_driver opal_occ_driver = {
+   .driver = {
+   .name   = "opal_occ",
+   .of_match_table = opal_occ_match,
+},
+   .probe  = opal_occ_probe,
+   .remove = opal_occ_remove,
+};
+
+module_platform_driver(opal_occ_driver);
+
+MODULE_DESCRIPTION("PowerNV OPAL-OCC driver");
+MODULE_LICENSE("GPL");
diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 65c79ec..a4f977f 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -889,6 +889,9 @@ static int __init opal_init(void)
/* Initialise OPAL sensor groups */
opal_sensor_groups_init();
 
+   /* Initialise OCC driver */
+   opal_pdev_init("ibm,opal-occ-inband-sensors");
+
return 0;
 }
 machine_subsys_initcall(powernv, opal_init);
-- 
1.8.3.1



Re: [PATCH -tip v3 6/7] kprobes/x86: Remove disable_irq from ftrace-based/optimized kprobe

2017-09-28 Thread Masami Hiramatsu
On Thu, 28 Sep 2017 09:25:41 +0200
Ingo Molnar  wrote:

> 
> * Masami Hiramatsu  wrote:
> 
> > Actually kprobes doesn't need to disable irq if it is
> > called from ftrace/jump trampoline code because
> > Documentation/kprobes.txt says
> > 
> > -
> > Probe handlers are run with preemption disabled.  Depending on the
> > architecture and optimization state, handlers may also run with
> > interrupts disabled (e.g., kretprobe handlers and optimized kprobe
> > handlers run without interrupt disabled on x86/x86-64).
> > -
> > 
> > So let's remove irq disabling from those handlers.
> 
> > -   local_irq_save(flags);
> 
> The title is talking about disable_irq():
> 
>   kprobes/x86: Remove disable_irq from ftrace-based/optimized kprobe
> 
> ... but the patch is actually using local_irq_save(), which is an entirely 
> different thing! You probably wanted to say:
> 
>   kprobes/x86: Remove irq disabling from ftrace-based/optimized kprobes

Correct! That's my mistake. thanks!

> 
> Also note the plural of 'kprobes' when we refer to them as a generic thing.
> 
> I fixed the title, but _please_ read changelogs more carefully before sending 
> them.

Thank you again,

> 
> Thanks,
> 
>   Ingo


-- 
Masami Hiramatsu 


Re: [PATCH 1/1] mm: only dispaly online cpus of the numa node

2017-09-28 Thread Leizhen (ThunderTown)


On 2017/8/28 21:13, Michal Hocko wrote:
> On Fri 25-08-17 18:34:33, Will Deacon wrote:
>> On Thu, Aug 24, 2017 at 10:32:26AM +0200, Michal Hocko wrote:
>>> It seems this has slipped through cracks. Let's CC arm64 guys
>>>
>>> On Tue 20-06-17 20:43:28, Zhen Lei wrote:
 When I executed numactl -H(which read /sys/devices/system/node/nodeX/cpumap
 and display cpumask_of_node for each node), but I got different result on
 X86 and arm64. For each numa node, the former only displayed online CPUs,
 and the latter displayed all possible CPUs. Unfortunately, both Linux
 documentation and numactl manual have not described it clear.

 I sent a mail to ask for help, and Michal Hocko  replied
 that he preferred to print online cpus because it doesn't really make much
 sense to bind anything on offline nodes.
>>>
>>> Yes printing offline CPUs is just confusing and more so when the
>>> behavior is not consistent over architectures. I believe that x86
>>> behavior is the more appropriate one because it is more logical to dump
>>> the NUMA topology and use it for affinity setting than adding one
>>> additional step to check the cpu state to achieve the same.
>>>
>>> It is true that the online/offline state might change at any time so the
>>> above might be tricky on its own but if we should at least make the
>>> behavior consistent.
>>>
 Signed-off-by: Zhen Lei 
>>>
>>> Acked-by: Michal Hocko 
>>
>> The concept looks find to me, but shouldn't we use cpumask_var_t and
>> alloc/free_cpumask_var?
> 
> This will be safer but both callers of node_read_cpumap are shallow
> stack so I am not sure a stack is a limiting factor here.
> 
> Zhen Lei, would you care to update that part please?
> 
Sure, I will send v2 immediately.

I'm so sorry that missed this email until someone told me.

-- 
Thanks!
BestRegards



Re: [Part1 PATCH v5 00/17] x86: Secure Encrypted Virtualization (AMD)

2017-09-28 Thread Borislav Petkov
On Wed, Sep 27, 2017 at 10:13:12AM -0500, Brijesh Singh wrote:
> This part of Secure Encrypted Virtualization (SEV) series focuses on the
> changes required in a guest OS for SEV support.

...

> This series is based on tip/master commit : a35205980288 (Merge branch 
> 'WIP.x86/fpu').
> 
> Complete git tree is available: 
> https://github.com/codomania/tip/tree/sev-v5-p1

Ok, so far so good, that part boots:

[0.00] AMD Secure Memory Encryption (SME) active

:-)

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [PATCH v4 2/2] memory: ti-emif-sram: introduce relocatable suspend/resume handlers

2017-09-28 Thread kbuild test robot
Hi Dave,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.14-rc2]
[cannot apply to next-20170928]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Dave-Gerlach/Documentation-dt-Update-ti-emif-bindings/20170929-111250
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All errors (new ones prefixed by >>):

>> drivers/memory/ti-emif-sram-pm.S:22:30: fatal error: emif-asm-offsets.h: No 
>> such file or directory
#include "emif-asm-offsets.h"
 ^
   compilation terminated.

vim +22 drivers/memory/ti-emif-sram-pm.S

20  
21  #include "emif.h"
  > 22  #include "emif-asm-offsets.h"
23  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[linux-next][DLPAR] kernel BUG at arch/powerpc/lib/locks.c:34!

2017-09-28 Thread Abdul Haleem
Hi,

Memory hot-unplug operation on linux-next kernel (4K pagesize) results
in BUG_ON() at arch/powerpc/lib/locks.c

/*
 * Waiting for a read lock or a write lock on a rwlock...
 * This turns out to be the same for read and write locks, since
 * we only know the holder if it is write-locked.
 */
void __rw_yield(arch_rwlock_t *rw)
{
int lock_value;
unsigned int holder_cpu, yield_count;

lock_value = rw->lock;
if (lock_value >= 0)
return; /* no write lock at present */
holder_cpu = lock_value & 0x;
>>  BUG_ON(holder_cpu >= NR_CPUS);
yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
if ((yield_count & 1) == 0)
return; /* virtual cpu is currently running */
rmb();


Machine Type: Power 8 PowerVM LPAR
kernel : 4.14.0-rc2-next-20170928
gcc: version 6.3.1
Test : DLPAR Memory
config:
CONFIG_PPC_4K_PAGES=y
# CONFIG_PPC_64K_PAGES is not set


logs:

Offlined Pages 65536
Offlined Pages 65536
Offlined Pages 65536
Offlined Pages 65536
[ cut here ]
kernel BUG at arch/powerpc/lib/locks.c:34!
Oops: Exception in kernel mode, sig: 5 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
Dumping ftrace buffer: 
   (ftrace buffer empty)
Modules linked in: rpadlpar_io rpaphp bridge stp llc xt_tcpudp ipt_REJECT 
nf_reject_ipv4 xt_conntrack nfnetlink iptable_mangle iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter 
vmx_crypto pseries_rng rng_core binfmt_misc nfsd ip_tables x_tables autofs4
CPU: 0 PID: 12030 Comm: touch Not tainted 4.14.0-rc2-next-20170928-autotest #1
task: c00271aecc00 task.stack: c0026c24c000
NIP:  c16a50d0 LR: c17ff2c4 CTR: c1af4270
REGS: c0026c24f860 TRAP: 0700   Not tainted  
(4.14.0-rc2-next-20170928-autotest)
MSR:  80029033   CR: 42008884  XER:   
CFAR: c17ff2c0 SOFTE: 1 
GPR00: c17ff2c4 c0026c24fae0 c3572500 c0026b7f37f0 
GPR04: 0002 c00270179b10 c3622500 00103265 
GPR08: 0001 a1e0 0323a1e0 c00270060420 
GPR12: 82008288 cfdc   
GPR16:     
GPR20:    0002 
GPR24: c2b252f0 c0026b7f37f0 fffd c00271aecc00 
GPR28: c00270008000 c0026b7f37e8 c0026b7f37f0 c361ff50 
NIP [c16a50d0] __spin_yield+0x60/0x130
LR [c17ff2c4] do_raw_spin_lock+0x2d4/0x2e0
Call Trace:
[c0026c24fae0] [c0026c24fb30] 0xc0026c24fb30 (unreliable)
[c0026c24fb50] [c17ff2c4] do_raw_spin_lock+0x2d4/0x2e0
[c0026c24fb80] [c27ca540] _raw_spin_lock+0x40/0x70
[c0026c24fba0] [c27bfbf0] __mutex_lock.isra.0+0x1a0/0x11f0
[c0026c24fca0] [c27c0f24] __mutex_lock_slowpath+0x44/0x70
[c0026c24fcc0] [c27c0ff4] mutex_lock+0xa4/0xd0
[c0026c24fce0] [c1af42b8] pipe_release+0x48/0x1e0
[c0026c24fd20] [c1ae0efc] __fput+0x12c/0x4f0
[c0026c24fd80] [c1ae12ec] fput+0x2c/0x50
[c0026c24fda0] [c178eb3c] task_work_run+0x17c/0x200
[c0026c24fe00] [c160adb8] do_notify_resume+0x1f8/0x220
[c0026c24fe30] [c15ebec4] ret_from_except_lite+0x70/0x74
Instruction dump:
2faa 39290001 f926da50 419e0078 3ce2000b e8e7da60 5549043e 3cc2000b 
210907ff 79080fe0 38e70001 f8e6da60 <0b08> 3ce20007 38e7ea78 1d290480 
---[ end trace 1343a8353f7a1a73 ]---

Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer: 
   (ftrace buffer empty)
Rebooting in 10 seconds..


Test script to recreate :
https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/memhotplug.py

$ avocado run memhotplug.py --show-job-log

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre


#
# Automatically generated file; DO NOT EDIT.
# Linux/powerpc 4.11.0-rc7 Kernel Configuration
#
CONFIG_PPC64=y

#
# Processor support
#
CONFIG_PPC_BOOK3S_64=y
# CONFIG_PPC_BOOK3E_64 is not set
# CONFIG_POWER7_CPU is not set
CONFIG_POWER8_CPU=y
CONFIG_PPC_BOOK3S=y
CONFIG_PPC_FPU=y
CONFIG_ALTIVEC=y
CONFIG_VSX=y
# CONFIG_PPC_ICSWX is not set
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_64=y
CONFIG_PPC_RADIX_MMU=y
CONFIG_PPC_MM_SLICES=y
CONFIG_PPC_HAVE_PMU_SUPPORT=y
CONFIG_PPC_PERF_CTRS=y
CONFIG_SMP=y
CONFIG_NR_CPUS=2048
CONFIG_PPC_DOORBELL=y
# CONFIG_CPU_BIG_ENDIAN is not set
CONFIG_CPU_LITTLE_ENDIAN=y
CONFIG_PPC64_BOOT_WRAPPER=y
CONFIG_64BIT=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_MMU=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NR_IRQS=512
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_ILOG2_U32=y
CONFIG_ARCH_HAS_ILOG2_U64=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_HAS_DM

Re: [PATCH v5 1/2] dax: introduce CONFIG_DAX_DRIVER

2017-09-28 Thread Michael Ellerman
Dan Williams  writes:

> In support of allowing device-mapper to compile out idle/dead code when
> there are no dax providers in the system, introduce the DAX_DRIVER
> symbol. This is selected by all leaf drivers that device-mapper might be
> layered on top. This allows device-mapper to conditionally 'select DAX'
> only when a provider is present.
>
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Martin Schwidefsky 
> Cc: Heiko Carstens 
> Cc: Gerald Schaefer 
> Cc: Benjamin Herrenschmidt 
> Cc: Mike Snitzer 
> Cc: Bart Van Assche 
> Signed-off-by: Dan Williams 
> ---
>  arch/powerpc/platforms/Kconfig |1 +
>  drivers/block/Kconfig  |1 +
>  drivers/dax/Kconfig|4 +++-
>  drivers/nvdimm/Kconfig |1 +
>  drivers/s390/block/Kconfig |1 +
>  5 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
> index 4fd64d3f5c44..4561340c1f92 100644
> --- a/arch/powerpc/platforms/Kconfig
> +++ b/arch/powerpc/platforms/Kconfig
> @@ -296,6 +296,7 @@ config AXON_RAM
>   tristate "Axon DDR2 memory device driver"
>   depends on PPC_IBM_CELL_BLADE && BLOCK
>   select DAX
> + select DAX_DRIVER


I would have thought you'd want to replace the select of DAX with
a select of DAX_DRIVER?

With the driver selecting both there's no need for the core to select
DAX, because all the DAX drivers have done it already.

cheers


Re: [PATCH for-next 3/9] RDMA/hns: Add return statement when kzalloc return NULL in hns_roce_v1_recreate_lp_qp

2017-09-28 Thread Wei Hu (Xavier)



On 2017/9/28 20:59, Leon Romanovsky wrote:

On Thu, Sep 28, 2017 at 07:56:59PM +0800, Wei Hu (Xavier) wrote:


On 2017/9/28 17:13, Leon Romanovsky wrote:

On Thu, Sep 28, 2017 at 12:57:28PM +0800, Wei Hu (Xavier) wrote:

From: Lijun Ou 

When lp_qp_work is NULL, it should be returned ENOMEM. This patch
mainly fixes it.

Ihis patch fixes the smatch error as below:
drivers/infiniband/hw/hns/hns_roce_hw_v1.c:918 hns_roce_v1_recreate_lp_qp()
error: potential null dereference 'lp_qp_work'.  (kzalloc returns null)

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu (Xavier) 
Signed-off-by: Shaobo Xu 
---
   drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 2 ++
   1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c 
b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index 95f5c88..1071fa2 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -912,6 +912,8 @@ static int hns_roce_v1_recreate_lp_qp(struct hns_roce_dev 
*hr_dev)

lp_qp_work = kzalloc(sizeof(struct hns_roce_recreate_lp_qp_work),
 GFP_KERNEL);
+   if (!lp_qp_work)
+   return -ENOMEM;


You will treat this error in the same was as you will treat timeout,
which is wrong.

Thanks,  Leon
We will send v2 to fix the compatible warn info.

No, you missed the point.
 From the code flow below the behavior of hns_roce_v1_recreate_lp_qp
for ENOMEM and ETIMEOUT returns will be the same and it is wrong.

For the ETIMEOUT, you can continue, for ENOMEM, you should properly
unfold the whole flow.

Thanks


Hi, Leon
We prepare to modify the warn info as bleow:

if (hr_dev->hw->dereg_mr && hns_roce_v1_recreate_lp_qp(hr_dev))
dev_warn(&hr_dev->pdev->dev, "recreate lp qp failed!\n");

for -ETIMEDOUT,  there is a warn info as blow, but there isn't this 
one for -ENOMEM.
dev_warn(dev, "recreate lp qp failed 20s timeout and return 
failed!\n");


static int hns_roce_v1_recreate_lp_qp(struct hns_roce_dev *hr_dev)
{

lp_qp_work = kzalloc(sizeof(struct 
hns_roce_recreate_lp_qp_work),

 GFP_KERNEL);
if (!lp_qp_work)
return -ENOMEM;


dev_warn(dev, "recreate lp qp failed 20s timeout and return 
failed!\n");

return -ETIMEDOUT;
}

Regards
Wei Hu

1656  */
1657 if (hr_dev->hw->dereg_mr && hns_roce_v1_recreate_lp_qp(hr_dev))
1658 dev_warn(&hr_dev->pdev->dev, "recreate lp qp timeout!\n");
1659
1660 p = (u32 *)(&addr[0]);



INIT_WORK(&(lp_qp_work->work), hns_roce_v1_recreate_lp_qp_work_fn);

--
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





Re: [PATCH v8 12/28] x86/insn-eval: Add utility functions to get segment selector

2017-09-28 Thread Ricardo Neri
On Thu, 2017-09-28 at 11:36 +0200, Borislav Petkov wrote:
> On Wed, Sep 27, 2017 at 03:32:26PM -0700, Ricardo Neri wrote:
> > 
> > The idea is that get_overridden_seg_reg() would implement the logic you
> > just described. It would return return INAT_SEG_REG_DEFAULT/IGNORE when
> > segment override prefixes are not allowed (i.e., valid insn with
> > operand rDI and string instruction; and rIP) or needed (i.e., long
> > mode, except if there are override prefixes for FS or GS); or
> > INAT_SEG_REG_[CSDEFG]S otherwise.
> Ok, lemme see if we're talking the same thing. Your diff is linewrapped
> so parsing that is hard.
> 
> Do this
> 
> if (regoff == offsetof(struct pt_regs, ip)) {
> if (user_64bit_mode(regs))
> return INAT_SEG_REG_IGNORE;
> else
> return INAT_SEG_REG_DEFAULT;
> }
> 
> and all the other checking *before* you do insn_init(). Because you have
> crazy stuff like:
> 
> if (seg_reg == INAT_SEG_REG_IGNORE)
> return seg_reg;
> 
> which shortcuts those functions and is simply clumsy and complicates
> following the code. The mere fact that you have to call the function
> "get_overridden_seg_reg_if_any_or_needed()" already tells you that that
> function is doing too many things at once.
> 
> When the function is called get_segment_register() then it should do
> only that. And all the checking is done before or in wrappers.

Yes, I realized this while I was typing.
> 
> IOW, all the rIP checking and early return down the
> insn_get_seg_base() -> resolve_seg_register() -> .. should be done
> separately.

Agreed now.
> 
> *Then* you do insn_init() and hand it down to insn_get_seg_base() and
> from now on you have a proper insn pointer which you hand around and
> check for NULL only once, on function entry.

I agree. In fact, insn_get_seg_base() does not need insn at all. All it needs is
a INAT_SEG_REG_* index. This would make things clear. UMIP (and callers that
need to copy_from_user code can do insn_get_seg_base(regs, INAT_SEG_REG_CS). No
insn needed.

In fact, it is only the insn_get_addr_ref_xx() family of functions that does
need to inspect insn (which will be populated and valided) to determine the what
registers are used as operands... and determine the applicable segment register.

However, insn_get_addr_ref_xx() functions call insn_get_seg_base() several times
each. Each time they would need to do:

if (can_use_seg_override_prefixes(insn, regoff))
    idx = get_overriden_seg_reg(insn, regs)
else
    idx = get_default_seg_reg()

The pseudocode above looks like a resolve_reg_idx() to me.

Then insn_get_addr_ref_xx() can call insn_get_seg_base(idx).

> 
> Then your code flow is much simpler: first you take care of the case
> where rIP doesn't do segment overrides and all the other cases are
> handled by the normal path, with a proper struct insn.

Do you think the pseudocode above addresses your concerns?

*insn_get_seg_base() will take a INAT_SEG_REG_* index
*insn_get_ref_xx() receives an initialized insn that can check for NULL value.
*a reworked resolve_seg_reg_idx will clearly check if it can use segment
override prefixes and obtain them. If not, it will use default values.

Thanks and BR,
Ricardo


[PATCH 3/5] f2fs: fix to show ino management cache size correctly

2017-09-28 Thread Chao Yu
It needs to stat size of ino management cache with all type instead of
orphan ino type.

Fixes: 652be55162dc ("f2fs: show # of orphan inodes")
Signed-off-by: Chao Yu 
---
 fs/f2fs/debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index 14095fbb4039..d441660c3ba6 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -240,7 +240,7 @@ static void update_mem_info(struct f2fs_sb_info *sbi)
si->cache_mem += NM_I(sbi)->dirty_nat_cnt *
sizeof(struct nat_entry_set);
si->cache_mem += si->inmem_pages * sizeof(struct inmem_pages);
-   for (i = 0; i <= ORPHAN_INO; i++)
+   for (i = 0; i < MAX_INO_ENTRY; i++)
si->cache_mem += sbi->im[i].ino_num * sizeof(struct ino_entry);
si->cache_mem += atomic_read(&sbi->total_ext_tree) *
sizeof(struct extent_tree);
-- 
2.13.1.388.g69e6b9b4f4a9



[PATCH 2/5] f2fs: drop FI_UPDATE_WRITE tag after f2fs_issue_flush

2017-09-28 Thread Chao Yu
If we failed to issue flush in ->fsync, we need to keep FI_UPDATE_WRITE
flag to make sure triggering flush in next ->fsync.

Signed-off-by: Chao Yu 
---
 fs/f2fs/file.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 224379a9848c..18ca8b305699 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -294,10 +294,12 @@ static int f2fs_do_sync_file(struct file *file, loff_t 
start, loff_t end,
remove_ino_entry(sbi, ino, APPEND_INO);
clear_inode_flag(inode, FI_APPEND_WRITE);
 flush_out:
-   remove_ino_entry(sbi, ino, UPDATE_INO);
-   clear_inode_flag(inode, FI_UPDATE_WRITE);
if (!atomic)
ret = f2fs_issue_flush(sbi);
+   if (!ret) {
+   remove_ino_entry(sbi, ino, UPDATE_INO);
+   clear_inode_flag(inode, FI_UPDATE_WRITE);
+   }
f2fs_update_time(sbi, REQ_TIME);
 out:
trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret);
-- 
2.13.1.388.g69e6b9b4f4a9



[PATCH 4/5] f2fs: enhance multiple device flush

2017-09-28 Thread Chao Yu
When multiple device feature is enabled, during ->fsync we will issue
flush in all devices to make sure node/data of the file being persisted
into storage. But some flushes of device could be unneeded as file's
data may be not writebacked into those devices. So this patch adds and
manage bitmap per inode in global cache to indicate which device is
dirty and it needs to issue flush during ->fsync, hence, we could improve
performance of fsync in scenario of multiple device.

Signed-off-by: Chao Yu 
---
 fs/f2fs/checkpoint.c | 36 +++-
 fs/f2fs/data.c   |  1 +
 fs/f2fs/f2fs.h   | 14 +++---
 fs/f2fs/file.c   |  3 ++-
 fs/f2fs/gc.c |  2 ++
 fs/f2fs/inline.c |  1 +
 fs/f2fs/inode.c  |  1 +
 fs/f2fs/node.c   |  3 ++-
 fs/f2fs/segment.c| 46 +++---
 9 files changed, 86 insertions(+), 21 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 04fe1df052b2..571980793542 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -401,7 +401,8 @@ const struct address_space_operations f2fs_meta_aops = {
 #endif
 };
 
-static void __add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
+static void __add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino,
+   unsigned int devidx, int type)
 {
struct inode_management *im = &sbi->im[type];
struct ino_entry *e, *tmp;
@@ -426,6 +427,10 @@ static void __add_ino_entry(struct f2fs_sb_info *sbi, 
nid_t ino, int type)
if (type != ORPHAN_INO)
im->ino_num++;
}
+
+   if (type == FLUSH_INO)
+   f2fs_set_bit(devidx, (char *)&e->dirty_device);
+
spin_unlock(&im->ino_lock);
radix_tree_preload_end();
 
@@ -454,7 +459,7 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, 
nid_t ino, int type)
 void add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
 {
/* add new dirty ino entry into list */
-   __add_ino_entry(sbi, ino, type);
+   __add_ino_entry(sbi, ino, 0, type);
 }
 
 void remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
@@ -480,7 +485,7 @@ void release_ino_entry(struct f2fs_sb_info *sbi, bool all)
struct ino_entry *e, *tmp;
int i;
 
-   for (i = all ? ORPHAN_INO: APPEND_INO; i <= UPDATE_INO; i++) {
+   for (i = all ? ORPHAN_INO : APPEND_INO; i < MAX_INO_ENTRY; i++) {
struct inode_management *im = &sbi->im[i];
 
spin_lock(&im->ino_lock);
@@ -494,6 +499,27 @@ void release_ino_entry(struct f2fs_sb_info *sbi, bool all)
}
 }
 
+void set_dirty_device(struct f2fs_sb_info *sbi, nid_t ino,
+   unsigned int devidx, int type)
+{
+   __add_ino_entry(sbi, ino, devidx, type);
+}
+
+bool is_dirty_device(struct f2fs_sb_info *sbi, nid_t ino,
+   unsigned int devidx, int type)
+{
+   struct inode_management *im = &sbi->im[type];
+   struct ino_entry *e;
+   bool is_dirty = false;
+
+   spin_lock(&im->ino_lock);
+   e = radix_tree_lookup(&im->ino_root, ino);
+   if (e && f2fs_test_bit(devidx, (char *)&e->dirty_device))
+   is_dirty = true;
+   spin_unlock(&im->ino_lock);
+   return is_dirty;
+}
+
 int acquire_orphan_inode(struct f2fs_sb_info *sbi)
 {
struct inode_management *im = &sbi->im[ORPHAN_INO];
@@ -530,7 +556,7 @@ void release_orphan_inode(struct f2fs_sb_info *sbi)
 void add_orphan_inode(struct inode *inode)
 {
/* add new orphan ino entry into list */
-   __add_ino_entry(F2FS_I_SB(inode), inode->i_ino, ORPHAN_INO);
+   __add_ino_entry(F2FS_I_SB(inode), inode->i_ino, 0, ORPHAN_INO);
update_inode_page(inode);
 }
 
@@ -554,7 +580,7 @@ static int recover_orphan_inode(struct f2fs_sb_info *sbi, 
nid_t ino)
return err;
}
 
-   __add_ino_entry(sbi, ino, ORPHAN_INO);
+   __add_ino_entry(sbi, ino, 0, ORPHAN_INO);
 
inode = f2fs_iget_retry(sbi->sb, ino);
if (IS_ERR(inode)) {
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 506281d9807d..861dd95f78b7 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1494,6 +1494,7 @@ static int __write_data_page(struct page *page, bool 
*submitted,
int err = 0;
struct f2fs_io_info fio = {
.sbi = sbi,
+   .ino = inode->i_ino,
.type = DATA,
.op = REQ_OP_WRITE,
.op_flags = wbc_to_write_flags(wbc),
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 3440551289d2..ce63e778136a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -179,12 +179,14 @@ enum {
ORPHAN_INO, /* for orphan ino list */
APPEND_INO, /* for append ino list */
UPDATE_INO, /* for update ino list */
+   FLUSH_INO,  /* for multiple device flushing */
MA

[PATCH 5/5] f2fs: fix to flush multiple device in checkpoint

2017-09-28 Thread Chao Yu
If f2fs manages multiple devices, in checkpoint, we need to issue flush
in those devices which contain dirty data/node in their cache before
we write checkpoint region, otherwise, filesystem metadata could be
corrupted if hitting SPO after checkpoint.

Signed-off-by: Chao Yu 
---
 fs/f2fs/checkpoint.c |  6 ++
 fs/f2fs/f2fs.h   |  3 +++
 fs/f2fs/segment.c| 29 +
 fs/f2fs/super.c  |  3 +++
 4 files changed, 41 insertions(+)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 571980793542..201608281681 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1172,6 +1172,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct 
cp_control *cpc)
struct super_block *sb = sbi->sb;
struct curseg_info *seg_i = CURSEG_I(sbi, CURSEG_HOT_NODE);
u64 kbytes_written;
+   int err;
 
/* Flush all the NAT/SIT pages */
while (get_pages(sbi, F2FS_DIRTY_META)) {
@@ -1265,6 +1266,11 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
if (unlikely(f2fs_cp_error(sbi)))
return -EIO;
 
+   /* flush all device cache */
+   err = f2fs_flush_device_cache(sbi);
+   if (err)
+   return err;
+
/* write out checkpoint buffer at block 0 */
update_meta_page(sbi, ckpt, start_blk++);
 
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index ce63e778136a..c85f49c41003 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1145,6 +1145,8 @@ struct f2fs_sb_info {
struct list_head s_list;
int s_ndevs;/* number of devices */
struct f2fs_dev_info *devs; /* for device list */
+   unsigned int dirty_device;  /* for checkpoint data flush */
+   spinlock_t dev_lock;/* protect dirty_device */
struct mutex umount_mutex;
unsigned int shrinker_run_no;
 
@@ -2555,6 +2557,7 @@ void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need);
 void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi);
 int f2fs_issue_flush(struct f2fs_sb_info *sbi, nid_t ino);
 int create_flush_cmd_control(struct f2fs_sb_info *sbi);
+int f2fs_flush_device_cache(struct f2fs_sb_info *sbi);
 void destroy_flush_cmd_control(struct f2fs_sb_info *sbi, bool free);
 void invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr);
 bool is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 9d096f0014dc..2fe3343d876c 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -659,6 +659,28 @@ void destroy_flush_cmd_control(struct f2fs_sb_info *sbi, 
bool free)
}
 }
 
+int f2fs_flush_device_cache(struct f2fs_sb_info *sbi)
+{
+   int ret = 0, i;
+
+   if (!sbi->s_ndevs)
+   return 0;
+
+   for (i = 1; i < sbi->s_ndevs; i++) {
+   if (!f2fs_test_bit(i, (char *)&sbi->dirty_device))
+   continue;
+   ret = __submit_flush_wait(sbi, FDEV(i).bdev);
+   if (ret)
+   break;
+
+   spin_lock(&sbi->dev_lock);
+   f2fs_clear_bit(i, (char *)&sbi->dirty_device);
+   spin_unlock(&sbi->dev_lock);
+   }
+
+   return ret;
+}
+
 static void __locate_dirty_segment(struct f2fs_sb_info *sbi, unsigned int 
segno,
enum dirty_type dirty_type)
 {
@@ -2515,6 +2537,13 @@ static void update_device_state(struct f2fs_io_info *fio)
 
/* update device state for fsync */
set_dirty_device(sbi, fio->ino, devidx, FLUSH_INO);
+
+   /* update device state for checkpoint */
+   if (!f2fs_test_bit(devidx, (char *)&sbi->dirty_device)) {
+   spin_lock(&sbi->dev_lock);
+   f2fs_set_bit(devidx, (char *)&sbi->dirty_device);
+   spin_unlock(&sbi->dev_lock);
+   }
 }
 
 static void do_write_page(struct f2fs_summary *sum, struct f2fs_io_info *fio)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 1c56100d28c1..1d68c18a487b 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1996,6 +1996,9 @@ static void init_sb_info(struct f2fs_sb_info *sbi)
for (j = HOT; j < NR_TEMP_TYPE; j++)
mutex_init(&sbi->wio_mutex[i][j]);
spin_lock_init(&sbi->cp_lock);
+
+   sbi->dirty_device = 0;
+   spin_lock_init(&sbi->dev_lock);
 }
 
 static int init_percpu_info(struct f2fs_sb_info *sbi)
-- 
2.13.1.388.g69e6b9b4f4a9



[PATCH 1/5] f2fs: obsolete ALLOC_NID_LIST list

2017-09-28 Thread Chao Yu
As Fan Li reported, there is no user traversing nid_list[ALLOC_NID_LIST]
which is used for tracking preallocated nids. Let's drop it, and only
track preallocated nids in free_nid_root radix-tree.

Reported-by: Fan Li 
Signed-off-by: Chao Yu 
---
 fs/f2fs/debug.c|  8 ++---
 fs/f2fs/f2fs.h | 15 +
 fs/f2fs/node.c | 97 ++
 fs/f2fs/node.h | 15 ++---
 fs/f2fs/shrinker.c |  2 +-
 5 files changed, 64 insertions(+), 73 deletions(-)

diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index 00c1d4a9f356..14095fbb4039 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -98,9 +98,9 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->dirty_nats = NM_I(sbi)->dirty_nat_cnt;
si->sits = MAIN_SEGS(sbi);
si->dirty_sits = SIT_I(sbi)->dirty_sentries;
-   si->free_nids = NM_I(sbi)->nid_cnt[FREE_NID_LIST];
+   si->free_nids = NM_I(sbi)->nid_cnt[FREE_NID];
si->avail_nids = NM_I(sbi)->available_nids;
-   si->alloc_nids = NM_I(sbi)->nid_cnt[ALLOC_NID_LIST];
+   si->alloc_nids = NM_I(sbi)->nid_cnt[PREALLOC_NID];
si->bg_gc = sbi->bg_gc;
si->util_free = (int)(free_user_blocks(sbi) >> sbi->log_blocks_per_seg)
* 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg)
@@ -233,8 +233,8 @@ static void update_mem_info(struct f2fs_sb_info *sbi)
}
 
/* free nids */
-   si->cache_mem += (NM_I(sbi)->nid_cnt[FREE_NID_LIST] +
-   NM_I(sbi)->nid_cnt[ALLOC_NID_LIST]) *
+   si->cache_mem += (NM_I(sbi)->nid_cnt[FREE_NID] +
+   NM_I(sbi)->nid_cnt[PREALLOC_NID]) *
sizeof(struct free_nid);
si->cache_mem += NM_I(sbi)->nat_cnt * sizeof(struct nat_entry);
si->cache_mem += NM_I(sbi)->dirty_nat_cnt *
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index c95069435b47..3440551289d2 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -688,10 +688,13 @@ static inline void __try_update_largest_extent(struct 
inode *inode,
}
 }
 
-enum nid_list {
-   FREE_NID_LIST,
-   ALLOC_NID_LIST,
-   MAX_NID_LIST,
+/*
+ * For free nid management
+ */
+enum nid_state {
+   FREE_NID,   /* newly added to free nid list */
+   PREALLOC_NID,   /* it is preallocated */
+   MAX_NID_STATE,
 };
 
 struct f2fs_nm_info {
@@ -714,8 +717,8 @@ struct f2fs_nm_info {
 
/* free node ids management */
struct radix_tree_root free_nid_root;/* root of the free_nid cache */
-   struct list_head nid_list[MAX_NID_LIST];/* lists for free nids */
-   unsigned int nid_cnt[MAX_NID_LIST]; /* the number of free node id */
+   struct list_head free_nid_list; /* list for free nids excluding 
preallocated nids */
+   unsigned int nid_cnt[MAX_NID_STATE];/* the number of free node id */
spinlock_t nid_list_lock;   /* protect nid lists ops */
struct mutex build_lock;/* lock for build free nids */
unsigned char (*free_nid_bitmap)[NAT_ENTRY_BITMAP_SIZE];
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 275ad3f35579..d087b45b3f72 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -46,7 +46,7 @@ bool available_free_memory(struct f2fs_sb_info *sbi, int type)
 * give 25%, 25%, 50%, 50%, 50% memory for each components respectively
 */
if (type == FREE_NIDS) {
-   mem_size = (nm_i->nid_cnt[FREE_NID_LIST] *
+   mem_size = (nm_i->nid_cnt[FREE_NID] *
sizeof(struct free_nid)) >> PAGE_SHIFT;
res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 2);
} else if (type == NAT_ENTRIES) {
@@ -1757,8 +1757,8 @@ static struct free_nid *__lookup_free_nid_list(struct 
f2fs_nm_info *nm_i,
return radix_tree_lookup(&nm_i->free_nid_root, n);
 }
 
-static int __insert_nid_to_list(struct f2fs_sb_info *sbi,
-   struct free_nid *i, enum nid_list list, bool new)
+static int __insert_free_nid(struct f2fs_sb_info *sbi,
+   struct free_nid *i, enum nid_state state, bool new)
 {
struct f2fs_nm_info *nm_i = NM_I(sbi);
 
@@ -1768,22 +1768,22 @@ static int __insert_nid_to_list(struct f2fs_sb_info 
*sbi,
return err;
}
 
-   f2fs_bug_on(sbi, list == FREE_NID_LIST ? i->state != NID_NEW :
-   i->state != NID_ALLOC);
-   nm_i->nid_cnt[list]++;
-   list_add_tail(&i->list, &nm_i->nid_list[list]);
+   f2fs_bug_on(sbi, state != i->state);
+   nm_i->nid_cnt[state]++;
+   if (state == FREE_NID)
+   list_add_tail(&i->list, &nm_i->free_nid_list);
return 0;
 }
 
-static void __remove_nid_from_list(struct f2fs_sb_info *sbi,
-   struct free_nid *i, enum nid_list list, bool reuse)
+static void __remove_free_nid(struct f2fs_sb_info *sbi,

[PATCH v2] perf/core: Fix updating cgroup time with descendants

2017-09-28 Thread Lin Xiulei
From: "leilei.lin" 

This fix updating cgroup time when event is being scheduled in
by descendants

Signed-off-by: leilei.lin 
Reviewed-and-tested-by: Jiri Olsa 
---
 kernel/events/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3e691b7..e3a5e32 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -662,7 +662,7 @@ static inline void
update_cgrp_time_from_event(struct perf_event *event)
/*
 * Do not update time when cgroup is not active
 */
-   if (cgrp == event->cgrp)
+   if (cgroup_is_descendant(cgrp->css.cgroup, event->cgrp->css.cgroup))
__update_cgrp_time(event->cgrp);
 }


Re: [Part1 PATCH v5 16/17] X86/KVM: Decrypt shared per-cpu variables when SEV is active

2017-09-28 Thread Borislav Petkov
On Wed, Sep 27, 2017 at 10:13:28AM -0500, Brijesh Singh wrote:
> When SEV is active, guest memory is encrypted with a guest-specific key, a
> guest memory region shared with the hypervisor must be mapped as decrypted
> before we can share it.
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Borislav Petkov 
> Cc: Paolo Bonzini 
> Cc: "Radim Krčmář" 
> Cc: Tom Lendacky 
> Cc: x...@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: k...@vger.kernel.org
> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/kernel/kvm.c | 41 ++---
>  1 file changed, 38 insertions(+), 3 deletions(-)

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


linux-next: Tree for Sep 29

2017-09-28 Thread Stephen Rothwell
Hi all,

News: I will not be doing linux-next releases from Setp 30 to Oct 30
(inclusive).

Changes since 20170928:

The net-next tree gained a build failure, due to in interaction with the
net tree, for which I applied a merge fix patch.

The drm tree still had its build failure for which I applied a fix patch.

The ipmi tree lost its build failure.

The akpm tree lost a patch that turned up elsewhere.

Non-merge commits (relative to Linus' tree): 2815
 2768 files changed, 96917 insertions(+), 40111 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And
finally, a simple boot test of the powerpc pseries_le_defconfig kernel
in qemu.

Below is a summary of the state of the merge.

I am currently merging 267 trees (counting Linus' and 41 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (770b782f555d Merge tag 'acpi-4.14-rc3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm)
Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi)
Merging kbuild-current/fixes (cd4175b11685 Merge branch 'parisc-4.14-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux)
Merging arc-current/for-curr (ef6c1bae4792 arc: remove redundant UTS_MACHINE 
define in arch/arc/Makefile)
Merging arm-current/fixes (746a272e4414 ARM: 8692/1: mm: abort uaccess retries 
upon fatal signal)
Merging m68k-current/for-linus (558d5ad276c9 m68k/mac: Avoid soft-lockup 
warning after mach_power_off)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (d8bd9f3f0925 powerpc: Handle MCE on POWER9 with 
only DSISR bit 30 set)
Merging sparc/master (23198ddffb6c sparc32: Add cmpxchg64().)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (9d538fa60bad net: Set sk_prot_creator when cloning sockets 
to the right proto)
Merging ipsec/master (dd269db84908 xfrm: don't call xfrm_policy_cache_flush 
under xfrm_state_lock)
Merging netfilter/master (7f4f7dd4417d netfilter: ipset: ipset list may return 
wrong member count for set with timeout)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (3e747fa18202 Merge ath-current from ath.git)
Merging mac80211/master (265698d7e613 nl80211: fix null-ptr dereference on 
invalid mesh configuration)
Merging sound-current/for-linus (bfc81a8bc18e ALSA: usb-audio: Check 
out-of-bounds access by corrupted buffer descriptor)
Merging pci-current/for-linus (9561475db680 PCI: Fix race condition with 
driver_override)
Merging driver-core.current/driver-core-linus (850fdec8d2fd driver core: remove 
DRIVER_ATTR)
Merging tty.current/tty-linus (c91261437985 serial: sccnxp: Fix error handling 
in sccnxp_probe())
Merging usb.current/usb-linus (8fec9355a968 USB: cdc-wdm: ignore -EPIPE from 
GetEncapsulatedResponse)
Merging usb-gadget-fixes/fixes (c3cdce45f8d3 usb: dwc3: of-simple: Add 
compatible for Spreadtrum SC9860 platform)
Merging usb-serial-fixes/usb-linus (c496ad835c31 USB: serial: cp210x: add 
support for ELV TFD500)
Merging usb-chipidea-fixes/ci-for-usb-stable (cbb22ebcfb99 usb: chipidea: core: 
check before accessing ci_role in ci_role_show)
Merging phy/fixes (26e03d803c81 phy: rockchip-typec: Don't set the aux voltage 
swi

Re: DMA error when sg->offset value is greater than PAGE_SIZE in Intel IOMMU

2017-09-28 Thread Harsh Jain

On 28-09-2017 18:35, Raj, Ashok wrote:
> Thanks for trying that Harsh. 
>
> sp_off turns of super page support. Which this mode, do you still see offsets 
> greater than 4k?
Yes, offset greater than 4k is still there. Refer below.
[56732.774872] offset 4110 len 76 dma addr 3a531200e dma len 76
[56732.804187] offset 4110 len 84 dma addr 3a63b200e dma len 84
[56732.805104] offset 4110 len 68 dma addr 3a531200e dma len 68
[56732.806870] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.808987] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.811215] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.813155] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.814823] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.816481] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.818159] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.819712] offset 4110 len 56 dma addr 3a531200e dma len 56
[56732.821629] offset 4110 len 56 dma addr 3a531200e dma len 56
[root@heptagon linux_t4_build]#
[root@heptagon linux_t4_build]#
[root@heptagon linux_t4_build]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.9.51 root=UUID=ccbb7f18-b3f0-43df-89de-07521e9c02fe ro 
intel_iommu=sp_off crashkernel=auto rhgb quiet rhgb quiet console=ttyS0,115200, 
console=tty0 LANG=en_US.UTF-8

>
> On Thu, Sep 28, 2017 at 07:08:21PM +0530, Harsh Jain wrote:
>>
>> Today I tried with "Intel_iommu=sp_off" boot option. Traffic runs without 
>> any error for more than 1 hrs. What magic this option did? :)
> Cheers,
> Ashok



[PATCH] nvme-pci: Use PCI bus address for data/queues in CMB

2017-09-28 Thread Abhishek Shah
Currently, NVMe PCI host driver is programming CMB dma address as
I/O SQs addresses. This results in failures on systems where 1:1
outbound mapping is not used (example Broadcom iProc SOCs) because
CMB BAR will be progammed with PCI bus address but NVMe PCI EP will
try to access CMB using dma address.

To have CMB working on systems without 1:1 outbound mapping, we
program PCI bus address for I/O SQs instead of dma address. This
approach will work on systems with/without 1:1 outbound mapping.

The patch is tested on Broadcom Stingray platform(arm64), which
does not have 1:1 outbound mapping, as well as on x86 platform,
which has 1:1 outbound mapping.

Fixes: 8ffaadf7 ("NVMe: Use CMB for the IO SQes if available")
Cc: sta...@vger.kernel.org
Signed-off-by: Abhishek Shah 
Reviewed-by: Anup Patel 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 drivers/nvme/host/pci.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 4a21213..29e3bd8 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -94,6 +94,7 @@ struct nvme_dev {
bool subsystem;
void __iomem *cmb;
dma_addr_t cmb_dma_addr;
+   pci_bus_addr_t cmb_bus_addr;
u64 cmb_size;
u32 cmbsz;
u32 cmbloc;
@@ -1220,7 +1221,7 @@ static int nvme_alloc_sq_cmds(struct nvme_dev *dev, 
struct nvme_queue *nvmeq,
if (qid && dev->cmb && use_cmb_sqes && NVME_CMB_SQS(dev->cmbsz)) {
unsigned offset = (qid - 1) * roundup(SQ_SIZE(depth),
  dev->ctrl.page_size);
-   nvmeq->sq_dma_addr = dev->cmb_dma_addr + offset;
+   nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset;
nvmeq->sq_cmds_io = dev->cmb + offset;
} else {
nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
@@ -1514,8 +1515,28 @@ static ssize_t nvme_cmb_show(struct device *dev,
 }
 static DEVICE_ATTR(cmb, S_IRUGO, nvme_cmb_show, NULL);
 
+static int nvme_find_cmb_bus_addr(struct pci_dev *pdev,
+ dma_addr_t dma_addr,
+ u64 size,
+ pci_bus_addr_t *bus_addr)
+{
+   struct resource *res;
+   struct pci_bus_region region;
+   struct resource tres = DEFINE_RES_MEM(dma_addr, size);
+
+   res = pci_find_resource(pdev, &tres);
+   if (!res)
+   return -EIO;
+
+   pcibios_resource_to_bus(pdev->bus, ®ion, res);
+   *bus_addr = region.start + (dma_addr - res->start);
+
+   return 0;
+}
+
 static void __iomem *nvme_map_cmb(struct nvme_dev *dev)
 {
+   int rc;
u64 szu, size, offset;
resource_size_t bar_size;
struct pci_dev *pdev = to_pci_dev(dev->dev);
@@ -1553,6 +1574,13 @@ static void __iomem *nvme_map_cmb(struct nvme_dev *dev)
 
dev->cmb_dma_addr = dma_addr;
dev->cmb_size = size;
+
+   rc = nvme_find_cmb_bus_addr(pdev, dma_addr, size, &dev->cmb_bus_addr);
+   if (rc) {
+   iounmap(cmb);
+   return NULL;
+   }
+
return cmb;
 }
 
-- 
2.7.4



Re: [PATCH 10/12] writeback: only allow one inflight and pending full flush

2017-09-28 Thread Amir Goldstein
On Fri, Sep 29, 2017 at 3:17 AM, Jens Axboe  wrote:
> On 09/28/2017 11:44 PM, Linus Torvalds wrote:
>> On Thu, Sep 28, 2017 at 2:41 PM, Andrew Morton
>>  wrote:
>>>
>>> test_and_set_bit()?
>>
>> If there aren't any atomicity concerns (either because of higher-level
>> locking, or because racing and having two people set the bit is fine),
>> it can be better to do them separately if the test_bit() is the common
>> case and you can avoid dirtying a cacheline that way.
>>
>> But yeah, if that is the case, it might be worth documenting, because
>> test_and_set_bit() is the more obviously appropriate "there can be
>> only one" model.
>
> It is documented though, but maybe not well enough...
>
> I've actually had to document/explain it enough times now, that it
> might be worth making a general construct. Though it has to be
> used carefully, so perhaps it's better contained as separate use
> cases.
>

Maybe change "Ensure that we only allow one of them pending"
in the comment above. Only the "allow one inflight" part is correct.

Or apply your follow up patch and be done with in...

Amir.


[PATCH] usb: dwc3: workaround: disable device-initiated U1/U2

2017-09-28 Thread Ran Wang
Issue: When the USB controller is configured as a USB device
mode, the device initiates low power when an ACK is pending for a
data packet (DP). When operating in SuperSpeed mode and when the
internal condition for low power (u1/u2) is satisfied, the device
initiates u1/u2 even though it has just received a DPH of the DP
header (DPH). This causes the link to enter and exit low power before
the device sends an ACK for the DP. This behavior can cause a
transaction timeout on the host for the DP. Impact: Depending on the
host transaction timeout value, the host may timeout on the
transaction and the host retries the transfer. If the same issue
happens again, this could result in the host resetting the device and
re-enumerating.

Workaround: Disable USB_DCTL (InitU1Ena, InitU2Ena) bits. As a
result,the device does not initiate lowpower requests; however,
it can still accept low-power requests from the host/hub and enter
low power.

Signed-off-by: Ran Wang 
---
 Documentation/devicetree/bindings/usb/dwc3.txt | 2 ++
 drivers/usb/dwc3/core.c| 2 ++
 drivers/usb/dwc3/core.h| 2 ++
 drivers/usb/dwc3/ep0.c | 4 ++--
 drivers/usb/dwc3/gadget.c  | 7 +++
 5 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt 
b/Documentation/devicetree/bindings/usb/dwc3.txt
index 7d4f90c16cd4..9afa8e95831e 100644
--- a/Documentation/devicetree/bindings/usb/dwc3.txt
+++ b/Documentation/devicetree/bindings/usb/dwc3.txt
@@ -47,6 +47,8 @@ Optional properties:
from P0 to P1/P2/P3 without delay.
  - snps,dis-tx-ipgap-linecheck-quirk: when set, disable u2mac linestate check
during HS transmit.
+ - snps,disable_devinit_u1u2: when set, disable device-initiated U1/U2
+   LPM request in USB device mode.
  - snps,is-utmi-l1-suspend: true when DWC3 asserts output signal
utmi_l1_suspend_n, false when asserts utmi_sleep_n
  - snps,hird-threshold: HIRD threshold
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index f13096b0900e..63d599872a43 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1143,6 +1143,8 @@ static void dwc3_get_properties(struct dwc3 *dwc)
 
dwc->tx_de_emphasis_quirk = device_property_read_bool(dev,
"snps,tx_de_emphasis_quirk");
+   dwc->disable_devinit_u1u2_quirk = device_property_read_bool(dev,
+   "snps,disable_devinit_u1u2");
device_property_read_u8(dev, "snps,tx_de_emphasis",
&tx_de_emphasis);
device_property_read_string(dev, "snps,hsphy_interface",
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 7c2f84dc218a..2be63c1a6ab6 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -896,6 +896,7 @@ struct dwc3_scratchpad_array {
  * 1   - -3.5dB de-emphasis
  * 2   - No de-emphasis
  * 3   - Reserved
+ * @disable_devinit_u1u2_quirk: disable device-initiated U1/U2 request.
  * @imod_interval: set the interrupt moderation interval in 250ns
  * increments or 0 to disable.
  */
@@ -1057,6 +1058,7 @@ struct dwc3 {
 
unsignedtx_de_emphasis_quirk:1;
unsignedtx_de_emphasis:2;
+   unsigneddisable_devinit_u1u2_quirk:1;
 
u16 imod_interval;
 };
diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index 827e376bfa97..bbbf46f031e2 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -391,7 +391,7 @@ static int dwc3_ep0_handle_u1(struct dwc3 *dwc, enum 
usb_device_state state,
return -EINVAL;
 
reg = dwc3_readl(dwc->regs, DWC3_DCTL);
-   if (set)
+   if (set && !dwc->disable_devinit_u1u2_quirk)
reg |= DWC3_DCTL_INITU1ENA;
else
reg &= ~DWC3_DCTL_INITU1ENA;
@@ -413,7 +413,7 @@ static int dwc3_ep0_handle_u2(struct dwc3 *dwc, enum 
usb_device_state state,
return -EINVAL;
 
reg = dwc3_readl(dwc->regs, DWC3_DCTL);
-   if (set)
+   if (set && !dwc->disable_devinit_u1u2_quirk)
reg |= DWC3_DCTL_INITU2ENA;
else
reg &= ~DWC3_DCTL_INITU2ENA;
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index f064f1549333..61141c6350dc 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -3199,6 +3199,7 @@ int dwc3_gadget_init(struct dwc3 *dwc)
 {
int ret;
int irq;
+   u32 reg;
 
irq = dwc3_gadget_get_irq(dwc);
if (irq < 0) {
@@ -3275,6 +3276,12 @@ int dwc3_gadget_init(struct dwc3 *dwc)
goto err4;
}
 
+   if (dwc->disable_devinit_u1u2_quirk) {
+   reg = dwc3_readl(dwc->regs, DWC3_DCTL);
+   reg &= ~(DWC3_DCTL_INITU1ENA | 

Re: [PATCH] ratelimit: use deferred printk() version

2017-09-28 Thread Sergey Senozhatsky
Hello,

(Cc-ing Andrew  
lkml.kernel.org/r/20170928120405.18273-1-sergey.senozhat...@gmail.com )

On (09/28/17 21:13), Sergey Senozhatsky wrote:
> (Cc-ing Sasha)
> 
> On (09/28/17 21:04), Sergey Senozhatsky wrote:
> [..]
> >  : process 9121 (trinity-c78) no longer affine to cpu8
> >  : smpboot: CPU 8 is now offline
> > 
> > Fixes: 6b1d174b0c27b ("ratelimit: extend to print suppressed messages on 
> > release")
> > Signed-off-by: Sergey Senozhatsky 
> > Reported-by: Sasha Levin 
> > Cc: sta...@vger.kernel.org
> > Cc: Peter Zijlstra 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: Borislav Petkov 
> > Cc: Steven Rostedt 
> > Cc: Petr Mladek 

a quick question, who is going to pick it up? or shall we ask Andrew?

-ss


Re: [PATCH] extcon: Split out extcon header file for consumer and provider device

2017-09-28 Thread Chanwoo Choi
Hi,

On 2017년 09월 29일 11:03, Yoshihiro Shimoda wrote:
> Hi,
> 
>> From: Chanwoo Choi
>> Sent: Friday, September 29, 2017 9:02 AM
>>
> < snip >
>>  drivers/phy/renesas/phy-rcar-gen3-usb2.c  |   2 +-
> < snip >
>>  drivers/usb/gadget/udc/renesas_usb3.c |   2 +-
> 
> These two drivers need the modification.
> But...
> 
> < snip >
>> diff --git a/drivers/usb/renesas_usbhs/common.h 
>> b/drivers/usb/renesas_usbhs/common.h
>> index 8c5fc12ad778..a78764bc23eb 100644
>> --- a/drivers/usb/renesas_usbhs/common.h
>> +++ b/drivers/usb/renesas_usbhs/common.h
>> @@ -17,7 +17,7 @@
>>  #ifndef RENESAS_USB_DRIVER_H
>>  #define RENESAS_USB_DRIVER_H
>>
>> -#include 
>> +#include 
> 
> Since this driver doesn't use any extcon-provider APIs for now,
> we doesn't need the modification, IIUC.

I don't modify 'drivers/usb/renesas_usbhs/common.h'
on v2 patch. Thanks for your comment.

> 
> Best regards,
> Yoshihiro Shimoda
> 
> 
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics


[PATCH] zsmalloc: calling zs_map_object() from irq is a bug

2017-09-28 Thread Sergey Senozhatsky
Use BUG_ON(in_interrupt()) in zs_map_object(). This is not a
new BUG_ON(), it's always been there, but was recently changed
to VM_BUG_ON(). There are several problems there. First, we use
use per-CPU mappings both in zsmalloc and in zram, and interrupt
may easily corrupt those buffers. Second, and more importantly,
we believe it's possible to start leaking sensitive information.
Consider the following case:

-> process P
swap out
 zram
  per-cpu mapping CPU1
   compress page A
-> IRQ

swap out
 zram
  per-cpu mapping CPU1
   compress page B
write page from per-cpu mapping CPU1 to zsmalloc pool
iret

-> process P
write page from per-cpu mapping CPU1 to zsmalloc pool  [*]
return

* so we store overwritten data that actually belongs to another
  page (task) and potentially contains sensitive data. And when
  process P will page fault it's going to read (swap in) that
  other task's data.

Signed-off-by: Sergey Senozhatsky 
Acked-by: Minchan Kim 
---
 mm/zsmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 7c38e850a8fc..685049a9048d 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1349,7 +1349,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long 
handle,
 * pools/users, we can't allow mapping in interrupt context
 * because it can corrupt another users mappings.
 */
-   WARN_ON_ONCE(in_interrupt());
+   BUG_ON(in_interrupt());
 
/* From now on, migration cannot move the object */
pin_tag(handle);
-- 
2.14.2



[PATCH] USB: serial: qcserial: add Dell DW5818, DW5819

2017-09-28 Thread Shrirang Bagul
Dell Wireless 5819/5818 devices are re-branded Sierra Wireless MC74
series which will by default boot with vid 0x413c and pid's 0x81cf,
0x81d0, 0x81d1,0x81d2.

Signed-off-by: Shrirang Bagul 
---
 drivers/usb/serial/qcserial.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/usb/serial/qcserial.c b/drivers/usb/serial/qcserial.c
index ebc0beea69d6..eb9928963a53 100644
--- a/drivers/usb/serial/qcserial.c
+++ b/drivers/usb/serial/qcserial.c
@@ -174,6 +174,10 @@ static const struct usb_device_id id_table[] = {
{DEVICE_SWI(0x413c, 0x81b3)},   /* Dell Wireless 5809e Gobi(TM) 4G LTE 
Mobile Broadband Card (rev3) */
{DEVICE_SWI(0x413c, 0x81b5)},   /* Dell Wireless 5811e QDL */
{DEVICE_SWI(0x413c, 0x81b6)},   /* Dell Wireless 5811e QDL */
+   {DEVICE_SWI(0x413c, 0x81cf)},   /* Dell Wireless 5819 */
+   {DEVICE_SWI(0x413c, 0x81d0)},   /* Dell Wireless 5819 */
+   {DEVICE_SWI(0x413c, 0x81d1)},   /* Dell Wireless 5818 */
+   {DEVICE_SWI(0x413c, 0x81d2)},   /* Dell Wireless 5818 */
 
/* Huawei devices */
{DEVICE_HWI(0x03f0, 0x581d)},   /* HP lt4112 LTE/HSPA+ Gobi 4G Modem 
(Huawei me906e) */
-- 
2.11.0



[PATCH v2] drm/i915: Replace *_reference/unreference() or *_ref/unref with _get/put()

2017-09-28 Thread Harsha Sharma
Replace instances of drm_framebuffer_reference/unreference() with
*_get/put() suffixes and drm_dev_unref with *_put() suffix
because get/put is shorter and consistent with the
kernel use of *_get/put suffixes.
Done with following coccinelle semantic patch

@@ 
expression ex; 
@@ 

( 
-drm_framebuffer_unreference(ex); 
+drm_framebuffer_put(ex); 
| 
-drm_dev_unref(ex); 
+drm_dev_put(ex); 
| 
-drm_framebuffer_reference(ex); 
+drm_framebuffer_get(ex); 
) 


Signed-off-by: Harsha Sharma 
---
Changes in v2:
 -Added coccinelle patch in log message 
 -cc to all driver-specific mailing lists

 drivers/gpu/drm/i915/i915_pci.c|  2 +-
 drivers/gpu/drm/i915/intel_display.c   | 10 +-
 drivers/gpu/drm/i915/intel_fbdev.c |  4 ++--
 drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_evict.c|  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c  |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_object.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_gem_request.c  |  2 +-
 drivers/gpu/drm/i915/selftests/i915_vma.c  |  2 +-
 drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c |  2 +-
 10 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 09d97e0..2f106cc 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -510,7 +510,7 @@ static void i915_pci_remove(struct pci_dev *pdev)
struct drm_device *dev = pci_get_drvdata(pdev);
 
i915_driver_unload(dev);
-   drm_dev_unref(dev);
+   drm_dev_put(dev);
 }
 
 static int i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id 
*ent)
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index f172755..92f8304 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2856,7 +2856,7 @@ static int skl_format_to_fourcc(int format, bool 
rgb_order, bool alpha)
 
if (intel_plane_ggtt_offset(state) == plane_config->base) {
fb = c->primary->fb;
-   drm_framebuffer_reference(fb);
+   drm_framebuffer_get(fb);
goto valid_fb;
}
}
@@ -2887,7 +2887,7 @@ static int skl_format_to_fourcc(int format, bool 
rgb_order, bool alpha)
  intel_crtc->pipe, PTR_ERR(intel_state->vma));
 
intel_state->vma = NULL;
-   drm_framebuffer_unreference(fb);
+   drm_framebuffer_put(fb);
return;
}
 
@@ -2908,7 +2908,7 @@ static int skl_format_to_fourcc(int format, bool 
rgb_order, bool alpha)
if (i915_gem_object_is_tiled(obj))
dev_priv->preserve_bios_swizzle = true;
 
-   drm_framebuffer_reference(fb);
+   drm_framebuffer_get(fb);
primary->fb = primary->state->fb = fb;
primary->crtc = primary->state->crtc = &intel_crtc->base;
 
@@ -9847,7 +9847,7 @@ struct drm_framebuffer *
if (obj->base.size < mode->vdisplay * fb->pitches[0])
return NULL;
 
-   drm_framebuffer_reference(fb);
+   drm_framebuffer_get(fb);
return fb;
 #else
return NULL;
@@ -10028,7 +10028,7 @@ int intel_get_load_detect_pipe(struct drm_connector 
*connector,
if (ret)
goto fail;
 
-   drm_framebuffer_unreference(fb);
+   drm_framebuffer_put(fb);
 
ret = drm_atomic_set_mode_for_crtc(&crtc_state->base, mode);
if (ret)
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c 
b/drivers/gpu/drm/i915/intel_fbdev.c
index 262e75c..1ff7149 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -189,7 +189,7 @@ static int intelfb_create(struct drm_fb_helper *helper,
  " releasing it\n",
  intel_fb->base.width, intel_fb->base.height,
  sizes->fb_width, sizes->fb_height);
-   drm_framebuffer_unreference(&intel_fb->base);
+   drm_framebuffer_put(&intel_fb->base);
intel_fb = ifbdev->fb = NULL;
}
if (!intel_fb || WARN_ON(!intel_fb->obj)) {
@@ -624,7 +624,7 @@ static bool intel_fbdev_init_bios(struct drm_device *dev,
ifbdev->preferred_bpp = fb->base.format->cpp[0] * 8;
ifbdev->fb = fb;
 
-   drm_framebuffer_reference(&ifbdev->fb->base);
+   drm_framebuffer_put(&ifbdev->fb->base);
 
/* Final pass to check if any active pipes don't have fbs */
for_each_crtc(dev, crtc) {
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c
index 89dc25a..a7055b1 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c
@@ -389,7 +389,7 @@ int i915_gem_dmabuf_mock_selftests(void)
 
   

Re: [virtio-dev] Re: [PATCH v15 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG

2017-09-28 Thread Michael S. Tsirkin
On Fri, Sep 08, 2017 at 07:09:24PM +0800, Wei Wang wrote:
> On 09/08/2017 11:36 AM, Michael S. Tsirkin wrote:
> > On Tue, Aug 29, 2017 at 11:09:18AM +0800, Wei Wang wrote:
> > > On 08/29/2017 02:03 AM, Michael S. Tsirkin wrote:
> > > > On Mon, Aug 28, 2017 at 06:08:31PM +0800, Wei Wang wrote:
> > > > > Add a new feature, VIRTIO_BALLOON_F_SG, which enables the transfer
> > > > > of balloon (i.e. inflated/deflated) pages using scatter-gather lists
> > > > > to the host.
> > > > > 
> > > > > The implementation of the previous virtio-balloon is not very
> > > > > efficient, because the balloon pages are transferred to the
> > > > > host one by one. Here is the breakdown of the time in percentage
> > > > > spent on each step of the balloon inflating process (inflating
> > > > > 7GB of an 8GB idle guest).
> > > > > 
> > > > > 1) allocating pages (6.5%)
> > > > > 2) sending PFNs to host (68.3%)
> > > > > 3) address translation (6.1%)
> > > > > 4) madvise (19%)
> > > > > 
> > > > > It takes about 4126ms for the inflating process to complete.
> > > > > The above profiling shows that the bottlenecks are stage 2)
> > > > > and stage 4).
> > > > > 
> > > > > This patch optimizes step 2) by transferring pages to the host in
> > > > > sgs. An sg describes a chunk of guest physically continuous pages.
> > > > > With this mechanism, step 4) can also be optimized by doing address
> > > > > translation and madvise() in chunks rather than page by page.
> > > > > 
> > > > > With this new feature, the above ballooning process takes ~597ms
> > > > > resulting in an improvement of ~86%.
> > > > > 
> > > > > TODO: optimize stage 1) by allocating/freeing a chunk of pages
> > > > > instead of a single page each time.
> > > > > 
> > > > > Signed-off-by: Wei Wang 
> > > > > Signed-off-by: Liang Li 
> > > > > Suggested-by: Michael S. Tsirkin 
> > > > > ---
> > > > >drivers/virtio/virtio_balloon.c | 171 
> > > > > 
> > > > >include/uapi/linux/virtio_balloon.h |   1 +
> > > > >2 files changed, 155 insertions(+), 17 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/virtio/virtio_balloon.c 
> > > > > b/drivers/virtio/virtio_balloon.c
> > > > > index f0b3a0b..8ecc1d4 100644
> > > > > --- a/drivers/virtio/virtio_balloon.c
> > > > > +++ b/drivers/virtio/virtio_balloon.c
> > > > > @@ -32,6 +32,8 @@
> > > > >#include 
> > > > >#include 
> > > > >#include 
> > > > > +#include 
> > > > > +#include 
> > > > >/*
> > > > > * Balloon device works in 4K page units.  So each page is pointed 
> > > > > to by
> > > > > @@ -79,6 +81,9 @@ struct virtio_balloon {
> > > > >   /* Synchronize access/update to this struct virtio_balloon 
> > > > > elements */
> > > > >   struct mutex balloon_lock;
> > > > > + /* The xbitmap used to record balloon pages */
> > > > > + struct xb page_xb;
> > > > > +
> > > > >   /* The array of pfns we tell the Host about. */
> > > > >   unsigned int num_pfns;
> > > > >   __virtio32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];
> > > > > @@ -141,13 +146,111 @@ static void set_page_pfns(struct 
> > > > > virtio_balloon *vb,
> > > > > page_to_balloon_pfn(page) + 
> > > > > i);
> > > > >}
> > > > > +static int add_one_sg(struct virtqueue *vq, void *addr, uint32_t 
> > > > > size)
> > > > > +{
> > > > > + struct scatterlist sg;
> > > > > +
> > > > > + sg_init_one(&sg, addr, size);
> > > > > + return virtqueue_add_inbuf(vq, &sg, 1, vq, GFP_KERNEL);
> > > > > +}
> > > > > +
> > > > > +static void send_balloon_page_sg(struct virtio_balloon *vb,
> > > > > +  struct virtqueue *vq,
> > > > > +  void *addr,
> > > > > +  uint32_t size,
> > > > > +  bool batch)
> > > > > +{
> > > > > + unsigned int len;
> > > > > + int err;
> > > > > +
> > > > > + err = add_one_sg(vq, addr, size);
> > > > > + /* Sanity check: this can't really happen */
> > > > > + WARN_ON(err);
> > > > It might be cleaner to detect that add failed due to
> > > > ring full and kick then. Just an idea, up to you
> > > > whether to do it.
> > > > 
> > > > > +
> > > > > + /* If batching is in use, we batch the sgs till the vq is full. 
> > > > > */
> > > > > + if (!batch || !vq->num_free) {
> > > > > + virtqueue_kick(vq);
> > > > > + wait_event(vb->acked, virtqueue_get_buf(vq, &len));
> > > > > + /* Release all the entries if there are */
> > > > Meaning
> > > > Account for all used entries if any
> > > > ?
> > > > 
> > > > > + while (virtqueue_get_buf(vq, &len))
> > > > > + ;
> > > > Above code is reused below. Add a function?
> > > > 
> > > > > + }
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * Send balloon pages in sgs to host. The balloon pages are recorded 
> > > > > in the
> > > > > + * page xbitmap. E

Re: Kernel panic - not syncing: Fatal exception in interrupt (file_free_rcu+0x14)

2017-09-28 Thread Linus Torvalds
On Thu, Sep 28, 2017 at 8:32 PM, Kyle Sanderson  wrote:
> Not sure if the stack is crap or not, but this looks like an RCU crash?
>
> https://i.imgur.com/sBnNe1p.jpg

Hmm. Not the clearest picture, and the "Code:" line in particular is
missing the interesting part, but at a guess it's taking a fault in
put_cred(), which inlines to

if (atomic_dec_and_test(&(cred)->usage))
__put_cred(cred);

and I think it's that "cred" pointer that may be NULL, which makes
"&(cred)->usage" be a NULL pointer too, and you get a page fault when
it tries to decrement the usage count.

Now, it goes without saying that the cred pointer should never *be*
NULL on a filp that is on the RCU freeing list, because we always
initialize file->f_cred when we allocate a file to the current creds.

So there's something odd going on. Possibly entirely unrelated memory
corruption.

Nothing obvious stands out, I think we'd need to see more of a pattern
of the problem to see what is up.

 Linus


[PATCH] ASoC: rockchip: Fix wrong allocation size of dapm routes

2017-09-28 Thread Jeffy Chen
The allocation size of dapm routes is wrong, correct it.

Fixes: d9f9c167edae ("ASoC: rockchip: Init dapm routes dynamically")
Signed-off-by: Jeffy Chen 
---

 sound/soc/rockchip/rk3399_gru_sound.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/sound/soc/rockchip/rk3399_gru_sound.c 
b/sound/soc/rockchip/rk3399_gru_sound.c
index 30eed83e8a13..008452e55ef8 100644
--- a/sound/soc/rockchip/rk3399_gru_sound.c
+++ b/sound/soc/rockchip/rk3399_gru_sound.c
@@ -493,14 +493,18 @@ static int rockchip_sound_of_parse_dais(struct device 
*dev,
struct device_node *np_codec;
struct snd_soc_dai_link *dai;
struct snd_soc_dapm_route *routes;
-   int i, index;
+   int i, index, max_num_routes;
 
card->dai_link = devm_kzalloc(dev, sizeof(rockchip_dais),
  GFP_KERNEL);
if (!card->dai_link)
return -ENOMEM;
 
-   routes = devm_kzalloc(dev, sizeof(rockchip_routes),
+   max_num_routes = 0;
+   for (i = 0; i < ARRAY_SIZE(rockchip_dais); i++)
+   max_num_routes += rockchip_routes[i].num_routes;
+
+   routes = devm_kzalloc(dev, max_num_routes * sizeof(*routes),
  GFP_KERNEL);
if (!routes)
return -ENOMEM;
-- 
2.11.0




Kernel panic - not syncing: Fatal exception in interrupt (file_free_rcu+0x14)

2017-09-28 Thread Kyle Sanderson
Not sure if the stack is crap or not, but this looks like an RCU crash?

https://i.imgur.com/sBnNe1p.jpg

Kyle.

FileServer ~ # uname -a
Linux FileServer.OpenWRT.local 4.12.5-gentoo #1 SMP PREEMPT Fri Aug 18
17:23:00 PDT 2017 x86_64 Intel(R) Atom(TM) CPU 330 @ 1.60GHz
GenuineIntel GNU/Linux
FileServer ~ # cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 28
model name  : Intel(R) Atom(TM) CPU  330   @ 1.60GHz
stepping: 2
microcode   : 0x20d
cpu MHz : 1999.917
cache size  : 512 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts nopl cpuid aperfmperf pni dtes64
monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm
bugs:
bogomips: 3999.83
clflush size: 64
cache_alignment : 64
address sizes   : 32 bits physical, 48 bits virtual
power management:


[PATCH 1/7] regulator: axp20x: Fix poly-phase bit offset for AXP803 DCDC5/6

2017-09-28 Thread Chen-Yu Tsai
The bit offset used to check if DCDC5 and DCDC6 are tied together in
poly-phase output is wrong. It was checking against a reserved bit,
which is always false.

In reality, neither the reference design layout nor actually produced
boards tie these two buck regulators together. But we should still
fix it, just in case.

Fixes: 1dbe0ccb0631 ("regulator: axp20x-regulator: add support for AXP803")
Signed-off-by: Chen-Yu Tsai 
Tested-by: Maxime Ripard 
---
 drivers/regulator/axp20x-regulator.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/regulator/axp20x-regulator.c 
b/drivers/regulator/axp20x-regulator.c
index f18b36dd57dd..376a99b7cf5d 100644
--- a/drivers/regulator/axp20x-regulator.c
+++ b/drivers/regulator/axp20x-regulator.c
@@ -590,7 +590,7 @@ static bool axp20x_is_polyphase_slave(struct axp20x_dev 
*axp20x, int id)
case AXP803_DCDC3:
return !!(reg & BIT(6));
case AXP803_DCDC6:
-   return !!(reg & BIT(7));
+   return !!(reg & BIT(5));
}
break;
 
-- 
2.14.2



[PATCH 4/7] ARM: dts: sunxi: Add dtsi for AXP81x PMIC

2017-09-28 Thread Chen-Yu Tsai
The AXP81x family of PMIC is used with the Allwinner A83T and H8 SoCs.
This includes the AXP813 and AXP818. There is no discernible difference
except the labeling. The AXP813 is paired with the A83T, while the
AXP818 is paired with the H8.

This patch adds a dtsi file for all the common bindings for these two
PMICs. Currently this is just listing all the regulator nodes. The
regulators are initialized based on their device node names.

In the future this would be expanded to include power supplies and
GPIO controllers.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/axp81x.dtsi | 139 ++
 1 file changed, 139 insertions(+)
 create mode 100644 arch/arm/boot/dts/axp81x.dtsi

diff --git a/arch/arm/boot/dts/axp81x.dtsi b/arch/arm/boot/dts/axp81x.dtsi
new file mode 100644
index ..73b761f850c5
--- /dev/null
+++ b/arch/arm/boot/dts/axp81x.dtsi
@@ -0,0 +1,139 @@
+/*
+ * Copyright 2017 Chen-Yu Tsai
+ *
+ * Chen-Yu Tsai 
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ * obtaining a copy of this software and associated documentation
+ * files (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use,
+ * copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following
+ * conditions:
+ *
+ * The above copyright notice and this permission notice shall be
+ * included in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/* AXP813/818 Integrated Power Management Chip */
+
+&axp81x {
+   interrupt-controller;
+   #interrupt-cells = <1>;
+
+   regulators {
+   /* Default work frequency for buck regulators */
+   x-powers,dcdc-freq = <3000>;
+
+   reg_dcdc1: dcdc1 {
+   };
+
+   reg_dcdc2: dcdc2 {
+   };
+
+   reg_dcdc3: dcdc3 {
+   };
+
+   reg_dcdc4: dcdc4 {
+   };
+
+   reg_dcdc5: dcdc5 {
+   };
+
+   reg_dcdc6: dcdc6 {
+   };
+
+   reg_dcdc7: dcdc7 {
+   };
+
+   reg_aldo1: aldo1 {
+   };
+
+   reg_aldo2: aldo2 {
+   };
+
+   reg_aldo3: aldo3 {
+   };
+
+   reg_dldo1: dldo1 {
+   };
+
+   reg_dldo2: dldo2 {
+   };
+
+   reg_dldo3: dldo3 {
+   };
+
+   reg_dldo4: dldo4 {
+   };
+
+   reg_eldo1: eldo1 {
+   };
+
+   reg_eldo2: eldo2 {
+   };
+
+   reg_eldo3: eldo3 {
+   };
+
+   reg_fldo1: fldo1 {
+   };
+
+   reg_fldo2: fldo2 {
+   };
+
+   reg_fldo3: fldo3 {
+   };
+
+   reg_ldo_io0: ldo-io0 {
+   /* Disable by default to avoid conflicts with GPIO */
+   status = "disabled";
+   };
+
+   reg_ldo_io1: ldo-io1 {
+   /* Disable by default to avoid conflicts with GPIO */
+   status = "disabled";
+   };
+
+   reg_rtc_ldo: rtc-ldo {
+   /* RTC_LDO is a fixed, always-on regulator */
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   };
+
+

[PATCH 2/7] regulator: axp20x: Add support for AXP813 regulators

2017-09-28 Thread Chen-Yu Tsai
The AXP813 PMIC has 7 DC-DC buck regulators, 16 LDOs (including the
fixed RTC LDO and 2 GPIO LDOs), and 1 switchable. The drive-vbus
feature is also supported. All the hardware details are very similar
to the AXP803, with the following exceptions:

  - Extra DCDC7 buck regulator, with the same range as DCDC6

  - SWitch now has a separate supply pin, instead of being chained
internaly from DCDC1

  - RTC LDO output voltage is now 1.8V

  - FLDO3 is an LDO with switchable supplies, but unconfigurable output
voltage. The voltage is always half that of its supply.

Support for FLDO3 is currently unimplemented, as it requires runtime
switching of its supplies, something the regulator subsystem does not
support. It is not used in either the reference designs nor actually
produced boards available.

Signed-off-by: Chen-Yu Tsai 
Tested-by: Maxime Ripard 
---
 drivers/regulator/axp20x-regulator.c | 102 +--
 include/linux/mfd/axp20x.h   |   3 ++
 2 files changed, 101 insertions(+), 4 deletions(-)

diff --git a/drivers/regulator/axp20x-regulator.c 
b/drivers/regulator/axp20x-regulator.c
index 376a99b7cf5d..e1761df4cbfd 100644
--- a/drivers/regulator/axp20x-regulator.c
+++ b/drivers/regulator/axp20x-regulator.c
@@ -244,6 +244,7 @@ static const struct regulator_desc 
axp22x_drivevbus_regulator = {
.ops= &axp20x_ops_sw,
 };
 
+/* DCDC ranges shared with AXP813 */
 static const struct regulator_linear_range axp803_dcdc234_ranges[] = {
REGULATOR_LINEAR_RANGE(50, 0x0, 0x46, 1),
REGULATOR_LINEAR_RANGE(122, 0x47, 0x4b, 2),
@@ -426,6 +427,69 @@ static const struct regulator_desc axp809_regulators[] = {
AXP_DESC_SW(AXP809, SW, "sw", "swin", AXP22X_PWR_OUT_CTRL2, BIT(6)),
 };
 
+static const struct regulator_desc axp813_regulators[] = {
+   AXP_DESC(AXP813, DCDC1, "dcdc1", "vin1", 1600, 3400, 100,
+AXP803_DCDC1_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL1, BIT(0)),
+   AXP_DESC_RANGES(AXP813, DCDC2, "dcdc2", "vin2", axp803_dcdc234_ranges,
+   76, AXP803_DCDC2_V_OUT, 0x7f, AXP22X_PWR_OUT_CTRL1,
+   BIT(1)),
+   AXP_DESC_RANGES(AXP813, DCDC3, "dcdc3", "vin3", axp803_dcdc234_ranges,
+   76, AXP803_DCDC3_V_OUT, 0x7f, AXP22X_PWR_OUT_CTRL1,
+   BIT(2)),
+   AXP_DESC_RANGES(AXP813, DCDC4, "dcdc4", "vin4", axp803_dcdc234_ranges,
+   76, AXP803_DCDC4_V_OUT, 0x7f, AXP22X_PWR_OUT_CTRL1,
+   BIT(3)),
+   AXP_DESC_RANGES(AXP813, DCDC5, "dcdc5", "vin5", axp803_dcdc5_ranges,
+   68, AXP803_DCDC5_V_OUT, 0x7f, AXP22X_PWR_OUT_CTRL1,
+   BIT(4)),
+   AXP_DESC_RANGES(AXP813, DCDC6, "dcdc6", "vin6", axp803_dcdc6_ranges,
+   72, AXP803_DCDC6_V_OUT, 0x7f, AXP22X_PWR_OUT_CTRL1,
+   BIT(5)),
+   AXP_DESC_RANGES(AXP813, DCDC7, "dcdc7", "vin7", axp803_dcdc6_ranges,
+   72, AXP813_DCDC7_V_OUT, 0x7f, AXP22X_PWR_OUT_CTRL1,
+   BIT(6)),
+   AXP_DESC(AXP813, ALDO1, "aldo1", "aldoin", 700, 3300, 100,
+AXP22X_ALDO1_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL3, BIT(5)),
+   AXP_DESC(AXP813, ALDO2, "aldo2", "aldoin", 700, 3300, 100,
+AXP22X_ALDO2_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL3, BIT(6)),
+   AXP_DESC(AXP813, ALDO3, "aldo3", "aldoin", 700, 3300, 100,
+AXP22X_ALDO3_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL3, BIT(7)),
+   AXP_DESC(AXP813, DLDO1, "dldo1", "dldoin", 700, 3300, 100,
+AXP22X_DLDO1_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(3)),
+   AXP_DESC_RANGES(AXP813, DLDO2, "dldo2", "dldoin", axp803_dldo2_ranges,
+   32, AXP22X_DLDO2_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2,
+   BIT(4)),
+   AXP_DESC(AXP813, DLDO3, "dldo3", "dldoin", 700, 3300, 100,
+AXP22X_DLDO3_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(5)),
+   AXP_DESC(AXP813, DLDO4, "dldo4", "dldoin", 700, 3300, 100,
+AXP22X_DLDO4_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(6)),
+   AXP_DESC(AXP813, ELDO1, "eldo1", "eldoin", 700, 1900, 50,
+AXP22X_ELDO1_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(0)),
+   AXP_DESC(AXP813, ELDO2, "eldo2", "eldoin", 700, 1900, 50,
+AXP22X_ELDO2_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(1)),
+   AXP_DESC(AXP813, ELDO3, "eldo3", "eldoin", 700, 1900, 50,
+AXP22X_ELDO3_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(2)),
+   /* to do / check ... */
+   AXP_DESC(AXP813, FLDO1, "fldo1", "fldoin", 700, 1450, 50,
+AXP803_FLDO1_V_OUT, 0x0f, AXP22X_PWR_OUT_CTRL3, BIT(2)),
+   AXP_DESC(AXP813, FLDO2, "fldo2", "fldoin", 700, 1450, 50,
+AXP803_FLDO2_V_OUT, 0x0f, AXP22X_PWR_OUT_CTRL3, BIT(3)),
+   /*
+* TODO: FLDO3 = {DCDC5, FLDOIN} / 2
+*
+* This means FLDO3 effectively switches s

Re: [RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken exclusively

2017-09-28 Thread Linus Torvalds
On Thu, Sep 28, 2017 at 6:53 PM, Mimi Zohar  wrote:
>
> The locking issue isn't with validating the file hash, but with the
> setxattr, chmod, chown syscalls.  Each of these syscalls takes the
> i_rwsem exclusively before IMA (or EVM) is called.

Read my email again.

> In setxattr, chmod, chown syscalls, IMA (and EVM) are called after the
> i_rwsem is already taken.  So the locking would be:
>
> lock: i_rwsem
> lock: iint->mutex

No.

Two locks. One inner, one outer. Only the actual ones that calculates
the hash would take the outer one. Read my email.

   Linus


[PATCH 3/7] mfd: axp20x: Add axp20x-regulator cell for AXP813

2017-09-28 Thread Chen-Yu Tsai
Now that axp20x-regulator supports AXP813, we can add a cell for it
to enable it.

Signed-off-by: Chen-Yu Tsai 
Tested-by: Maxime Ripard 
---
 drivers/mfd/axp20x.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/mfd/axp20x.c b/drivers/mfd/axp20x.c
index 336de66ca408..2468b431bb22 100644
--- a/drivers/mfd/axp20x.c
+++ b/drivers/mfd/axp20x.c
@@ -876,6 +876,8 @@ static struct mfd_cell axp813_cells[] = {
.name   = "axp221-pek",
.num_resources  = ARRAY_SIZE(axp803_pek_resources),
.resources  = axp803_pek_resources,
+   }, {
+   .name   = "axp20x-regulator",
}
 };
 
-- 
2.14.2



[PATCH 0/7] regulator: axp20x: Add support for AXP813/818 regulators

2017-09-28 Thread Chen-Yu Tsai
Hi everyone,

This series adds support for the X-Powers AXP813/818 [1] PMICs'
regulators. The series is quite straightforward. There are no compile
time dependencies between the driver patches, so each can go through
their respective (mfd and regulator) trees.

Patch 1 fixes a wrong bit offset for the AXP803 DCDC5/6 poly-phase
detection code. This code path is not exercised as we don't have any
boards that tie these two outputs together.

Patch 2 adds driver support for the AXP813 regulators. The DT binding
part was merged together with the PMIC compatible string and basic
descriptions.

Patch 3 adds a axp20x-regulator cell for AXP813, thereby enabling the
regulators.

Patch 4 adds a shared dtsi file for the PMIC. This currently contains
a list of regulator nodes, but will be expanded with Quentin's power
supply work.

Patches 5 through 7 add regulator nodes to board dts files for the A83T
boards that I have. They are not squashed together as each file has
substantial additions.

Originally my work also included enabling SDIO WiFi and Ethernet. But
the Ethernet bindings were reverted, and SDIO probing somehow didn't
work after v4.14-rc1. Everything can be found here:

https://github.com/wens/linux/tree/a83t-regulator-wifi-eth

Please have a look and merge if everything looks OK.


Regards
ChenYu


[1] AXP813 and AXP818 are functionally identical. They have different
labels and are bundled with different SoCs (A83T and H8), as a sort
of product or market segmentation.


Chen-Yu Tsai (7):
  regulator: axp20x: Fix poly-phase bit offset for AXP803 DCDC5/6
  regulator: axp20x: Add support for AXP813 regulators
  mfd: axp20x: Add axp20x-regulator cell for AXP813
  ARM: dts: sunxi: Add dtsi for AXP81x PMIC
  ARM: dts: sun8i: a83t: cubietruck-plus: Add AXP818 regulator nodes
  ARM: dts: sun8i: a83t: bananapi-m3: Add AXP813 regulator nodes
  ARM: dts: sun8i: a83t: allwinner-h8homlet-v2: Add AXP818 regulator
nodes

 .../{sun8i-a83t-bananapi-m3.dts => axp81x.dtsi}| 157 ++---
 .../boot/dts/sun8i-a83t-allwinner-h8homlet-v2.dts  | 126 -
 arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts   | 134 +-
 arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts   | 150 +++-
 drivers/mfd/axp20x.c   |   2 +
 drivers/regulator/axp20x-regulator.c   | 104 +-
 include/linux/mfd/axp20x.h |   3 +
 7 files changed, 582 insertions(+), 94 deletions(-)
 copy arch/arm/boot/dts/{sun8i-a83t-bananapi-m3.dts => axp81x.dtsi} (52%)

-- 
2.14.2



[PATCH 7/7] ARM: dts: sun8i: a83t: allwinner-h8homlet-v2: Add AXP818 regulator nodes

2017-09-28 Thread Chen-Yu Tsai
This patch adds device nodes for all the regulators of the AXP818 PMIC.
References to the 3V dummy regulator are replaced, and it is disabled.
The 3.3V and 5V are also disabled.

Signed-off-by: Chen-Yu Tsai 
---
 .../boot/dts/sun8i-a83t-allwinner-h8homlet-v2.dts  | 126 -
 1 file changed, 124 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/sun8i-a83t-allwinner-h8homlet-v2.dts 
b/arch/arm/boot/dts/sun8i-a83t-allwinner-h8homlet-v2.dts
index 1f0d60afb25b..1c7371d6bbb2 100644
--- a/arch/arm/boot/dts/sun8i-a83t-allwinner-h8homlet-v2.dts
+++ b/arch/arm/boot/dts/sun8i-a83t-allwinner-h8homlet-v2.dts
@@ -65,7 +65,7 @@
 &mmc0 {
pinctrl-names = "default";
pinctrl-0 = <&mmc0_pins>;
-   vmmc-supply = <®_vcc3v0>;
+   vmmc-supply = <®_dcdc1>;
cd-gpios = <&pio 5 6 GPIO_ACTIVE_HIGH>; /* PF6 */
bus-width = <4>;
cd-inverted;
@@ -75,7 +75,8 @@
 &mmc2 {
pinctrl-names = "default";
pinctrl-0 = <&mmc2_8bit_emmc_pins>;
-   vmmc-supply = <®_vcc3v0>;
+   vmmc-supply = <®_dcdc1>;
+   vqmmc-supply = <®_dcdc1>;
bus-width = <8>;
non-removable;
cap-mmc-hw-reset;
@@ -104,6 +105,8 @@
reg = <0x3a3>;
interrupt-parent = <&r_intc>;
interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
+   eldoin-supply = <®_dcdc1>;
+   swin-supply = <®_dcdc1>;
};
 
ac100: codec@e89 {
@@ -131,6 +134,125 @@
};
 };
 
+#include "axp81x.dtsi"
+
+®_aldo1 {
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-name = "vcc18-csi2-dsi-efuse-hdmi";
+};
+
+®_aldo2 {
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-name = "vdd-drampll-vcc18-pll-adc-cpvdd-ldoin";
+};
+
+®_aldo3 {
+   regulator-always-on;
+   regulator-min-microvolt = <300>;
+   regulator-max-microvolt = <300>;
+   regulator-name = "vcc-pl-avcc";
+};
+
+®_dcdc1 {
+   regulator-always-on;
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vcc-3v3";
+};
+
+®_dcdc2 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpua";
+};
+
+®_dcdc3 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpub";
+};
+
+®_dcdc4 {
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-gpu";
+};
+
+®_dcdc5 {
+   regulator-always-on;
+   regulator-min-microvolt = <150>;
+   regulator-max-microvolt = <150>;
+   regulator-name = "vcc-dram";
+};
+
+®_dcdc6 {
+   regulator-always-on;
+   regulator-min-microvolt = <90>;
+   regulator-max-microvolt = <90>;
+   regulator-name = "vdd-sys-vdd09-usb0-hdmi";
+};
+
+®_dldo2 {
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vcc-mipi-3v3";
+};
+
+®_dldo4 {
+   /*
+* The PHY requires 20ms after all voltages are applied until core
+* logic is ready and 30ms after the reset pin is de-asserted.
+* Set a 100ms delay to account for PMIC ramp time and board traces.
+*/
+   regulator-enable-ramp-delay = <10>;
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vdd33-pd-ave-ephy";
+};
+
+®_fldo1 {
+   regulator-min-microvolt = <108>;
+   regulator-max-microvolt = <132>;
+   regulator-name = "vdd12-hsic";
+};
+
+®_fldo2 {
+   /*
+* Despite the embedded CPUs core not being used in any way,
+* this must remain on or the system will hang.
+*/
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpus";
+};
+
+®_rtc_ldo {
+   regulator-name = "vcc-rtc-vdd1v8-io-vdd18-lvds";
+};
+
+®_sw {
+   regulator-name = "vcc-wifi";
+};
+
+®_vcc3v0 {
+   status = "disabled";
+};
+
+®_vcc3v3 {
+   status = "disabled";
+};
+
+®_vcc5v0 {
+   status = "disabled";
+};
+
 &uart0 {
pinctrl-names = "default";
pinctrl-0 = <&uart0_pb_pins>;
-- 
2.14.2



[PATCH 6/7] ARM: dts: sun8i: a83t: bananapi-m3: Add AXP813 regulator nodes

2017-09-28 Thread Chen-Yu Tsai
This patch adds device nodes for all the regulators of the AXP813 PMIC.
References to the 3.3V dummy regulator are replaced, and it is disabled.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts | 134 ++-
 1 file changed, 132 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts 
b/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts
index 2bafd7e99ef7..c7dae2e5a668 100644
--- a/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts
+++ b/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts
@@ -71,7 +71,7 @@
 &mmc0 {
pinctrl-names = "default";
pinctrl-0 = <&mmc0_pins>;
-   vmmc-supply = <®_vcc3v3>;
+   vmmc-supply = <®_dcdc1>;
bus-width = <4>;
cd-gpios = <&pio 5 6 GPIO_ACTIVE_HIGH>; /* PF6 */
cd-inverted;
@@ -81,7 +81,7 @@
 &mmc2 {
pinctrl-names = "default";
pinctrl-0 = <&mmc2_8bit_emmc_pins>;
-   vmmc-supply = <®_vcc3v3>;
+   vmmc-supply = <®_dcdc1>;
bus-width = <8>;
non-removable;
cap-mmc-hw-reset;
@@ -96,6 +96,10 @@
reg = <0x3a3>;
interrupt-parent = <&r_intc>;
interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
+   eldoin-supply = <®_dcdc1>;
+   fldoin-supply = <®_dcdc5>;
+   swin-supply = <®_dcdc1>;
+   x-powers,drive-vbus-en;
};
 
ac100: codec@e89 {
@@ -123,6 +127,128 @@
};
 };
 
+#include "axp81x.dtsi"
+
+®_aldo1 {
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-name = "vcc18-csi2-dsi-efuse-hdmi";
+};
+
+®_aldo2 {
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-name = "vdd-drampll-vcc18-pll-adc-cpvdd-ldoin";
+};
+
+®_aldo3 {
+   regulator-always-on;
+   regulator-min-microvolt = <300>;
+   regulator-max-microvolt = <300>;
+   regulator-name = "vcc-pl-avcc";
+};
+
+®_dcdc1 {
+   /* schematics says 3.1V but FEX file says 3.3V */
+   regulator-always-on;
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vcc-3v3";
+};
+
+®_dcdc2 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpua";
+};
+
+®_dcdc3 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpub";
+};
+
+®_dcdc4 {
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-gpu";
+};
+
+®_dcdc5 {
+   regulator-always-on;
+   regulator-min-microvolt = <120>;
+   regulator-max-microvolt = <120>;
+   regulator-name = "vcc-dram";
+};
+
+®_dcdc6 {
+   regulator-always-on;
+   regulator-min-microvolt = <90>;
+   regulator-max-microvolt = <90>;
+   regulator-name = "vdd-sys-vdd09-usb0-hdmi";
+};
+
+®_dldo1 {
+   /*
+* This powers both the WiFi/BT module's main power, I/O supply,
+* and external pull-ups on all the data lines. It should be set
+* to the same voltage as the I/O supply (DCDC1 in this case) to
+* avoid any leakage or mismatch.
+*/
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vcc-wifi";
+};
+
+®_dldo3 {
+   regulator-always-on;
+   regulator-min-microvolt = <250>;
+   regulator-max-microvolt = <250>;
+   regulator-name = "vcc-pd";
+};
+
+®_drivevbus {
+   regulator-name = "usb0-vbus";
+   status = "okay";
+};
+
+®_fldo1 {
+   regulator-min-microvolt = <108>;
+   regulator-max-microvolt = <132>;
+   regulator-name = "vdd12-hsic";
+};
+
+®_fldo2 {
+   /*
+* Despite the embedded CPUs core not being used in any way,
+* this must remain on or the system will hang.
+*/
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpus";
+};
+
+®_rtc_ldo {
+   regulator-name = "vcc-rtc-vdd1v8-io-vdd18-lvds";
+};
+
+®_sw {
+   /*
+* The PHY requires 20ms after all voltages
+* are applied until core logic is ready and
+* 30ms after the reset pin is de-asserted.
+* Set a 100ms delay to account for PMIC
+* ramp time and board traces.
+*/
+   regulator-enable-ramp-delay = <10>;
+   regulator-name = "vcc-gmac";
+};
+
 ®_usb1_vbus {
gpio = <&pio 3 24 GPIO_ACTIVE_HIGH>; /* PD24 */
status = "okay";
@@ -132,6 +258,10 @@
status = "disabled";
 };
 
+®_vcc3v3 {
+   status = "disabled";
+};
+
 ®_vcc5v0 {
status = "disabled";
 };
-- 
2.14.2



[PATCH 5/7] ARM: dts: sun8i: a83t: cubietruck-plus: Add AXP818 regulator nodes

2017-09-28 Thread Chen-Yu Tsai
This patch adds device nodes for all the regulators of the AXP818 PMIC.
References to the 3.3V dummy regulator are replaced, and it is disabled.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts | 150 ++-
 1 file changed, 148 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts 
b/arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts
index 716a205c6dbb..7e1b1f6ca5f4 100644
--- a/arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts
+++ b/arch/arm/boot/dts/sun8i-a83t-cubietruck-plus.dts
@@ -127,7 +127,7 @@
 &mmc0 {
pinctrl-names = "default";
pinctrl-0 = <&mmc0_pins>;
-   vmmc-supply = <®_vcc3v3>;
+   vmmc-supply = <®_dcdc1>;
bus-width = <4>;
cd-gpios = <&pio 5 6 GPIO_ACTIVE_HIGH>; /* PF6 */
cd-inverted;
@@ -137,7 +137,7 @@
 &mmc2 {
pinctrl-names = "default";
pinctrl-0 = <&mmc2_8bit_emmc_pins>;
-   vmmc-supply = <®_vcc3v3>;
+   vmmc-supply = <®_dcdc1>;
bus-width = <8>;
non-removable;
cap-mmc-hw-reset;
@@ -152,6 +152,9 @@
reg = <0x3a3>;
interrupt-parent = <&r_intc>;
interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
+   eldoin-supply = <®_dcdc1>;
+   swin-supply = <®_dcdc1>;
+   x-powers,drive-vbus-en;
};
 
ac100: codec@e89 {
@@ -179,6 +182,145 @@
};
 };
 
+#include "axp81x.dtsi"
+
+®_aldo1 {
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-name = "vcc18-csi2-dsi-efuse-hdmi-d4dp";
+};
+
+®_aldo2 {
+   regulator-always-on;
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-name = "vdd-drampll-vcc18-pll-adc-cpvdd-ldoin";
+};
+
+®_aldo3 {
+   regulator-always-on;
+   regulator-min-microvolt = <300>;
+   regulator-max-microvolt = <300>;
+   regulator-name = "vcc-pl-avcc";
+};
+
+®_dcdc1 {
+   /*
+* The schematics say this should be 3.3V, but the FEX file says
+* it should be 3V. The latter makes sense, as the WiFi module's
+* I/O is indirectly powered from DCDC1, through SW. It is rated
+* at 2.98V maximum.
+*/
+   regulator-always-on;
+   regulator-min-microvolt = <300>;
+   regulator-max-microvolt = <300>;
+   regulator-name = "vcc-3v";
+};
+
+®_dcdc2 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpua";
+};
+
+®_dcdc3 {
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-cpub";
+};
+
+®_dcdc4 {
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   regulator-name = "vdd-gpu";
+};
+
+®_dcdc5 {
+   regulator-always-on;
+   regulator-min-microvolt = <150>;
+   regulator-max-microvolt = <150>;
+   regulator-name = "vcc-dram";
+};
+
+®_dcdc6 {
+   regulator-always-on;
+   regulator-min-microvolt = <90>;
+   regulator-max-microvolt = <90>;
+   regulator-name = "vdd-sys-vdd09-usb0-hdmi";
+};
+
+®_dldo2 {
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vcc-mipi-3v3-d4dpio";
+};
+
+®_dldo3 {
+   regulator-always-on;
+   regulator-min-microvolt = <250>;
+   regulator-max-microvolt = <250>;
+   regulator-name = "vcc-pd-vdd25-ephy";
+};
+
+®_dldo4 {
+   /*
+* The PHY requires 20ms after all voltages are applied until core
+* logic is ready and 30ms after the reset pin is de-asserted.
+* Set a 100ms delay to account for PMIC ramp time and board traces.
+*/
+   regulator-enable-ramp-delay = <10>;
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-name = "vdd33-ephy";
+};
+
+®_drivevbus {
+   regulator-name = "usb0-vbus";
+   status = "okay";
+};
+
+®_eldo1 {
+   regulator-min-microvolt = <120>;
+   regulator-max-microvolt = <120>;
+   regulator-name = "vdd12-d4dp-1";
+};
+
+®_eldo2 {
+   regulator-min-microvolt = <120>;
+   regulator-max-microvolt = <120>;
+   regulator-name = "vdd12-d4dp-2";
+};
+
+®_fldo1 {
+   /* TODO should be handled by USB PHY */
+   regulator-always-on;
+   regulator-min-microvolt = <108>;
+   regulator-max-microvolt = <132>;
+   regulator-name = "vdd12-hsic";
+};
+
+®_fldo2 {
+   /*
+* Despite the embedded CPUs core not being used in any way,
+* this must remain on or the system will hang.
+*/
+   regulator-always-on;
+   regulator-min-microvolt = <70>;
+   regulator-max-microvolt = <110>;
+   re

[RFC PATCH v6 0/3] ACPI / EC: Tune the timing of EC events arrival during S3-exit

2017-09-28 Thread Lv Zheng
If EC events occurred during BIOS S3-exit and early OS S3-exit steps can
be detected by OS earlier, then there can be less driver order issues
between acpi_ec_resume() and some other drivers' .resume() hook (e.x.
acpi_button_resume()).

However there are known facts that EC FW does drop EC events during S3,
and it takes time for EC FW to initialize (maximum 1.4 observed) while
Windows acts normally, so detecting EC event earlier might just be a
workaround for other drivers (they should be aware of this order issue and
deal with it themselves). As such, this patchset is marked as an RFC.

If Linux EC driver started to detect events during early OS S3-exit, it
need to timely poll EC events during noirq stages as in this stage there is
no EC event triggering source.

This patchset implements earlier EC event handling for Linux.

Lv Zheng (3):
  ACPI / EC: Fix possible driver order issue by moving EC event handling
earlier
  ACPI / EC: Add event detection support for noirq stages
  ACPI / EC: Enable noirq stage event detection

 drivers/acpi/ec.c   | 128 +++-
 drivers/acpi/internal.h |   1 +
 2 files changed, 118 insertions(+), 11 deletions(-)

-- 
2.7.4



[RFC PATCH v6 1/3] ACPI / EC: Fix possible driver order issue by moving EC event handling earlier

2017-09-28 Thread Lv Zheng
This patch tries to detect EC events earlier after resume, so that if an
event occurred before invoking acpi_ec_unblock_transactions(), it could be
detected by acpi_ec_unblock_transactions() which is the earliest EC driver
call after resume.

However after the noirq stage, if an event ocurred after
acpi_ec_unblock_transactions() and before acpi_ec_resume(), there was no
mean to detect and trigger it right then, but can only detect it and handle
it after acpi_ec_resume().

Now the final logic is:
1. If ec_freeze_events=Y, event handling is stopped in acpi_ec_suspend(),
   restarted in acpi_ec_resume();
2. If ec_freeze_events=N, event handling is stopped in
   acpi_ec_block_transactions(), restarted in
   acpi_ec_unblock_transactions();
3. In order to handling the conflict of the edge-trigger nature of EC IRQ
   and the Linux noirq stage, advance_transaction() is invoked where the
   event handling is enabled and the noirq stage is ended.

Known issue:
1. Event ocurred between acpi_ec_unblock_transactions() and
   acpi_ec_resume() may still lead to the order issue. This can only be
   fixed by adding a periodic detection mechanism during the noirq stage.

Signed-off-by: Lv Zheng 
Tested-by: Tomislav Ivek 
Tested-by: Luya Tshimbalanga 
---
 drivers/acpi/ec.c | 35 ++-
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index df84246..f1f320b 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -249,6 +249,11 @@ static bool acpi_ec_started(struct acpi_ec *ec)
   !test_bit(EC_FLAGS_STOPPED, &ec->flags);
 }
 
+static bool acpi_ec_no_sleep_events(void)
+{
+   return acpi_sleep_no_ec_events() && ec_freeze_events;
+}
+
 static bool acpi_ec_event_enabled(struct acpi_ec *ec)
 {
/*
@@ -260,14 +265,14 @@ static bool acpi_ec_event_enabled(struct acpi_ec *ec)
return false;
/*
 * However, disabling the event handling is experimental for late
-* stage (suspend), and is controlled by the boot parameter of
-* "ec_freeze_events":
+* stage (suspend), and is controlled by
+* "acpi_ec_no_sleep_events()":
 * 1. true:  The EC event handling is disabled before entering
 *   the noirq stage.
 * 2. false: The EC event handling is automatically disabled as
 *   soon as the EC driver is stopped.
 */
-   if (ec_freeze_events)
+   if (acpi_ec_no_sleep_events())
return acpi_ec_started(ec);
else
return test_bit(EC_FLAGS_STARTED, &ec->flags);
@@ -524,8 +529,8 @@ static bool acpi_ec_query_flushed(struct acpi_ec *ec)
 static void __acpi_ec_flush_event(struct acpi_ec *ec)
 {
/*
-* When ec_freeze_events is true, we need to flush events in
-* the proper position before entering the noirq stage.
+* When acpi_ec_no_sleep_events() is true, we need to flush events
+* in the proper position before entering the noirq stage.
 */
wait_event(ec->wait, acpi_ec_query_flushed(ec));
if (ec_query_wq)
@@ -948,7 +953,8 @@ static void acpi_ec_start(struct acpi_ec *ec, bool resuming)
if (!resuming) {
acpi_ec_submit_request(ec);
ec_dbg_ref(ec, "Increase driver");
-   }
+   } else if (!acpi_ec_no_sleep_events())
+   __acpi_ec_enable_event(ec);
ec_log_drv("EC started");
}
spin_unlock_irqrestore(&ec->lock, flags);
@@ -980,7 +986,7 @@ static void acpi_ec_stop(struct acpi_ec *ec, bool 
suspending)
if (!suspending) {
acpi_ec_complete_request(ec);
ec_dbg_ref(ec, "Decrease driver");
-   } else if (!ec_freeze_events)
+   } else if (!acpi_ec_no_sleep_events())
__acpi_ec_disable_event(ec);
clear_bit(EC_FLAGS_STARTED, &ec->flags);
clear_bit(EC_FLAGS_STOPPED, &ec->flags);
@@ -1910,7 +1916,7 @@ static int acpi_ec_suspend(struct device *dev)
struct acpi_ec *ec =
acpi_driver_data(to_acpi_device(dev));
 
-   if (acpi_sleep_no_ec_events() && ec_freeze_events)
+   if (acpi_ec_no_sleep_events())
acpi_ec_disable_event(ec);
return 0;
 }
@@ -1946,7 +1952,18 @@ static int acpi_ec_resume(struct device *dev)
struct acpi_ec *ec =
acpi_driver_data(to_acpi_device(dev));
 
-   acpi_ec_enable_event(ec);
+   if (acpi_ec_no_sleep_events())
+   acpi_ec_enable_event(ec);
+   else {
+   /*
+* Though whether there is an event pending has been
+* checked in acpi_ec_unblock_transactions() when
+* acpi_ec_no_sleep_events() is false, check it one more
+* time after noirq stage to detect events occurred after
+

[RFC PATCH v6 2/3] ACPI / EC: Add event detection support for noirq stages

2017-09-28 Thread Lv Zheng
This patch adds a timer to poll EC events:
1. between acpi_ec_suspend() and acpi_ec_block_transactions(),
2. between acpi_ec_unblock_transactions() and acpi_ec_resume().
During these periods, if an EC event occurred, we have not mean to detect
it. Thus the events occurred in late S3-entry could be dropped, and the
events occurred in early S3-exit could be deferred to acpi_ec_resume().

This patch solves event losses in S3-entry and resume order in S3-exit by
timely polling EC events during these periods.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=196129 [#1]
Signed-off-by: Lv Zheng 
Tested-by: Tomislav Ivek 
---
 drivers/acpi/ec.c   | 93 +++--
 drivers/acpi/internal.h |  1 +
 2 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index f1f320b..389c499 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "internal.h"
@@ -102,6 +103,7 @@ enum ec_command {
 #define ACPI_EC_CLEAR_MAX  100 /* Maximum number of events to query
 * when trying to clear the EC */
 #define ACPI_EC_MAX_QUERIES16  /* Maximum number of parallel queries */
+#define ACPI_EC_EVENT_INTERVAL 500 /* Detecting event every 500ms */
 
 enum {
EC_FLAGS_QUERY_ENABLED, /* Query is enabled */
@@ -113,6 +115,7 @@ enum {
EC_FLAGS_STARTED,   /* Driver is started */
EC_FLAGS_STOPPED,   /* Driver is stopped */
EC_FLAGS_GPE_MASKED,/* GPE masked */
+   EC_FLAGS_GPE_POLLING,   /* GPE polling is enabled */
 };
 
 #define ACPI_EC_COMMAND_POLL   0x01 /* Available for command byte */
@@ -154,6 +157,15 @@ static bool ec_no_wakeup __read_mostly;
 module_param(ec_no_wakeup, bool, 0644);
 MODULE_PARM_DESC(ec_no_wakeup, "Do not wake up from suspend-to-idle");
 
+static bool ec_detect_noirq_events __read_mostly;
+module_param(ec_detect_noirq_events, bool, 0644);
+MODULE_PARM_DESC(ec_detect_noirq_events, "Enabling event detection during 
noirq stage");
+
+static unsigned int
+ec_detect_noirq_interval __read_mostly = ACPI_EC_EVENT_INTERVAL;
+module_param(ec_detect_noirq_interval, uint, 0644);
+MODULE_PARM_DESC(ec_detect_noirq_interval, "Event detection interval(ms) 
during noirq stage");
+
 struct acpi_ec_query_handler {
struct list_head node;
acpi_ec_query_func func;
@@ -358,6 +370,48 @@ static inline bool acpi_ec_is_gpe_raised(struct acpi_ec 
*ec)
return (gpe_status & ACPI_EVENT_FLAG_STATUS_SET) ? true : false;
 }
 
+static void acpi_ec_gpe_tick(struct acpi_ec *ec)
+{
+   mod_timer(&ec->timer,
+ jiffies + msecs_to_jiffies(ec_detect_noirq_interval));
+}
+
+static void ec_start_gpe_poller(struct acpi_ec *ec)
+{
+   unsigned long flags;
+   bool start_tick = false;
+
+   if (!acpi_ec_no_sleep_events() || !ec_detect_noirq_events)
+   return;
+   spin_lock_irqsave(&ec->lock, flags);
+   if (!test_and_set_bit(EC_FLAGS_GPE_POLLING, &ec->flags)) {
+   ec_log_drv("GPE poller started");
+   start_tick = true;
+   /* kick off GPE polling without delay */
+   advance_transaction(ec);
+   }
+   spin_unlock_irqrestore(&ec->lock, flags);
+   if (start_tick)
+   acpi_ec_gpe_tick(ec);
+}
+
+static void ec_stop_gpe_poller(struct acpi_ec *ec)
+{
+   unsigned long flags;
+   bool stop_tick = false;
+
+   if (!acpi_ec_no_sleep_events() || !ec_detect_noirq_events)
+   return;
+   spin_lock_irqsave(&ec->lock, flags);
+   if (test_and_clear_bit(EC_FLAGS_GPE_POLLING, &ec->flags))
+   stop_tick = true;
+   spin_unlock_irqrestore(&ec->lock, flags);
+   if (stop_tick) {
+   del_timer_sync(&ec->timer);
+   ec_log_drv("GPE poller stopped");
+   }
+}
+
 static inline void acpi_ec_enable_gpe(struct acpi_ec *ec, bool open)
 {
if (open)
@@ -1017,6 +1071,12 @@ static void acpi_ec_leave_noirq(struct acpi_ec *ec)
spin_unlock_irqrestore(&ec->lock, flags);
 }
 
+/*
+ * Note: this API is prepared for tuning the order of the ACPI
+ * suspend/resume steps as the last entry of EC during suspend, thus it
+ * must be invoked after acpi_ec_suspend() or everything should be done in
+ * acpi_ec_suspend().
+ */
 void acpi_ec_block_transactions(void)
 {
struct acpi_ec *ec = first_ec;
@@ -1028,16 +1088,28 @@ void acpi_ec_block_transactions(void)
/* Prevent transactions from being carried out */
acpi_ec_stop(ec, true);
mutex_unlock(&ec->mutex);
+   ec_stop_gpe_poller(ec);
 }
 
+/*
+ * Note: this API is prepared for tuning the order of the ACPI
+ * suspend/resume steps as the first entry of EC during resume, thus it
+ * must be invoked before acpi_ec_resume() or everything should be done in
+ * acpi_ec_

[RFC PATCH v6 3/3] ACPI / EC: Enable noirq stage event detection

2017-09-28 Thread Lv Zheng
This patch enables noirq stage event detection for the EC driver.

EC is a very special driver, required to detecting events throughout the
entire suspend/resume process. Thus this patch enables event detection for
EC during noirq stages to meet this requirement. This is done by making
sure that the EC sleep APIs:
  acpi_ec_block_transactions()
  acpi_ec_unblock_transactions()
rather than the EC driver suspend/resume hooks:
  acpi_ec_suspend()
  acpi_ec_resume()
are the boundary of the EC event handling during suspend/resume, so that
the ACPI sleep core can tune their invocation timing to handle special BIOS
requirements.

If this commit is bisected to be a regression culprit, please report this
to bugzilla.kernel.org for further investigation.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=196129
Signed-off-by: Lv Zheng 
Tested-by: Tomislav Ivek 
---
 drivers/acpi/ec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index 389c499..a48a2b3 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -157,7 +157,7 @@ static bool ec_no_wakeup __read_mostly;
 module_param(ec_no_wakeup, bool, 0644);
 MODULE_PARM_DESC(ec_no_wakeup, "Do not wake up from suspend-to-idle");
 
-static bool ec_detect_noirq_events __read_mostly;
+static bool ec_detect_noirq_events __read_mostly = true;
 module_param(ec_detect_noirq_events, bool, 0644);
 MODULE_PARM_DESC(ec_detect_noirq_events, "Enabling event detection during 
noirq stage");
 
-- 
2.7.4



Re: [kbuild-all] [0-Day CI notification] 0-Day kernel test service will be shut down from Sep 30 3PM to Oct 5

2017-09-28 Thread Fengguang Wu

CC LKML. Sorry it's a site level power down during the 10.1 holidays. :(

On Fri, Sep 29, 2017 at 10:12:20AM +0800, Philip Li wrote:

Hi all, this is Philip who maintains the 0-Day kernel test service. Thanks for
subscribing to 0-Day kernel testing. We will have lab power down from Oct 1
to Oct 5, so that the service will be shut down from Asia Pacific Time Sep 30 
3PM
and will recover from Oct 6 as soon as we can. Sorry for any inconvenience 
caused
due to the service shut down.

Thanks
___
kbuild-all mailing list
kbuild-...@lists.01.org
https://lists.01.org/mailman/listinfo/kbuild-all


Re: [lkp-robot] [mac80211] 31e9170bde: hwsim.sta_dynamic_down_up.fail

2017-09-28 Thread Xiang Gao
Thanks, I will look into it.
Xiang Gao


2017-09-28 4:06 GMT-04:00 kernel test robot :
>
> FYI, we noticed the following commit:
>
> commit: 31e9170bdeb6ebe66426337b4e2b9924683a412b ("mac80211: aead api to 
> reduce redundancy")
> url: 
> https://github.com/0day-ci/linux/commits/Xiang-Gao/mac80211-aead-api-to-reduce-redundancy/20170926-053110
> base: https://git.kernel.org/cgit/linux/kernel/git/jberg/mac80211-next.git 
> master
>
> in testcase: hwsim
> with following parameters:
>
> group: hwsim-10
>
>
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 2G
>
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
>
>
> 2017-09-27 16:04:27 ./run-tests.py sta_dynamic_down_up
> DEV: wlan0: 02:00:00:00:00:00
> DEV: wlan1: 02:00:00:00:01:00
> DEV: wlan2: 02:00:00:00:02:00
> APDEV: wlan3
> APDEV: wlan4
> START sta_dynamic_down_up 1/1
> Test: Dynamically added wpa_supplicant interface down/up
> Starting AP wlan3
> Create a dynamic wpa_supplicant interface and connect
> Connect STA wlan5 to AP
> dev1->dev2 unicast data delivery failed
> Traceback (most recent call last):
>   File "./run-tests.py", line 453, in main
> t(dev, apdev)
>   File "/lkp/benchmarks/hwsim/tests/hwsim/test_sta_dynamic.py", line 122, in 
> test_sta_dynamic_down_up
> hwsim_utils.test_connectivity(wpas, hapd)
>   File "/lkp/benchmarks/hwsim/tests/hwsim/hwsim_utils.py", line 165, in 
> test_connectivity
> raise Exception(last_err)
> Exception: dev1->dev2 unicast data delivery failed
> FAIL sta_dynamic_down_up 5.397413 2017-09-27 16:04:32.540689
> passed 0 test case(s)
> skipped 0 test case(s)
> failed tests: sta_dynamic_down_up
>
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp qemu -k  job-script  # job-script is attached in 
> this email
>
>
>
> Thanks,
> Xiaolong


Re: [PATCH v6 2/2] tracing: Add support for preempt and irq enable/disable events

2017-09-28 Thread Joel Fernandes
Hi Steven, Peter,

I'm planning to make the following changes for the next rev, could you
let me know if you're Ok with it?

1. Drop the stop_critical_timings changes - previous patch was
generating the preempt_enable/disable events but they aren't "real"
events. Instead since we already have cpuidle trace events, we can
just rely on those for now to understand how much time was spent in
idle. A future patch could do something smarter.

2. Drop the recursion protection from trace_preempt_enable/disable.
The trace_preempt_enable/disable calls don't nest, so there's no need
to protect it with a per-cpu variable.

3. trace_irq_enable/disable on the other hand are called in this way,
so I'll add some comments about why per-cpu variable is needed.

thanks a lot,

- Joel

On Mon, Sep 25, 2017 at 3:57 PM, Joel Fernandes  wrote:
> On Mon, Sep 25, 2017 at 3:52 AM, Steven Rostedt  wrote:
>> On Mon, 25 Sep 2017 12:32:23 +0200
>> Peter Zijlstra  wrote:
>>
>>
>>> > You mean you want to trace all calls to preempt and irq off even if
>>> > preempt and irqs are already off?
>>>
>>> Sure, why not? This stuff naturally nests, and who is to say its not a
>>> useful thing to trace all of them?
>>>
>>> By also tracing the nested sections you can, for instance, see how much
>>> you'd really win by getting rid of the outer one. If, for instance, the
>>> outer only accounts for 1% of the time, while the inner ones are
>>> interlinked and span the other 99%, there's more work to do than if that
>>> were not the case.
>>
>> If we do this we need a field to record if the preemption or irqs were
>> toggled by that call. Something that filters could easily be added to
>> only show what this patch set has.
>
> I request that we please not do this for my patchset, there are a
> number of reasons in my mind:
>
> 1. trace_preempt_off in existing code is only called the first time
> preempt is turned off. This is the definition of "preempt off", its
> the first time Preemption is actually turned off, and has nothing much
> to do with going into a deeper preempt count. Whether the count
> increases or not, preempt is already off and that's confirmed by the
> first preempt off event.
>
> This is how I read it in the comments in sched/core.c as well:
> "
>  * If the value passed in is equal to the current preempt count
>  * then we just disabled preemption."
>
> This is how I based this patchset as well, againt its not my usecase
> and it can be a future patch if its useful to track this.
>
> 2. This stuff is already a bit trace heavy, I prefer not to generate
> event every time the preempt count increases. Ofcourse filters but
> still then we have the filtering overhead for a usecase that I am not
> really targetting with this patchset.
>
> 3. It will complicate the patch more, apart from adding filters as
> Steven suggested, it would also mean we change how
> preempt_latency_start in sched/core.c works.
>
> Do you mind if we please keep this as a 'future' usecase for a future
> patch? Its not my usecase at all for this patchset and not what I was
> intending.
>
> I will reply to Peter's other comments on the other email shortly.
>
> thanks!
>
> - Joel


[0-Day CI notification] 0-Day kernel test service will be shut down from Sep 30 3PM to Oct 5

2017-09-28 Thread Philip Li
Hi all, this is Philip who maintains the 0-Day kernel test service. Thanks for
subscribing to 0-Day kernel testing. We will have lab power down from Oct 1
to Oct 5, so that the service will be shut down from Asia Pacific Time Sep 30 
3PM
and will recover from Oct 6 as soon as we can. Sorry for any inconvenience 
caused
due to the service shut down.

Thanks


Re: linux-next: build failure after merge of the net-next tree

2017-09-28 Thread Florian Fainelli
Le 09/28/17 à 18:36, Stephen Rothwell a écrit :
> Hi all,
> 
> After merging the net-next tree, today's linux-next build (arm
> multi_v7_defconfig) failed like this:
> 
> net/dsa/slave.c: In function 'dsa_slave_create':
> net/dsa/slave.c:1191:18: error: 'struct dsa_slave_priv' has no member named 
> 'phy'
>   phy_disconnect(p->phy);
>   ^
> 
> Caused by commit
> 
>   0115dcd1787d ("net: dsa: use slave device phydev")
> 
> Interacting with commit
> 
>   e804441cfe0b ("net: dsa: Fix network device registration order")
> 
> from the net tree.
> 
> I applied the following merge fix patch (which I am not sure about):

Your resolution looks fine to me, thanks Stephen!

> 
> From: Stephen Rothwell 
> Date: Fri, 29 Sep 2017 11:28:45 +1000
> Subject: [PATCH] net: dsa: merge fix patch for removal of phy
> 
> Signed-off-by: Stephen Rothwell 
> ---
>  net/dsa/slave.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index 8869954485db..9191c929c6c8 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -1188,7 +1188,7 @@ int dsa_slave_create(struct dsa_port *port, const char 
> *name)
>   return 0;
>  
>  out_phy:
> - phy_disconnect(p->phy);
> + phy_disconnect(slave_dev->phydev);
>   if (of_phy_is_fixed_link(p->dp->dn))
>   of_phy_deregister_fixed_link(p->dp->dn);
>  out_free:
> 


-- 
Florian


Re: [PATCH for-next 2/9] RDMA/hns: Factor out the code for checking sdb status into a new function

2017-09-28 Thread Wei Hu (Xavier)



On 2017/9/28 21:50, Leon Romanovsky wrote:

On Thu, Sep 28, 2017 at 12:57:27PM +0800, Wei Hu (Xavier) wrote:

From: Lijun Ou 

It mainly places the lines for checking send doorbell status
into a special functions. As a result, we can directly call it in
check_qp_db_process_status function and keep consistent indenting
style.

It fixes: 5f110ac4bed8 ("IB/hns: Fix for checkpatch.pl comment style)

You forgot " at the end of the line, and there is need to put fixes
(should be Fixes) in the line before Signed-off-by.

Thanks, Leon
We will modify the statement(Fixes: xx)  and put it before signed-off-by 
in patch v2.



The warning from static checker:
drivers/infiniband/hw/hns/hns_roce_hw_v1.c:3562 check_qp_db_process_status()
warn: inconsistent indenting

Signed-off-by: Lijun Ou 
Signed-off-by: Wei Hu (Xavier) 
Signed-off-by: Shaobo Xu 
---
  drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 95 --
  1 file changed, 51 insertions(+), 44 deletions(-)






Re: [PATCH v3 0/4] Update TMDSEVM3530 support for omap3-evm

2017-09-28 Thread Derald D. Woods
On Tue, Sep 12, 2017 at 06:48:20PM -0500, Derald D. Woods wrote:
> This patch set allows TMDSEVM3530(omap3-evm.dts) to boot using common
> processor module data that is shared with 'omap3-evm-37xx.dts'. A new
> common file for processor module data is introduced to help facilitate
> the updated OMAP3530 support.
> 


Are there any further concerns after v3?

Derald


> Changes in v3
> -
> - Drop unnecessary compatible string change to Sharp LCD panel
> 
> Changes in v2
> -
> - Pull in change from linux-next
>   (ARM: dts: omap*: Replace deprecated "vmmc_aux" with "vqmmc")
> - Add compatible and supply fix for LCD panel
> - Add supply references for DSS
> - Add "Signed-off-by" for each patch
> 
> Derald D. Woods (4):
>   ARM: dts: omap3-evm-37xx: Add common processor module support
>   ARM: dts: omap3-evm: Add OMAP3530 specific device tree processor data
>   ARM: dts: omap3: Add Sharp LS037V7DW01 'envdd' supply
>   ARM: dts: omap3-evm: Add DSS {vdds_dsi,vdda_video}-supply references
> 
>  arch/arm/boot/dts/omap3-evm-37xx.dts   | 209 +---
>  arch/arm/boot/dts/omap3-evm-processor-common.dtsi  | 216 
> +
>  arch/arm/boot/dts/omap3-evm.dts|  76 +++-
>  .../boot/dts/omap3-panel-sharp-ls037v7dw01.dtsi|   1 +
>  4 files changed, 290 insertions(+), 212 deletions(-)
>  create mode 100644 arch/arm/boot/dts/omap3-evm-processor-common.dtsi
> 
> -- 
> 2.14.1
> 


RE: [PATCH] extcon: Split out extcon header file for consumer and provider device

2017-09-28 Thread Yoshihiro Shimoda
Hi,

> From: Chanwoo Choi
> Sent: Friday, September 29, 2017 9:02 AM
> 
< snip >
>  drivers/phy/renesas/phy-rcar-gen3-usb2.c  |   2 +-
< snip >
>  drivers/usb/gadget/udc/renesas_usb3.c |   2 +-

These two drivers need the modification.
But...

< snip >
> diff --git a/drivers/usb/renesas_usbhs/common.h 
> b/drivers/usb/renesas_usbhs/common.h
> index 8c5fc12ad778..a78764bc23eb 100644
> --- a/drivers/usb/renesas_usbhs/common.h
> +++ b/drivers/usb/renesas_usbhs/common.h
> @@ -17,7 +17,7 @@
>  #ifndef RENESAS_USB_DRIVER_H
>  #define RENESAS_USB_DRIVER_H
> 
> -#include 
> +#include 

Since this driver doesn't use any extcon-provider APIs for now,
we doesn't need the modification, IIUC.

Best regards,
Yoshihiro Shimoda



Re: [PATCH 01/12] mmc: dt-bindings: update Mediatek MMC bindings

2017-09-28 Thread Chaotian Jing
On Wed, 2017-09-27 at 09:18 +0800, Chaotian Jing wrote:
> On Wed, 2017-09-27 at 00:33 +0200, Ulf Hansson wrote:
> > On 14 September 2017 at 04:10, Chaotian Jing  
> > wrote:
> > > On Wed, 2017-09-13 at 09:10 -0500, Rob Herring wrote:
> > >> On Tue, Sep 12, 2017 at 05:07:41PM +0800, Chaotian Jing wrote:
> > >> > Change the comptiable for support of multi-platform
> > >> > Add description for reg
> > >> > Add description for source_cg
> > >> > Add description for mediatek,latch-ck
> > >>
> > >> This is at least the 3rd patch with exactly the same vague subject.
> > >> Please make the subject somewhat unique.
> > >>
> > > Thx, will change the subject at next version
> > >> >
> > >> > Signed-off-by: Chaotian Jing 
> > >> > ---
> > >> >  Documentation/devicetree/bindings/mmc/mtk-sd.txt | 13 ++---
> > >> >  1 file changed, 10 insertions(+), 3 deletions(-)
> > >> >
> > >> > diff --git a/Documentation/devicetree/bindings/mmc/mtk-sd.txt 
> > >> > b/Documentation/devicetree/bindings/mmc/mtk-sd.txt
> > >> > index 4182ea3..405cd06 100644
> > >> > --- a/Documentation/devicetree/bindings/mmc/mtk-sd.txt
> > >> > +++ b/Documentation/devicetree/bindings/mmc/mtk-sd.txt
> > >> > @@ -7,10 +7,15 @@ This file documents differences between the core 
> > >> > properties in mmc.txt
> > >> >  and the properties used by the msdc driver.
> > >> >
> > >> >  Required properties:
> > >> > -- compatible: Should be "mediatek,mt8173-mmc","mediatek,mt8135-mmc"
> > >> > +- compatible: value should be either of the following.
> > >> > +   "mediatek,mt8135-mmc": for mmc host ip compatible with mt8135
> > >> > +   "mediatek,mt8173-mmc": for mmc host ip compatible with mt8173
> > >> > +   "mediatek,mt2701-mmc": for mmc host ip compatible with mt2701
> > >> > +   "mediatek,mt2712-mmc": for mmc host ip compatible with mt2712
> > >> > +- reg: physical base address of the controller and length
> > >> >  - interrupts: Should contain MSDC interrupt number
> > >> > -- clocks: MSDC source clock, HCLK
> > >> > -- clock-names: "source", "hclk"
> > >> > +- clocks: MSDC source clock, HCLK, source_cg
> > >> > +- clock-names: "source", "hclk", "source_cg"
> > >>
> > >> All chips support source_cg? That's not backwards compatible for
> > >> existing compatible strings if the driver requires it.
> > > Not all chips support source_cg, for chips which do not support
> > > source_cg, no need source_cg here, and the driver will parse it
> > > to know if current chip support it.
> > 
> > In such case you must not add add a required binding for it. I think
> > that is what Rob is trying to point out for you.
> > 
> > [...]
> > 
> > Kind regards
> > Uffe
> The source_cg is required(MUST) at MT2712 and future SoCs, but not
> required(do not have it) at previous SoCs, so that put it at required
> properties, let the driver to handle it.

Any other comments about it ? still must not add a required binding for
it ? if add a optional binding for it, how to add it ? as cannot
duplicate "clocks" & "clock-names" in one node.




Re: [RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken exclusively

2017-09-28 Thread Mimi Zohar
On Thu, 2017-09-28 at 17:33 -0700, Linus Torvalds wrote:
> On Thu, Sep 28, 2017 at 5:12 PM, Mimi Zohar  wrote:
> >
> > Originally IMA did define it's own lock, prior to IMA-appraisal.  IMA-
> > appraisal introduced writing the file hash as an xattr, which required
> > taking the i_mutex.  process_measurement() and ima_file_free() took
> > the iint->mutex first and then the i_mutex, while setxattr, chmod and
> > chown took the locks in reverse order.  To resolve the potential
> > deadlock, the iint->mutex was eliminated.
> 
> Umm. You already have an explicit invalidation model, where you
> invalidate after a write has occurred.

Invalidating after each write would be horrible performance.  Only
after all the changes are made, after the file close, is the file
integrity status invalidated and the file hash re-calculated and
written out.

At some point, we might want to go back and look at having finer grain
file integrity invalidation.

> But the locking of the generation count (or "invalidation status" or
> whatever) can - and should be - entirely independent of the locking of
> the actual appraisal.

The locking issue isn't with validating the file hash, but with the
setxattr, chmod, chown syscalls.  Each of these syscalls takes the
i_rwsem exclusively before IMA (or EVM) is called.

In ima_file_free(), the locking would be:

lock: iint->mutex
lock: i_rwsem
write hash as xattr
unlock: i_rwsem
unlock iint->mutex


In setxattr, chmod, chown syscalls, IMA (and EVM) are called after the
i_rwsem is already taken.  So the locking would be:

lock: i_rwsem
lock: iint->mutex

unlock: iint->mutex
unlock: i_rwsem

Perhaps now the problem is clearer?

Mimi
 

> So make the appraisal itself use a semaphore ("only one appraisal at a time").
> 
> But use a separate lock for the generation count.
> So then appraisal is:
> 
>  - get appraisal semaphore
>   - get generation count lock
> read generation count
>   - drop generation count lock
>   - do the actual appraisal
>  - drop appraisal semaphore
> 
> Note that you now have a tuple of "generation count, appraisal" that
> you have *not* saved off yet, but it's your stable thing.
> 
> Now you can write the xattr:
> 
>   - get exclusive inode lock (for xattr)
>   - get generation count lock
>   - if the appraisal generation does not match, do NOT write
> the appraisal you just calculated, since it's pointless: it's already
> stale.
>   - otherwise write the appraisal and generation count to the xattr
>   - drop generation count lock
>   - release exclusive inode lock
> 
> and then for anything that does setxattr or chmod or whatever, just
> use that generation count lock to invalidate the appraisal. You don't
> need to actual appraisal lock for that.
> 
> So now the appraisal lock is always the outermost one, and the
> generation count lock is always the innermost.
> 
> Anyway, I haven't looked at the details of what IMA does, but
> something like the above really sounds like it should work and seems
> pretty straightforward.
> 
> No?
> 
>Linus
> 



Re: [PATCH v3] mm, sysctl: make NUMA stats configurable

2017-09-28 Thread kemi


On 2017年09月29日 05:29, Andrew Morton wrote:
> On Thu, 28 Sep 2017 14:11:41 +0800 Kemi Wang  wrote:
> 
>> This is the second step which introduces a tunable interface that allow
>> numa stats configurable for optimizing zone_statistics(), as suggested by
>> Dave Hansen and Ying Huang.
> 
> Looks OK I guess.
> 
> I fiddled with it a lot.  Please consider:
> 

Thanks for your help to make it more graceful! I will be more careful next time.
There may be a typo error in Documentation/sysctl/vm.txt, see comment below.

> From: Andrew Morton 
> Subject: mm-sysctl-make-numa-stats-configurable-fix
> 
> - tweak documentation
> 
> - move advisory message from start_kernel() into mm_init() (I'm not sure
>   we really need this message)
> 
> - use strcasecmp() in __parse_vm_numa_stats_mode()
> 
> - clean up coding style amd nessages in sysctl_vm_numa_stats_mode_handler()
> 
> Cc: Aaron Lu 
> Cc: Andi Kleen 
> Cc: Christopher Lameter 
> Cc: Dave Hansen 
> Cc: Jesper Dangaard Brouer 
> Cc: Johannes Weiner 
> Cc: Jonathan Corbet 
> Cc: Kees Cook 
> Cc: Kemi Wang 
> Cc: "Luis R . Rodriguez" 
> Cc: Mel Gorman 
> Cc: Michal Hocko 
> Cc: Sebastian Andrzej Siewior 
> Cc: Tim Chen 
> Cc: Vlastimil Babka 
> Cc: Ying Huang 
> Signed-off-by: Andrew Morton 
> ---
> 
>  Documentation/sysctl/vm.txt |   15 ++---
>  init/main.c |6 ++---
>  mm/vmstat.c |   39 +++---
>  3 files changed, 29 insertions(+), 31 deletions(-)
> 
> diff -puN 
> Documentation/sysctl/vm.txt~mm-sysctl-make-numa-stats-configurable-fix 
> Documentation/sysctl/vm.txt
> --- a/Documentation/sysctl/vm.txt~mm-sysctl-make-numa-stats-configurable-fix
> +++ a/Documentation/sysctl/vm.txt
> @@ -853,7 +853,7 @@ ten times more freeable objects than the
>  
>  numa_stats_mode
>  
> -This interface allows numa statistics configurable.
> +This interface allows runtime configuration *or* numa statistics.
>  

typo? or->of/for?

>  When page allocation performance becomes a bottleneck and you can tolerate
>  some possible tool breakage and decreased numa counter precision, you can
> @@ -864,13 +864,14 @@ When page allocation performance is not
>  tooling to work, you can do:
>   echo [S|s]trict > /proc/sys/vm/numa_stat_mode
>  
> -We recommend automatic detection of numa statistics by system, because numa
> -statistics does not affect system's decision and it is very rarely
> -consumed. you can do:
> +We recommend automatic detection of numa statistics by system, because
> +numa statistics do not affect system decisions and it is very rarely
> +consumed.  In this case you can do:
>   echo [A|a]uto > /proc/sys/vm/numa_stats_mode
> -This is also system default configuration, with this default setting, numa
> -counters update is skipped unless the counter is *read* by users at least
> -once.
> +
> +This is the system default configuration.  With this default setting, numa
> +counter updates are skipped until the counter is *read* by userspace at
> +least once.
>  
>  ==
>  
> diff -puN drivers/base/node.c~mm-sysctl-make-numa-stats-configurable-fix 
> drivers/base/node.c
> diff -puN include/linux/vmstat.h~mm-sysctl-make-numa-stats-configurable-fix 
> include/linux/vmstat.h
> diff -puN init/main.c~mm-sysctl-make-numa-stats-configurable-fix init/main.c
> --- a/init/main.c~mm-sysctl-make-numa-stats-configurable-fix
> +++ a/init/main.c
> @@ -504,6 +504,9 @@ static void __init mm_init(void)
>   pgtable_init();
>   vmalloc_init();
>   ioremap_huge_init();
> +#ifdef CONFIG_NUMA
> + pr_info("vmstat: NUMA stat updates are skipped unless they have been 
> used\n");
> +#endif
>  }
>  
>  asmlinkage __visible void __init start_kernel(void)
> @@ -567,9 +570,6 @@ asmlinkage __visible void __init start_k
>   sort_main_extable();
>   trap_init();
>   mm_init();
> -#ifdef CONFIG_NUMA
> - pr_info("vmstat: NUMA stats is skipped unless it has been consumed\n");
> -#endif
>  
>   ftrace_init();
>  
> diff -puN kernel/sysctl.c~mm-sysctl-make-numa-stats-configurable-fix 
> kernel/sysctl.c
> diff -puN mm/page_alloc.c~mm-sysctl-make-numa-stats-configurable-fix 
> mm/page_alloc.c
> diff -puN mm/vmstat.c~mm-sysctl-make-numa-stats-configurable-fix mm/vmstat.c
> --- a/mm/vmstat.c~mm-sysctl-make-numa-stats-configurable-fix
> +++ a/mm/vmstat.c
> @@ -40,13 +40,11 @@ static DEFINE_MUTEX(vm_numa_stats_mode_l
>  
>  static int __parse_vm_numa_stats_mode(char *s)
>  {
> - const char *str = s;
> -
> - if (strcmp(str, "auto") == 0 || strcmp(str, "Auto") == 0)
> + if (strcasecmp(s, "auto"))
>   vm_numa_stats_mode = VM_NUMA_STAT_AUTO_MODE;
> - else if (strcmp(str, "strict") == 0 || strcmp(str, "Strict") == 0)
> + else if (strcasecmp(s, "strict") == 0)
>   vm_numa_stats_mode = VM_NUMA_STAT_STRICT_MODE;
> - else if (strcmp(str, "coarse") == 0 || strcmp(str, "Coarse") == 0)
> + else if (strcasecmp(s, "co

Re: [PATCH] x86/asm: Fix inline asm call constraints for GCC 4.4

2017-09-28 Thread Josh Poimboeuf
On Thu, Sep 28, 2017 at 04:53:09PM -0700, Linus Torvalds wrote:
> On Thu, Sep 28, 2017 at 2:58 PM, Josh Poimboeuf  wrote:
> >
> > Reported-by: kernel test robot 
> > Fixes: f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang")
> > Signed-off-by: Josh Poimboeuf 
> 
> Side note: it's not like I personally need the credit, but in general
> I really want people to pick up on who debugged the code and pointed
> to the solution. That's often more of the work than the fix itself.
> 
> The kernel test robot report looked to be ignored as a "gcc-4.4 is too
> old to worry about" thing. People who then step up and analyze the
> problem are rare as it is. They need to be credited in the commit
> logs.
> 
> We don't have any fixed format for that, but it's pretty free-form. So
> we have tags like
> 
>   Root-caused-by:
>   Diagnosed-by:
>   Analyzed-by:
>   Debugged-by:
>   Bisected-by:
>   Fix-suggested-by:
> 
> etc for giving credit to people who figured out some part of a bug
> (and, having grepped for this, we also a _shitload_ of miss-spellings
> of various things ;)

Indeed, credit is important and I try to give it where it's due.  Sorry
for the snub!  I anoint you with:

Debugged-by: Linus Torvalds 

-- 
Josh


linux-next: build failure after merge of the net-next tree

2017-09-28 Thread Stephen Rothwell
Hi all,

After merging the net-next tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:

net/dsa/slave.c: In function 'dsa_slave_create':
net/dsa/slave.c:1191:18: error: 'struct dsa_slave_priv' has no member named 
'phy'
  phy_disconnect(p->phy);
  ^

Caused by commit

  0115dcd1787d ("net: dsa: use slave device phydev")

Interacting with commit

  e804441cfe0b ("net: dsa: Fix network device registration order")

from the net tree.

I applied the following merge fix patch (which I am not sure about):

From: Stephen Rothwell 
Date: Fri, 29 Sep 2017 11:28:45 +1000
Subject: [PATCH] net: dsa: merge fix patch for removal of phy

Signed-off-by: Stephen Rothwell 
---
 net/dsa/slave.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 8869954485db..9191c929c6c8 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1188,7 +1188,7 @@ int dsa_slave_create(struct dsa_port *port, const char 
*name)
return 0;
 
 out_phy:
-   phy_disconnect(p->phy);
+   phy_disconnect(slave_dev->phydev);
if (of_phy_is_fixed_link(p->dp->dn))
of_phy_deregister_fixed_link(p->dp->dn);
 out_free:
-- 
2.14.1

-- 
Cheers,
Stephen Rothwell


[PATCH RESEND] KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume

2017-09-28 Thread Wanpeng Li
From: Wanpeng Li 

[ cut here ]
 WARNING: CPU: 4 PID: 5280 at /home/kernel/linux/arch/x86/kvm//vmx.c:11394 
nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
 CPU: 4 PID: 5280 Comm: qemu-system-x86 Tainted: GW  OE   4.13.0+ #17
 RIP: 0010:nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
 Call Trace:
  ? emulator_read_emulated+0x15/0x20 [kvm]
  ? segmented_read+0xae/0xf0 [kvm]
  vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
  ? vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
  x86_emulate_instruction+0x733/0x810 [kvm]
  vmx_handle_exit+0x2f4/0xda0 [kvm_intel]
  ? kvm_arch_vcpu_ioctl_run+0xd2f/0x1c60 [kvm]
  kvm_arch_vcpu_ioctl_run+0xdab/0x1c60 [kvm]
  ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
  kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? __fget+0xfc/0x210
  do_vfs_ioctl+0xa4/0x6a0
  ? __fget+0x11d/0x210
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x23/0xc2

A nested #PF is triggered during L0 emulating instruction for L2. However, it
doesn't consider we should not break L1's vmlauch/vmresme. This patch fixes
it by queuing the #PF exception instead ,requesting an immediate VM exit from
L2 and keeping the exception for L1 pending for a subsequent nested VM exit.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/vmx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c83d28b..1ca91c8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9840,7 +9840,8 @@ static void vmx_inject_page_fault_nested(struct kvm_vcpu 
*vcpu,
 
WARN_ON(!is_guest_mode(vcpu));
 
-   if (nested_vmx_is_page_fault_vmexit(vmcs12, fault->error_code)) {
+   if (nested_vmx_is_page_fault_vmexit(vmcs12, fault->error_code) &&
+   !to_vmx(vcpu)->nested.nested_run_pending) {
vmcs12->vm_exit_intr_error_code = fault->error_code;
nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
  PF_VECTOR | INTR_TYPE_HARD_EXCEPTION |
-- 
2.7.4



Re: [PATCH 2/3] srcu: queue work without holding the lock

2017-09-28 Thread Paul E. McKenney
On Thu, Sep 28, 2017 at 06:03:57PM +0200, Sebastian Andrzej Siewior wrote:
> On 2017-09-22 11:46:10 [-0700], Paul E. McKenney wrote:
> > On Fri, Sep 22, 2017 at 05:28:05PM +0200, Sebastian Andrzej Siewior wrote:
> > > On RT we can't invoke queue_delayed_work() within an atomic section
> > > (which is provided by raw_spin_lock_irqsave()).
> > > srcu_reschedule() invokes queue_delayed_work() outside of the
> > > raw_spin_lock_irq_rcu_node() section so this should be fine here, too.
> > > If the remaining callers of call_srcu() aren't atomic
> > > (spin_lock_irqsave() is fine) then this should work on RT, too.
> > 
> > Just to make sure I understand...   The problem is not the _irqsave,
> > but rather the raw_?
> 
> exactly. The _irqsave is translated into a sleeping lock on RT and does
> not matter. The raw_ ones stay as they are and queue_delayed_work() uses
> sleeping locks itself and this is where things fall apart.

OK, internally I could get rid of raw_ at the expense of some code bloat,
but in the call_srcu() case, the caller might well hold a raw_ lock.

Thoughts?

Thanx, Paul



Re: [PATCH 1/3] srcu: use cpu_online() instead custom check

2017-09-28 Thread Paul E. McKenney
On Thu, Sep 28, 2017 at 06:02:08PM +0200, Sebastian Andrzej Siewior wrote:
> On 2017-09-22 11:43:14 [-0700], Paul E. McKenney wrote:
> > On Fri, Sep 22, 2017 at 05:28:04PM +0200, Sebastian Andrzej Siewior wrote:
> > > The current check via srcu_online is slightly racy because after looking
> > > at srcu_online there could be an interrupt that interrupted us long
> > > enough until the CPU we checked against went offline.
> > 
> > But in that case, wouldn't the interrupt block the synchronize_sched()
> > later in the offline sequence?
> 
> What I meant is:
> 
>   CPU0CPU1
>   preempt_disable();
>   if (READ_ONCE(per_cpu(srcu_online, 1)))
>   *interrupt*
>   WRITE_ONCE(per_cpu(srcu_online, 
> cpu), false);
>   and CPU is offnline
>   
>   ret = queue_delayed_work_on(1, wq, dwork, delay);
> 
> is this possible or are there a safety belt for this?

I don't see anything that would prevent this.  It is unlikely, but not
so unlikely that it should not be fixed.

> > More to the point, are you actually seeing this failure, or is this
> > a theoretical bug?
> 
> I need to get rid of the preempt_disable() section in which
> queue_delayed_work*() is invoked for RT.

OK, but please see below...

> > > An alternative would be to hold the hotplug rwsem (so the CPUs don't
> > > change their state) and then check based on cpu_online() if we queue it
> > > on a specific CPU or not. queue_work_on() itself can handle if something
> > > is enqueued on an offline CPU but a timer which is enqueued on an offline
> > > CPU won't fire until the CPU is back online.
> > > 
> > > I am not sure if the removal in rcu_init() is okay or not. I assume that
> > > SRCU won't enqueue a work item before SRCU is up and ready.
> > 
> > Another alternative would be to disable preemption across the check and
> > the call to queue_delayed_work_on().
> 
> you need to ensure the *other* CPU won't in the middle of checking its
> status. preempt_disable() won't do this on the other CPU.

Agreed.

> > Yet another alternative would be to have an SRCU-specific per-CPU lock
> > that is acquired across the setting and clearing of srcu_online,
> > and also across the check and the call to queue_delayed_work_on().
> > This last would be more consistent with a desire to remove the
> > synchronize_sched() from the offline sequence.
> > 
> > Or am I missing something here?
> The perCPU lock should work. And cpus_read_lock() is basically that
> except that srcu_online_cpu() is not holding it but the CPU-HP code.
> 
> So you want keep things as-is or do you prefer a per-CPU rwsem instead?

The per-CPU rwsem seems like a reasonable approach.  Except for the
call_srcu() path, given that call_srcu()'s caller might have preemption
(or even interrupts) disabled.

Thoughts?

Thanx, Paul



[PATCH v2] KVM: VMX: Don't expose PLE enable if there is no hardware support

2017-09-28 Thread Wanpeng Li
From: Wanpeng Li 

PLE_Window: Software can configure this field as an upper bound on the amount 
of time
a guest is allowed to execute in a PAUSE LOOP.

KVM doesn't expose the PLE capability to the L1 hypervisor, however, ple_window 
still
shows the default value on L1 hypervisor. This patch fixes it by clearing all 
the
PLE related module parameter if there is no PLE capability.

Reviewed-by: Konrad Rzeszutek Wilk 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
v1 -> v2:
 * fix typo in patch description

 arch/x86/kvm/vmx.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c83d28b..4d4f9b4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6781,8 +6781,13 @@ static __init int hardware_setup(void)
if (enable_ept && !cpu_has_vmx_ept_2m_page())
kvm_disable_largepages();
 
-   if (!cpu_has_vmx_ple())
+   if (!cpu_has_vmx_ple()) {
ple_gap = 0;
+   ple_window = 0;
+   ple_window_grow = 0;
+   ple_window_max = 0;
+   ple_window_shrink = 0;
+   }
 
if (!cpu_has_vmx_apicv()) {
enable_apicv = 0;
-- 
2.7.4



[PATCH v2 2/4] KVM: LAPIC: Keep timer running when switching between one-shot and periodic mode

2017-09-28 Thread Wanpeng Li
From: Wanpeng Li 

If we take TSC-deadline mode timer out of the picture, the Intel SDM
does not say that the timer is disable when the timer mode is change,
either from one-shot to periodic or vice versa.

After this patch, the timer is no longer disarmed on change of mode, so
the counter (TMCCT) keeps counting down.

So what does a write to LVTT changes ? On baremetal, the change of mode
is probably taken into account only when the counter reach 0. When this
happen, LVTT is use to figure out if the counter should restard counting
down from TMICT (so periodic mode) or stop counting (if one-shot mode).

This patch is based on observation of the behavior of the APIC timer on
baremetal as well as check that they does not go against the description
written in the Intel SDM.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/lapic.c | 40 
 1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a739cbb..946c11b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1301,7 +1301,7 @@ static void update_divide_count(struct kvm_lapic *apic)
   apic->divide_count);
 }
 
-static void apic_update_lvtt(struct kvm_lapic *apic)
+static bool apic_update_lvtt(struct kvm_lapic *apic)
 {
u32 timer_mode = kvm_lapic_get_reg(apic, APIC_LVTT) &
apic->lapic_timer.timer_mode_mask;
@@ -1309,7 +1309,9 @@ static void apic_update_lvtt(struct kvm_lapic *apic)
if (apic->lapic_timer.timer_mode != timer_mode) {
apic->lapic_timer.timer_mode = timer_mode;
hrtimer_cancel(&apic->lapic_timer.timer);
+   return true;
}
+   return false;
 }
 
 static void apic_timer_expired(struct kvm_lapic *apic)
@@ -1430,11 +1432,12 @@ static void start_sw_period(struct kvm_lapic *apic)
HRTIMER_MODE_ABS_PINNED);
 }
 
-static bool set_target_expiration(struct kvm_lapic *apic)
+static bool set_target_expiration(struct kvm_lapic *apic, bool timer_update)
 {
-   ktime_t now;
-   u64 tscl = rdtsc();
+   ktime_t now, remaining;
+   u64 tscl = rdtsc(), delta;
 
+   /* Calculate the next time the timer should trigger an interrupt */
now = ktime_get();
apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
* APIC_BUS_CYCLE_NS * apic->divide_count;
@@ -1470,9 +1473,21 @@ static bool set_target_expiration(struct kvm_lapic *apic)
   ktime_to_ns(ktime_add_ns(now,
apic->lapic_timer.period)));
 
+   if (!timer_update)
+   delta = apic->lapic_timer.period;
+   else {
+   remaining = ktime_sub(apic->lapic_timer.target_expiration, now);
+   if (ktime_to_ns(remaining) < 0)
+   remaining = 0;
+   delta = mod_64(ktime_to_ns(remaining), 
apic->lapic_timer.period);
+   }
+
+   if (!delta)
+   return false;
+
apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
-   nsec_to_cycles(apic->vcpu, apic->lapic_timer.period);
-   apic->lapic_timer.target_expiration = ktime_add_ns(now, 
apic->lapic_timer.period);
+   nsec_to_cycles(apic->vcpu, delta);
+   apic->lapic_timer.target_expiration = ktime_add_ns(now, delta);
 
return true;
 }
@@ -1609,12 +1624,12 @@ void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu)
restart_apic_timer(apic);
 }
 
-static void start_apic_timer(struct kvm_lapic *apic)
+static void start_apic_timer(struct kvm_lapic *apic, bool timer_update)
 {
atomic_set(&apic->lapic_timer.pending, 0);
 
if ((apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
-   && !set_target_expiration(apic))
+   && !set_target_expiration(apic, timer_update))
return;
 
restart_apic_timer(apic);
@@ -1729,7 +1744,8 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
val |= APIC_LVT_MASKED;
val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask);
kvm_lapic_set_reg(apic, APIC_LVTT, val);
-   apic_update_lvtt(apic);
+   if (apic_update_lvtt(apic) && !apic_lvtt_tscdeadline(apic))
+   start_apic_timer(apic, true);
break;
 
case APIC_TMICT:
@@ -1738,7 +1754,7 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
 
hrtimer_cancel(&apic->lapic_timer.timer);
kvm_lapic_set_reg(apic, APIC_TMICT, val);
-   start_apic_timer(apic);
+   start_apic_timer(apic, false);
break;
 
case APIC_TDCR:
@@ -1872,7 +1888,7 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, 
u64 data)
 
hrtimer_cancel(&apic->lapic_timer.timer);
apic->lapic_timer.tscde

[PATCH v2 4/4] KVM: LAPIC: Don't silently accept bad vectors

2017-09-28 Thread Wanpeng Li
From: Wanpeng Li 

Vectors 0-15 are reserved, and a physical LAPIC - upon sending or
receiving one - would generate an APIC error instead of doing the
requested action. Make our emulation behave similarly.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/lapic.c | 30 --
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6bafd06..a779ba9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -935,6 +935,25 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct 
kvm_lapic_irq *irq,
return ret;
 }
 
+static void apic_error(struct kvm_lapic *apic, unsigned long errmask)
+{
+   uint32_t esr;
+
+   esr = kvm_lapic_get_reg(apic, APIC_ESR);
+
+   if ((esr & errmask) != errmask) {
+   uint32_t lvterr = kvm_lapic_get_reg(apic, APIC_LVTERR);
+
+   kvm_lapic_set_reg(apic, APIC_ESR, esr | errmask);
+   if (!(lvterr & APIC_LVT_MASKED)) {
+   struct kvm_lapic_irq irq;
+
+   irq.vector = lvterr & 0xff;
+   kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, &irq, 
NULL);
+   }
+   }
+}
+
 /*
  * Add a pending IRQ into lapic.
  * Return 1 if successfully added and 0 if discarded.
@@ -946,6 +965,11 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
delivery_mode,
int result = 0;
struct kvm_vcpu *vcpu = apic->vcpu;
 
+   if (unlikely(vector < 16) && delivery_mode == APIC_DM_FIXED) {
+   apic_error(apic, APIC_ESR_RECVILL);
+   return 0;
+   }
+
trace_kvm_apic_accept_irq(vcpu->vcpu_id, delivery_mode,
  trig_mode, vector);
switch (delivery_mode) {
@@ -1146,7 +1170,10 @@ static void apic_send_ipi(struct kvm_lapic *apic)
   irq.trig_mode, irq.level, irq.dest_mode, irq.delivery_mode,
   irq.vector, irq.msi_redir_hint);
 
-   kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, &irq, NULL);
+   if (unlikely(irq.vector < 16 && irq.delivery_mode == APIC_DM_FIXED))
+   apic_error(apic, APIC_ESR_SENDILL);
+   else
+   kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, &irq, NULL);
 }
 
 static u32 apic_get_tmcct(struct kvm_lapic *apic)
@@ -1734,7 +1761,6 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
case APIC_LVTPC:
case APIC_LVT1:
case APIC_LVTERR:
-   /* TODO: Check vector */
if (!kvm_apic_sw_enabled(apic))
val |= APIC_LVT_MASKED;
 
-- 
2.7.4



[PATCH v2 3/4] KVM: LAPIC: Apply change to TDCR right away to the timer

2017-09-28 Thread Wanpeng Li
From: Wanpeng Li 

The description in the Intel SDM of how the divide configuration
register is used: "The APIC timer frequency will be the processor's bus
clock or core crystal clock frequency divided by the value specified in
the divide configuration register."

Observation of baremetal shown that when the TDCR is change, the TMCCT
does not change or make a big jump in value, but the rate at which it
count down change.

The patch update the emulation to APIC timer to so that a change to the
divide configuration would be reflected in the value of the counter and
when the next interrupt is triggered.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/lapic.c | 31 +--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 946c11b..6bafd06 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1432,7 +1432,7 @@ static void start_sw_period(struct kvm_lapic *apic)
HRTIMER_MODE_ABS_PINNED);
 }
 
-static bool set_target_expiration(struct kvm_lapic *apic, bool timer_update)
+static bool set_target_expiration(struct kvm_lapic *apic, bool timer_update, 
uint32_t old_divisor)
 {
ktime_t now, remaining;
u64 tscl = rdtsc(), delta;
@@ -1440,7 +1440,7 @@ static bool set_target_expiration(struct kvm_lapic *apic, 
bool timer_update)
/* Calculate the next time the timer should trigger an interrupt */
now = ktime_get();
apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
-   * APIC_BUS_CYCLE_NS * apic->divide_count;
+   * APIC_BUS_CYCLE_NS * old_divisor;
 
if (!apic->lapic_timer.period)
return false;
@@ -1485,6 +1485,12 @@ static bool set_target_expiration(struct kvm_lapic 
*apic, bool timer_update)
if (!delta)
return false;
 
+   if (apic->divide_count != old_divisor) {
+   apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, 
APIC_TMICT)
+   * APIC_BUS_CYCLE_NS * apic->divide_count;
+   delta = delta * apic->divide_count / old_divisor;
+   }
+
apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
nsec_to_cycles(apic->vcpu, delta);
apic->lapic_timer.target_expiration = ktime_add_ns(now, delta);
@@ -1624,12 +1630,13 @@ void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu)
restart_apic_timer(apic);
 }
 
-static void start_apic_timer(struct kvm_lapic *apic, bool timer_update)
+static void start_apic_timer(struct kvm_lapic *apic, bool timer_update,
+   uint32_t old_divisor)
 {
atomic_set(&apic->lapic_timer.pending, 0);
 
if ((apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
-   && !set_target_expiration(apic, timer_update))
+   && !set_target_expiration(apic, timer_update, old_divisor))
return;
 
restart_apic_timer(apic);
@@ -1745,7 +1752,7 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask);
kvm_lapic_set_reg(apic, APIC_LVTT, val);
if (apic_update_lvtt(apic) && !apic_lvtt_tscdeadline(apic))
-   start_apic_timer(apic, true);
+   start_apic_timer(apic, true, apic->divide_count);
break;
 
case APIC_TMICT:
@@ -1754,16 +1761,20 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 
reg, u32 val)
 
hrtimer_cancel(&apic->lapic_timer.timer);
kvm_lapic_set_reg(apic, APIC_TMICT, val);
-   start_apic_timer(apic, false);
+   start_apic_timer(apic, false, apic->divide_count);
break;
 
-   case APIC_TDCR:
+   case APIC_TDCR: {
+   uint32_t current_divisor = apic->divide_count;
+
if (val & 4)
apic_debug("KVM_WRITE:TDCR %x\n", val);
kvm_lapic_set_reg(apic, APIC_TDCR, val);
update_divide_count(apic);
+   hrtimer_cancel(&apic->lapic_timer.timer);
+   start_apic_timer(apic, true, current_divisor);
break;
-
+   }
case APIC_ESR:
if (apic_x2apic_mode(apic) && val != 0) {
apic_debug("KVM_WRITE:ESR not zero %x\n", val);
@@ -1888,7 +1899,7 @@ void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, 
u64 data)
 
hrtimer_cancel(&apic->lapic_timer.timer);
apic->lapic_timer.tscdeadline = data;
-   start_apic_timer(apic, false);
+   start_apic_timer(apic, false, apic->divide_count);
 }
 
 void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8)
@@ -2254,7 +2265,7 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct 
kvm_lapic_state *s)
apic_update_lvtt(apic);
apic_manage_nmi_watchdog(apic, kvm_lapic_get

[PATCH v2 1/4] KVM: LAPIC: Fix lapic timer mode transition

2017-09-28 Thread Wanpeng Li
From: Wanpeng Li 

SDM 10.5.4.1 TSC-Deadline Mode mentioned that "Transitioning between 
TSC-Deadline
mode and other timer modes also disarms the timer". So the APIC Timer Initial 
Count
Register for one-shot/periodic mode should be reset. This patch do it.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/include/asm/apicdef.h | 1 +
 arch/x86/kvm/lapic.c   | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index c46bb99..d8ef1b4 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -100,6 +100,7 @@
 #defineAPIC_TIMER_BASE_CLKIN   0x0
 #defineAPIC_TIMER_BASE_TMBASE  0x1
 #defineAPIC_TIMER_BASE_DIV 0x2
+#defineAPIC_LVT_TIMER_MASK (3 << 17)
 #defineAPIC_LVT_TIMER_ONESHOT  (0 << 17)
 #defineAPIC_LVT_TIMER_PERIODIC (1 << 17)
 #defineAPIC_LVT_TIMER_TSCDEADLINE  (2 << 17)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 69c5612..a739cbb 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1722,6 +1722,9 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
break;
 
case APIC_LVTT:
+   if (apic_lvtt_tscdeadline(apic) != ((val &
+   APIC_LVT_TIMER_MASK) == APIC_LVT_TIMER_TSCDEADLINE))
+   kvm_lapic_set_reg(apic, APIC_TMICT, 0);
if (!kvm_apic_sw_enabled(apic))
val |= APIC_LVT_MASKED;
val &= (apic_lvt_mask[0] | apic->lapic_timer.timer_mode_mask);
-- 
2.7.4



[PATCH v2 0/4] KVM: LAPIC: Rework lapic timer to behave more like real-hardware

2017-09-28 Thread Wanpeng Li
The issue is reported in xen community.

Anthony PERARD pointed out:

https://www.mail-archive.com/xen-devel@lists.xen.org/msg117283.html#

 | When developing PVH for OVMF, I've used the lapic timer. It turns out that 
the
 | way it is used by OVMF did not work with Xen [1]. I tried to find out how
 | real-hw behave, and write a XTF tests [2]. And this patch series tries to fix
 | the behavior of the vlapic timer.
 | 
 | 
 | The OVMF driver for the APIC timer initialize the timer like this:
 |  write to TMICT (initial counter)
 |  write to TMDCR (divide configuration)
 |  enable the timer (this may change timer mode from one-shot to periodic)
 | It turns out that TMICT is set to 0 on the last step, but OVMF expect the 
timer
 | to run.
 | 
 | Here is some description of the APIC timer, base on observation as well as 
read
 | of the Intel SDM. The description is also patch of patch description
 | (reworded).
 | 
 | Maybe a way of thinking how the APIC timer is evaluated, is to think of how
 | hardward will do it. There is a counter TMCCT which always keeps counting 
down.
 | 
 | Setting TMICT also set TMCCT, nothing else matter.
 | Setting LVTT does not change anything right away.
 | Setting TMDCR does not change much.
 | 
 | Now TMCCT keeps counting down, by a value related to TMDCR.
 | Once, TMCCT reach 0, it is only at this time that LVTT is taken into account.
 | Is there an interrupt to deliver? Should the timer restart counting from the
 | value in TMICT?
 | 
 | In the Intel SDM, there is the word "disarm" of the timer used. I guess the
 | easier way to disarm the APIC timer (when in periodic or one-shot) is to set
 | TMICT to 0. But if we take TSC-Deadline mode out of the picture, there is
 | nothing in the manual that say that the timer is disarm or stopped when
 | changing timer mode (there is only two modes left, period and one-shot).
 | 
 | As for the TSC-deadline timer mode, observation shown that changing to it (or
 | from it) does reset and disarm both timers, so effectively TMICT and the
 | tscdeadline are set to 0.
 | 
 | [1] 
https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg00959.html
 | [2] v1: 
 | https://lists.xenproject.org/archives/html/xen-devel/2017-03/msg02533.html
 | v2: look for "[XTF PATCH V2 0/3] Testing vlapic timer"

 In addition, Patch 3/4 implements the illegal vector error handling according 
to 
 SDM 10.5.2~10.5.3.

v1 -> v2:
 * add cover-letter and collect recent lapic patches to one patchset

Wanpeng Li (4):
  KVM: LAPIC: Fix lapic timer mode transition
  KVM: LAPIC: Keep timer running when switching between one-shot and periodic 
mode
  KVM: LAPIC: Apply change to TDCR right away to the timer
  KVM: LAPIC: Don't silently accept bad vectors

 arch/x86/include/asm/apicdef.h |  1 +
 arch/x86/kvm/lapic.c   | 90 ++
 2 files changed, 74 insertions(+), 17 deletions(-)

-- 
2.7.4



Re: [RFC PATCH 1/2] arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables

2017-09-28 Thread Paul E. McKenney
On Fri, Sep 29, 2017 at 07:59:09AM +1300, Michael Cree wrote:
> On Thu, Sep 28, 2017 at 08:43:54AM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 28, 2017 at 09:45:35AM +0100, Will Deacon wrote:
> > > On Thu, Sep 28, 2017 at 10:38:01AM +0200, Peter Zijlstra wrote:
> > > > On Wed, Sep 27, 2017 at 04:49:28PM +0100, Will Deacon wrote:
> > > > > In many cases, page tables can be accessed concurrently by either 
> > > > > another
> > > > > CPU (due to things like fast gup) or by the hardware page table walker
> > > > > itself, which may set access/dirty bits. In such cases, it is 
> > > > > important
> > > > > to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
> > > > > entries cannot be torn, merged or subject to apparent loss of 
> > > > > coherence.
> > > > 
> > > > In fact, we should use lockless_dereference() for many of them. Yes
> > > > Alpha is the only one that cares about the difference between that and
> > > > READ_ONCE() and they do have the extra barrier, but if we're going to do
> > > > this, we might as well do it 'right' :-)
> > > 
> > > I know this sounds daft, but I think one of the big reasons why
> > > lockless_dereference() doesn't get an awful lot of use is because it's
> > > such a mouthful! Why don't we just move the smp_read_barrier_depends()
> > > into READ_ONCE? Would anybody actually care about the potential impact on
> > > Alpha (which, frankly, is treading on thin ice given the low adoption of
> > > lockless_dereference())?
> > 
> > This is my cue to ask my usual question...  ;-)
> > 
> > Are people still running mainline kernels on Alpha?  (Added Alpha folks.)
> 
> Yes.  I run two Alpha build daemons that build the unofficial
> debian-alpha port.  Debian popcon reports nine machines running
> Alpha, which are likely to be running the 4.12.y kernel which
> is currently in debian-alpha, (and presumably soon to be 4.13.y
> which is now built on Alpha in experimental).

I salute your dedication to Alpha!  ;-)

Thanx, Paul



Re: [PATCH v1 14/14] tee: shm: inline tee_shm getter functions

2017-09-28 Thread Yury Norov
On Thu, Sep 28, 2017 at 09:04:11PM +0300, Volodymyr Babchuk wrote:
> From: Volodymyr Babchuk 
> 
> Now, when struct tee_shm is defined in public header,
> we can inline small getter functions.

struct tee_shm is moved to public header in first patch of series,
so you can put tee_shm_is_registered() in proper place at once, right?

> 
> Signed-off-by: Volodymyr Babchuk 
> ---
>  drivers/tee/tee_shm.c   | 17 -
>  include/linux/tee_drv.h | 10 --
>  2 files changed, 8 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c
> index 5176c83..453700a 100644
> --- a/drivers/tee/tee_shm.c
> +++ b/drivers/tee/tee_shm.c
> @@ -494,23 +494,6 @@ struct tee_shm *tee_shm_get_from_id(struct tee_context 
> *ctx, int id)
>  }
>  EXPORT_SYMBOL_GPL(tee_shm_get_from_id);
>  
> -bool tee_shm_is_registered(struct tee_shm *shm)
> -{
> - return shm && (shm->flags & TEE_SHM_REGISTER);
> -}
> -EXPORT_SYMBOL_GPL(tee_shm_is_registered);
> -
> -/**
> - * tee_shm_get_id() - Get id of a shared memory object
> - * @shm: Shared memory handle
> - * @returns id
> - */
> -int tee_shm_get_id(struct tee_shm *shm)
> -{
> - return shm->id;
> -}
> -EXPORT_SYMBOL_GPL(tee_shm_get_id);
> -
>  /**
>   * tee_shm_put() - Decrease reference count on a shared memory handle
>   * @shm: Shared memory handle
> diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h
> index 6aaef65..2ae0286 100644
> --- a/include/linux/tee_drv.h
> +++ b/include/linux/tee_drv.h
> @@ -429,7 +429,10 @@ static inline size_t tee_shm_get_page_offset(struct 
> tee_shm *shm)
>   * @shm: Shared memory handle
>   * @returns id
>   */
> -int tee_shm_get_id(struct tee_shm *shm);
> +static inline int tee_shm_get_id(struct tee_shm *shm)
> +{
> + return shm->id;
> +}
>  
>  /**
>   * tee_shm_get_from_id() - Find shared memory object and increase reference
> @@ -445,6 +448,9 @@ struct tee_shm *tee_shm_get_from_id(struct tee_context 
> *ctx, int id);
>   * @shm: Shared memory handle
>   * @returns true if object is registered in TEE
>   */
> -bool tee_shm_is_registered(struct tee_shm *shm);
> +static inline bool tee_shm_is_registered(struct tee_shm *shm)
> +{
> + return shm && (shm->flags & TEE_SHM_REGISTER);
> +}
>  
>  #endif /*__TEE_DRV_H*/
> -- 
> 2.7.4


[PATCH 1/3] printk: Introduce per-console loglevel setting

2017-09-28 Thread Calvin Owens
Not all consoles are created equal: depending on the actual hardware,
the latency of a printk() call can vary dramatically. The worst examples
are serial consoles, where it can spin for tens of milliseconds banging
the UART to emit a message, which can cause application-level problems
when the kernel spews onto the console.

At Facebook we use netconsole to monitor our fleet, but we still have
serial consoles attached on each host for live debugging, and the latter
has caused problems. An obvious solution is to disable the kernel
console output to ttyS0, but this makes live debugging frustrating,
since crashes become silent and opaque to the ttyS0 user. Enabling it on
the fly when needed isn't feasible, since boxes you need to debug via
serial are likely to be borked in ways that make this impossible.

That puts us between a rock and a hard place: we'd love to set
kernel.printk to KERN_INFO and get all the logs. But while netconsole is
fast enough to permit that without perturbing userspace, ttyS0 is not,
and we're forced to limit console logging to KERN_WARNING and higher.

This patch introduces a new per-console loglevel setting, and changes
console_unlock() to use max(global_level, per_console_level) when
deciding whether or not to emit a given log message.

This lets us have our cake and eat it too: instead of being forced to
limit all consoles verbosity based on the speed of the slowest one, we
can "promote" the faster console while still using a conservative system
loglevel setting to avoid disturbing applications.

Cc: Petr Mladek 
Cc: Steven Rostedt 
Cc: Sergey Senozhatsky 
Signed-off-by: Calvin Owens 
---
(V1: https://lkml.org/lkml/2017/4/4/783)

Changes in V2:
* Honor the ignore_loglevel setting in all cases
* Change semantics to use max(global, console) as the loglevel
  for a console, instead of the previous patch where we treated
  the per-console one as a filter downstream of the global one.

 include/linux/console.h |  1 +
 kernel/printk/printk.c  | 38 +++---
 2 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index b8920a0..a5b5d79 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -147,6 +147,7 @@ struct console {
int cflag;
void*data;
struct   console *next;
+   int level;
 };
 
 /*
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 512f7c2..3f1675e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1141,9 +1141,14 @@ module_param(ignore_loglevel, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(ignore_loglevel,
 "ignore loglevel setting (prints all kernel messages to the 
console)");
 
-static bool suppress_message_printing(int level)
+static int effective_loglevel(struct console *con)
 {
-   return (level >= console_loglevel && !ignore_loglevel);
+   return max(console_loglevel, con ? con->level : LOGLEVEL_EMERG);
+}
+
+static bool suppress_message_printing(int level, struct console *con)
+{
+   return (level >= effective_loglevel(con) && !ignore_loglevel);
 }
 
 #ifdef CONFIG_BOOT_PRINTK_DELAY
@@ -1175,7 +1180,7 @@ static void boot_delay_msec(int level)
unsigned long timeout;
 
if ((boot_delay == 0 || system_state >= SYSTEM_RUNNING)
-   || suppress_message_printing(level)) {
+   || suppress_message_printing(level, NULL)) {
return;
}
 
@@ -1549,7 +1554,7 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, 
int, len)
  * The console_lock must be held.
  */
 static void call_console_drivers(const char *ext_text, size_t ext_len,
-const char *text, size_t len)
+const char *text, size_t len, int level)
 {
struct console *con;
 
@@ -1568,6 +1573,8 @@ static void call_console_drivers(const char *ext_text, 
size_t ext_len,
if (!cpu_online(smp_processor_id()) &&
!(con->flags & CON_ANYTIME))
continue;
+   if (suppress_message_printing(level, con))
+   continue;
if (con->flags & CON_EXTENDED)
con->write(con, ext_text, ext_len);
else
@@ -1856,10 +1863,9 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
  char *dict, size_t dict_len,
  char *text, size_t text_len) { return 0; }
 static void call_console_drivers(const char *ext_text, size_t ext_len,
-const char *text, size_t len) {}
+const char *text, size_t len, int level) {}
 static size_t msg_print_text(const struct printk_log *msg,
 bool syslog, char *buf, size_t size) { return 0; }
-static bool suppress_message_printing(int level) { return false; }
 
 #endif

[PATCH 3/3] printk: Add ability to set loglevel via "console=" cmdline

2017-09-28 Thread Calvin Owens
This extends the "console=" interface to allow setting the per-console
loglevel by adding "/N" to the string, where N is the desired loglevel
expressed as a base 10 integer. Invalid values are silently ignored.

Cc: Petr Mladek 
Cc: Steven Rostedt 
Cc: Sergey Senozhatsky 
Signed-off-by: Calvin Owens 
---
 Documentation/admin-guide/kernel-parameters.txt |  6 ++---
 kernel/printk/console_cmdline.h |  1 +
 kernel/printk/printk.c  | 30 -
 3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 0549662..f22b992 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -607,10 +607,10 @@
ttyS[,options]
ttyUSB0[,options]
Use the specified serial port.  The options are of
-   the form "pnf", where "" is the baud rate,
+   the form "pnf/l", where "" is the baud rate,
"p" is parity ("n", "o", or "e"), "n" is number of
-   bits, and "f" is flow control ("r" for RTS or
-   omit it).  Default is "9600n8".
+   bits, "f" is flow control ("r" for RTS or omit it),
+   and "l" is the loglevel on [0,7]. Default is "9600n8".
 
See Documentation/admin-guide/serial-console.rst for 
more
information.  See
diff --git a/kernel/printk/console_cmdline.h b/kernel/printk/console_cmdline.h
index 2ca4a8b..269e666 100644
--- a/kernel/printk/console_cmdline.h
+++ b/kernel/printk/console_cmdline.h
@@ -5,6 +5,7 @@ struct console_cmdline
 {
charname[16];   /* Name of the driver   */
int index;  /* Minor dev. to use*/
+   int loglevel;   /* Loglevel to use */
char*options;   /* Options for the driver   */
 #ifdef CONFIG_A11Y_BRAILLE_CONSOLE
char*brl_options;   /* Options for braille driver */
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 488bda3..4c14cf2 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1892,7 +1892,7 @@ asmlinkage __visible void early_printk(const char *fmt, 
...)
 #endif
 
 static int __add_preferred_console(char *name, int idx, char *options,
-  char *brl_options)
+  int loglevel, char *brl_options)
 {
struct console_cmdline *c;
int i;
@@ -1918,6 +1918,7 @@ static int __add_preferred_console(char *name, int idx, 
char *options,
c->options = options;
braille_set_options(c, brl_options);
 
+   c->loglevel = loglevel;
c->index = idx;
return 0;
 }
@@ -1928,8 +1929,8 @@ static int __add_preferred_console(char *name, int idx, 
char *options,
 static int __init console_setup(char *str)
 {
char buf[sizeof(console_cmdline[0].name) + 4]; /* 4 for "ttyS" */
-   char *s, *options, *brl_options = NULL;
-   int idx;
+   char *s, *options, *llevel, *brl_options = NULL;
+   int idx, loglevel = LOGLEVEL_EMERG;
 
if (_braille_console_setup(&str, &brl_options))
return 1;
@@ -1947,6 +1948,14 @@ static int __init console_setup(char *str)
options = strchr(str, ',');
if (options)
*(options++) = 0;
+
+   llevel = strchr(str, '/');
+   if (llevel) {
+   *(llevel++) = 0;
+   if (kstrtoint(llevel, 10, &loglevel))
+   loglevel = LOGLEVEL_EMERG;
+   }
+
 #ifdef __sparc__
if (!strcmp(str, "ttya"))
strcpy(buf, "ttyS0");
@@ -1959,7 +1968,7 @@ static int __init console_setup(char *str)
idx = simple_strtoul(s, NULL, 10);
*s = 0;
 
-   __add_preferred_console(buf, idx, options, brl_options);
+   __add_preferred_console(buf, idx, options, loglevel, brl_options);
console_set_on_cmdline = 1;
return 1;
 }
@@ -1980,7 +1989,8 @@ __setup("console=", console_setup);
  */
 int add_preferred_console(char *name, int idx, char *options)
 {
-   return __add_preferred_console(name, idx, options, NULL);
+   return __add_preferred_console(name, idx, options, LOGLEVEL_EMERG,
+  NULL);
 }
 
 bool console_suspend_enabled = true;
@@ -2475,6 +2485,7 @@ void register_console(struct console *newcon)
struct console *bcon = NULL;
struct console_cmdline *c;
static bool has_preferred;
+   bool extant = false;
 
if (console_drivers)
for_each_console(bcon)
@@ -2541,6 +2552,12 @@ void register_console(struct console *newcon)
if (newcon->index < 0)

[PATCH 2/3] printk: Add /sys/consoles/ interface

2017-09-28 Thread Calvin Owens
This adds a new sysfs interface that contains a directory for each
console registered on the system. Each directory contains a single
"loglevel" file for reading and setting the per-console loglevel.

We can let kobject destruction race with console removal: if it does,
loglevel_{show,store}() will safely fail with -ENODEV. This is a little
weird, but avoids embedding the kobject and therefore needing to totally
refactor the way we handle console struct lifetime.

Cc: Petr Mladek 
Cc: Steven Rostedt 
Cc: Sergey Senozhatsky 
Signed-off-by: Calvin Owens 
---
(V1: https://lkml.org/lkml/2017/4/4/784)

Changes in V2:
* Honor minimum_console_loglevel when setting loglevels
* Added entry in Documentation/ABI/testing

 Documentation/ABI/testing/sysfs-consoles | 13 +
 include/linux/console.h  |  1 +
 kernel/printk/printk.c   | 88 
 3 files changed, 102 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-consoles

diff --git a/Documentation/ABI/testing/sysfs-consoles 
b/Documentation/ABI/testing/sysfs-consoles
new file mode 100644
index 000..6a1593e
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-consoles
@@ -0,0 +1,13 @@
+What:  /sys/consoles/
+Date:  September 2017
+KernelVersion: 4.15
+Contact:   Calvin Owens 
+Description:   The /sys/consoles tree contains a directory for each console
+   configured on the system. These directories contain the
+   following attributes:
+
+   * "loglevel"Set the per-console loglevel: the kernel uses
+   max(system_loglevel, perconsole_loglevel) when
+   deciding whether to emit a given message. The
+   default is 0, which means max() always yields
+   the system setting in the kernel.printk sysctl.
diff --git a/include/linux/console.h b/include/linux/console.h
index a5b5d79..76840be 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -148,6 +148,7 @@ struct console {
void*data;
struct   console *next;
int level;
+   struct kobject *kobj;
 };
 
 /*
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 3f1675e..488bda3 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -105,6 +105,8 @@ enum devkmsg_log_masks {
 
 static unsigned int __read_mostly devkmsg_log = DEVKMSG_LOG_MASK_DEFAULT;
 
+static struct kobject *consoles_dir_kobj;
+
 static int __control_devkmsg(char *str)
 {
if (!str)
@@ -2371,6 +2373,82 @@ static int __init keep_bootcon_setup(char *str)
 
 early_param("keep_bootcon", keep_bootcon_setup);
 
+static ssize_t loglevel_show(struct kobject *kobj, struct kobj_attribute *attr,
+char *buf)
+{
+   struct console *con;
+   ssize_t ret = -ENODEV;
+
+   console_lock();
+   for_each_console(con) {
+   if (con->kobj == kobj) {
+   ret = sprintf(buf, "%d\n", con->level);
+   break;
+   }
+   }
+   console_unlock();
+
+   return ret;
+}
+
+static ssize_t loglevel_store(struct kobject *kobj, struct kobj_attribute 
*attr,
+ const char *buf, size_t count)
+{
+   struct console *con;
+   ssize_t ret;
+   int tmp;
+
+   ret = kstrtoint(buf, 10, &tmp);
+   if (ret < 0)
+   return ret;
+
+   if (tmp < LOGLEVEL_EMERG)
+   return -ERANGE;
+
+   /*
+* Mimic the behavior of /dev/kmsg with respect to minimum_loglevel
+*/
+   if (tmp < minimum_console_loglevel)
+   tmp = minimum_console_loglevel;
+
+   ret = -ENODEV;
+   console_lock();
+   for_each_console(con) {
+   if (con->kobj == kobj) {
+   con->level = tmp;
+   ret = count;
+   break;
+   }
+   }
+   console_unlock();
+
+   return ret;
+}
+
+static const struct kobj_attribute console_loglevel_attr =
+   __ATTR(loglevel, 0644, loglevel_show, loglevel_store);
+
+static void console_register_sysfs(struct console *newcon)
+{
+   /*
+* We might be called very early from register_console(): in that case,
+* printk_late_init() will take care of this later.
+*/
+   if (!consoles_dir_kobj)
+   return;
+
+   newcon->kobj = kobject_create_and_add(newcon->name, consoles_dir_kobj);
+   if (WARN_ON(!newcon->kobj))
+   return;
+
+   WARN_ON(sysfs_create_file(newcon->kobj, &console_loglevel_attr.attr));
+}
+
+static void console_unregister_sysfs(struct console *oldcon)
+{
+   kobject_put(oldcon->kobj);
+}
+
 /*
  * The console driver calls this routine during kernel initialization
  * to register the console printing procedure with printk() and to
@@ -2495,6 +2573,7 @@ void registe

Re: [PATCH v2 0/2] Replace PID bitmap allocation with IDR API

2017-09-28 Thread Rik van Riel
On Fri, 2017-09-29 at 01:35 +0530, Gargi Sharma wrote:
> On Thu, Sep 28, 2017 at 3:46 PM, Rik van Riel 
> wrote:
> > On Fri, 2017-09-29 at 01:09 +0530, Gargi Sharma wrote:
> > 
> > > 1000 processes that just sleep and sit around without doing
> > > anything(100 second sleep and then exit).
> > 
> > Is that with or without your patches?
> > 
> > How does it compare to a kernel with(out) your patches?
> 
> Ah thanks for pointing this out. Those were without the patches.
> Here are the stats for easier comparison.
> 
> With Patches Without patches
> pstree
> real0m0.542sreal0m0.859s
> user0m0.335s   user0m0.536s
> sys0m0.150s sys0m0.172s
> 
> ps
> real0m0.722sreal0m0.918s
> user0m0.064s   user0m0.100s
> sys0m0.162s sys0m0.172s
> 
> readdir
> real0m0.080s   real0m0.092s
> user0m0.000s  user0m0.000s
> sys0m0.021ssys0m0.020s

So your patches speed up the use of /proc?

I suspect pstree and ps benefit from the simplification
and speedup of find_pid_ns, which is called from
find_task_by_pid_ns.

That is great news.

-- 
All Rights Reversed.

signature.asc
Description: This is a digitally signed message part


Re: [RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken exclusively

2017-09-28 Thread Linus Torvalds
On Thu, Sep 28, 2017 at 5:12 PM, Mimi Zohar  wrote:
>
> Originally IMA did define it's own lock, prior to IMA-appraisal.  IMA-
> appraisal introduced writing the file hash as an xattr, which required
> taking the i_mutex.  process_measurement() and ima_file_free() took
> the iint->mutex first and then the i_mutex, while setxattr, chmod and
> chown took the locks in reverse order.  To resolve the potential
> deadlock, the iint->mutex was eliminated.

Umm. You already have an explicit invalidation model, where you
invalidate after a write has occurred.

But the locking of the generation count (or "invalidation status" or
whatever) can - and should be - entirely independent of the locking of
the actual appraisal.

So make the appraisal itself use a semaphore ("only one appraisal at a time").

But use a separate lock for the generation count.
So then appraisal is:

 - get appraisal semaphore
  - get generation count lock
read generation count
  - drop generation count lock
  - do the actual appraisal
 - drop appraisal semaphore

Note that you now have a tuple of "generation count, appraisal" that
you have *not* saved off yet, but it's your stable thing.

Now you can write the xattr:

  - get exclusive inode lock (for xattr)
  - get generation count lock
  - if the appraisal generation does not match, do NOT write
the appraisal you just calculated, since it's pointless: it's already
stale.
  - otherwise write the appraisal and generation count to the xattr
  - drop generation count lock
  - release exclusive inode lock

and then for anything that does setxattr or chmod or whatever, just
use that generation count lock to invalidate the appraisal. You don't
need to actual appraisal lock for that.

So now the appraisal lock is always the outermost one, and the
generation count lock is always the innermost.

Anyway, I haven't looked at the details of what IMA does, but
something like the above really sounds like it should work and seems
pretty straightforward.

No?

   Linus


Re: [PATCH v1 06/14] tee: optee: add page list manipulation functions

2017-09-28 Thread Yury Norov
On Thu, Sep 28, 2017 at 09:04:03PM +0300, Volodymyr Babchuk wrote:
> From: Volodymyr Babchuk 
> 
> These functions will be used to pass information about shared
> buffers to OP-TEE.
> 
> Signed-off-by: Volodymyr Babchuk 
> ---
>  drivers/tee/optee/call.c  | 48 
> +++
>  drivers/tee/optee/optee_private.h |  4 
>  2 files changed, 52 insertions(+)
> 
> diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
> index f7b7b40..f8e044d 100644
> --- a/drivers/tee/optee/call.c
> +++ b/drivers/tee/optee/call.c
> @@ -11,6 +11,7 @@
>   * GNU General Public License for more details.
>   *
>   */
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -442,3 +443,50 @@ void optee_disable_shm_cache(struct optee *optee)
>   }
>   optee_cq_wait_final(&optee->call_queue, &w);
>  }
> +
> +/**
> + * optee_fill_pages_list() - write list of user pages to given shared
> + * buffer.
> + *
> + * @dst: page-aligned buffer where list of pages will be stored

I'm not much familiar with the subsystem you work on, but I don't
understand why the type of dst is u64*. If it's just a buffer, it
should be void *. Also, if we assuming running it on arm were pointers
are 32-bit, the result of page_to_phys() will be u32, and you will
waste half of your u64 array for storing zeroes; this line:
*dst = page_to_phys(pages[i]);

> + * @pages: array of pages that represents shared buffer
> + * @num_pages: number of entries in @pages
> + *
> + * @dst should be big enough to hold list of user page addresses and
> + *   links to the next pages of buffer
> + */
> +void optee_fill_pages_list(u64 *dst, struct page **pages, size_t num_pages)
> +{
> + size_t i;
> +
> + /* TODO: add support for RichOS page sizes that != 4096 */
> + BUILD_BUG_ON(PAGE_SIZE != OPTEE_MSG_NONCONTIG_PAGE_SIZE);

RichOS stands for Linux? Why I am still not a rich OS developer? :)
This is the first occurrence of the term in kernel sources, please
explain it.

Also, I think that it would be more logical to add the dependency on
page size to Kconfig, not here, and move the comment there, so user
will be simply unable to build the whole module.

> + for (i = 0; i < num_pages; i++, dst++) {
> + /* Check if we are going to roll over the page boundary */
> + if (IS_ALIGNED((uintptr_t)(dst + 1),
> +OPTEE_MSG_NONCONTIG_PAGE_SIZE)) {
> + *dst = virt_to_phys(dst + 1);
> + dst++;
> + }

Is my understanding correct that @dst is not a simple array of buffer
page addresses?  Instead, it has a complex structure: First 511 records
store buffer page entries, and last one points to the next page of dst.
Is it somehow documented? Also, did you consider to create a header structure
for the buffer page, like memory allocators do? You can place there number
of entries, pointer to the next page, maybe some flags. I think it will be
more transparent, especially if we consider communication protocol between
independent software products.

> + *dst = page_to_phys(pages[i]);
> + }
> +}
> +
> +static size_t get_pages_array_size(size_t num_entries)
> +{
> + /* Number of user pages + number of pages to hold list of user pages */
> + return sizeof(u64) *
> + (num_entries + (sizeof(u64) * num_entries) /
> +  OPTEE_MSG_NONCONTIG_PAGE_SIZE);
> +}
> +
> +u64 *optee_allocate_pages_array(size_t num_entries)
> +{
> + return alloc_pages_exact(get_pages_array_size(num_entries), GFP_KERNEL);
> +}
> +
> +void optee_free_pages_array(void *array, size_t num_entries)
> +{
> + free_pages_exact(array, get_pages_array_size(num_entries));
> +}
> +
> diff --git a/drivers/tee/optee/optee_private.h 
> b/drivers/tee/optee/optee_private.h
> index c374cd5..caa3c04 100644
> --- a/drivers/tee/optee/optee_private.h
> +++ b/drivers/tee/optee/optee_private.h
> @@ -165,6 +165,10 @@ int optee_from_msg_param(struct tee_param *params, 
> size_t num_params,
>  int optee_to_msg_param(struct optee_msg_param *msg_params, size_t num_params,
>  const struct tee_param *params);
>  
> +u64 *optee_allocate_pages_array(size_t num_entries);
> +void optee_free_pages_array(void *array, size_t num_entries);
> +void optee_fill_pages_list(u64 *dst, struct page **pages, size_t num_pages);
> +
>  /*
>   * Small helpers
>   */
> -- 
> 2.7.4


Re: [PATCH 10/12] writeback: only allow one inflight and pending full flush

2017-09-28 Thread Jens Axboe
On 09/28/2017 11:44 PM, Linus Torvalds wrote:
> On Thu, Sep 28, 2017 at 2:41 PM, Andrew Morton
>  wrote:
>>
>> test_and_set_bit()?
> 
> If there aren't any atomicity concerns (either because of higher-level
> locking, or because racing and having two people set the bit is fine),
> it can be better to do them separately if the test_bit() is the common
> case and you can avoid dirtying a cacheline that way.
> 
> But yeah, if that is the case, it might be worth documenting, because
> test_and_set_bit() is the more obviously appropriate "there can be
> only one" model.

It is documented though, but maybe not well enough...

I've actually had to document/explain it enough times now, that it
might be worth making a general construct. Though it has to be
used carefully, so perhaps it's better contained as separate use
cases.

-- 
Jens Axboe



Re: [RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken exclusively

2017-09-28 Thread Mimi Zohar
On Thu, 2017-09-28 at 16:39 -0700, Linus Torvalds wrote:
> On Thu, Sep 28, 2017 at 3:02 PM, Dave Chinner  wrote:
> > On Thu, Sep 28, 2017 at 08:39:33AM -0400, Mimi Zohar wrote:
> >> Don't attempt to take the i_rwsem, if it has already been taken
> >> exclusively.
> >>
> >> Signed-off-by:  Mimi Zohar 
> >
> > That's bloody awful.
> >
> > The locking in filesystem IO paths is already complex enough without
> > adding a new IO path semantic that says "caller has already locked
> > the i_rwsem in some order and some dependencies that we have no idea
> > about".
> 
> I do have to admit that I never got a satisfactory answer on why IMA
> doesn't just use its own private per-inode lock for this all.
> 
> It isn't using the i_rwsem for file consistency reasons anyway, so it
> seems to be purely about serializing the actual signature generation
> with the xattr writing, but since IMA does those both, why isn't IMA
> just using its own lock (not the filesystem lock) to do that?

Originally IMA did define it's own lock, prior to IMA-appraisal.  IMA-
appraisal introduced writing the file hash as an xattr, which required
taking the i_mutex.  process_measurement() and ima_file_free() took
the iint->mutex first and then the i_mutex, while setxattr, chmod and
chown took the locks in reverse order.  To resolve the potential
deadlock, the iint->mutex was eliminated.

Mimi



Re: [PATCH 10/12] writeback: only allow one inflight and pending full flush

2017-09-28 Thread Jens Axboe
On 09/28/2017 11:41 PM, Andrew Morton wrote:
> On Wed, 27 Sep 2017 14:13:57 -0600 Jens Axboe  wrote:
> 
>> When someone calls wakeup_flusher_threads() or
>> wakeup_flusher_threads_bdi(), they schedule writeback of all dirty
>> pages in the system (or on that bdi). If we are tight on memory, we
>> can get tons of these queued from kswapd/vmscan. This causes (at
>> least) two problems:
>>
>> 1) We consume a ton of memory just allocating writeback work items.
>>We've seen as much as 600 million of these writeback work items
>>pending. That's a lot of memory to pointlessly hold hostage,
>>while the box is under memory pressure.
>>
>> 2) We spend so much time processing these work items, that we
>>introduce a softlockup in writeback processing. This is because
>>each of the writeback work items don't end up doing any work (it's
>>hard when you have millions of identical ones coming in to the
>>flush machinery), so we just sit in a tight loop pulling work
>>items and deleting/freeing them.
>>
>> Fix this by adding a 'start_all' bit to the writeback structure, and
>> set that when someone attempts to flush all dirty pages. The bit is
>> cleared when we start writeback on that work item. If the bit is
>> already set when we attempt to queue !nr_pages writeback, then we
>> simply ignore it.
>>
>> This provides us one full flush in flight, with one pending as well,
>> and makes for more efficient handling of this type of writeback.
>>
>> ...
>>
>> @@ -953,12 +954,27 @@ static void wb_start_writeback(struct bdi_writeback 
>> *wb, bool range_cyclic,
>>  return;
>>  
>>  /*
>> + * All callers of this function want to start writeback of all
>> + * dirty pages. Places like vmscan can call this at a very
>> + * high frequency, causing pointless allocations of tons of
>> + * work items and keeping the flusher threads busy retrieving
>> + * that work. Ensure that we only allow one of them pending and
>> + * inflight at the time. It doesn't matter if we race a little
>> + * bit on this, so use the faster separate test/set bit variants.
>> + */
>> +if (test_bit(WB_start_all, &wb->state))
>> +return;
>> +
>> +set_bit(WB_start_all, &wb->state);
> 
> test_and_set_bit()?

Like Linus says, this is done purposely. I've even included a bit about
it in the comment above, though maybe it's not clear enough. I've used
this trick in blk-mq quite a bit as well, and for high frequency calls,
it can make a substantial difference not to redirty that cache line if
you can avoid it.

If you do care about atomicity, this works really well too:

if (test_bit(bit, addr) || test_and_set_bit(bit, addr))
...

just to avoid the locked operation. Also see this commit:
commit 7fcbbaf18392f0b17c95e2f033c8ccf87eecde1d
Author: Jens Axboe 
Date:   Thu May 22 11:54:16 2014 -0700

mm/filemap.c: avoid always dirtying mapping->flags on O_DIRECT

where there are some actual numbers on a specific case.

For the case at hand, we don't even need to do the test_and_set
case, since we don't care about a small race there.

-- 
Jens Axboe



[PATCH] extcon: Split out extcon header file for consumer and provider device

2017-09-28 Thread Chanwoo Choi
The extcon has two type of extcon devices as following.
- 'extcon provider deivce' adds new extcon device and detect the
   state/properties of external connector. Also, it notifies the
   state/properties to the extcon consumer device.
- 'extcon consumer device' gets the change state/properties
   from extcon provider device.
Prior to that, include/linux/extcon.h contains all exported API for
both provider and consumer device driver. To clarify the meaning of
header file and to remove the wrong use-case on consumer device,
this patch separates into extcon.h and extcon-provider.h.

[Description for include/linux/{extcon.h|extcon-provider.h}]
- extcon.h includes the extcon API and data structure for extcon consumer
  device driver. This header file contains the following APIs:
  : Register/unregister the notifier to catch the change of extcon device
  : Get the extcon device instance
  : Get the extcon device name
  : Get the state of each external connector
  : Get the property value of each external connector
  : Get the property capability of each external connector

- extcon-provider.h includes the extcon API and data structure for extcon
  provider device driver. This header file contains the following APIs:
  : Include 'include/linux/extcon.h'
  : Allocate the memory for extcon device instance
  : Register/unregister extcon device
  : Set the state of each external connector
  : Set the property value of each external connector
  : Set the property capability of each external connector

Cc: Felipe Balbi 
Cc: Kishon Vijay Abraham I 
Cc: Greg Kroah-Hartman 
Cc: Sebastian Reichel 
Cc: Lee Jones 
Signed-off-by: Chanwoo Choi 
---
 drivers/extcon/extcon-adc-jack.c  |   2 +-
 drivers/extcon/extcon-arizona.c   |   2 +-
 drivers/extcon/extcon-axp288.c|   2 +-
 drivers/extcon/extcon-gpio.c  |   2 +-
 drivers/extcon/extcon-intel-cht-wc.c  |   2 +-
 drivers/extcon/extcon-intel-int3496.c |   2 +-
 drivers/extcon/extcon-max14577.c  |   2 +-
 drivers/extcon/extcon-max3355.c   |   2 +-
 drivers/extcon/extcon-max77693.c  |   2 +-
 drivers/extcon/extcon-max77843.c  |   2 +-
 drivers/extcon/extcon-max8997.c   |   2 +-
 drivers/extcon/extcon-qcom-spmi-misc.c|   2 +-
 drivers/extcon/extcon-rt8973a.c   |   2 +-
 drivers/extcon/extcon-sm5502.c|   2 +-
 drivers/extcon/extcon-usb-gpio.c  |   2 +-
 drivers/extcon/extcon-usbc-cros-ec.c  |   2 +-
 drivers/extcon/extcon.h   |   2 +-
 drivers/phy/allwinner/phy-sun4i-usb.c |   2 +-
 drivers/phy/broadcom/phy-bcm-ns2-usbdrd.c |   2 +-
 drivers/phy/renesas/phy-rcar-gen3-usb2.c  |   2 +-
 drivers/phy/rockchip/phy-rockchip-inno-usb2.c |   2 +-
 drivers/power/supply/qcom_smbb.c  |   2 +-
 drivers/usb/gadget/udc/renesas_usb3.c |   2 +-
 drivers/usb/phy/phy-tahvo.c   |   2 +-
 drivers/usb/renesas_usbhs/common.h|   2 +-
 include/linux/extcon-provider.h   | 142 ++
 include/linux/extcon.h| 109 +---
 include/linux/mfd/palmas.h|   2 +-
 28 files changed, 173 insertions(+), 130 deletions(-)
 create mode 100644 include/linux/extcon-provider.h

diff --git a/drivers/extcon/extcon-adc-jack.c b/drivers/extcon/extcon-adc-jack.c
index 6f6537ab0a79..3877d86c746a 100644
--- a/drivers/extcon/extcon-adc-jack.c
+++ b/drivers/extcon/extcon-adc-jack.c
@@ -26,7 +26,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 /**
  * struct adc_jack_data - internal data for adc_jack device driver
diff --git a/drivers/extcon/extcon-arizona.c b/drivers/extcon/extcon-arizona.c
index f84da4a17724..da0e9bc4262f 100644
--- a/drivers/extcon/extcon-arizona.c
+++ b/drivers/extcon/extcon-arizona.c
@@ -27,7 +27,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include 
 
diff --git a/drivers/extcon/extcon-axp288.c b/drivers/extcon/extcon-axp288.c
index f4fd03e58e37..981fba56bc18 100644
--- a/drivers/extcon/extcon-axp288.c
+++ b/drivers/extcon/extcon-axp288.c
@@ -22,7 +22,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/extcon/extcon-gpio.c b/drivers/extcon/extcon-gpio.c
index ebed22f22d75..ab770adcca7e 100644
--- a/drivers/extcon/extcon-gpio.c
+++ b/drivers/extcon/extcon-gpio.c
@@ -17,7 +17,7 @@
  * GNU General Public License for more details.
  */
 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/extcon/extcon-intel-cht-wc.c 
b/drivers/extcon/extcon-intel-cht-wc.c
index 91a0023074af..7c4bc8c44c3f 100644
--- a/drivers/extcon/extcon-intel-cht-wc.c
+++ b/drivers/extcon/extcon-intel-cht-wc.c
@@ -15,7 +15,7 @@
  * more details.
  */
 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/extcon/extcon-intel-int3496.c 
b/drivers/extcon/e

Re: [PATCH V4] r8152: add Linksys USB3GIGV1 id

2017-09-28 Thread Doug Anderson
Grant,

On Thu, Sep 28, 2017 at 11:35 AM, Grant Grundler  wrote:
> This linksys dongle by default comes up in cdc_ether mode.
> This patch allows r8152 to claim the device:
>Bus 002 Device 002: ID 13b1:0041 Linksys
>
> Signed-off-by: Grant Grundler 
> ---
>  drivers/net/usb/cdc_ether.c | 10 ++
>  drivers/net/usb/r8152.c |  2 ++
>  2 files changed, 12 insertions(+)

This seems nice to me now.  Thanks for all the fixes!  I'm no expert
in this area, but as far as I know this is ready to go now, so FWIW:

Reviewed-by: Douglas Anderson 


Re: [PATCH 1/4] pci: introduce __pci_walk_bus for caller with pci_bus_sem held

2017-09-28 Thread Govindarajulu Varadarajan

On Thu, 28 Sep 2017, Sinan Kaya wrote:


On 9/27/2017 5:42 PM, Govindarajulu Varadarajan wrote:

+void __pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
+   void *userdata);


pci_walk_bus_locked would be a better name as you are assuming that caller is
holding the lock.



Will change and resubmit for v2.



Re: [PATCH] x86/asm: Fix inline asm call constraints for GCC 4.4

2017-09-28 Thread Linus Torvalds
On Thu, Sep 28, 2017 at 2:58 PM, Josh Poimboeuf  wrote:
>
> Reported-by: kernel test robot 
> Fixes: f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang")
> Signed-off-by: Josh Poimboeuf 

Side note: it's not like I personally need the credit, but in general
I really want people to pick up on who debugged the code and pointed
to the solution. That's often more of the work than the fix itself.

The kernel test robot report looked to be ignored as a "gcc-4.4 is too
old to worry about" thing. People who then step up and analyze the
problem are rare as it is. They need to be credited in the commit
logs.

We don't have any fixed format for that, but it's pretty free-form. So
we have tags like

  Root-caused-by:
  Diagnosed-by:
  Analyzed-by:
  Debugged-by:
  Bisected-by:
  Fix-suggested-by:

etc for giving credit to people who figured out some part of a bug
(and, having grepped for this, we also a _shitload_ of miss-spellings
of various things ;)

 Linus


Re: [PATCH V2] r8152: add Linksys USB3GIGV1 id

2017-09-28 Thread Doug Anderson
Hi,

On Thu, Sep 28, 2017 at 3:28 PM, Rustad, Mark D  wrote:
>
>> On Sep 27, 2017, at 9:39 AM, Grant Grundler  wrote:
>>
>> On Wed, Sep 27, 2017 at 12:15 AM, Oliver Neukum  wrote:
>>> Am Dienstag, den 26.09.2017, 08:19 -0700 schrieb Doug Anderson:

 I know that for at least some of the adapters in the CDC Ethernet
 blacklist it was claimed that the CDC Ethernet support in the adapter
 was kinda broken anyway so the blacklist made sense.  ...but for the
 Linksys Gigabit adapter the CDC Ethernet driver seems to work OK, it's
 just not quite as full featured / efficient as the R8152 driver.

 Is that not a concern?  I guess you could tell people in this
 situation that they simply need to enable the R8152 driver to get
 continued support for their Ethernet adapter?
>>>
>>> Hi,
>>>
>>> yes, it is a valid concern. An #ifdef will be needed.
>>
>> Good idea - I will post V3 shortly.
>>
>> I'm assuming you mean to add #ifdef CONFIG_USB_RTL8152 around the
>> blacklist entry in cdc_ether driver.
>
> Shouldn't that be an #if IS_ENABLED(...) test, since that seems to be the 
> proper way to check configured drivers.

Yes, I had the same feedback on v3.  See my comments at
.  Grant has fixed it in
v4.  Please see .  :)

-Doug


Re: [PATCH 4/4] lockdep: make MAX_LOCK_DEPTH configurable from Kconfig

2017-09-28 Thread Govindarajulu Varadarajan

On Thu, 28 Sep 2017, Peter Zijlstra wrote:


On Wed, Sep 27, 2017 at 02:42:20PM -0700, Govindarajulu Varadarajan wrote:

Make MAX_LOCK_DEPTH configurable. It is set to 48 right now. Number of
VFs under a PCI pf bus can exceed 48 and this disables lockdep.

lockdep currently allows max of 63 held_locks.


But why a config knob? Why not just raise the number to 64
unconditionally? And is that sufficient; you only state 48 is
insufficient, you don't actually state the VF limit.



I did not want to change the default configuration for everyone.

I will change it 63 unconditionally in v2 and resubmit the series.


Re: [PATCH INTERNAL 1/1] ASoC: cygnus: Remove set_fmt from SPDIF dai ops

2017-09-28 Thread Lori Hikichi


On 9/28/2017 3:29 PM, Lori Hikichi wrote:
> The SPDIF port cannot modify its format so a set_fmt function is not
> needed. Previously, we used a generic set_fmt for all ports and returned
> an error code for the SPDIF port. It is cleaner to not populate the
> set_fmt field.
>
> Signed-off-by: Lori Hikichi 
> ---
>  sound/soc/bcm/cygnus-ssp.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/sound/soc/bcm/cygnus-ssp.c b/sound/soc/bcm/cygnus-ssp.c
> index 15c438f..b9a3bb8d 100644
> --- a/sound/soc/bcm/cygnus-ssp.c
> +++ b/sound/soc/bcm/cygnus-ssp.c
> @@ -1136,6 +1136,13 @@ static int cygnus_ssp_resume(struct snd_soc_dai 
> *cpu_dai)
>   .set_tdm_slot   = cygnus_set_dai_tdm_slot,
>  };
>  
> +static const struct snd_soc_dai_ops cygnus_spdif_dai_ops = {
> + .startup= cygnus_ssp_startup,
> + .shutdown   = cygnus_ssp_shutdown,
> + .trigger= cygnus_ssp_trigger,
> + .hw_params  = cygnus_ssp_hw_params,
> + .set_sysclk = cygnus_ssp_set_sysclk,
> +};
>  
>  #define INIT_CPU_DAI(num) { \
>   .name = "cygnus-ssp" #num, \
> @@ -1174,7 +1181,7 @@ static int cygnus_ssp_resume(struct snd_soc_dai 
> *cpu_dai)
>   .formats = SNDRV_PCM_FMTBIT_S16_LE |
>   SNDRV_PCM_FMTBIT_S32_LE,
>   },
> - .ops = &cygnus_ssp_dai_ops,
> + .ops = &cygnus_spdif_dai_ops,
>   .suspend = cygnus_ssp_suspend,
>   .resume = cygnus_ssp_resume,
>  };
Please ignore this patch. It was accidentally included in the patch set and is a
duplicate of another patch already in the patch set.


Re: [PATCH 3/4] pci aer: fix deadlock in do_recovery

2017-09-28 Thread Govindarajulu Varadarajan

On Thu, 28 Sep 2017, Sinan Kaya wrote:


On 9/27/2017 5:42 PM, Govindarajulu Varadarajan wrote:

CPU0CPU1
-
__driver_attach()
device_lock(&dev->mutex) <--- device mutex lock here
driver_probe_device()
pci_enable_sriov()
pci_iov_add_virtfn()
pci_device_add()
aer_isr()   <--- pci aer 
error
do_recovery()
broadcast_error_message()
pci_walk_bus()
down_read(&pci_bus_sem) <--- rd sem


How about releasing the device_lock here on CPU0?>


pci_device_add() is called by driver's pci probe function. device_lock(dev)
should be held before calling pci driver probe function.


or in other words keep device_lock as short as possible?


The problem is not the duration device_lock is held. It is the order two locks
are aquired. We cannot control or implement a restriction that during
device_lock() is held, driver probe should not call pci function which aquires
pci_bus_sem. And in case of pci aer, aer handler needs to call driver 
err_handler()
for which we need to hold device_lock() before calling err_handler(). In order
to find all the devices on a pci bus, we should hold pci_bus_sem to do
pci_walk_bus().


down_write(&pci_bus_sem) <-- stuck on wr sem
report_error_detected()
device_lock(&dev->mutex)<--- DEAD LOCK




Re: [RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken exclusively

2017-09-28 Thread Linus Torvalds
On Thu, Sep 28, 2017 at 3:02 PM, Dave Chinner  wrote:
> On Thu, Sep 28, 2017 at 08:39:33AM -0400, Mimi Zohar wrote:
>> Don't attempt to take the i_rwsem, if it has already been taken
>> exclusively.
>>
>> Signed-off-by:  Mimi Zohar 
>
> That's bloody awful.
>
> The locking in filesystem IO paths is already complex enough without
> adding a new IO path semantic that says "caller has already locked
> the i_rwsem in some order and some dependencies that we have no idea
> about".

I do have to admit that I never got a satisfactory answer on why IMA
doesn't just use its own private per-inode lock for this all.

It isn't using the i_rwsem for file consistency reasons anyway, so it
seems to be purely about serializing the actual signature generation
with the xattr writing, but since IMA does those both, why isn't IMA
just using its own lock (not the filesystem lock) to do that?

 Linus


Re: [REGRESSION] Warning in tcp_fastretrans_alert() of net/ipv4/tcp_input.c

2017-09-28 Thread Yuchung Cheng
On Thu, Sep 28, 2017 at 1:14 AM, Oleksandr Natalenko
 wrote:
> Hi.
>
> Won't tell about panic in tcp_sacktag_walk() since I cannot trigger it
> intentionally, but setting net.ipv4.tcp_retrans_collapse to 0 *does not* fix
> warning in tcp_fastretrans_alert() for me.

Hi Oleksandr: no retrans_collapse should not matter for that warning
in tcp_fstretrans_alert(). the warning as I explained earlier is
likely false. Neal and I are more concerned the panic in
tcp_sacktag_walk. This is just a blind shot but thx for retrying.

We can submit a one-liner to remove the fast retrans warning but want
to nail the bigger issue first.

>
> On středa 27. září 2017 2:18:32 CEST Yuchung Cheng wrote:
>> On Tue, Sep 26, 2017 at 5:12 PM, Yuchung Cheng  wrote:
>> > On Tue, Sep 26, 2017 at 6:10 AM, Roman Gushchin  wrote:
>> >>> On Wed, Sep 20, 2017 at 6:46 PM, Roman Gushchin  wrote:
>> >>> > > Hello.
>> >>> > >
>> >>> > > Since, IIRC, v4.11, there is some regression in TCP stack resulting
>> >>> > > in the
>> >>> > > warning shown below. Most of the time it is harmless, but rarely it
>> >>> > > just
>> >>> > > causes either freeze or (I believe, this is related too) panic in
>> >>> > > tcp_sacktag_walk() (because sk_buff passed to this function is
>> >>> > > NULL).
>> >>> > > Unfortunately, I still do not have proper stacktrace from panic, but
>> >>> > > will try to capture it if possible.
>> >>> > >
>> >>> > > Also, I have custom settings regarding TCP stack, shown below as
>> >>> > > well. ifb is used to shape traffic with tc.
>> >>> > >
>> >>> > > Please note this regression was already reported as BZ [1] and as a
>> >>> > > letter to ML [2], but got neither attention nor resolution. It is
>> >>> > > reproducible for (not only) me on my home router since v4.11 till
>> >>> > > v4.13.1 incl.
>> >>> > >
>> >>> > > Please advise on how to deal with it. I'll provide any additional
>> >>> > > info if
>> >>> > > necessary, also ready to test patches if any.
>> >>> > >
>> >>> > > Thanks.
>> >>> > >
>> >>> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=195835
>> >>> > > [2]
>> >>> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.ne
>> >>> > > t_lists_netdev_msg436158.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJ
>> >>> > > YgtDM7QT-W-Fz_d29HYQ&m=MDDRfLG5DvdOeniMpaZDJI8ulKQ6PQ6OX_1YtRsiTMA&s
>> >>> > > =-n3dGZw-pQ95kMBUfq5G9nYZFcuWtbTDlYFkcvQPoKc&e=>>> >
>> >>> > We're experiencing the same problems on some machines in our fleet.
>> >>> > Exactly the same symptoms: tcp_fastretrans_alert() warnings and
>> >>> > sometimes panics in tcp_sacktag_walk().
>> >>
>> >>> > Here is an example of a backtrace with the panic log:
>> >> Hi Yuchung!
>> >>
>> >>> do you still see the panics if you disable RACK?
>> >>> sysctl net.ipv4.tcp_recovery=0?
>> >>
>> >> No, we haven't seen any crash since that.
>> >
>> > I am out of ideas how RACK can potentially cause tcp_sacktag_walk to
>> > take an empty skb :-( Do you have stack trace or any hint on which call
>> > to tcp-sacktag_walk triggered the panic? internally at Google we never
>> > see that.
>>
>> hmm something just struck me: could you try
>> sysctl net.ipv4.tcp_recovery=1 net.ipv4.tcp_retrans_collapse=0
>> and see if kernel still panics on sack processing?
>>
>> >>> also have you experience any sack reneg? could you post the output of
>> >>> ' nstat |grep -i TCP' thanks
>> >>
>> >> hostnameTcpActiveOpens  22896800.0
>> >> hostnameTcpPassiveOpens 35927580.0
>> >> hostnameTcpAttemptFails 746910 0.0
>> >> hostnameTcpEstabResets  154988 0.0
>> >> hostnameTcpInSegs   162586782550.0
>> >> hostnameTcpOutSegs  469670116110.0
>> >> hostnameTcpRetransSegs  13724310   0.0
>> >> hostnameTcpInErrs   2  0.0
>> >> hostnameTcpOutRsts  94187980.0
>> >> hostnameTcpExtEmbryonicRsts 2303   0.0
>> >> hostnameTcpExtPruneCalled   90192  0.0
>> >> hostnameTcpExtOfoPruned 57274  0.0
>> >> hostnameTcpExtOutOfWindowIcmps  3  0.0
>> >> hostnameTcpExtTW11647050.0
>> >> hostnameTcpExtTWRecycled2  0.0
>> >> hostnameTcpExtPAWSEstab 1590.0
>> >> hostnameTcpExtDelayedACKs   209207209  0.0
>> >> hostnameTcpExtDelayedACKLocked  508571 0.0
>> >> hostnameTcpExtDelayedACKLost17132480.0
>> >> hostnameTcpExtListenOverflows   6250.0
>> >> hostnameTcpExtListenDrops   6250.0
>> >> hostnam

linux-next: Signed-off-by misspelt for commit in the akpm-current tree

2017-09-28 Thread Stephen Rothwell
Hi Andrew,

Commit

  0be0a6eba9e3 ("z3fold: fix stale list handling")

has missing Signed-off-by misspelt for its author.

-- 
Cheers,
Stephen Rothwell


Re: [PATCH v4 2/2] ACPI / CPPC: Make cppc acpi driver aware of pcc subspace ids

2017-09-28 Thread Prakash, Prashanth
Hi George,

On 9/19/2017 11:24 PM, George Cherian wrote:
> Based on ACPI 6.2 Section 8.4.7.1.9 If the PCC register space is used,
> all PCC registers, for all processors in the same performance
> domain (as defined by _PSD), must be defined to be in the same subspace.
> Based on Section 14.1 of ACPI specification, it is possible to have a
> maximum of 256 PCC subspace ids. Add support of multiple PCC subspace id
> instead of using a single global pcc_data structure.
>
> While at that fix the time_delta check in send_pcc_cmd() so that 
> last_mpar_reset
> and mpar_count is initialized properly.
>
> Signed-off-by: George Cherian 
> ---
>  drivers/acpi/cppc_acpi.c | 243 
> +--
>  1 file changed, 153 insertions(+), 90 deletions(-)
>
> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> index e5b47f0..3ae79ef 100644
> --- a/drivers/acpi/cppc_acpi.c
> +++ b/drivers/acpi/cppc_acpi.c
> @@ -75,13 +75,16 @@ struct cppc_pcc_data {
>  
>   /* Wait queue for CPUs whose requests were batched */
>   wait_queue_head_t pcc_write_wait_q;
> + ktime_t last_cmd_cmpl_time;
> + ktime_t last_mpar_reset;
> + int mpar_count;
> + int refcount;
>  };
>  
> -/* Structure to represent the single PCC channel */
> -static struct cppc_pcc_data pcc_data = {
> - .pcc_subspace_idx = -1,
> - .platform_owns_pcc = true,
> -};
> +/* Array  to represent the PCC channel per subspace id */
> +static struct cppc_pcc_data *pcc_data[MAX_PCC_SUBSPACES];
> +/* The cpu_pcc_subspace_idx containsper CPU subspace id */
> +static DEFINE_PER_CPU(int, cpu_pcc_subspace_idx);
>  
>  /*
>   * The cpc_desc structure contains the ACPI register details
> @@ -93,7 +96,8 @@ static struct cppc_pcc_data pcc_data = {
>  static DEFINE_PER_CPU(struct cpc_desc *, cpc_desc_ptr);
>  
>  /* pcc mapped address + header size + offset within PCC subspace */
> -#define GET_PCC_VADDR(offs) (pcc_data.pcc_comm_addr + 0x8 + (offs))
> +#define GET_PCC_VADDR(offs, pcc_ss_id) (pcc_data[pcc_ss_id]->pcc_comm_addr + 
> \
> + 0x8 + (offs))
>  
>  /* Check if a CPC register is in PCC */
>  #define CPC_IN_PCC(cpc) ((cpc)->type == ACPI_TYPE_BUFFER &&  \
> @@ -188,13 +192,16 @@ static struct kobj_type cppc_ktype = {
>   .default_attrs = cppc_attrs,
>  };
>  
> -static int check_pcc_chan(bool chk_err_bit)
> +static int check_pcc_chan(int pcc_ss_id, bool chk_err_bit)
>  {
>   int ret = -EIO, status = 0;
> - struct acpi_pcct_shared_memory __iomem *generic_comm_base = 
> pcc_data.pcc_comm_addr;
> - ktime_t next_deadline = ktime_add(ktime_get(), pcc_data.deadline);
> + struct cppc_pcc_data *pcc_ss_data = pcc_data[pcc_ss_id];
> + struct acpi_pcct_shared_memory __iomem *generic_comm_base =
> + pcc_ss_data->pcc_comm_addr;
> + ktime_t next_deadline = ktime_add(ktime_get(),
> +   pcc_ss_data->deadline);
>  
> - if (!pcc_data.platform_owns_pcc)
> + if (!pcc_ss_data->platform_owns_pcc)
>   return 0;
>  
>   /* Retry in case the remote processor was too slow to catch up. */
> @@ -219,7 +226,7 @@ static int check_pcc_chan(bool chk_err_bit)
>   }
>  
>   if (likely(!ret))
> - pcc_data.platform_owns_pcc = false;
> + pcc_ss_data->platform_owns_pcc = false;
>   else
>   pr_err("PCC check channel failed. Status=%x\n", status);
>  
> @@ -230,13 +237,12 @@ static int check_pcc_chan(bool chk_err_bit)
>   * This function transfers the ownership of the PCC to the platform
>   * So it must be called while holding write_lock(pcc_lock)
>   */
> -static int send_pcc_cmd(u16 cmd)
> +static int send_pcc_cmd(int pcc_ss_id, u16 cmd)
>  {
>   int ret = -EIO, i;
> + struct cppc_pcc_data *pcc_ss_data = pcc_data[pcc_ss_id];
>   struct acpi_pcct_shared_memory *generic_comm_base =
> - (struct acpi_pcct_shared_memory *) pcc_data.pcc_comm_addr;
> - static ktime_t last_cmd_cmpl_time, last_mpar_reset;
> - static int mpar_count;
> + (struct acpi_pcct_shared_memory *)pcc_ss_data->pcc_comm_addr;
>   unsigned int time_delta;
>  
>   /*
> @@ -249,24 +255,25 @@ static int send_pcc_cmd(u16 cmd)
>* before write completion, so first send a WRITE command to
>* platform
>*/
> - if (pcc_data.pending_pcc_write_cmd)
> - send_pcc_cmd(CMD_WRITE);
> + if (pcc_ss_data->pending_pcc_write_cmd)
> + send_pcc_cmd(pcc_ss_id, CMD_WRITE);
>  
> - ret = check_pcc_chan(false);
> + ret = check_pcc_chan(pcc_ss_id, false);
>   if (ret)
>   goto end;
>   } else /* CMD_WRITE */
> - pcc_data.pending_pcc_write_cmd = FALSE;
> + pcc_ss_data->pending_pcc_write_cmd = FALSE;
>  
>   /*
>* Handle the Minimum Request Turnaround Time(MRTT

[PATCH v2 00/10] ARM: bcm: Add support for Broadcom Hurricane 2 SoC

2017-09-28 Thread Florian Fainelli
Hi all,

This patch series adds basic (boot to prompt with essential peripherals
working) support for Broadcom's Hurricane 2 SoC which is found in switching
applications.

This is also an iProc-family chip with a number of variations, including
some in the clock controller that I have not been able to identify yet.

Changes in v2:

- fixed DTC warnings spotted with make dtbs W=1
- added Jon's ack

Florian Fainelli (10):
  MAINTAINERS: Update Broadcom iProc regexp with Hurricane 2
  dt-bindings: Add documentation for Broadcom Hurricane 2 SoCs
  ARM: bcm: Add support for Broadcom Hurricane 2 SoC
  dt-bindings: Document Broadcom Hurricane 2 clocks
  clk: bcm: Add Broadcom Hurricane 2 clock support
  ARM: dts: Add Broadcom Hurricane 2 DTS include file
  ARM: debug: Add Hurricane 2 UART2 debug addresses
  dt-bindings: Add Ubiquiti Networks vendor prefix
  ARM: dts: Hurricane 2: Add basic support for Ubiquiti UniFi Switch 8
  ARM: multi_v7_defconfig: Enable CONFIG_ARCH_BCM_HR2

 .../devicetree/bindings/arm/bcm/brcm,hr2.txt   |  14 +
 .../bindings/clock/brcm,iproc-clocks.txt   |  14 +
 .../devicetree/bindings/vendor-prefixes.txt|   1 +
 MAINTAINERS|   1 +
 arch/arm/Kconfig.debug |  10 +-
 arch/arm/boot/dts/Makefile |   2 +
 arch/arm/boot/dts/bcm-hr2.dtsi | 368 +
 arch/arm/boot/dts/bcm53340-ubnt-unifi-switch8.dts  |  85 +
 arch/arm/configs/multi_v7_defconfig|   1 +
 arch/arm/mach-bcm/Kconfig  |   9 +
 arch/arm/mach-bcm/Makefile |   3 +
 arch/arm/mach-bcm/bcm_hr2.c|  25 ++
 drivers/clk/bcm/Kconfig|   9 +
 drivers/clk/bcm/Makefile   |   1 +
 drivers/clk/bcm/clk-hr2.c  |  27 ++
 15 files changed, 569 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/arm/bcm/brcm,hr2.txt
 create mode 100644 arch/arm/boot/dts/bcm-hr2.dtsi
 create mode 100644 arch/arm/boot/dts/bcm53340-ubnt-unifi-switch8.dts
 create mode 100644 arch/arm/mach-bcm/bcm_hr2.c
 create mode 100644 drivers/clk/bcm/clk-hr2.c

-- 
2.14.1



[PATCH 2/3] Arm: dts: stm32: remove extra compatible string from DT & driver

2017-09-28 Thread Vikas Manocha
This patch remove the extra compatibility string "st,stm32-usart" from
driver & device tree.

Signed-off-by: Vikas Manocha 
Reviewed-by: Patrice Chotard 
---
 arch/arm/boot/dts/stm32f429.dtsi | 12 ++--
 arch/arm/boot/dts/stm32f746.dtsi | 12 ++--
 arch/arm/boot/dts/stm32h743.dtsi |  4 ++--
 drivers/tty/serial/stm32-usart.c |  3 ---
 4 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/arch/arm/boot/dts/stm32f429.dtsi b/arch/arm/boot/dts/stm32f429.dtsi
index dd7e99b..5d6bfdf 100644
--- a/arch/arm/boot/dts/stm32f429.dtsi
+++ b/arch/arm/boot/dts/stm32f429.dtsi
@@ -315,7 +315,7 @@
};
 
usart2: serial@40004400 {
-   compatible = "st,stm32-usart", "st,stm32-uart";
+   compatible = "st,stm32-uart";
reg = <0x40004400 0x400>;
interrupts = <38>;
clocks = <&rcc 0 STM32F4_APB1_CLOCK(UART2)>;
@@ -323,7 +323,7 @@
};
 
usart3: serial@40004800 {
-   compatible = "st,stm32-usart", "st,stm32-uart";
+   compatible = "st,stm32-uart";
reg = <0x40004800 0x400>;
interrupts = <39>;
clocks = <&rcc 0 STM32F4_APB1_CLOCK(UART3)>;
@@ -387,7 +387,7 @@
};
 
usart7: serial@40007800 {
-   compatible = "st,stm32-usart", "st,stm32-uart";
+   compatible = "st,stm32-uart";
reg = <0x40007800 0x400>;
interrupts = <82>;
clocks = <&rcc 0 STM32F4_APB1_CLOCK(UART7)>;
@@ -395,7 +395,7 @@
};
 
usart8: serial@40007c00 {
-   compatible = "st,stm32-usart", "st,stm32-uart";
+   compatible = "st,stm32-uart";
reg = <0x40007c00 0x400>;
interrupts = <83>;
clocks = <&rcc 0 STM32F4_APB1_CLOCK(UART8)>;
@@ -445,7 +445,7 @@
};
 
usart1: serial@40011000 {
-   compatible = "st,stm32-usart", "st,stm32-uart";
+   compatible = "st,stm32-uart";
reg = <0x40011000 0x400>;
interrupts = <37>;
clocks = <&rcc 0 STM32F4_APB2_CLOCK(USART1)>;
@@ -456,7 +456,7 @@
};
 
usart6: serial@40011400 {
-   compatible = "st,stm32-usart", "st,stm32-uart";
+   compatible = "st,stm32-uart";
reg = <0x40011400 0x400>;
interrupts = <71>;
clocks = <&rcc 0 STM32F4_APB2_CLOCK(USART6)>;
diff --git a/arch/arm/boot/dts/stm32f746.dtsi b/arch/arm/boot/dts/stm32f746.dtsi
index 5633860..5f94178 100644
--- a/arch/arm/boot/dts/stm32f746.dtsi
+++ b/arch/arm/boot/dts/stm32f746.dtsi
@@ -136,7 +136,7 @@
};
 
usart2: serial@40004400 {
-   compatible = "st,stm32f7-usart", "st,stm32f7-uart";
+   compatible = "st,stm32f7-uart";
reg = <0x40004400 0x400>;
interrupts = <38>;
clocks = <&rcc 1 CLK_USART2>;
@@ -144,7 +144,7 @@
};
 
usart3: serial@40004800 {
-   compatible = "st,stm32f7-usart", "st,stm32f7-uart";
+   compatible = "st,stm32f7-uart";
reg = <0x40004800 0x400>;
interrupts = <39>;
clocks = <&rcc 1 CLK_USART3>;
@@ -177,7 +177,7 @@
};
 
usart7: serial@40007800 {
-   compatible = "st,stm32f7-usart", "st,stm32f7-uart";
+   compatible = "st,stm32f7-uart";
reg = <0x40007800 0x400>;
interrupts = <82>;
clocks = <&rcc 1 CLK_UART7>;
@@ -185,7 +185,7 @@
};
 
usart8: serial@40007c00 {
-   compatible = "st,stm32f7-usart", "st,stm32f7-uart";
+   compatible = "st,stm32f7-uart";
reg = <0x40007c00 0x400>;
interrupts = <83>;
clocks = <&rcc 1 CLK_UART8>;
@@ -193,7 +193,7 @@
};
 
usart1: serial@40011000 {
-   compatible = "st,stm32f7-usart", "st,stm32f7-uart";
+   compatible = "st,stm32f7-uart";
reg = <0x40011000 0x400>;
interrupts = <37>;
clocks = <&rcc 1 CLK_USART1>;
@@ -201,7 +201,7 @@
};
 
usart6: serial@40011400 {
-   compatible = "st,stm32f7-usart", "st,stm32f7-uart";
+   

[PATCH v2 02/10] dt-bindings: Add documentation for Broadcom Hurricane 2 SoCs

2017-09-28 Thread Florian Fainelli
Add binding documentation for the Broadcom Hurricane 2 SoCs used in
switching control planes.

Acked-by: Jon Mason 
Signed-off-by: Florian Fainelli 
---
 Documentation/devicetree/bindings/arm/bcm/brcm,hr2.txt | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/bcm/brcm,hr2.txt

diff --git a/Documentation/devicetree/bindings/arm/bcm/brcm,hr2.txt 
b/Documentation/devicetree/bindings/arm/bcm/brcm,hr2.txt
new file mode 100644
index ..a124c7fc4dcd
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/bcm/brcm,hr2.txt
@@ -0,0 +1,14 @@
+Broadcom Hurricane 2 device tree bindings
+---
+
+Broadcom Hurricane 2 family of SoCs are used for switching control. These SoCs
+are based on Broadcom's iProc SoC architecture and feature a single core Cortex
+A9 ARM CPUs, DDR2/DDR3 memory, PCIe GEN-2, USB 2.0 and USB 3.0, serial and NAND
+flash and a PCIe attached integrated switching engine.
+
+Boards with Hurricane SoCs shall have the following properties:
+
+Required root node property:
+
+BCM53342
+compatible = "brcm,bcm53342", "brcm,hr2";
-- 
2.14.1



[PATCH v2 06/10] ARM: dts: Add Broadcom Hurricane 2 DTS include file

2017-09-28 Thread Florian Fainelli
Describe the Broadcom Hurricane 2 SoC comprised of a Cortex-A9 CPU
complex along with standard iProc peripherals:

* timers
* SPI controller
* NAND controller
* a single AMAC (Ethernet MAC controller)
* dual PCIe controllers

The design is largely similar to existing iProc-based SoCs such as
Northstar Plus.

Acked-by: Jon Mason 
Signed-off-by: Florian Fainelli 
---
 arch/arm/boot/dts/bcm-hr2.dtsi | 368 +
 1 file changed, 368 insertions(+)
 create mode 100644 arch/arm/boot/dts/bcm-hr2.dtsi

diff --git a/arch/arm/boot/dts/bcm-hr2.dtsi b/arch/arm/boot/dts/bcm-hr2.dtsi
new file mode 100644
index ..3f9cedd8011f
--- /dev/null
+++ b/arch/arm/boot/dts/bcm-hr2.dtsi
@@ -0,0 +1,368 @@
+/*
+ *  BSD LICENSE
+ *
+ *  Copyright(c) 2017 Broadcom.  All rights reserved.
+ *
+ *  Redistribution and use in source and binary forms, with or without
+ *  modification, are permitted provided that the following conditions
+ *  are met:
+ *
+ ** Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ ** Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in
+ *  the documentation and/or other materials provided with the
+ *  distribution.
+ ** Neither the name of Broadcom Corporation nor the names of its
+ *  contributors may be used to endorse or promote products derived
+ *  from this software without specific prior written permission.
+ *
+ *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *  OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *  SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *  DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *  THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+
+/ {
+   compatible = "brcm,hr2";
+   model = "Broadcom Hurricane 2 SoC";
+   interrupt-parent = <&gic>;
+   #address-cells = <1>;
+   #size-cells = <1>;
+
+   cpus {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   cpu0: cpu@0 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a9";
+   next-level-cache = <&L2>;
+   reg = <0x0>;
+   };
+   };
+
+   pmu {
+   compatible = "arm,cortex-a9-pmu";
+   interrupts = ;
+   interrupt-affinity = <&cpu0>;
+   };
+
+   mpcore@1900 {
+   compatible = "simple-bus";
+   ranges = <0x 0x1900 0x00023000>;
+   #address-cells = <1>;
+   #size-cells = <1>;
+
+   a9pll: arm_clk@0 {
+   #clock-cells = <0>;
+   compatible = "brcm,hr2-armpll";
+   clocks = <&osc>;
+   reg = <0x0 0x1000>;
+   };
+
+   timer@20200 {
+   compatible = "arm,cortex-a9-global-timer";
+   reg = <0x20200 0x100>;
+   interrupts = ;
+   clocks = <&periph_clk>;
+   };
+
+   twd-timer@20600 {
+   compatible = "arm,cortex-a9-twd-timer";
+   reg = <0x20600 0x20>;
+   interrupts = ;
+   clocks = <&periph_clk>;
+   };
+
+   twd-watchdog@20620 {
+   compatible = "arm,cortex-a9-twd-wdt";
+   reg = <0x20620 0x20>;
+   interrupts = ;
+   clocks = <&periph_clk>;
+   };
+
+   gic: interrupt-controller@21000 {
+   compatible = "arm,cortex-a9-gic";
+   #interrupt-cells = <3>;
+   #address-cells = <0>;
+   interrupt-controller;
+   reg = <0x21000 0x1000>,
+ <0x20100 0x100>;
+   };
+
+   L2: l2-cache@22000 {
+   compatible = "arm,pl310-cache";
+   reg = <0x22000 0x1000>;
+   cache-unified;
+   cache-level = <2>;
+   };
+   };
+
+   clocks {
+   #address-cells = <1>;
+   #size-cells = <

[PATCH 3/3] ARM: dts: stm32h7: correct uart nodes compatible string

2017-09-28 Thread Vikas Manocha
With this change, stm32h743 will use its own uart configuration.
Major difference between stm32f7 & stm32h7 uart configuration is FIFO
availability in stm32h7.

Signed-off-by: Vikas Manocha 
Reviewed-by: Patrice Chotard 
---
 arch/arm/boot/dts/stm32h743.dtsi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/stm32h743.dtsi b/arch/arm/boot/dts/stm32h743.dtsi
index 26de315..fab637b 100644
--- a/arch/arm/boot/dts/stm32h743.dtsi
+++ b/arch/arm/boot/dts/stm32h743.dtsi
@@ -67,7 +67,7 @@
};
 
usart2: serial@40004400 {
-   compatible = "st,stm32f7-uart";
+   compatible = "st,stm32h7-uart";
reg = <0x40004400 0x400>;
interrupts = <38>;
status = "disabled";
@@ -99,7 +99,7 @@
};
 
usart1: serial@40011000 {
-   compatible = "st,stm32f7-uart";
+   compatible = "st,stm32h7-uart";
reg = <0x40011000 0x400>;
interrupts = <37>;
status = "disabled";
-- 
1.9.1



[PATCH 1/3] Arm: dts: stm32: remove extra compatible string for uart

2017-09-28 Thread Vikas Manocha
This patch removes the extra compatibility string "st,stm32-usart" to
avoid confusion, save some time & space.

Signed-off-by: Vikas Manocha 
Reviewed-by: Patrice Chotard 
---
 Documentation/devicetree/bindings/dma/stm32-dma.txt |  2 +-
 Documentation/devicetree/bindings/serial/st,stm32-usart.txt | 10 +++---
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/Documentation/devicetree/bindings/dma/stm32-dma.txt 
b/Documentation/devicetree/bindings/dma/stm32-dma.txt
index 4408af6..6f44df9 100644
--- a/Documentation/devicetree/bindings/dma/stm32-dma.txt
+++ b/Documentation/devicetree/bindings/dma/stm32-dma.txt
@@ -71,7 +71,7 @@ channel: a phandle to the DMA controller plus the following 
four integer cells:
 Example:
 
usart1: serial@40011000 {
-   compatible = "st,stm32-usart", "st,stm32-uart";
+   compatible = "st,stm32-uart";
reg = <0x40011000 0x400>;
interrupts = <37>;
clocks = <&clk_pclk2>;
diff --git a/Documentation/devicetree/bindings/serial/st,stm32-usart.txt 
b/Documentation/devicetree/bindings/serial/st,stm32-usart.txt
index 3657f9f..d150b04 100644
--- a/Documentation/devicetree/bindings/serial/st,stm32-usart.txt
+++ b/Documentation/devicetree/bindings/serial/st,stm32-usart.txt
@@ -2,14 +2,10 @@
 
 Required properties:
 - compatible: can be either:
-  - "st,stm32-usart",
   - "st,stm32-uart",
-  - "st,stm32f7-usart",
   - "st,stm32f7-uart",
-  - "st,stm32h7-usart"
   - "st,stm32h7-uart".
-  depending on whether the device supports synchronous mode
-  and is compatible with stm32(f4), stm32f7 or stm32h7.
+  depending is compatible with stm32(f4), stm32f7 or stm32h7.
 - reg: The address and length of the peripheral registers space
 - interrupts:
   - The interrupt line for the USART instance,
@@ -33,7 +29,7 @@ usart4: serial@40004c00 {
 };
 
 usart2: serial@40004400 {
-   compatible = "st,stm32-usart", "st,stm32-uart";
+   compatible = "st,stm32-uart";
reg = <0x40004400 0x400>;
interrupts = <38>;
clocks = <&clk_pclk1>;
@@ -43,7 +39,7 @@ usart2: serial@40004400 {
 };
 
 usart1: serial@40011000 {
-   compatible = "st,stm32-usart", "st,stm32-uart";
+   compatible = "st,stm32-uart";
reg = <0x40011000 0x400>;
interrupts = <37>;
clocks = <&rcc 0 164>;
-- 
1.9.1



[PATCH v2 05/10] clk: bcm: Add Broadcom Hurricane 2 clock support

2017-09-28 Thread Florian Fainelli
Add support for the Broadcom Hurricane 2 SoC clock controller. We can
re-use the existing iProc clock library since the SoC's architecture is
largely the same as its predecessors. For now, we just initialize the
iProc ARM PLL.

Acked-by: Jon Mason 
Signed-off-by: Florian Fainelli 
---
 drivers/clk/bcm/Kconfig   |  9 +
 drivers/clk/bcm/Makefile  |  1 +
 drivers/clk/bcm/clk-hr2.c | 27 +++
 3 files changed, 37 insertions(+)
 create mode 100644 drivers/clk/bcm/clk-hr2.c

diff --git a/drivers/clk/bcm/Kconfig b/drivers/clk/bcm/Kconfig
index 1d9187df167b..4c4bd85f707c 100644
--- a/drivers/clk/bcm/Kconfig
+++ b/drivers/clk/bcm/Kconfig
@@ -30,6 +30,15 @@ config CLK_BCM_CYGNUS
help
  Enable common clock framework support for the Broadcom Cygnus SoC
 
+config CLK_BCM_HR2
+   bool "Broadcom Hurricane 2 clock support"
+   depends on ARCH_BCM_HR2 || COMPILE_TEST
+   select COMMON_CLK_IPROC
+   default ARCH_BCM_HR2
+   help
+ Enable common clock framework support for the Broadcom Hurricane 2
+ SoC
+
 config CLK_BCM_NSP
bool "Broadcom Northstar/Northstar Plus clock support"
depends on ARCH_BCM_5301X || ARCH_BCM_NSP || COMPILE_TEST
diff --git a/drivers/clk/bcm/Makefile b/drivers/clk/bcm/Makefile
index a0c14fa4aa1e..755144195541 100644
--- a/drivers/clk/bcm/Makefile
+++ b/drivers/clk/bcm/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_ARCH_BCM2835)  += clk-bcm2835.o
 obj-$(CONFIG_ARCH_BCM2835) += clk-bcm2835-aux.o
 obj-$(CONFIG_ARCH_BCM_53573)   += clk-bcm53573-ilp.o
 obj-$(CONFIG_CLK_BCM_CYGNUS)   += clk-cygnus.o
+obj-$(CONFIG_CLK_BCM_HR2)  += clk-hr2.o
 obj-$(CONFIG_CLK_BCM_NSP)  += clk-nsp.o
 obj-$(CONFIG_CLK_BCM_NS2)  += clk-ns2.o
 obj-$(CONFIG_CLK_BCM_SR)   += clk-sr.o
diff --git a/drivers/clk/bcm/clk-hr2.c b/drivers/clk/bcm/clk-hr2.c
new file mode 100644
index ..f7c5b7379475
--- /dev/null
+++ b/drivers/clk/bcm/clk-hr2.c
@@ -0,0 +1,27 @@
+/*
+ * Copyright (C) 2017 Broadcom
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation version 2.
+ *
+ * This program is distributed "as is" WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "clk-iproc.h"
+
+static void __init hr2_armpll_init(struct device_node *node)
+{
+   iproc_armpll_setup(node);
+}
+CLK_OF_DECLARE(hr2_armpll, "brcm,hr2-armpll", hr2_armpll_init);
-- 
2.14.1



[PATCH 0/3] Arm: dts: stm32: remove extra compatible uart string

2017-09-28 Thread Vikas Manocha
stm32 uart driver is using two compatible strings "st,stm32-usart"
& "st,stm32-uart". One can be removed safely to save some space & time.

Vikas Manocha (3):
  Arm: dts: stm32: remove extra compatible string for uart
  Arm: dts: stm32: remove extra compatible string from DT & driver
  ARM: dts: stm32h7: correct uart nodes compatible string

 Documentation/devicetree/bindings/dma/stm32-dma.txt |  2 +-
 Documentation/devicetree/bindings/serial/st,stm32-usart.txt | 10 +++---
 arch/arm/boot/dts/stm32f429.dtsi| 12 ++--
 arch/arm/boot/dts/stm32f746.dtsi| 12 ++--
 arch/arm/boot/dts/stm32h743.dtsi|  4 ++--
 drivers/tty/serial/stm32-usart.c|  3 ---
 6 files changed, 18 insertions(+), 25 deletions(-)

-- 
1.9.1



[PATCH v2 04/10] dt-bindings: Document Broadcom Hurricane 2 clocks

2017-09-28 Thread Florian Fainelli
Add a Device Tree binding document for the Broadcom Hurricane 2 SoC
which is an iProc based system.

Acked-by: Jon Mason 
Signed-off-by: Florian Fainelli 
---
 .../devicetree/bindings/clock/brcm,iproc-clocks.txt| 14 ++
 1 file changed, 14 insertions(+)

diff --git a/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt 
b/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt
index f2c5f0e4a363..f8e4a93466cb 100644
--- a/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt
+++ b/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt
@@ -137,6 +137,20 @@ These clock IDs are defined in:
 ch1_audio  audiopll 2   BCM_CYGNUS_AUDIOPLL_CH1
 ch2_audio  audiopll 3   BCM_CYGNUS_AUDIOPLL_CH2
 
+Hurricane 2
+--
+PLL and leaf clock compatible strings for Hurricane 2 are:
+ "brcm,hr2-armpll"
+
+The following table defines the set of PLL/clock for Hurricane 2:
+
+Clock  Source  Index   ID
+----   -   -
+crystalN/A N/A N/A
+
+armpll crystal N/A N/A
+
+
 Northstar and Northstar Plus
 --
 PLL and leaf clock compatible strings for Northstar and Northstar Plus are:
-- 
2.14.1



[PATCH v2 07/10] ARM: debug: Add Hurricane 2 UART2 debug addresses

2017-09-28 Thread Florian Fainelli
Broadcom Hurricane 2 SoCs typically use their secondary UART for
debug/console, provide a known good location for that.

Acked-by: Jon Mason 
Signed-off-by: Florian Fainelli 
---
 arch/arm/Kconfig.debug | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
index 6dcea8e8e941..0346805fe33c 100644
--- a/arch/arm/Kconfig.debug
+++ b/arch/arm/Kconfig.debug
@@ -169,6 +169,11 @@ choice
depends on ARCH_BCM_5301X || ARCH_BCM_NSP
select DEBUG_UART_8250
 
+   config DEBUG_BCM_HR2
+   bool "Kernel low-level debugging on Hurricane 2 UART2"
+   depends on ARCH_BCM_HR2
+   select DEBUG_UART_8250
+
config DEBUG_BCM_KONA_UART
bool "Kernel low-level debugging messages via BCM KONA UART"
depends on ARCH_BCM_MOBILE
@@ -1508,6 +1513,7 @@ config DEBUG_UART_PHYS
default 0x11009000 if DEBUG_MT8135_UART3
default 0x1600 if DEBUG_INTEGRATOR
default 0x18000300 if DEBUG_BCM_5301X
+   default 0x18000400 if DEBUG_BCM_HR2
default 0x1801 if DEBUG_SIRFATLAS7_UART0
default 0x1802 if DEBUG_SIRFATLAS7_UART1
default 0x1c09 if DEBUG_VEXPRESS_UART0_RS1
@@ -1623,6 +1629,7 @@ config DEBUG_UART_VIRT
default 0xf01fb000 if DEBUG_NOMADIK_UART
default 0xf0201000 if DEBUG_BCM2835 || DEBUG_BCM2836
default 0xf1000300 if DEBUG_BCM_5301X
+   default 0xf1000400 if DEBUG_BCM_HR2
default 0xf1002000 if DEBUG_MT8127_UART0
default 0xf1006000 if DEBUG_MT6589_UART0
default 0xf1009000 if DEBUG_MT8135_UART3
@@ -1728,7 +1735,8 @@ config DEBUG_UART_8250_SHIFT
int "Register offset shift for the 8250 debug UART"
depends on DEBUG_LL_UART_8250 || DEBUG_UART_8250
default 0 if DEBUG_FOOTBRIDGE_COM1 || ARCH_IOP32X || DEBUG_BCM_5301X || 
\
-   DEBUG_OMAP7XXUART1 || DEBUG_OMAP7XXUART2 || DEBUG_OMAP7XXUART3
+   DEBUG_BCM_HR2 || DEBUG_OMAP7XXUART1 || DEBUG_OMAP7XXUART2 || \
+   DEBUG_OMAP7XXUART3
default 2
 
 config DEBUG_UART_8250_WORD
-- 
2.14.1



  1   2   3   4   5   6   7   8   >