Re: [PATCH v3 1/2] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest

2016-07-06 Thread Wanpeng Li
2016-07-07 14:48 GMT+08:00 Haozhong Zhang :
> On 07/07/16 11:46, Wanpeng Li wrote:
>> From: Wanpeng Li 
>>
>> BUG: unable to handle kernel NULL pointer dereference at   (null)
>> IP: [<  (null)>]   (null)
>> PGD 0
>> Oops: 0010 [#1] SMP
>> Call Trace:
>>  ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
>>  handle_preemption_timer+0xe/0x20 [kvm_intel]
>>  vmx_handle_exit+0x169/0x15a0 [kvm_intel]
>>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
>>  kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
>>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
>>  ? vcpu_load+0x1c/0x60 [kvm]
>>  ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
>>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
>>  do_vfs_ioctl+0x96/0x6a0
>>  ? __fget_light+0x2a/0x90
>>  SyS_ioctl+0x79/0x90
>>  do_syscall_64+0x68/0x180
>>  entry_SYSCALL64_slow_path+0x25/0x25
>> Code:  Bad RIP value.
>> RIP  [<  (null)>]   (null)
>>  RSP 
>> CR2: 
>> ---[ end trace 9c70c48b1a2bc66e ]---
>>
>> This can be reproduced readily by preemption timer enabled on L0 and disabled
>> on L1.
>>
>> Preemption timer for nested VMX is emulated by hrtimer which is started on L2
>> entry, stopped on L2 exit and evaluated via the check_nested_events hook. 
>> However,
>> nested_vmx_exit_handled is always return true for preemption timer vmexit, 
>> then
>> the L1 preemption timer vmexit is captured and be treated as a L2 preemption
>> timer vmexit, incurr a nested vmexit dereference NULL pointer.
>>
>> This patch fix it by depending on check_nested_events to capture L2 
>> preemption
>> timer(emulated hrtimer) expire and nested vmexit.
>>
>> Tested-by: Haozhong Zhang 
>> Cc: Paolo Bonzini 
>> Cc: Radim Krčmář 
>> Cc: Yunhong Jiang 
>> Cc: Jan Kiszka 
>> Cc: Haozhong Zhang 
>> Signed-off-by: Wanpeng Li 
>> ---
>> v2 -> v3:
>>  * update patch subject
>> v1 -> v2:
>>  * fix typo in patch description
>>
>>  arch/x86/kvm/vmx.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 85e2f0a..29c16a8 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -8041,6 +8041,8 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu 
>> *vcpu)
>>   return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
>>   case EXIT_REASON_PCOMMIT:
>>   return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
>> + case EXIT_REASON_PREEMPTION_TIMER:
>> + return false;
>
> If patch 2 can avoid accidentally enabling preemption timer in vmcs02,
> will this one still be needed?

After complete "L1 TSC deadline timer to trigger while L2 is running",
L0's preemption timer fire when L2 is running can result in
(is_guest_mode(vcpu) && nested_vmx_exit_handled(vcpu)) be true, right?

Regards,
Wanpeng Li


Re: [PATCH V2 02/10] mailbox: tegra-hsp: Add HSP(Hardware Synchronization Primitives) driver

2016-07-06 Thread Joseph Lo

On 07/07/2016 12:50 AM, Stephen Warren wrote:

On 07/06/2016 03:06 AM, Joseph Lo wrote:

On 07/06/2016 03:05 PM, Alexandre Courbot wrote:

On Tue, Jul 5, 2016 at 6:04 PM, Joseph Lo  wrote:

The Tegra HSP mailbox driver implements the signaling doorbell-based
interprocessor communication (IPC) for remote processors currently. The
HSP HW modules support some different features for that, which are
shared mailboxes, shared semaphores, arbitrated semaphores, and
doorbells. And there are multiple HSP HW instances on the chip. So the
driver is extendable to support more features for different IPC
requirement.

The driver of remote processor can use it as a mailbox client and deal
with the IPC protocol to synchronize the data communications.



diff --git a/drivers/mailbox/tegra-hsp.c b/drivers/mailbox/tegra-hsp.c



+static irqreturn_t hsp_db_irq(int irq, void *p)
+{
+   struct tegra_hsp_mbox *hsp_mbox = p;
+   ulong val;
+   int master_id;
+
+   val = (ulong)hsp_readl(hsp_mbox->db_base[HSP_DB_CCPLEX],
+  HSP_DB_REG_PENDING);
+   hsp_writel(hsp_mbox->db_base[HSP_DB_CCPLEX],
HSP_DB_REG_PENDING, val);
+
+   spin_lock(&hsp_mbox->lock);
+   for_each_set_bit(master_id, &val, MAX_NUM_HSP_CHAN) {
+   struct mbox_chan *chan;
+   struct tegra_hsp_mbox_chan *mchan;
+   int i;
+
+   for (i = 0; i < MAX_NUM_HSP_CHAN; i++) {


I wonder if this could not be optimized. You are doing a double loop
on MAX_NUM_HSP_CHAN to look for an identical master_id. Since it seems
like the same master_id cannot be used twice (considering that the
inner loop only processes the first match), couldn't you just select
the free channel in of_hsp_mbox_xlate() by doing
&mbox->chans[master_id] (and returning an error if it is already
used), then simply getting chan as &hsp_mbox->mbox->chans[master_id]
instead of having the inner loop below? That would remove the need for
the second loop.


That was exactly what I did in the V1, which only supported one HSP
sub-module per HSP HW block. So we can just use the master_id as the
mbox channel ID.

Meanwhile, the V2 is purposed to support multiple HSP sub-modules to be
running on the same HSP HW block. The "ID" between different modules
could be conflict. So I dropped the mechanism that used the master_id as
the mbox channel ID.


I haven't looked at the code in this patch since I'm mainly concerned
about the DT bindings. However, I will say that nothing in the change to
the mailbox specifier in DT should have required /any/ changes to the
code, except to add a single check to validate that the "mailbox type"
encoded into the top 16 bits of the mailbox ID were 0, and hence
represented a doorbell rather than anything else. Any enhancements to
support other mailbox types could have happened later, and I doubt would
require anything dynamic even then.


Yes, I only add the code for that change. Maybe some glue code for the 
extend-ability to support more HSP modules in the future.





+static int tegra_hsp_db_init(struct tegra_hsp_mbox *hsp_mbox,
+struct mbox_chan *mchan, int master_id)
+{
+   struct platform_device *pdev =
to_platform_device(hsp_mbox->mbox->dev);
+   struct tegra_hsp_mbox_chan *hsp_mbox_chan;
+   int ret;
+
+   if (!hsp_mbox->db_irq) {
+   int i;
+
+   hsp_mbox->db_irq = platform_get_irq_byname(pdev,
"doorbell");


Getting the IRQ sounds more like a job for probe() - I don't see the
benefit of lazy-doing it?


We only need the IRQ when the client is requesting the DB service. For
other HSP sub-modules, they are using different IRQ. So I didn't do that
at probe time.


All resources provided by other devices/drivers must be acquired at
probe time, since that's the only time it's possible to defer probe if
the provider of the resource is not available.

If you don't follow that rule, what happens is:

1) This driver probes.

2) Some other driver calls tegra_hsp_db_init(), and it fails since the
provider of the IRQ is not yet available. This likely ends up returning
something other than -EPROBE_DEFER since the HSP driver was found
successfully (thus there is no deferred probe situation as far as the
mailbox core is concerned), it's just that the mailbox channel
lookup/init/... failed.

3) The other driver's probe() fails due to this, but since the error
wasn't a probe deferral, the other driver's probe() is never retried.


Agree, will fix this.

Thanks,
-Joseph


Re: [PATCH v3 1/2] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest

2016-07-06 Thread Haozhong Zhang
On 07/07/16 11:46, Wanpeng Li wrote:
> From: Wanpeng Li 
> 
> BUG: unable to handle kernel NULL pointer dereference at   (null)
> IP: [<  (null)>]   (null)
> PGD 0
> Oops: 0010 [#1] SMP
> Call Trace:
>  ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
>  handle_preemption_timer+0xe/0x20 [kvm_intel]
>  vmx_handle_exit+0x169/0x15a0 [kvm_intel]
>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
>  kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
>  ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
>  ? vcpu_load+0x1c/0x60 [kvm]
>  ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
>  kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
>  do_vfs_ioctl+0x96/0x6a0
>  ? __fget_light+0x2a/0x90
>  SyS_ioctl+0x79/0x90
>  do_syscall_64+0x68/0x180
>  entry_SYSCALL64_slow_path+0x25/0x25
> Code:  Bad RIP value.
> RIP  [<  (null)>]   (null)
>  RSP 
> CR2: 
> ---[ end trace 9c70c48b1a2bc66e ]---
> 
> This can be reproduced readily by preemption timer enabled on L0 and disabled
> on L1.
> 
> Preemption timer for nested VMX is emulated by hrtimer which is started on L2
> entry, stopped on L2 exit and evaluated via the check_nested_events hook. 
> However,
> nested_vmx_exit_handled is always return true for preemption timer vmexit, 
> then
> the L1 preemption timer vmexit is captured and be treated as a L2 preemption
> timer vmexit, incurr a nested vmexit dereference NULL pointer.
> 
> This patch fix it by depending on check_nested_events to capture L2 preemption
> timer(emulated hrtimer) expire and nested vmexit.
> 
> Tested-by: Haozhong Zhang 
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Cc: Yunhong Jiang 
> Cc: Jan Kiszka 
> Cc: Haozhong Zhang 
> Signed-off-by: Wanpeng Li 
> ---
> v2 -> v3:
>  * update patch subject
> v1 -> v2:
>  * fix typo in patch description
> 
>  arch/x86/kvm/vmx.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 85e2f0a..29c16a8 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -8041,6 +8041,8 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu 
> *vcpu)
>   return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
>   case EXIT_REASON_PCOMMIT:
>   return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
> + case EXIT_REASON_PREEMPTION_TIMER:
> + return false;

If patch 2 can avoid accidentally enabling preemption timer in vmcs02,
will this one still be needed?

Haozhong


>   default:
>   return true;
>   }
> -- 
> 1.9.1
> 


[PATCH v4 2/5] ARM: dts: vfxxx: Add On-Chip ROM node for Vybrid

2016-07-06 Thread Sanchayan Maity
Add a device tree node for the On-Chip ROM on Vybrid.

Signed-off-by: Sanchayan Maity 
---
 arch/arm/boot/dts/vfxxx.dtsi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm/boot/dts/vfxxx.dtsi b/arch/arm/boot/dts/vfxxx.dtsi
index 0e34d44..6c5222e 100644
--- a/arch/arm/boot/dts/vfxxx.dtsi
+++ b/arch/arm/boot/dts/vfxxx.dtsi
@@ -91,6 +91,11 @@
interrupt-parent = <&mscm_ir>;
ranges;
 
+   ocrom: ocrom@ {
+   compatible = "fsl,vf610-ocrom", "syscon";
+   reg = <0x 0x18000>;
+   };
+
aips0: aips-bus@4000 {
compatible = "fsl,aips-bus", "simple-bus";
#address-cells = <1>;
-- 
2.9.0



[PATCH v4 4/5] soc: Add SoC driver for Freescale Vybrid platform

2016-07-06 Thread Sanchayan Maity
This adds a SoC driver to be used by Freescale Vybrid SoC's.
Driver utilises syscon and nvmem consumer API's to get the
various register values needed and expose the SoC specific
properties via sysfs.

A sample output from Colibri Vybrid VF61 is below:

root@colibri-vf:~# cd /sys/bus/soc/devices/soc0
root@colibri-vf:/sys/bus/soc/devices/soc0# ls
family machinepower  revision   soc_id subsystem  uevent
root@colibri-vf:/sys/bus/soc/devices/soc0# cat family
Freescale Vybrid VF610
root@colibri-vf:/sys/bus/soc/devices/soc0# cat machine
Freescale Vybrid
root@colibri-vf:/sys/bus/soc/devices/soc0# cat revision
0013
root@colibri-vf:/sys/bus/soc/devices/soc0# cat soc_id
df6472a6130f29d4

Signed-off-by: Sanchayan Maity 
---
 drivers/soc/Kconfig |   1 +
 drivers/soc/fsl/Kconfig |  10 +++
 drivers/soc/fsl/Makefile|   1 +
 drivers/soc/fsl/soc-vf610.c | 212 
 4 files changed, 224 insertions(+)
 create mode 100644 drivers/soc/fsl/Kconfig
 create mode 100644 drivers/soc/fsl/soc-vf610.c

diff --git a/drivers/soc/Kconfig b/drivers/soc/Kconfig
index cb58ef0..4410eb7 100644
--- a/drivers/soc/Kconfig
+++ b/drivers/soc/Kconfig
@@ -2,6 +2,7 @@ menu "SOC (System On Chip) specific Drivers"
 
 source "drivers/soc/bcm/Kconfig"
 source "drivers/soc/brcmstb/Kconfig"
+source "drivers/soc/fsl/Kconfig"
 source "drivers/soc/fsl/qe/Kconfig"
 source "drivers/soc/mediatek/Kconfig"
 source "drivers/soc/qcom/Kconfig"
diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig
new file mode 100644
index 000..746b5e3
--- /dev/null
+++ b/drivers/soc/fsl/Kconfig
@@ -0,0 +1,10 @@
+#
+# Freescale SoC drivers
+
+config SOC_BUS_VF610
+   bool "SoC bus device for the Freescale Vybrid platform"
+   depends on NVMEM && NVMEM_VF610_OCOTP && OF
+   select SOC_BUS
+   help
+Include support for the SoC bus on the Freescale Vybrid platform
+providing sysfs information about the module variant.
diff --git a/drivers/soc/fsl/Makefile b/drivers/soc/fsl/Makefile
index 203307f..afaf092 100644
--- a/drivers/soc/fsl/Makefile
+++ b/drivers/soc/fsl/Makefile
@@ -2,5 +2,6 @@
 # Makefile for the Linux Kernel SOC fsl specific device drivers
 #
 
+obj-$(CONFIG_SOC_VF610)+= soc-vf610.o
 obj-$(CONFIG_QUICC_ENGINE) += qe/
 obj-$(CONFIG_CPM)  += qe/
diff --git a/drivers/soc/fsl/soc-vf610.c b/drivers/soc/fsl/soc-vf610.c
new file mode 100644
index 000..23900ea
--- /dev/null
+++ b/drivers/soc/fsl/soc-vf610.c
@@ -0,0 +1,212 @@
+/*
+ * Copyright (C) 2016 Toradex AG.
+ *
+ * Author: Sanchayan Maity 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define MSCM_CPxCOUNT_OFFSET   0x2C
+#define MSCM_CPxCFG1_OFFSET0x14
+#define ROM_REVISION_OFFSET0x80
+
+struct vf610_soc {
+   struct device *dev;
+   struct soc_device_attribute *soc_dev_attr;
+   struct soc_device *soc_dev;
+   struct nvmem_cell *ocotp_cfg0;
+   struct nvmem_cell *ocotp_cfg1;
+};
+
+static int vf610_soc_probe(struct platform_device *pdev)
+{
+   struct vf610_soc *info;
+   struct device *dev = &pdev->dev;
+   struct device_node *soc_node;
+   struct device_node *cfg0_node, *cfg1_node;
+   struct regmap *rom_regmap, *mscm_regmap;
+   char soc_type[] = "xx0";
+   size_t id1_len, id2_len;
+   u32 cpucount, l2size, rom_rev;
+   u8 *socid1, *socid2;
+   int ret;
+
+   info = devm_kzalloc(dev, sizeof(struct vf610_soc), GFP_KERNEL);
+   if (!info)
+   return -ENOMEM;
+   info->dev = dev;
+
+   soc_node = of_find_node_by_path("/soc");
+   if (!soc_node)
+   return -ENODEV;
+
+   cfg0_node = of_find_node_by_name(soc_node, "cfg0");
+   if (!cfg0_node) {
+   ret = -ENODEV;
+   goto out_cfg0_node;
+   }
+
+   cfg1_node = of_find_node_by_name(soc_node, "cfg1");
+   if (!cfg1_node) {
+   ret = -ENODEV;
+   goto out_cfg1_node;
+   }
+
+   info->ocotp_cfg0 = of_nvmem_cell_get_direct(cfg0_node);
+   if (IS_ERR(info->ocotp_cfg0)) {
+   ret = PTR_ERR(info->ocotp_cfg0);
+   goto out_ocotp_cfg0;
+   }
+
+   info->ocotp_cfg1 = of_nvmem_cell_get_direct(cfg1_node);
+   if (IS_ERR(info->ocotp_cfg1)) {
+   ret = PTR_ERR(info->ocotp_cfg1);
+   goto out_ocotp_cfg1;
+   }
+
+   socid1 = nvmem

[PATCH v4 5/5] ARM: dts: vfxxx: Add a compatible binding for Vybrid SoC bus driver

2016-07-06 Thread Sanchayan Maity
Add a compatible binding to the main soc node required by the
Vybrid SoC bus driver to bind on.

Signed-off-by: Sanchayan Maity 
---
 arch/arm/boot/dts/vfxxx.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/vfxxx.dtsi b/arch/arm/boot/dts/vfxxx.dtsi
index 6c5222e..c68bc72 100644
--- a/arch/arm/boot/dts/vfxxx.dtsi
+++ b/arch/arm/boot/dts/vfxxx.dtsi
@@ -87,7 +87,7 @@
soc {
#address-cells = <1>;
#size-cells = <1>;
-   compatible = "simple-bus";
+   compatible = "fsl,vf610-soc", "simple-bus";
interrupt-parent = <&mscm_ir>;
ranges;
 
-- 
2.9.0



[PATCH v4 3/5] nvmem: core: Add consumer API to get nvmem cell from node

2016-07-06 Thread Sanchayan Maity
From: Stefan Agner 

The existing NVMEM consumer API's do not allow to get a
NVMEM cell directly given a device tree node. This patch
adds a function to provide this functionality.

Assuming the nvmem cell id name is known, this can be used
as follows

struct device_node *cell_np;
struct nvmem_cell *foo_cell;

cell_np = of_find_node_by_name(parent, "foo");
foo_cell = of_nvmem_cell_get_direct(cell_np);

Parent node can also be the of_node of the main SoC device
node.

Signed-off-by: Sanchayan Maity 
---
 drivers/nvmem/core.c   | 44 +-
 include/linux/nvmem-consumer.h |  1 +
 2 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c
index 965911d..470abee 100644
--- a/drivers/nvmem/core.c
+++ b/drivers/nvmem/core.c
@@ -743,29 +743,21 @@ static struct nvmem_cell *nvmem_cell_get_from_list(const 
char *cell_id)
 
 #if IS_ENABLED(CONFIG_NVMEM) && IS_ENABLED(CONFIG_OF)
 /**
- * of_nvmem_cell_get() - Get a nvmem cell from given device node and cell id
+ * of_nvmem_cell_get_direct() - Get a nvmem cell from given device node
  *
- * @dev node: Device tree node that uses the nvmem cell
- * @id: nvmem cell name from nvmem-cell-names property.
+ * @dev node: Device tree node that uses nvmem cell
  *
  * Return: Will be an ERR_PTR() on error or a valid pointer
  * to a struct nvmem_cell.  The nvmem_cell will be freed by the
  * nvmem_cell_put().
  */
-struct nvmem_cell *of_nvmem_cell_get(struct device_node *np,
-   const char *name)
+struct nvmem_cell *of_nvmem_cell_get_direct(struct device_node *cell_np)
 {
-   struct device_node *cell_np, *nvmem_np;
+   struct device_node *nvmem_np;
struct nvmem_cell *cell;
struct nvmem_device *nvmem;
const __be32 *addr;
-   int rval, len, index;
-
-   index = of_property_match_string(np, "nvmem-cell-names", name);
-
-   cell_np = of_parse_phandle(np, "nvmem-cells", index);
-   if (!cell_np)
-   return ERR_PTR(-EINVAL);
+   int rval, len;
 
nvmem_np = of_get_next_parent(cell_np);
if (!nvmem_np)
@@ -824,6 +816,32 @@ err_mem:
 
return ERR_PTR(rval);
 }
+EXPORT_SYMBOL_GPL(of_nvmem_cell_get_direct);
+
+/**
+ * of_nvmem_cell_get() - Get a nvmem cell from given device node and cell id
+ *
+ * @dev node: Device tree node that uses the nvmem cell
+ * @id: nvmem cell name from nvmem-cell-names property.
+ *
+ * Return: Will be an ERR_PTR() on error or a valid pointer
+ * to a struct nvmem_cell.  The nvmem_cell will be freed by the
+ * nvmem_cell_put().
+ */
+struct nvmem_cell *of_nvmem_cell_get(struct device_node *np,
+   const char *name)
+{
+   struct device_node *cell_np;
+   int index;
+
+   index = of_property_match_string(np, "nvmem-cell-names", name);
+
+   cell_np = of_parse_phandle(np, "nvmem-cells", index);
+   if (!cell_np)
+   return ERR_PTR(-EINVAL);
+
+   return of_nvmem_cell_get_direct(cell_np);
+}
 EXPORT_SYMBOL_GPL(of_nvmem_cell_get);
 #endif
 
diff --git a/include/linux/nvmem-consumer.h b/include/linux/nvmem-consumer.h
index 9bb77d3..bf879fc 100644
--- a/include/linux/nvmem-consumer.h
+++ b/include/linux/nvmem-consumer.h
@@ -136,6 +136,7 @@ static inline int nvmem_device_write(struct nvmem_device 
*nvmem,
 #endif /* CONFIG_NVMEM */
 
 #if IS_ENABLED(CONFIG_NVMEM) && IS_ENABLED(CONFIG_OF)
+struct nvmem_cell *of_nvmem_cell_get_direct(struct device_node *cell_np);
 struct nvmem_cell *of_nvmem_cell_get(struct device_node *np,
 const char *name);
 struct nvmem_device *of_nvmem_device_get(struct device_node *np,
-- 
2.9.0



[PATCH v4 0/5] Implement SoC driver for Vybrid

2016-07-06 Thread Sanchayan Maity
Hello,

This fourth patch series is rebased on top of shawn's for-next branch
and tested on Colibri Vybrid VF50 and VF61 modules.

This patchset implements SoC bus support for Freescale Vybrid platform,
implementing the following
https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-devices-soc

This a reworked version of an older patchset series posted in June 2015
which was at v5 then [1]. Since the NVMEM framework was then getting
introduced, we decided that first a NVMEM driver for OCOTP peripheral
being in place would be better.

Compared to the older revisions, this driver now relies on NVMEM
consumer API using the NVMEM based vf610_ocotp driver which has
already been in mainline for a while now.

One point on which we were not sure here is whether we really should
introduce a new Kconfig symbol as being introduced here. While we
could just enable it when SOC_VF610 is selected, this however would
introduce circular dependencies.

Feedback is most welcome.

@Rob Herring
Does this patchset address the concerns you had? The only change
to the device tree is now for the compatible property.

@Srinivas
Is this new NVMEM consumer API acceptable? Would you recommend a
different approach?

Changes since v3:
1. Use just a compatible node at the SoC node and do not
use a separate node for the binding
2. Use syscon regmap lookup for getting information from
MSCM and OCROM nodes.
3. Introduce a NVMEM consumer API for getting a NVMEM cell
given a device node containing that cell.
4. Do not introduce any node at the SoC level.

Changes since v2:
1. Remove syscon_regmap_read_from_offset function and use the
available syscon functions
2. Remove fsl,vf610-soc-bus and related bindings at SoC node
level and introduce a fsl,vf610-soc node which is used by the
driver to bind and has all the required phandles plus the NVMEM
consumer handles.
3. Fix memory leak. of_node_put was not called for returned node
of of_parse_phandle and memory allocated by nvmem_cell_read was
not freed explicitly in return error paths.

Changes since v1:
Add device tree binding documentation.

2016: v3 patchset
https://lkml.org/lkml/2016/5/20/200

2016: v2 patchset
https://lkml.org/lkml/2016/5/2/69

2016: v1 patchset
https://lkml.org/lkml/2016/3/11/132

[1] Older v5:
http://lkml.iu.edu/hypermail/linux/kernel/1506.0/03787.html
Even earlier versions:
Version 4 of the patchset can be found here
https://lkml.org/lkml/2015/5/26/199
Version 3 of the patchset can be found here
http://www.spinics.net/lists/arm-kernel/msg420847.html
Version 2 of the patchset can be found here
http://www.spinics.net/lists/devicetree/msg80654.html
Version 1 of the patchset can be found here
http://www.spinics.net/lists/devicetree/msg80257.html
The RFC version can be found here
https://lkml.org/lkml/2015/5/11/13

Regards,
Sanchayan.

Sanchayan Maity (4):
  ARM: dts: vfxxx: Add device tree node for OCOTP
  ARM: dts: vfxxx: Add On-Chip ROM node for Vybrid
  soc: Add SoC driver for Freescale Vybrid platform
  ARM: dts: vfxxx: Add a compatible binding for Vybrid SoC bus driver

Stefan Agner (1):
  nvmem: core: Add consumer API to get nvmem cell from node

 arch/arm/boot/dts/vfxxx.dtsi   |  23 -
 drivers/nvmem/core.c   |  44 ++---
 drivers/soc/Kconfig|   1 +
 drivers/soc/fsl/Kconfig|  10 ++
 drivers/soc/fsl/Makefile   |   1 +
 drivers/soc/fsl/soc-vf610.c| 212 +
 include/linux/nvmem-consumer.h |   1 +
 7 files changed, 278 insertions(+), 14 deletions(-)
 create mode 100644 drivers/soc/fsl/Kconfig
 create mode 100644 drivers/soc/fsl/soc-vf610.c

-- 
2.9.0



Re: [PATCH 31/31] mm, vmstat: Remove zone and node double accounting by approximating retries

2016-07-06 Thread Minchan Kim
On Wed, Jul 06, 2016 at 09:58:50AM +0100, Mel Gorman wrote:
> On Wed, Jul 06, 2016 at 09:02:52AM +0900, Minchan Kim wrote:
> > On Fri, Jul 01, 2016 at 09:01:39PM +0100, Mel Gorman wrote:
> > > The number of LRU pages, dirty pages and writeback pages must be accounted
> > > for on both zones and nodes because of the reclaim retry logic, compaction
> > > retry logic and highmem calculations all depending on per-zone stats.
> > > 
> > > The retry logic is only critical for allocations that can use any zones.
> > 
> > Sorry, I cannot follow this assertion.
> > Could you explain?
> > 
> 
> The patch has been reworked since and I tried clarifying the changelog.
> Does this help?

Thanks. It is surely better than old but not clear to me, yet.

> 
> --- 8<
> mm, vmstat: remove zone and node double accounting by approximating retries
> 
> The number of LRU pages, dirty pages and writeback pages must be accounted
> for on both zones and nodes because of the reclaim retry logic, compaction
> retry logic and highmem calculations all depending on per-zone stats.
> 
> Many lowmem allocations are immune from OOM kill due to a check in
> __alloc_pages_may_oom for (ac->high_zoneidx < ZONE_NORMAL) since commit
> 03668b3ceb0c ("oom: avoid oom killer for lowmem allocations"). The exception
> is costly high-order allocations or allocations that cannot fail. If the
> __alloc_pages_may_oom avoids OOM-kill for low-order lowmem allocations
> then a check in __alloc_pages_slowpath will always retry.

If I read code rightly, __alloc_pages_slowpath will never retry in that case
because __alloc_pages_may_oom will return 0's did_some_progress vaule
so it would go to warn_alloc_failed unless direct compaction is successful.

> 
> Hence this patch will always retry reclaim for zone-constrained allocations
> in should_reclaim_retry.
> 
> As there is no guarantee enough memory can ever be freed to satisfy
> compaction, this patch avoids retrying compaction for zone-contrained
> allocations.o
> 
> In combination, that means that the per-node stats can be used when deciding
> whether to continue reclaim using a rough approximation.  While it is
> possible this will make the wrong decision on occasion, it will not infinite
> loop as the number of reclaim attempts is capped by MAX_RECLAIM_RETRIES.
> 
> The final step is calculating the number of dirtyable highmem pages. As
> those calculations only care about the global count of file pages in
> highmem. This patch uses a global counter used instead of per-zone stats
> as it is sufficient.
> 
> In combination, this allows the per-zone LRU and dirty state counters to
> be removed.
> 
> Suggested by: Michal Hocko 
> Signed-off-by: Mel Gorman 
> Acked-by: Hillf Danton 
> 
> diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
> index 9aadcc781857..c68680aac044 100644
> --- a/include/linux/mm_inline.h
> +++ b/include/linux/mm_inline.h
> @@ -4,6 +4,22 @@
>  #include 
>  #include 
>  
> +#ifdef CONFIG_HIGHMEM
> +extern unsigned long highmem_file_pages;
> +
> +static inline void acct_highmem_file_pages(int zid, enum lru_list lru,
> + int nr_pages)
> +{
> + if (is_highmem_idx(zid) && is_file_lru(lru))
> + highmem_file_pages += nr_pages;
> +}
> +#else
> +static inline void acct_highmem_file_pages(int zid, enum lru_list lru,
> + int nr_pages)
> +{
> +}
> +#endif
> +
>  /**
>   * page_is_file_cache - should the page be on a file LRU or anon LRU?
>   * @page: the page to test
> @@ -29,9 +45,7 @@ static __always_inline void __update_lru_size(struct lruvec 
> *lruvec,
>   struct pglist_data *pgdat = lruvec_pgdat(lruvec);
>  
>   __mod_node_page_state(pgdat, NR_LRU_BASE + lru, nr_pages);
> - __mod_zone_page_state(&pgdat->node_zones[zid],
> - NR_ZONE_LRU_BASE + !!is_file_lru(lru),
> - nr_pages);
> + acct_highmem_file_pages(zid, lru, nr_pages);
>  }
>  
>  static __always_inline void update_lru_size(struct lruvec *lruvec,
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index bd33e6f1bed0..a3b7f45aac56 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -110,10 +110,6 @@ struct zone_padding {
>  enum zone_stat_item {
>   /* First 128 byte cacheline (assuming 64 bit words) */
>   NR_FREE_PAGES,
> - NR_ZONE_LRU_BASE, /* Used only for compaction and reclaim retry */
> - NR_ZONE_LRU_ANON = NR_ZONE_LRU_BASE,
> - NR_ZONE_LRU_FILE,
> - NR_ZONE_WRITE_PENDING,  /* Count of dirty, writeback and unstable pages 
> */
>   NR_MLOCK,   /* mlock()ed pages found and moved off LRU */
>   NR_SLAB_RECLAIMABLE,
>   NR_SLAB_UNRECLAIMABLE,
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index b17cc4830fa6..cc753c639e3d 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -307,7 +307,6 @@ extern void lru_cache_add_active_or_unevictable(struct 
> 

[PATCH v4 1/5] ARM: dts: vfxxx: Add device tree node for OCOTP

2016-07-06 Thread Sanchayan Maity
Add device tree node for the OCOTP peripheral on Vybrid.

Signed-off-by: Sanchayan Maity 
---
 arch/arm/boot/dts/vfxxx.dtsi | 16 
 1 file changed, 16 insertions(+)

diff --git a/arch/arm/boot/dts/vfxxx.dtsi b/arch/arm/boot/dts/vfxxx.dtsi
index 2c13ec6..0e34d44 100644
--- a/arch/arm/boot/dts/vfxxx.dtsi
+++ b/arch/arm/boot/dts/vfxxx.dtsi
@@ -520,6 +520,22 @@
status = "disabled";
};
 
+   ocotp@400a5000 {
+   compatible = "fsl,vf610-ocotp";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   reg = <0x400a5000 0xCF0>;
+   clocks = <&clks VF610_CLK_OCOTP>;
+
+   ocotp_cfg0: cfg0@410 {
+   reg = <0x410 0x4>;
+   };
+
+   ocotp_cfg1: cfg1@420 {
+   reg = <0x420 0x4>;
+   };
+   };
+
snvs0: snvs@400a7000 {
compatible = "fsl,sec-v4.0-mon", "syscon", 
"simple-mfd";
reg = <0x400a7000 0x2000>;
-- 
2.9.0



Re: [PATCH] thermal: hisilicon: Add dependency on the clock driver to allow frequency scaling

2016-07-06 Thread Amit Kucheria
On Sun, Jun 26, 2016 at 10:02 PM, Amit Kucheria
 wrote:
> On Mon, Jun 20, 2016 at 6:46 PM, Leo Yan  wrote:
>> Hi Amit,
>>
>> On Mon, Jun 20, 2016 at 05:46:36PM +0530, Amit Kucheria wrote:
>>> The Hisilicon clock stub driver is needed to allow the thermal drivers to
>>> actually scale the frequency. Make it an automatic dependency.
>>>
>>> Signed-off-by: Amit Kucheria 
>>> ---
>>>  drivers/thermal/Kconfig | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
>>> index 22ae1f7..4e843f7 100644
>>> --- a/drivers/thermal/Kconfig
>>> +++ b/drivers/thermal/Kconfig
>>> @@ -178,6 +178,7 @@ config THERMAL_EMULATION
>>>  config HISI_THERMAL
>>>   tristate "Hisilicon thermal driver"
>>>   depends on (ARCH_HISI && CPU_THERMAL && OF) || COMPILE_TEST
>>> + select STUB_CLK_HI6220
>>
>> Acked-by: Leo Yan 
>
> Thanks Leo. Eduardo, will you take this through your tree or should
> Wei include this in his tree?

Ping!


Re: [PATCH V2 02/10] mailbox: tegra-hsp: Add HSP(Hardware Synchronization Primitives) driver

2016-07-06 Thread Joseph Lo

On 07/06/2016 08:23 PM, Alexandre Courbot wrote:

On Wed, Jul 6, 2016 at 6:06 PM, Joseph Lo  wrote:

On 07/06/2016 03:05 PM, Alexandre Courbot wrote:


On Tue, Jul 5, 2016 at 6:04 PM, Joseph Lo  wrote:


The Tegra HSP mailbox driver implements the signaling doorbell-based
interprocessor communication (IPC) for remote processors currently. The
HSP HW modules support some different features for that, which are
shared mailboxes, shared semaphores, arbitrated semaphores, and
doorbells. And there are multiple HSP HW instances on the chip. So the
driver is extendable to support more features for different IPC
requirement.

The driver of remote processor can use it as a mailbox client and deal
with the IPC protocol to synchronize the data communications.

Signed-off-by: Joseph Lo 
---
Changes in V2:
- Update the driver to support the binding changes in V2
- it's extendable to support multiple HSP sub-modules on the same HSP HW
block
now.
---
   drivers/mailbox/Kconfig |   9 +
   drivers/mailbox/Makefile|   2 +
   drivers/mailbox/tegra-hsp.c | 418

   3 files changed, 429 insertions(+)
   create mode 100644 drivers/mailbox/tegra-hsp.c

diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig
index 5305923752d2..fe584cb54720 100644
--- a/drivers/mailbox/Kconfig
+++ b/drivers/mailbox/Kconfig
@@ -114,6 +114,15 @@ config MAILBOX_TEST
Test client to help with testing new Controller driver
implementations.

+config TEGRA_HSP_MBOX
+   bool "Tegra HSP(Hardware Synchronization Primitives) Driver"



Space missing before the opening parenthesis (same in the patch title
btw).


Okay.




+   depends on ARCH_TEGRA_186_SOC
+   help
+ The Tegra HSP driver is used for the interprocessor
communication
+ between different remote processors and host processors on
Tegra186
+ and later SoCs. Say Y here if you want to have this support.
+ If unsure say N.



Since this option is selected automatically by ARCH_TEGRA_186_SOC, you
should probably drop the last 2 sentences.


Okay.




+
   config XGENE_SLIMPRO_MBOX
  tristate "APM SoC X-Gene SLIMpro Mailbox Controller"
  depends on ARCH_XGENE
diff --git a/drivers/mailbox/Makefile b/drivers/mailbox/Makefile
index 0be3e742bb7d..26d8f91c7fea 100644
--- a/drivers/mailbox/Makefile
+++ b/drivers/mailbox/Makefile
@@ -25,3 +25,5 @@ obj-$(CONFIG_TI_MESSAGE_MANAGER) += ti-msgmgr.o
   obj-$(CONFIG_XGENE_SLIMPRO_MBOX) += mailbox-xgene-slimpro.o

   obj-$(CONFIG_HI6220_MBOX)  += hi6220-mailbox.o
+
+obj-${CONFIG_TEGRA_HSP_MBOX}   += tegra-hsp.o
diff --git a/drivers/mailbox/tegra-hsp.c b/drivers/mailbox/tegra-hsp.c
new file mode 100644
index ..93c3ef58f29f
--- /dev/null
+++ b/drivers/mailbox/tegra-hsp.c
@@ -0,0 +1,418 @@
+/*
+ * Copyright (c) 2016, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but
WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
for
+ * more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define HSP_INT_DIMENSIONING   0x380
+#define HSP_nSM_OFFSET 0
+#define HSP_nSS_OFFSET 4
+#define HSP_nAS_OFFSET 8
+#define HSP_nDB_OFFSET 12
+#define HSP_nSI_OFFSET 16



Would be nice to have comments to understand what SM, SS, AS, etc.
stand for (Shared Mailboxes, Shared Semaphores, Arbitrated Semaphores
but you need to look at the patch description to understand that). A
top-of-file comment explaning the necessary concepts to read this code
would do the trick.


Yes, will fix that.




+#define HSP_nINT_MASK  0xf
+
+#define HSP_DB_REG_TRIGGER 0x0
+#define HSP_DB_REG_ENABLE  0x4
+#define HSP_DB_REG_RAW 0x8
+#define HSP_DB_REG_PENDING 0xc
+
+#define HSP_DB_CCPLEX  1
+#define HSP_DB_BPMP3



Maybe turn this into enum and use that type for
tegra_hsp_db_chan::db_id? Also have MAX_NUM_HSP_DB here, since it is
related to these values?


Okay.




+
+#define MAX_NUM_HSP_CHAN 32
+#define MAX_NUM_HSP_DB 7
+
+#define hsp_db_offset(i, d) \
+   (d->base + ((1 + (d->nr_sm >> 1) + d->nr_ss + d->nr_as) << 16) +
\
+   (i) * 0x100)
+
+struct tegra_hsp_db_chan {
+   int master_id;
+   int db_id;
+};
+
+struct tegra_hsp_mbox_chan {
+   int type;
+   union {
+   struct tegra_hsp_db_chan db_chan;
+   };
+};
+
+struct tegra_hsp_mbox {
+   struct mbox_controller *mbox;
+   void __iomem *base;
+   void __iomem *db_base[MAX_NUM_HSP_DB];
+   int db_irq;
+   int nr_sm;
+   int nr_as;
+   int

Re: [PATCH v2 1/2] drivers: led: is31fl319x: 1/3/6/9-channel light effect led driver

2016-07-06 Thread kbuild test robot
Hi,

[auto build test WARNING on j.anaszewski-leds/for-next]
[also build test WARNING on v4.7-rc6 next-20160706]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/H-Nikolaus-Schaller/driver-leds-is31fl319x-dimmable-LED-driver/20160706-180838
base:   
https://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds.git 
for-next
config: mn10300-allmodconfig (attached as .config)
compiler: am33_2.0-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=mn10300 

All warnings (new ones prefixed by >>):

   In file included from include/linux/printk.h:289:0,
from include/linux/kernel.h:13,
from include/linux/list.h:8,
from include/linux/kobject.h:20,
from include/linux/device.h:17,
from include/linux/i2c.h:30,
from drivers/leds/leds-is31fl319x.c:18:
   drivers/leds/leds-is31fl319x.c: In function 'is31fl319x_brightness_set':
>> include/linux/dynamic_debug.h:64:16: warning: format '%d' expects argument 
>> of type 'int', but argument 5 has type 'long int' [-Wformat=]
 static struct _ddebug  __aligned(8)   \
   ^
   include/linux/dynamic_debug.h:84:2: note: in expansion of macro 
'DEFINE_DYNAMIC_DEBUG_METADATA'
 DEFINE_DYNAMIC_DEBUG_METADATA(descriptor, fmt);  \
 ^
   include/linux/device.h:1197:2: note: in expansion of macro 'dynamic_dev_dbg'
 dynamic_dev_dbg(dev, format, ##__VA_ARGS__); \
 ^
   drivers/leds/leds-is31fl319x.c:104:2: note: in expansion of macro 'dev_dbg'
 dev_dbg(&is31->client->dev, "%s %d: %d\n", __func__, (led - is31->leds),
 ^

vim +64 include/linux/dynamic_debug.h

b48420c1 Jim Cromie  2012-04-27  48 const 
char *modname);
b48420c1 Jim Cromie  2012-04-27  49  
cbc46635 Joe Perches 2011-08-11  50  struct device;
cbc46635 Joe Perches 2011-08-11  51  
b9075fa9 Joe Perches 2011-10-31  52  extern __printf(3, 4)
906d2015 Joe Perches 2014-09-24  53  void __dynamic_dev_dbg(struct _ddebug 
*descriptor, const struct device *dev,
b9075fa9 Joe Perches 2011-10-31  54const char *fmt, ...);
cbc46635 Joe Perches 2011-08-11  55  
ffa10cb4 Jason Baron 2011-08-11  56  struct net_device;
ffa10cb4 Jason Baron 2011-08-11  57  
b9075fa9 Joe Perches 2011-10-31  58  extern __printf(3, 4)
906d2015 Joe Perches 2014-09-24  59  void __dynamic_netdev_dbg(struct _ddebug 
*descriptor,
ffa10cb4 Jason Baron 2011-08-11  60   const struct 
net_device *dev,
b9075fa9 Joe Perches 2011-10-31  61   const char *fmt, ...);
ffa10cb4 Jason Baron 2011-08-11  62  
07613b0b Jason Baron 2011-10-04  63  #define 
DEFINE_DYNAMIC_DEBUG_METADATA(name, fmt)   \
c0d2af63 Joe Perches 2012-10-18 @64 static struct _ddebug  __aligned(8) 
\
07613b0b Jason Baron 2011-10-04  65 __attribute__((section("__verbose"))) 
name = {  \
07613b0b Jason Baron 2011-10-04  66 .modname = KBUILD_MODNAME,  
\
07613b0b Jason Baron 2011-10-04  67 .function = __func__,   
\
07613b0b Jason Baron 2011-10-04  68 .filename = __FILE__,   
\
07613b0b Jason Baron 2011-10-04  69 .format = (fmt),
\
07613b0b Jason Baron 2011-10-04  70 .lineno = __LINE__, 
\
07613b0b Jason Baron 2011-10-04  71 .flags =  
_DPRINTK_FLAGS_DEFAULT,   \
07613b0b Jason Baron 2011-10-04  72 }

:: The code at line 64 was first introduced by commit
:: c0d2af637863940b1a4fb208224ca7acb905c39f dynamic_debug: Remove 
unnecessary __used

:: TO: Joe Perches 
:: CC: Greg Kroah-Hartman 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


[PATCH] PM / hibernate: Introduce snapshot test mode for hibernation

2016-07-06 Thread Chen Yu
This mode is to verify if the snapshot data written to
swap device can be successfully restored to memory. It
is useful to ease the debugging process on hibernation,
since this mode can not only bypass the BIOSen/bootloader,
but also the system re-initialization.

For example:
$ sudo echo snapshot > /sys/power/disk
$ sudo echo disk > /sys/power/state

/* manual resume.*/
$ sudo echo 8:3 > /sys/power/resume

Signed-off-by: Chen Yu 
---
 kernel/power/hibernate.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index fca9254..667d926 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -52,6 +52,7 @@ enum {
 #ifdef CONFIG_SUSPEND
HIBERNATION_SUSPEND,
 #endif
+   HIBERNATION_SNAPSHOT,
/* keep last */
__HIBERNATION_AFTER_LAST
 };
@@ -631,6 +632,9 @@ static void power_down(void)
"Try swapon -a.\n");
return;
 #endif
+   case HIBERNATION_SNAPSHOT:
+   /* Do nothing. */
+   return;
}
kernel_halt();
/*
@@ -878,6 +882,7 @@ static const char * const hibernation_modes[] = {
 #ifdef CONFIG_SUSPEND
[HIBERNATION_SUSPEND]   = "suspend",
 #endif
+   [HIBERNATION_SNAPSHOT]  = "snapshot",
 };
 
 /*
@@ -924,6 +929,7 @@ static ssize_t disk_show(struct kobject *kobj, struct 
kobj_attribute *attr,
 #ifdef CONFIG_SUSPEND
case HIBERNATION_SUSPEND:
 #endif
+   case HIBERNATION_SNAPSHOT:
break;
case HIBERNATION_PLATFORM:
if (hibernation_ops)
@@ -970,6 +976,7 @@ static ssize_t disk_store(struct kobject *kobj, struct 
kobj_attribute *attr,
 #ifdef CONFIG_SUSPEND
case HIBERNATION_SUSPEND:
 #endif
+   case HIBERNATION_SNAPSHOT:
hibernation_mode = mode;
break;
case HIBERNATION_PLATFORM:
-- 
2.7.4



Re: [PATCH v2 2/2] clk: hi6220: initialize UART1 clock to 150MHz

2016-07-06 Thread Jorge Ramirez

On 07/06/2016 11:43 PM, Michael Turquette wrote:

Quoting Guodong Xu (2016-06-29 01:45:55)

>From: Jorge Ramirez-Ortiz
>
>Early at boot, during the sys_clk initialization, make sure UART1 uses
>the higher frequency clock, 150MHz.
>
>This enables support for higher baud rates (up to 3Mbps) in UART1, which
>is required by faster bluetooth transfers.
>
>v2: use clk_set_rate() to propergate clock settings.
>
>Signed-off-by: Jorge Ramirez-Ortiz
>Signed-off-by: Guodong Xu
>---
>  drivers/clk/hisilicon/clk-hi6220.c | 4 
>  1 file changed, 4 insertions(+)
>
>diff --git a/drivers/clk/hisilicon/clk-hi6220.c 
b/drivers/clk/hisilicon/clk-hi6220.c
>index a36ffcb..631c56f 100644
>--- a/drivers/clk/hisilicon/clk-hi6220.c
>+++ b/drivers/clk/hisilicon/clk-hi6220.c
>@@ -12,6 +12,7 @@
>  
>  #include 

>  #include 
>+#include 
>  #include 
>  #include 
>  #include 
>@@ -192,6 +193,9 @@ static void __init hi6220_clk_sys_init(struct device_node 
*np)
>  
> hi6220_clk_register_divider(hi6220_div_clks_sys,

> ARRAY_SIZE(hi6220_div_clks_sys), clk_data);
>+
>+   if (clk_set_rate(clk_data->clk_data.clks[HI6220_UART1_SRC], 15000))
>+   pr_err("failed to set uart1 clock rate\n");

Why doesn't the UART driver call clk_get and then clk_set_rate on this
clock? Why do it in the clk provider driver?


yes that was my initial choice as well; in the end I opted to do it in 
the clock driver because of it being a value that will not have to ever 
change for the SoC and - maybe more importantly- because of not having a 
DT property available for the primecell pl011 uart where to  specify the 
value (so I thought this was a less intrusive implementation).





linux-next: Tree for Jul 7

2016-07-06 Thread Stephen Rothwell
Hi all,

Changes since 20160706:

The mac80211-next tree gained a conflict against the wireless-drivers-next
tree.

The clockevents tree gained a conflict against the arm-soc tree.

Non-merge commits (relative to Linus' tree): 7149
 6726 files changed, 338614 insertions(+), 141091 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 236 trees (counting Linus' and 34 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (bc86765181aa Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging fixes/master (5edb56491d48 Linux 4.7-rc3)
Merging kbuild-current/rc-fixes (b36fad65d61f kbuild: Initialize exported 
variables)
Merging arc-current/for-curr (9bd54517ee86 arc: unwind: warn only once if 
DW2_UNWIND is disabled)
Merging arm-current/fixes (56530f5d2ddc ARM: 8579/1: mm: Fix definition of 
pmd_mknotpresent)
Merging m68k-current/for-linus (9a6462763b17 m68k/mvme16x: Include generic 
)
Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached 
build errors)
Merging powerpc-fixes/fixes (eb584b3ee4d3 powerpc/tm: Fix stack pointer 
corruption in __tm_recheckpoint())
Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2)
Merging sparc/master (6b15d6650c53 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging net/master (bc86765181aa Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging ipsec/master (d6af1a31cc72 vti: Add pmtu handling to vti_xmit.)
Merging ipvs/master (370a8107e788 ipvs: fix bind to link-local mcast IPv6 
address in backup)
Merging wireless-drivers/master (034fdd4a17ff Merge ath-current from ath.git)
Merging mac80211/master (16a910a6722b cfg80211: handle failed skb allocation)
Merging sound-current/for-linus (9cd25743765c ALSA: hda/realtek: Add Lenovo 
L460 to docking unit fixup)
Merging pci-current/for-linus (ef0dab4aae14 PCI: Fix unaligned accesses in VC 
code)
Merging driver-core.current/driver-core-linus (33688abb2802 Linux 4.7-rc4)
Merging tty.current/tty-linus (a99cde438de0 Linux 4.7-rc6)
Merging usb.current/usb-linus (a99cde438de0 Linux 4.7-rc6)
Merging usb-gadget-fixes/fixes (50c763f8c1ba usb: dwc3: Set the ClearPendIN bit 
on Clear Stall EP command)
Merging usb-serial-fixes/usb-linus (4c2e07c6a29e Linux 4.7-rc5)
Merging usb-chipidea-fixes/ci-for-usb-stable (ea1d39a31d3b usb: common: 
otg-fsm: add license to usb-otg-fsm)
Merging staging.current/staging-linus (a99cde438de0 Linux 4.7-rc6)
Merging char-misc.current/char-misc-linus (33688abb2802 Linux 4.7-rc4)
Merging input-current/for-linus (caca925fca4f Input: xpad - validate USB 
endpoint count during probe)
Merging crypto-current/master (055ddaace035 crypto: user - re-add size check 
for CRYPTO_MSG_GETALG)
Merging ide/master (1993b176a822 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide)
Merging rr-fixes/fixes (8244062ef1e5 modules: fix longstanding /proc/kallsyms 
vs module insertion race.)
Merging vfio-fixes/for-linus (ce7585f3c4d7 vfio/pci: Allow VPD short read)
Merging kselftest-fixes/fixes (f80eb4289491 selftests/exec: Makefile is a 
run-time dependency, add it to the install list)
Merging backlight-fixes/for-backlight-fixes (68feaca0b13e backlight: pwm: 
Handle EPROBE_DEFER while requesting the PWM)
Merging ftrace-fixes/for-next-urgent (6224beb12e19 tracing: Have branch tracer 
use recursive fie

Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-07-06 Thread Roland Dreier
On Thu, Jan 7, 2016 at 3:00 AM, Konstantin Khlebnikov  wrote:
> Or just shift GSO CB and add couple checks like
> BUILD_BUG_ON(sizeof(SKB_GSO_CB(skb)->room) < sizeof(*IPCB(skb)));

Resurrecting this old thread, because the patch that ultimately went
upstream (commit 9207f9d45b0a / net: preserve IP control block during
GSO segmentation) causes a huge IPoIB performance regression (to the
point of being unusable):
https://bugzilla.kernel.org/show_bug.cgi?id=111921

I don't think anyone has explained what goes wrong or why IPoIB works
the way it does.  The underlying difference that IPoIB has from other
drivers is that there are two levels of address resolution.  First,
normal ARP / ND resolves an IP address to a "hardware" address.  The
difference is that in IPoIB, the hardware address is an IB GID (plus a
QPN, but we can ignore that).  To actually send data to that GID, the
IPoIB driver has to do a second lookup - it needs to ask the IB subnet
manager for a path record that tells it how to reach that GID.

In particular this means that "destination address" (as the IP / ARP
layer understands it) actually isn't in the packet anywhere - there's
nothing like an ethernet header as there is for "normal" network
drivers.  Instead, the driver stashes the address in skb->cb during
hard_header_ops->create() and then looks at it in the xmit routine -
this was designed way back around when commit a0417fa3a18a / net: Make
qdisc_skb_cb upper size bound explicit. was merged.  The expectation
was that the part of the cb after sizeof (struct qdisc_skb_cb) would
be preserved.

The problem with commit 9207f9d45b0a is that GSO operations now access
cb after SKB_SGO_CB_OFFSET==32, which lands right in the middle of
where IPoIB stashes its hwaddr.

It seems that the intent of the commit is to preserve the IP control
block - struct inet_skb_parm (and presumably struct inet6_skb_parm) -
even when using SKB_GSO_CB().  Seems like both inet_skb_parm and
inet6_skb_parm are 20 bytes.  IPoIB uses the part of cb after 28
bytes, so if we could squeeze struct skb_gso_cb down to 8 bytes and
set SKB_SGO_CB_OFFSET to 20, then everything would work.  The struct
is

struct skb_gso_cb {
int mac_offset;
int encap_level;
__u16   csum_start;
};

is it feasible to make encap_level a __u16 (which would make the
overall struct exactly 8 bytes)?  If I understand this correctly, 64K
nested encapsulations seems like quite a bit for a packet...

Or, earlier in this thread, having the GSO in ip_output and other gso
paths save and restore the IP/IP6 control block was suggested as an
alternate approach.  I don't know if there are performance
implications to that.

What is the best way to keep the crash fix but not kill IPoIB performance?

Thanks!
 - R.


Re: [PATCH V2 03/10] Documentation: dt-bindings: firmware: tegra: add bindings of the BPMP

2016-07-06 Thread Joseph Lo

On 07/07/2016 01:03 AM, Stephen Warren wrote:

On 07/05/2016 03:04 AM, Joseph Lo wrote:

The BPMP is a specific processor in Tegra chip, which is designed for
booting process handling and offloading the power management, clock
management, and reset control tasks from the CPU. The binding document
defines the resources that would be used by the BPMP firmware driver,
which can create the interprocessor communication (IPC) between the CPU
and BPMP.


Acked-by: Stephen Warren 


Thanks,
-Joseph


Re: [PATCH 11/31] mm: vmscan: do not reclaim from kswapd if there is any eligible zone

2016-07-06 Thread Minchan Kim
On Wed, Jul 06, 2016 at 09:42:00AM +0100, Mel Gorman wrote:

> > > > 
> > > > If buffer_head is over limit, old logic force to reclaim highmem but
> > > > this zone_balanced logic will prevent it.
> > > > 
> > > 
> > > The old logic was always busted on 64-bit because is_highmem would always
> > > be 0. The original intent appears to be that buffer_heads_over_limit
> > > would release the buffers when pages went inactive. There are a number
> > 
> > Yes but the difference is in old, it was handled both direct and background
> > reclaim once buffers_heads is over the limit but your change slightly
> > changs it so kswapd couldn't reclaim high zone if any eligible zone
> > is balanced. I don't know how big difference it can make but we saw
> > highmem buffer_head problems several times, IIRC. So, I just wanted
> > to notice it to you. whether it's handled or not, it's up to you.
> > 
> 
> The last time I remember buffer_heads_over_limit was an NTFS filesystem
> using small sub-page block sizes with a large highmem:lowmem ratio. If a
> similar situation is encountered then a test patch would be something like;
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index dc12af938a8d..a8ebd1871f16 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3151,7 +3151,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, 
> int classzone_idx)
>* zone was balanced even under extreme pressure when the
>* overall node may be congested.
>*/
> - for (i = sc.reclaim_idx; i >= 0; i--) {
> + for (i = sc.reclaim_idx; i >= 0 && !buffer_heads_over_limit; 
> i--) {
>   zone = pgdat->node_zones + i;
>   if (!populated_zone(zone))
>   continue;
> 
> I'm not going to go with it for now because buffer_heads_over_limit is not
> necessarily a problem unless lowmem is factor. We don't want background
> reclaim to go ahead unnecessarily just because buffer_heads_over_limit.
> It could be distinguished by only forcing reclaim to go ahead on systems
> with highmem.

If you don't think it's a problem, I don't want to insist on it because I don't
have any report/workload right now. Instead, please write some comment in there
for others to understand why kswapd is okay to ignore buffer_heads_over_limit
unlike direct reclaim. Such non-symmetric behavior is really hard to follow
without any description.


Re: [PATCH V2 03/10] Documentation: dt-bindings: firmware: tegra: add bindings of the BPMP

2016-07-06 Thread Joseph Lo

On 07/06/2016 07:42 PM, Alexandre Courbot wrote:

On Tue, Jul 5, 2016 at 6:04 PM, Joseph Lo  wrote:

The BPMP is a specific processor in Tegra chip, which is designed for
booting process handling and offloading the power management, clock
management, and reset control tasks from the CPU. The binding document
defines the resources that would be used by the BPMP firmware driver,
which can create the interprocessor communication (IPC) between the CPU
and BPMP.

Signed-off-by: Joseph Lo 
---
Changes in V2:
- update the message that the BPMP is clock and reset control provider
- add tegra186-clock.h and tegra186-reset.h header files
- revise the description of the required properties
---
  .../bindings/firmware/nvidia,tegra186-bpmp.txt |  77 ++
  include/dt-bindings/clock/tegra186-clock.h | 940 +
  include/dt-bindings/reset/tegra186-reset.h | 217 +
  3 files changed, 1234 insertions(+)
  create mode 100644 
Documentation/devicetree/bindings/firmware/nvidia,tegra186-bpmp.txt
  create mode 100644 include/dt-bindings/clock/tegra186-clock.h
  create mode 100644 include/dt-bindings/reset/tegra186-reset.h

diff --git 
a/Documentation/devicetree/bindings/firmware/nvidia,tegra186-bpmp.txt 
b/Documentation/devicetree/bindings/firmware/nvidia,tegra186-bpmp.txt
new file mode 100644
index ..4d0b6eba56c5
--- /dev/null
+++ b/Documentation/devicetree/bindings/firmware/nvidia,tegra186-bpmp.txt
@@ -0,0 +1,77 @@
+NVIDIA Tegra Boot and Power Management Processor (BPMP)
+
+The BPMP is a specific processor in Tegra chip, which is designed for
+booting process handling and offloading the power management, clock
+management, and reset control tasks from the CPU. The binding document
+defines the resources that would be used by the BPMP firmware driver,
+which can create the interprocessor communication (IPC) between the CPU
+and BPMP.
+
+Required properties:
+- name : Should be bpmp
+- compatible
+Array of strings
+One of:
+- "nvidia,tegra186-bpmp"
+- mboxes : The phandle of mailbox controller and the mailbox specifier.
+- shmem : List of the phandle of the TX and RX shared memory area that
+ the IPC between CPU and BPMP is based on.
+- #clock-cells : Should be 1.
+- #reset-cells : Should be 1.
+
+This node is a mailbox consumer. See the following files for details of
+the mailbox subsystem, and the specifiers implemented by the relevant
+provider(s):
+
+- Documentation/devicetree/bindings/mailbox/mailbox.txt
+- Documentation/devicetree/bindings/mailbox/nvidia,tegra186-hsp.txt
+
+This node is a clock and reset provider. See the following files for
+general documentation of those features, and the specifiers implemented
+by this node:
+
+- Documentation/devicetree/bindings/clock/clock-bindings.txt
+- include/dt-bindings/clock/tegra186-clock.h
+- Documentation/devicetree/bindings/reset/reset.txt
+- include/dt-bindings/reset/tegra186-reset.h
+
+The shared memory bindings for BPMP
+---
+
+The shared memory area for the IPC TX and RX between CPU and BPMP are
+predefined and work on top of sysram, which is an SRAM inside the chip.
+
+See "Documentation/devicetree/bindings/sram/sram.txt" for the bindings.
+
+Example:
+
+hsp_top0: hsp@03c0 {
+   ...
+   #mbox-cells = <1>;
+};
+
+sysram@3000 {
+   compatible = "nvidia,tegra186-sysram", "mmio-ram";


Shouldn't the second compatible be "mmio-sram"?

If so, then you have the same typo in tegra186.dtsi as well.



Good catch, will fix.

Thanks,
-Joseph


Re: [PATCH V2 01/10] Documentation: dt-bindings: mailbox: tegra: Add binding for HSP mailbox

2016-07-06 Thread Joseph Lo

On 07/07/2016 01:02 AM, Stephen Warren wrote:

On 07/05/2016 03:04 AM, Joseph Lo wrote:

Add DT binding for the Hardware Synchronization Primitives (HSP). The
HSP is designed for the processors to share resources and communicate
together. It provides a set of hardware synchronization primitives for
interprocessor communication. So the interprocessor communication (IPC)
protocols can use hardware synchronization primitive, when operating
between two processors not in an SMP relationship.


Acked-by: Stephen Warren 


Thanks,
-Joseph


--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] arm64: kexec_file_load support

2016-07-06 Thread Dave Young
On 07/05/16 at 05:03pm, AKASHI Takahiro wrote:
> Hi Dave,
> 
> On Tue, Jul 05, 2016 at 09:25:56AM +0800, Dave Young wrote:
> > On 07/04/16 at 03:58pm, AKASHI Takahiro wrote:
> > > Hi,
> > > 
> > > On Fri, Jul 01, 2016 at 12:46:31PM -0300, Thiago Jung Bauermann wrote:
> > > > Am Freitag, 01 Juli 2016, 14:11:12 schrieb AKASHI Takahiro:
> > > > > I'm not sure whether there is any demand for kexec_file_load
> > > > > support on arm64, but anyhow I'm working on this and now
> > > > > my early prototype code does work fine.
> > > > 
> > > > It is necessary if you want to support loading only signed kernels, and 
> > > > also 
> > > > if you want IMA to measure the kernel in its event log.
> > > > 
> > > > > There is, however, one essential issue:
> > > > > While arm64 kernel requires a device tree blob to be set up
> > > > > correctly at boot time, the current system call API doesn't
> > > > > have this parameter.
> > > > > int kexec_file_load(int kernel_fd, int initrd_fd,
> > > > > unsigned long cmdline_len, const char
> > > > > *cmdline_ptr, unsigned long flags);
> > > > > 
> > > > > Should we invent a new system call, like kexec_file_load2,
> > > > > and, if so, what kind of interface would be desired?
> > > > 
> > > > I'm facing the same issue on powerpc. What I'm doing is taking the 
> > > > device 
> > > > tree that was used to boot the current kernel and modifying it as 
> > > > necessary 
> > > > to pass it to the next kernel.
> > > 
> > > That is exactly what I do.
> > > 
> > > > I agree that it would be better if we could have a system call where a 
> > > > custom device tree could be passed. One suggestion is:
> > > 
> > > For powerpc, you might be able to use dtbImage instead of Image
> > > without changing the kernel interfaces.
> > > > 
> > > > kexec_file_load2(int fds[], int fd_types[], int nr_fds,
> > > >  unsigned long cmdline_len, const char *cmdline_ptr,
> > > > unsigned long flags);
> > > 
> > > You don't want to simply add one more argument, i.e. dtb_fd, don't you.
> > > 
> > > I prefer a slightly-simpler interface:
> > > struct kexec_file_fd {
> > > enum kexec_file_type;
> > > int fd;
> > > }
> > > 
> > > int kexec_file_load2(struct kexec_file_fd[], int nr_fds, int 
> > > flags);
> > > 
> > > Or if you want to keep the compatibility with the existing system call,
> > > 
> > > int kexec_file_load(int kernel_fd, int initrd_fd,
> > > unsigned long cmdline_len, const char 
> > > *cmdline_ptr,
> > > unsigned long flags,
> > > int struct kexec_file_fd[], int nr_fds);
> > > 
> > > Here SYSCALL_DEFINE7() have to be defined, and I'm not sure that we will 
> > > not
> > > have a problem in adding a system call with more than 6 arguments.
> > > 
> > > > Where fds is an array with nr_fds file descriptors and fd_types is an 
> > > > array 
> > > > specifying what each fd in fds is. So for example, if fds[i] is the 
> > > > kernel, 
> > > > then fd_types[i] would have the value KEXEC_FILE_KERNEL_FD. If fds[i] 
> > > > is the 
> > > > device tree blob, fd_types[i], would have the value KEXEC_FILE_DTB and 
> > > > so 
> > > > on. That way, the syscall can be extended for an arbitrary number and 
> > > > types 
> > > > of segments that have to be loaded, just like kexec_load.
> > > > 
> > > > Another option is to have a struct:
> > > > 
> > > > kexec_file_load2(struct kexec_file_params *params, unsigned long 
> > > > params_sz);
> > > 
> > > Wow, we can add any number of new parameters with this interface.
> > > 
> > > Thanks,
> > > -Takahiro AKASHI
> > > 
> > > > Where:
> > > > 
> > > > struct kexec_file_params {
> > > > int version;/* allows struct to be extended in the future */
> > > > int fds[];
> > > > int fd_types[];
> > > > int nr_fds;
> > > > unsigned long cmdline_len;
> > > > const char *cmdline_ptr;
> > > > unsigned long flags;
> > > > };
> > > > 
> > > > This is even more flexible.
> > 
> > I would like to vote for this one, and use kexec_file_fd fds[] in the 
> > struct 
> 
> If we take this approach, we'd better take "flags" out of struct,
> and my preference would be:
> 
> enum kexec_file_type {
> KEXEC_FILE_TYPE_KERNEL;
> KEXEC_FILE_TYPE_INITRD;
> KEXEC_FILE_TYPE_DTB;
> }
> 
> struct kexec_file_fd {
> enum kexec_file_type;
> int fd;
> }
> 
> sturct kexec_file_params {
> int version;
> unsigned char *cmdline;
> unsigned long cmdline_len;
> int nr_fds;
> struct kexec_file_fd fds[0];
> }
> 
> int kexec_file_load2(int kernel_fd, unsigned long flags,
> sturct kexec_file_params extra);
> 
> So we don't 

[PATCH v2] arm64: dts: berlin4ct: Add L2 cache topology

2016-07-06 Thread Jisheng Zhang
This patch adds the L2 cache topology for berlin4ct which has 1MB L2
cache.

Signed-off-by: Jisheng Zhang 
---
Since V1:
 - use lower case for node label
 - remove useless "0" in node label and node name

 arch/arm64/boot/dts/marvell/berlin4ct.dtsi | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/berlin4ct.dtsi 
b/arch/arm64/boot/dts/marvell/berlin4ct.dtsi
index 099ad93..deea38b 100644
--- a/arch/arm64/boot/dts/marvell/berlin4ct.dtsi
+++ b/arch/arm64/boot/dts/marvell/berlin4ct.dtsi
@@ -68,6 +68,7 @@
device_type = "cpu";
reg = <0x0>;
enable-method = "psci";
+   next-level-cache = <&l2>;
cpu-idle-states = <&CPU_SLEEP_0>;
};
 
@@ -76,6 +77,7 @@
device_type = "cpu";
reg = <0x1>;
enable-method = "psci";
+   next-level-cache = <&l2>;
cpu-idle-states = <&CPU_SLEEP_0>;
};
 
@@ -84,6 +86,7 @@
device_type = "cpu";
reg = <0x2>;
enable-method = "psci";
+   next-level-cache = <&l2>;
cpu-idle-states = <&CPU_SLEEP_0>;
};
 
@@ -92,9 +95,14 @@
device_type = "cpu";
reg = <0x3>;
enable-method = "psci";
+   next-level-cache = <&l2>;
cpu-idle-states = <&CPU_SLEEP_0>;
};
 
+   l2: l2-cache {
+   compatible = "cache";
+   };
+
idle-states {
entry-method = "psci";
CPU_SLEEP_0: cpu-sleep-0 {
-- 
2.8.1



Re: [PATCH] arm64: dts: berlin4ct: Add L2 cache topology

2016-07-06 Thread Jisheng Zhang
Dear Sebastian,

On Wed, 6 Jul 2016 19:49:01 +0200 Sebastian Hesselbarth wrote:

> On 16.06.2016 10:40, Jisheng Zhang wrote:
> > This patch adds the L2 cache topology for berlin4ct which has 1MB L2
> > cache.
> > 
> > Signed-off-by: Jisheng Zhang 
> > ---
> >  arch/arm64/boot/dts/marvell/berlin4ct.dtsi | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/arch/arm64/boot/dts/marvell/berlin4ct.dtsi 
> > b/arch/arm64/boot/dts/marvell/berlin4ct.dtsi
> > index 099ad93..c9e3a98 100644
> > --- a/arch/arm64/boot/dts/marvell/berlin4ct.dtsi
> > +++ b/arch/arm64/boot/dts/marvell/berlin4ct.dtsi  
> [...]
> > @@ -92,9 +95,14 @@
> > device_type = "cpu";
> > reg = <0x3>;
> > enable-method = "psci";
> > +   next-level-cache = <&L2_0>;
> > cpu-idle-states = <&CPU_SLEEP_0>;
> > };
> >  
> > +   L2_0: l2-cache0 {  
> 
> Jisheng,
> 
> The node name should just have a generic name that reflects
> the purpose of the unit it represents, i.e.
> s/l2-cache0/cache/

IMHO, "cache" is too generic, this is L2 cache topology, so in v2, I use 
"l2-cache" instead. what do you think?

PS: I found other arm64 SoCs also use "l2-cache" as the node name.

> 
> nits:
> - What is that "0" for? Please remove if there is no good reason.
> - Does the node label need to be upper-case? Please make it lower case.
> 

oh yeah, thanks for the hints! will do in v2.

Thanks for reviewing,
Jisheng

> Sebastian
> 
> > +   compatible = "cache";
> > +   };
> > +
> > idle-states {
> > entry-method = "psci";
> > CPU_SLEEP_0: cpu-sleep-0 {
> >   
> 



Re: [PATCH 08/31] mm, vmscan: simplify the logic deciding whether kswapd sleeps

2016-07-06 Thread Minchan Kim
On Wed, Jul 06, 2016 at 09:31:21AM +0100, Mel Gorman wrote:
> On Wed, Jul 06, 2016 at 09:30:54AM +0900, Minchan Kim wrote:
> > On Tue, Jul 05, 2016 at 11:26:39AM +0100, Mel Gorman wrote:
> > 
> > 
> > 
> > > > > @@ -3418,10 +3426,10 @@ void wakeup_kswapd(struct zone *zone, int 
> > > > > order, enum zone_type classzone_idx)
> > > > >   if (!cpuset_zone_allowed(zone, GFP_KERNEL | __GFP_HARDWALL))
> > > > >   return;
> > > > >   pgdat = zone->zone_pgdat;
> > > > > - if (pgdat->kswapd_max_order < order) {
> > > > > - pgdat->kswapd_max_order = order;
> > > > > - pgdat->classzone_idx = min(pgdat->classzone_idx, 
> > > > > classzone_idx);
> > > > > - }
> > > > > + if (pgdat->kswapd_classzone_idx == -1)
> > > > > + pgdat->kswapd_classzone_idx = classzone_idx;
> > > > 
> > > > It's tricky. Couldn't we change kswapd_classzone_idx to integer type
> > > > and remove if above if condition?
> > > > 
> > > 
> > > It's tricky and not necessarily better overall. It's perfectly possible
> > > to be woken up for zone index 0 so it's changing -1 to another magic
> > > value.
> > 
> > I don't get it. What is a problem with this?
> > 
> 
> It becomes difficult to tell the difference between "no wakeup and init to
> zone 0" and "wakeup and reclaim for zone 0". At least that's the problem
> I ran into when I tried before settling on -1.

Sorry for bothering you several times. I cannot parse what you mean.
I didn't mean -1 is problem here but why do we need below two lines
I removed?

IOW, what's the problem if we apply below patch?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c538a8c..6eb23f5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3413,9 +3413,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum 
zone_type classzone_idx)
if (!cpuset_zone_allowed(zone, GFP_KERNEL | __GFP_HARDWALL))
return;
pgdat = zone->zone_pgdat;
-   if (pgdat->kswapd_classzone_idx == -1)
-   pgdat->kswapd_classzone_idx = classzone_idx;
-   pgdat->kswapd_classzone_idx = max(pgdat->kswapd_classzone_idx, 
classzone_idx);
+   pgdat->kswapd_classzone_idx = max_t(int, pgdat->kswapd_classzone_idx, 
classzone_idx);
pgdat->kswapd_order = max(pgdat->kswapd_order, order);
if (!waitqueue_active(&pgdat->kswapd_wait))
return;  

> 
> -- 
> Mel Gorman
> SUSE Labs


Re: linux-next: manual merge of the mac80211-next tree with the wireless-drivers-next tree

2016-07-06 Thread Coelho, Luciano
On Thu, 2016-07-07 at 11:56 +1000, Stephen Rothwell wrote:
> Hi Johannes,
> 
> Today's linux-next merge of the mac80211-next tree got a conflict in:
> 
>   drivers/net/wireless/marvell/mwifiex/cmdevt.c
> 
> between commit:
> 
>   a9c790ba23eb ("mwifiex: factor out mwifiex_cancel_scan")
> 
> from the wireless-drivers-next tree and commit:
> 
>   1d76250bd34a ("nl80211: support beacon report scanning")
> 
> from the mac80211-next tree.
> 
> I fixed it up (I used the wireless-drivers-next tree version of this
> file
> and then added the following merge fix patch) and can carry the fix
> as
> necessary. This is now fixed as far as linux-next is concerned, but
> any
> non trivial conflicts should be mentioned to your upstream maintainer
> when your tree is submitted for merging.  You may also want to
> consider
> cooperating with the maintainer of the conflicting tree to minimise
> any
> particularly complex conflicts.
> 
> From: Stephen Rothwell 
> Date: Thu, 7 Jul 2016 11:51:35 +1000
> Subject: [PATCH] mwifiex: fixup for "nl80211: support beacon report
> scanning"
> 
> Signed-off-by: Stephen Rothwell 
> ---
>  drivers/net/wireless/marvell/mwifiex/scan.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/marvell/mwifiex/scan.c
> b/drivers/net/wireless/marvell/mwifiex/scan.c
> index 4d21ca9744c1..ed3de0754a08 100644
> --- a/drivers/net/wireless/marvell/mwifiex/scan.c
> +++ b/drivers/net/wireless/marvell/mwifiex/scan.c
> @@ -2026,9 +2026,13 @@ void mwifiex_cancel_scan(struct
> mwifiex_adapter *adapter)
>   if (!priv)
>   continue;
>   if (priv->scan_request) {
> + struct cfg80211_scan_info info = {
> + .aborted = true,
> + };
> +
>   mwifiex_dbg(adapter, INFO,
>   "info: aborting
> scan\n");
> - cfg80211_scan_done(priv-
> >scan_request, 1);
> + cfg80211_scan_done(priv-
> >scan_request, &info);
>   priv->scan_request = NULL;
>   }
>   }

The fix looks good to me.  Thanks!

--
Luca.

Re: [GIT PULL] STi SoC changes for v4.8

2016-07-06 Thread Olof Johansson
On Fri, Jul 01, 2016 at 04:47:46PM +0200, Patrice Chotard wrote:
> Hi Olof, Arnd and Kevin,
> 
> Please consider this first round of STi SoC updates for v4.8:
> 
> The following changes since commit 4c2e07c6a29e0129e975727b9f57eede813eea85:
> 
>   Linux 4.7-rc5 (2016-06-26 17:52:03 -0700)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pchotard/sti.git
> sti-soc_for_v4.8

Again, that's a branch and not a tag.

> for you to fetch changes up to 55aa35180c57d82f3db23e5aabce97acb0d36681:
> 
>   ARM: sti: Implement dummy L2 cache's write_sec (2016-07-01 16:23:44 +0200)
> 
> 
> Highlights:
> ---
> _ add a dummy L2 cache's write_sec callback as in non secure mode execution,
>   we can't get access to L2 cache secure registers
> _ cosmetics change, in case of dump_stack, update the hardware name with a
>   more genericfor the STi SoCs family

Minor nit: This is a somewhat odd format to write a list in. Please use
'-' or '*' instead, and feel free to use capital letters, etc. :)


-Olof


Re: [PATCH -v3.2 1/2] ratelimit: Extend to print suppressed messages on release

2016-07-06 Thread Borislav Petkov
On Wed, Jul 06, 2016 at 09:17:52PM -0400, Steven Rostedt wrote:
> Hmm, should this clear the missed flag? Especially since it isn't
> cleared below.

The expectation is that after you call exit on something, you don't need
it anymore. But I know exactly why you're asking for this so I'll do the
change. :-)

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.


Re: [PATCH 1/9] mm: Hardened usercopy

2016-07-06 Thread Baruch Siach
Hi Kees,

On Wed, Jul 06, 2016 at 03:25:20PM -0700, Kees Cook wrote:
> +#ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR

Should be CONFIG_HARDENED_USERCOPY to match the slab/slub implementation 
condition.

> +const char *__check_heap_object(const void *ptr, unsigned long n,
> + struct page *page);
> +#else
> +static inline const char *__check_heap_object(const void *ptr,
> +   unsigned long n,
> +   struct page *page)
> +{
> + return NULL;
> +}
> +#endif

baruch

-- 
 http://baruch.siach.name/blog/  ~. .~   Tk Open Systems
=}ooO--U--Ooo{=
   - bar...@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -


[PATCH v14 6/8] perf tools: Enable overwrite settings

2016-07-06 Thread Wang Nan
This patch allows following config terms and option:

Globally setting events to overwrite;

 # perf record --overwrite ...

Set specific events to be overwrite or no-overwrite.

 # perf record --event cycles/overwrite/ ...
 # perf record --event cycles/no-overwrite/ ...

Add missing config terms and update config term array size because the
longest string length is changed.

For overwritable events, automatically select attr.write_backward since
perf requires it to be backward for reading.

Test result:
 # perf record --overwrite -e syscalls:*enter_nanosleep* usleep 1
 [ perf record: Woken up 2 times to write data ]
 [ perf record: Captured and wrote 0.011 MB perf.data (1 samples) ]
 # perf evlist -v
 syscalls:sys_enter_nanosleep: type: 2, size: 112, config: 0x134, { 
sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW, 
disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, 
sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, write_backward: 1
 # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events

Signed-off-by: Wang Nan 
Signed-off-by: He Kuang 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: Nilay Vaish 
Cc: pi3or...@163.com
---
 tools/perf/Documentation/perf-record.txt | 14 ++
 tools/perf/builtin-record.c  |  1 +
 tools/perf/perf.h|  1 +
 tools/perf/tests/backward-ring-buffer.c  | 14 ++
 tools/perf/util/evsel.c  |  4 
 tools/perf/util/evsel.h  |  2 ++
 tools/perf/util/parse-events.c   | 20 ++--
 tools/perf/util/parse-events.h   |  2 ++
 tools/perf/util/parse-events.l   |  2 ++
 9 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index 5b46b1d..384c630 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -367,6 +367,20 @@ options.
 'perf record --dry-run -e' can act as a BPF script compiler if llvm.dump-obj
 in config file is set to true.
 
+--overwrite::
+Makes all events use an overwritable ring buffer. An overwritable ring
+buffer works like a flight recorder: when it gets full, the kernel will
+overwrite the oldest records, that thus will never make it to the
+perf.data file.
+
+When '--overwrite' and '--switch-output' are used perf records and drops
+events until it receives a signal, meaning that something unusual was
+detected that warrants taking a snapshot of the most current events,
+those fitting in the ring buffer at that moment.
+
+'overwrite' attribute can also be set or canceled for an event using
+config terms. For example: 'cycles/overwrite/' and 
'instructions/no-overwrite/'.
+
 SEE ALSO
 
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 3a83472..8d4c6bb 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1561,6 +1561,7 @@ struct option __record_options[] = {
OPT_BOOLEAN_SET('i', "no-inherit", &record.opts.no_inherit,
&record.opts.no_inherit_set,
"child tasks do not inherit counters"),
+   OPT_BOOLEAN(0, "overwrite", &record.opts.overwrite, "use overwrite 
mode"),
OPT_UINTEGER('F', "freq", &record.opts.user_freq, "profile at this 
frequency"),
OPT_CALLBACK('m', "mmap-pages", &record.opts, "pages[,pages]",
 "number of mmap data pages and AUX area tracing mmap 
pages",
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index cd8f1b1..608b42b 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -59,6 +59,7 @@ struct record_opts {
bool record_switch_events;
bool all_kernel;
bool all_user;
+   bool overwrite;
unsigned int freq;
unsigned int mmap_pages;
unsigned int auxtrace_mmap_pages;
diff --git a/tools/perf/tests/backward-ring-buffer.c 
b/tools/perf/tests/backward-ring-buffer.c
index db7393c..c0618c7 100644
--- a/tools/perf/tests/backward-ring-buffer.c
+++ b/tools/perf/tests/backward-ring-buffer.c
@@ -132,26 +132,24 @@ int test__backward_ring_buffer(int subtest __maybe_unused)
}
 
bzero(&parse_error, sizeof(parse_error));
-   err = parse_events(evlist, "syscalls:sys_enter_prctl", &parse_error);
+   /*
+* Set backward bit, ring buffer should be writing from end. Record
+* it in aux evlist
+*/
+   err = parse_events(evlist, "syscalls:sys_enter_prctl/overwrite/", 
&parse_error);
if (err) {
pr_debug("Failed to parse tracepoint event, try use root\n");
ret = TEST_SKIP;
goto out_delete_evlist;
}
 
-   /*
-* Set backward bit, ring buffer should be writing from end. Record
-  

[PATCH v14 1/8] perf tools: Drop redundant evsel->overwrite indicator

2016-07-06 Thread Wang Nan
From: Arnaldo Carvalho de Melo 

evsel->overwrite indicator means an event should be put into
overwritable ring buffer. In current implementation, it equals to
evsel->attr.write_backward. To reduce compliexity, remove
evsel->overwrite, use evsel->attr.write_backward instead.

In addition, in __perf_evsel__open(), if kernel doesn't support
write_backward and user explicitly set it in evsel, don't fallback
like other missing feature, since it is meaningless to fall back to
a forward ring buffer in this case: we are unable to stably read
from an forward overwritable ring buffer.

Signed-off-by: Arnaldo Carvalho de Melo 
Signed-off-by: Wang Nan 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: Nilay Vaish 
Cc: pi3or...@163.com
---
 tools/perf/tests/backward-ring-buffer.c |  1 +
 tools/perf/util/evlist.c|  4 ++--
 tools/perf/util/evsel.c | 12 +---
 tools/perf/util/evsel.h |  1 -
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/tools/perf/tests/backward-ring-buffer.c 
b/tools/perf/tests/backward-ring-buffer.c
index e70313f..1750ef2 100644
--- a/tools/perf/tests/backward-ring-buffer.c
+++ b/tools/perf/tests/backward-ring-buffer.c
@@ -101,6 +101,7 @@ int test__backward_ring_buffer(int subtest __maybe_unused)
return TEST_FAIL;
}
 
+   evlist->backward = true;
err = perf_evlist__create_maps(evlist, &opts.target);
if (err < 0) {
pr_debug("Not enough memory to create thread/cpu maps\n");
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 1135077..7228596 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1003,7 +1003,7 @@ static bool
 perf_evlist__should_poll(struct perf_evlist *evlist __maybe_unused,
 struct perf_evsel *evsel)
 {
-   if (evsel->overwrite)
+   if (evsel->attr.write_backward)
return false;
return true;
 }
@@ -1018,7 +1018,7 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist 
*evlist, int idx,
evlist__for_each_entry(evlist, evsel) {
int fd;
 
-   if (evsel->overwrite != (evlist->overwrite && evlist->backward))
+   if (!!evsel->attr.write_backward != (evlist->overwrite && 
evlist->backward))
continue;
 
if (evsel->system_wide && thread)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 0fea724..3abe519 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1359,6 +1359,9 @@ static int __perf_evsel__open(struct perf_evsel *evsel, 
struct cpu_map *cpus,
int pid = -1, err;
enum { NO_CHANGE, SET_TO_MAX, INCREASED_MAX } set_rlimit = NO_CHANGE;
 
+   if (perf_missing_features.write_backward && evsel->attr.write_backward)
+   return -EINVAL;
+
if (evsel->system_wide)
nthreads = 1;
else
@@ -1389,11 +1392,6 @@ fallback_missing_features:
if (perf_missing_features.lbr_flags)
evsel->attr.branch_sample_type &= ~(PERF_SAMPLE_BRANCH_NO_FLAGS 
|
 PERF_SAMPLE_BRANCH_NO_CYCLES);
-   if (perf_missing_features.write_backward) {
-   if (evsel->overwrite)
-   return -EINVAL;
-   evsel->attr.write_backward = false;
-   }
 retry_sample_id:
if (perf_missing_features.sample_id_all)
evsel->attr.sample_id_all = 0;
@@ -1495,7 +1493,7 @@ try_fallback:
 */
if (!perf_missing_features.write_backward && 
evsel->attr.write_backward) {
perf_missing_features.write_backward = true;
-   goto fallback_missing_features;
+   goto out_close;
} else if (!perf_missing_features.clockid_wrong && 
evsel->attr.use_clockid) {
perf_missing_features.clockid_wrong = true;
goto fallback_missing_features;
@@ -2404,7 +2402,7 @@ int perf_evsel__open_strerror(struct perf_evsel *evsel, 
struct target *target,
"We found oprofile daemon running, please stop it and try again.");
break;
case EINVAL:
-   if (evsel->overwrite && perf_missing_features.write_backward)
+   if (evsel->attr.write_backward && 
perf_missing_features.write_backward)
return scnprintf(msg, size, "Reading from overwrite 
event is not supported by this kernel.");
if (perf_missing_features.clockid)
return scnprintf(msg, size, "clockid feature not 
supported.");
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 86fed7a..a31ee2d 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -114,7 +114,6 @@ struct perf_evsel {
booltracking;
boolper_pkg;
boolprecise_max;
-   bool

[PATCH v14 5/8] perf record: Read from overwritable ring buffer

2016-07-06 Thread Wang Nan
overwrite_evt_state is introduced to reflect the state of overwritable
ring buffers. It is a state machine with following states:

   .(forbid)_.
   | V
 NOTREADY -(0)-> RUNNING --(1)--> DATA_PENDING --(2)--> EMPTY
   ^  ^  |   ^   |
   |  |__(forbid)/   |___(forbid)___/|
   | |
\_(3)___/

 NOTREADY : Overwritable evlist is not ready
 RUNNING  : Overwritable ring buffers are recording
 DATA_PENDING : We are required to collect overwritable ring buffers
 EMPTY: We have collected data from those ring buffers.

 (0): Create overwritable evlist
 (1): Pause ring buffers for reading
 (2): Read from ring buffers
 (3): Resume ring buffers for recording

We can't avoid this complexity. Since we deliberately drop records from
overwritable ring buffer, there's no way for us to check remaining from
ring buffer itself (by checking head and old pointers). Therefore, we
need DATA_PENDING and EMPTY state to help us recording what we have done
to the ring buffer.

With the above state machine, this patch improves record__mmap_read_all(),
read from overwritable ring buffer when DATA_PENDING state is observed.

Since the above state machine governs overwritable ring buffer only, if
there's no such ring buffer, it should stop.

Signed-off-by: Wang Nan 
Signed-off-by: He Kuang 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: Nilay Vaish 
Cc: pi3or...@163.com
---
 tools/perf/builtin-record.c | 158 +++-
 1 file changed, 157 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 3b62295..3a83472 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -43,6 +43,33 @@
 #include 
 #include 
 
+/*
+ * State machine of overwrite_evt_state:
+ *
+ *   .(forbid)_.
+ *   | V
+ * NOTREADY -(0)-> RUNNING --(1)--> DATA_PENDING --(2)--> EMPTY
+ *   ^  ^  |   ^   |
+ *   |  |__(forbid)/   |___(forbid)___/|
+ *   | |
+ *\_(3)___/
+ *
+ * NOTREADY : Overwritable evlist is not ready
+ * RUNNING  : Overwritable ring buffers are recording
+ * DATA_PENDING : We are required to collect overwritable ring buffers
+ * EMPTY: We have collected data from those ring buffers.
+ *
+ * (0): Create overwritable evlist
+ * (1): Pause ring buffers for reading
+ * (2): Read from ring buffers
+ * (3): Resume ring buffers for recording
+ */
+enum overwrite_evt_state {
+   OVERWRITE_EVT_NOTREADY,
+   OVERWRITE_EVT_RUNNING,
+   OVERWRITE_EVT_DATA_PENDING,
+   OVERWRITE_EVT_EMPTY,
+};
 
 struct record {
struct perf_tooltool;
@@ -62,6 +89,7 @@ struct record {
boolbuildid_all;
booltimestamp_filename;
boolswitch_output;
+   enum overwrite_evt_state overwrite_evt_state;
unsigned long long  samples;
 };
 
@@ -343,6 +371,96 @@ int auxtrace_record__snapshot_start(struct auxtrace_record 
*itr __maybe_unused)
 
 #endif
 
+static void
+record__toggle_overwrite_evlist(struct record *rec,
+   enum overwrite_evt_state state)
+{
+   struct perf_evlist *evlist = rec->overwrite_evlist;
+   enum overwrite_evt_state old_state = rec->overwrite_evt_state;
+   enum action {
+   NONE,
+   PAUSE,
+   RESUME,
+   } action = NONE;
+
+   if (!evlist)
+   return;
+
+   switch (old_state) {
+   case OVERWRITE_EVT_RUNNING: {
+   switch (state) {
+   case OVERWRITE_EVT_DATA_PENDING:
+   action = PAUSE;
+   break;
+   case OVERWRITE_EVT_RUNNING:
+   case OVERWRITE_EVT_EMPTY:
+   case OVERWRITE_EVT_NOTREADY:
+   default:
+   goto state_err;
+   }
+   break;
+   }
+   case OVERWRITE_EVT_DATA_PENDING: {
+   switch (state) {
+   case OVERWRITE_EVT_EMPTY:
+   break;
+   case OVERWRITE_EVT_RUNNING:
+   case OVERWRITE_EVT_DATA_PENDING:
+   case OVERWRITE_EVT_NOTREADY:
+   default:
+   goto state_err;
+   }
+   break;
+   }
+   case OVERWRITE_EVT_EMPTY: {
+   switch (state) {
+   case OVERWRITE_EVT_RUNNING:
+   action = RESUME;
+ 

Re: [dm-devel] [RFC] block: fix blk_queue_split() resource exhaustion

2016-07-06 Thread NeilBrown
On Wed, Jun 22 2016, Lars Ellenberg wrote:

> For a long time, generic_make_request() converts recursion into
> iteration by queuing recursive arguments on current->bio_list.
>
> This is convenient for stacking drivers,
> the top-most driver would take the originally submitted bio,
> and re-submit a re-mapped version of it, or one or more clones,
> or one or more new allocated bios to its backend(s). Which
> are then simply processed in turn, and each can again queue
> more "backend-bios" until we reach the bottom of the driver stack,
> and actually dispatch to the real backend device.
>
> Any stacking driver ->make_request_fn() could expect that,
> once it returns, any backend-bios it submitted via recursive calls
> to generic_make_request() would now be processed and dispatched, before
> the current task would call into this driver again.
>
> This is changed by commit
>   54efd50 block: make generic_make_request handle arbitrarily sized bios
>
> Drivers may call blk_queue_split() inside their ->make_request_fn(),
> which may split the current bio into a front-part to be dealt with
> immediately, and a remainder-part, which may need to be split even
> further. That remainder-part will simply also be pushed to
> current->bio_list, and would end up being head-of-queue, in front
> of any backend-bios the current make_request_fn() might submit during
> processing of the fron-part.
>
> Which means the current task would immediately end up back in the same
> make_request_fn() of the same driver again, before any of its backend
> bios have even been processed.
>
> This can lead to resource starvation deadlock.
> Drivers could avoid this by learning to not need blk_queue_split(),
> or by submitting their backend bios in a different context (dedicated
> kernel thread, work_queue context, ...). Or by playing funny re-ordering
> games with entries on current->bio_list.
>
> Instead, I suggest to distinguish between recursive calls to
> generic_make_request(), and pushing back the remainder part in
> blk_queue_split(), by pointing current->bio_lists to a
>   struct recursion_to_iteration_bio_lists {
>   struct bio_list recursion;
>   struct bio_list remainder;
>   }
>
> To have all bios targeted to drivers lower in the stack processed before
> processing the next piece of a bio targeted at the higher levels,
> as long as queued bios resulting from recursion are available,
> they will continue to be processed in FIFO order.
> Pushed back bio-parts resulting from blk_queue_split() will be processed
> in LIFO order, one-by-one, whenever the recursion list becomes empty.

I really like this change.  It seems to precisely address the problem.
The "problem" being that requests for "this" device are potentially
mixed up with requests from underlying devices.
However I'm not sure it is quite general enough.

The "remainder" list is a stack of requests aimed at "this" level or
higher, and I think it will always exactly fit that description.
The "recursion" list needs to be a queue of requests aimed at the next
level down, and that doesn't quiet work, because once you start acting
on the first entry in that list, all the rest become "this" level.

I think you can address this by always calling ->make_request_fn with an
empty "recursion", then after the call completes, splice the "recursion"
list that resulted (if any) on top of the "remainder" stack.

This way, the "remainder" stack is always "requests for lower-level
devices before request for upper level devices" and the "recursion"
queue is always "requests for devices below the current level".

I also really *don't* like the idea of punting to a separate thread - it
seems to be just delaying the problem.

Can you try move the bio_list_init(->recursion) call to just before
the ->make_request_fn() call, and adding
bio_list_merge_head(->remainder, ->recursion)
just after?
(or something like that) and confirm it makes sense, and works?

Thanks!

NeilBrown


signature.asc
Description: PGP signature


[PATCH v14 8/8] perf tools: Add --tail-synthesize option

2016-07-06 Thread Wang Nan
When working with overwritable ring buffer there's a inconvenience
problem: if perf dumps data after a long period after it starts, non-sample
events may lost, which makes following 'perf report' unable to identify
proc name and mmap layout. For example:

 # perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \
dd if=/dev/zero of=/dev/null

send SIGUSR2 after dd runs long enough. The resuling perf.data lost
correct comm and mmap events:

 # perf script -i perf.data.2016061522374354
 perf 24478 [004] 2581325.601789:  raw_syscalls:sys_exit: NR 0 = 512
 
 Should be 'dd'
   27b2e8 syscall_slow_exit_work+0xfe2000e3 
(/lib/modules/4.6.0-rc3+/build/vmlinux)
   203cc7 do_syscall_64+0xfe200117 
(/lib/modules/4.6.0-rc3+/build/vmlinux)
   b18d83 return_from_SYSCALL_64+0xfe20 
(/lib/modules/4.6.0-rc3+/build/vmlinux)
 7f47c417edf0 [unknown] ([unknown])
 
 Fail to unwind

This patch provides a '--tail-synthesize' option, allows perf to collect
system status when finalizing output file. In resuling output file, the
non-sample events reflect system status when dumping data.

After this patch:
 # perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output 
--tail-synthesize \
dd if=/dev/zero of=/dev/null

 # perf script -i perf.data.2016061600544998
 dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ...
 ^^
 Correct comm
   203a18 syscall_trace_enter_phase2+0xfe2001a8 
([kernel.kallsyms])
   203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms])
   203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms])
   b18d83 return_from_SYSCALL_64+0xfe20 ([kernel.kallsyms])
d8e50 __GI___libc_write+0x01d9639f4010 
(/tmp/oxygen_root-w00229757/lib64/libc-2.18.so)
^
Correct unwind

This option doesn't aim to solve this problem completely. If a process
terminates before SIGUSR2, we still lost its COMM and MMAP events. For
example, we can't unwind correctly from the final perf.data we get from
the previous example, because when perf collects the final output file
(when we press C-c), 'dd' has been terminated so its
'/proc//mmap' becomes empty. However, this is a cheaper choice. To
completely solve this problem we need to continously output non-sample
events. To satisify the requirement of daemonization, we need to merge
them periodically. It is possible but requires much more code and cycles.

Automatically select --tail-synthesize when --overwrite is provided.

Signed-off-by: Wang Nan 
Cc: He Kuang 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: Nilay Vaish 
Cc: pi3or...@163.com
---
 tools/perf/Documentation/perf-record.txt |  8 
 tools/perf/builtin-record.c  | 31 +--
 tools/perf/perf.h|  1 +
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index 384c630..69966ab 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -367,6 +367,12 @@ options.
 'perf record --dry-run -e' can act as a BPF script compiler if llvm.dump-obj
 in config file is set to true.
 
+--tail-synthesize::
+Instead of collecting non-sample events (for example, fork, comm, mmap) at
+the beginning of record, collect them during finalizing an output file.
+The collected non-sample events reflects the status of the system when
+record is finished.
+
 --overwrite::
 Makes all events use an overwritable ring buffer. An overwritable ring
 buffer works like a flight recorder: when it gets full, the kernel will
@@ -381,6 +387,8 @@ those fitting in the ring buffer at that moment.
 'overwrite' attribute can also be set or canceled for an event using
 config terms. For example: 'cycles/overwrite/' and 
'instructions/no-overwrite/'.
 
+Implies --tail-synthesize.
+
 SEE ALSO
 
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8d4c6bb..65e4f40 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -763,13 +763,16 @@ record__finish_output(struct record *rec)
return;
 }
 
-static int record__synthesize_workload(struct record *rec)
+static int record__synthesize_workload(struct record *rec, bool tail)
 {
struct {
struct thread_map map;
struct thread_map_data map_data;
} thread_map;
 
+   if (rec->opts.tail_synthesize != tail)
+   return 0;
+
thread_map.map.nr = 1;
thread_map.map.map[0].pid = rec->evlist->workload.pid;
thread_map.map.map[0].comm = NULL;
@@ -780,7 +783,7 @@ static int record__synthesize_workload(struct record *rec)
   

[PATCH v14 3/8] perf tests: Add testcase for auxiliary evlist

2016-07-06 Thread Wang Nan
Improve test backward-ring-buffer, trace both enter and exit event of
prctl() syscall, utilize auxiliary evlist to mmap enter and exit event
into separated mmaps.

Signed-off-by: Wang Nan 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: Nilay Vaish 
Cc: He Kuang 
---
 tools/perf/tests/backward-ring-buffer.c | 85 ++---
 tools/perf/util/evlist.h|  8 
 2 files changed, 75 insertions(+), 18 deletions(-)

diff --git a/tools/perf/tests/backward-ring-buffer.c 
b/tools/perf/tests/backward-ring-buffer.c
index 1750ef2..db7393c 100644
--- a/tools/perf/tests/backward-ring-buffer.c
+++ b/tools/perf/tests/backward-ring-buffer.c
@@ -31,7 +31,11 @@ static int count_samples(struct perf_evlist *evlist, int 
*sample_count,
for (i = 0; i < evlist->nr_mmaps; i++) {
union perf_event *event;
 
-   perf_evlist__mmap_read_catchup(evlist, i);
+   /*
+* Before calling count_samples(), ring buffers in backward
+* evlist should have catched up with newest record
+* using perf_evlist__mmap_read_catchup_all().
+*/
while ((event = perf_evlist__mmap_read_backward(evlist, i)) != 
NULL) {
const u32 type = event->header.type;
 
@@ -51,34 +55,54 @@ static int count_samples(struct perf_evlist *evlist, int 
*sample_count,
return TEST_OK;
 }
 
-static int do_test(struct perf_evlist *evlist, int mmap_pages,
-  int *sample_count, int *comm_count)
+static int do_test(struct perf_evlist *evlist,
+  struct perf_evlist *aux_evlist,
+  int mmap_pages,
+  int *enter_sample_count,
+  int *exit_sample_count,
+  int *comm_count)
 {
-   int err;
+   int err, dummy;
char sbuf[STRERR_BUFSIZE];
 
-   err = perf_evlist__mmap(evlist, mmap_pages, true);
+   err = perf_evlist__mmap(evlist, mmap_pages, false);
if (err < 0) {
pr_debug("perf_evlist__mmap: %s\n",
 strerror_r(errno, sbuf, sizeof(sbuf)));
return TEST_FAIL;
}
 
+   err = perf_evlist__mmap(aux_evlist, mmap_pages, true);
+   if (err < 0) {
+   pr_debug("perf_evlist__mmap for aux_evlist: %s\n",
+strerror_r(errno, sbuf, sizeof(sbuf)));
+   return TEST_FAIL;
+   }
+
perf_evlist__enable(evlist);
testcase();
perf_evlist__disable(evlist);
 
-   err = count_samples(evlist, sample_count, comm_count);
+   perf_evlist__mmap_read_catchup_all(aux_evlist);
+   err = count_samples(aux_evlist, exit_sample_count, comm_count);
+   if (err)
+   goto errout;
+   err = count_samples(evlist, enter_sample_count, &dummy);
+   if (err)
+   goto errout;
+errout:
perf_evlist__munmap(evlist);
+   perf_evlist__munmap(aux_evlist);
return err;
 }
 
 
 int test__backward_ring_buffer(int subtest __maybe_unused)
 {
-   int ret = TEST_SKIP, err, sample_count = 0, comm_count = 0;
+   int ret = TEST_SKIP, err, dummy;
+   int enter_sample_count = 0, exit_sample_count = 0, comm_count = 0;
char pid[16], sbuf[STRERR_BUFSIZE];
-   struct perf_evlist *evlist;
+   struct perf_evlist *evlist, *aux_evlist = NULL;
struct perf_evsel *evsel __maybe_unused;
struct parse_events_error parse_error;
struct record_opts opts = {
@@ -101,7 +125,6 @@ int test__backward_ring_buffer(int subtest __maybe_unused)
return TEST_FAIL;
}
 
-   evlist->backward = true;
err = perf_evlist__create_maps(evlist, &opts.target);
if (err < 0) {
pr_debug("Not enough memory to create thread/cpu maps\n");
@@ -116,11 +139,21 @@ int test__backward_ring_buffer(int subtest __maybe_unused)
goto out_delete_evlist;
}
 
-   perf_evlist__config(evlist, &opts, NULL);
+   /*
+* Set backward bit, ring buffer should be writing from end. Record
+* it in aux evlist
+*/
+   perf_evlist__last(evlist)->attr.write_backward = 1;
 
-   /* Set backward bit, ring buffer should be writing from end */
-   evlist__for_each_entry(evlist, evsel)
-   evsel->attr.write_backward = 1;
+   err = parse_events(evlist, "syscalls:sys_exit_prctl", &parse_error);
+   if (err) {
+   pr_debug("Failed to parse tracepoint event, try use root\n");
+   ret = TEST_SKIP;
+   goto out_delete_evlist;
+   }
+   /* Don't set backward bit for exit event. Record it in main evlist */
+
+   perf_evlist__config(evlist, &opts, NULL);
 
err = perf_evlist__open(evlist);
if (err < 0) {
@@ -129,24 +162,40 @@ int test__backward_ring_buffer(int subtest __maybe_unused)
goto ou

[PATCH v14 4/8] perf record: Introduce rec->overwrite_evlist for overwritable events

2016-07-06 Thread Wang Nan
Create an auxiliary evlist for overwritable events.

Before mmap, build this evlist and set 'overwrite' and 'backward'
attribute. Since perf_evlist__mmap_ex() only maps events when
evsel->overwrite matches evlist's corresponding attributes, with
these two evlists an event goes to either rec->evlist or
rec->overwrite_evlist.

Signed-off-by: Wang Nan 
Cc: He Kuang 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: Nilay Vaish 
Cc: pi3or...@163.com
---
 tools/perf/builtin-record.c | 59 ++---
 1 file changed, 56 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index b2b3b60..3b62295 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -51,6 +51,7 @@ struct record {
struct perf_data_file   file;
struct auxtrace_record  *itr;
struct perf_evlist  *evlist;
+   struct perf_evlist  *overwrite_evlist;
struct perf_session *session;
const char  *progname;
int realtime_prio;
@@ -342,13 +343,41 @@ int auxtrace_record__snapshot_start(struct 
auxtrace_record *itr __maybe_unused)
 
 #endif
 
+static int record__create_overwrite_evlist(struct record *rec)
+{
+   struct perf_evlist *evlist = rec->evlist;
+   struct perf_evsel *pos;
+
+   evlist__for_each_entry(evlist, pos) {
+   if (!pos->attr.write_backward)
+   continue;
+
+   if (!rec->overwrite_evlist) {
+   rec->overwrite_evlist = perf_evlist__new_aux(evlist);
+   if (rec->overwrite_evlist) {
+   rec->overwrite_evlist->backward = true;
+   rec->overwrite_evlist->overwrite = true;
+   return 0;
+   } else
+   return -ENOMEM;
+   }
+   }
+   return 0;
+}
+
 static int record__mmap_evlist(struct record *rec,
-  struct perf_evlist *evlist)
+  struct perf_evlist *evlist,
+  bool overwrite)
 {
struct record_opts *opts = &rec->opts;
char msg[512];
 
-   if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, false,
+   /*
+* Don't use evlist->overwrite because it is logically an
+* internal attribute and is set by perf_evlist__mmap_ex().
+* Avoid circular dependency.
+*/
+   if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, overwrite,
 opts->auxtrace_mmap_pages,
 opts->auxtrace_snapshot_mode) < 0) {
if (errno == EPERM) {
@@ -373,7 +402,23 @@ static int record__mmap_evlist(struct record *rec,
 
 static int record__mmap(struct record *rec)
 {
-   return record__mmap_evlist(rec, rec->evlist);
+   int err;
+
+   err = record__create_overwrite_evlist(rec);
+   if (err)
+   return err;
+
+   err = record__mmap_evlist(rec, rec->evlist, false);
+   if (err)
+   return err;
+
+   if (!rec->overwrite_evlist)
+   return 0;
+
+   err = record__mmap_evlist(rec, rec->overwrite_evlist, true);
+   if (err)
+   return err;
+   return 0;
 }
 
 static int record__open(struct record *rec)
@@ -698,9 +743,14 @@ static const struct perf_event_mmap_page 
*record__pick_pc(struct record *rec)
 {
const struct perf_event_mmap_page *pc;
 
+   /* Change it to a loop if a new aux evlist is added */
pc = perf_evlist__pick_pc(rec->evlist);
if (pc)
return pc;
+   pc = perf_evlist__pick_pc(rec->overwrite_evlist);
+   if (pc)
+   return pc;
+
return NULL;
 }
 
@@ -1311,6 +1361,7 @@ static struct record record = {
.mmap2  = perf_event__process_mmap2,
.ordered_events = true,
},
+   .overwrite_evlist = NULL,
 };
 
 const char record_callchain_help[] = CALLCHAIN_RECORD_HELP
@@ -1614,6 +1665,8 @@ int cmd_record(int argc, const char **argv, const char 
*prefix __maybe_unused)
err = __cmd_record(&record, argc, argv);
 out_symbol_exit:
perf_evlist__delete(rec->evlist);
+   if (rec->overwrite_evlist)
+   perf_evlist__delete(rec->overwrite_evlist);
symbol__exit();
auxtrace_record__free(rec->itr);
return err;
-- 
1.8.3.4



[PATCH v14 0/8] perf tools: Support overwritable ring buffer

2016-07-06 Thread Wang Nan
This patch set enables daemonized perf recording by utilizing
overwritable backward ring buffer. With this feature one can
put perf background, and dump ring buffer records by a SIGUSR2
when he/she find something unusual. For example, following
command record system calls, schedule events and samples on cpu cycles
continously:

 # perf record -g -e cycles -e raw_syscalls:*/call-graph=no/ \
  -e sched:sched_switch/call-graph=no/ \
  --switch-output --overwrite -a

Then by sending SIGUSR2 to perf when lagging is happen, we get multiple
perf.data output, each of them correspond a abnormal event, and the data
size is reasonable:

 # ls -l ./perf.data*
 -rw--- 1 root root 5122165 May 13 23:51 ./perf.data.2016051323511683
 -rw--- 1 root root 5135093 May 13 23:51 ./perf.data.2016051323512107
 -rw--- 1 root root 5135213 May 13 23:51 ./perf.data.2016051323512215
 -rw--- 1 root root 5135157 May 13 23:51 ./perf.data.2016051323512387

v1 -> v2: Totally redesign: drop the principle of 'channal', use
  auxiliary evlist instead. Fix missing documentation.

v2 -> v3: Rename perf_evlist__toggle_paused() to perf_evlist__pause/resume.

v3 -> v4: Update commit message to describe auxiliary evlist more clearly.

v4 -> v5: Reorder commits, ensure '--overwrite' works right after perf
  support the option.
  Add test cases for auxiliary evlist.
  Avoid bug if main evlist is empty.

v5 -> v6: Improve filter pollfd related code.

v6 -> v7: Rebase to newest perf/core.

v7 -> v8: Unmap mmaps from parent and children in
  perf_evlist__munmap_filtered(), hide more detail of aux evlist.
  Add --tail-synthesize, do synthesize at the end of perf.data.

v8 -> v9: Beautify code of test case, make patch set more granular,
  improve documentation.

v9 -> v10: Make patch set more granular: extract preparation code to
   patch 1-3.

v10 -> v11: Rebase to newest perf/core: solve conflicts caused by commit
e5cadb93d08 ("perf evlist: Rename for_each() macros to
for_each_entry()").

v11 -> v12: Improve 'perf test backward': skip this test on old kernel,
resolve conflicts.

v12 -> v13: Drop evsel->overwrite, use evsel->attr.write_backward instead.

v13 -> v14: Follow Jiri Olsa's suggestion: Improve commit message,
add OVERWRITE_EVT_NOTREADY state, stop the state machine if
overwrite_evlist is not generated.

Arnaldo Carvalho de Melo (1):
  perf tools: Drop redundant evsel->overwrite indicator

Wang Nan (7):
  perf evlist: Introduce aux evlist
  perf tests: Add testcase for auxiliary evlist
  perf record: Introduce rec->overwrite_evlist for overwritable events
  perf record: Read from overwritable ring buffer
  perf tools: Enable overwrite settings
  perf tools: Don't warn about out of order event if write_backward is
used
  perf tools: Add --tail-synthesize option

 tools/perf/Documentation/perf-record.txt |  22 +++
 tools/perf/builtin-record.c  | 249 +--
 tools/perf/perf.h|   2 +
 tools/perf/tests/backward-ring-buffer.c  |  84 ---
 tools/perf/util/evlist.c |  53 +--
 tools/perf/util/evlist.h |  20 +++
 tools/perf/util/evsel.c  |  16 +-
 tools/perf/util/evsel.h  |   3 +-
 tools/perf/util/parse-events.c   |  20 ++-
 tools/perf/util/parse-events.h   |   2 +
 tools/perf/util/parse-events.l   |   2 +
 tools/perf/util/session.c|  22 ++-
 12 files changed, 441 insertions(+), 54 deletions(-)

-- 
1.8.3.4



[PATCH v14 2/8] perf evlist: Introduce aux evlist

2016-07-06 Thread Wang Nan
An auxiliary evlist is created by perf_evlist__new_aux() using an
existing evlist as its parent. An auxiliary evlist can have its own
'struct perf_mmap', but can't have any other data. User should use its
parent instead when accessing other data.

Auxiliary evlists are containers of 'struct perf_mmap'. It is introduced
to allow its parent evlist to map different events into separated mmaps.

Following commits create an auxiliary evlist for overwritable
events, because overwritable events need a read only and backwards ring
buffer, which is different from normal events.

To achieve this goal, this patch carefully changes 'evlist' to
'evlist->parent' in all functions in the path of 'perf_evlist__mmap_ex',
except 'evlist->mmap' related operations, to make sure all evlist
modifications (like pollfd and event id hash tables) goes to original
evlist.

A 'evlist->parent' pointer is added to 'struct perf_evlist' and points to
the evlist itself for normal evlists.

Children of one evlist are linked into it so one can find all children
from its parent.

To avoid potential complexity, forbid creating aux evlist from another
aux evlist.

Improve perf_evlist__munmap_filtered(), so when recording, if an event
is terminated, unmap mmaps, from parent and children.

Signed-off-by: Wang Nan 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: Nilay Vaish 
Cc: pi3or...@163.com
---
 tools/perf/util/evlist.c | 49 +---
 tools/perf/util/evlist.h | 12 
 2 files changed, 50 insertions(+), 11 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 7228596..7000fe2 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -41,10 +41,12 @@ void perf_evlist__init(struct perf_evlist *evlist, struct 
cpu_map *cpus,
for (i = 0; i < PERF_EVLIST__HLIST_SIZE; ++i)
INIT_HLIST_HEAD(&evlist->heads[i]);
INIT_LIST_HEAD(&evlist->entries);
+   INIT_LIST_HEAD(&evlist->children);
perf_evlist__set_maps(evlist, cpus, threads);
fdarray__init(&evlist->pollfd, 64);
evlist->workload.pid = -1;
evlist->backward = false;
+   evlist->parent = evlist;
 }
 
 struct perf_evlist *perf_evlist__new(void)
@@ -490,13 +492,17 @@ static void perf_evlist__munmap_filtered(struct fdarray 
*fda, int fd,
 void *arg __maybe_unused)
 {
struct perf_evlist *evlist = container_of(fda, struct perf_evlist, 
pollfd);
+   struct perf_evlist *child;
 
perf_evlist__mmap_put(evlist, fda->priv[fd].idx);
+   list_for_each_entry(child, &evlist->children, list)
+   perf_evlist__mmap_put(child, fda->priv[fd].idx);
+
 }
 
 int perf_evlist__filter_pollfd(struct perf_evlist *evlist, short 
revents_and_mask)
 {
-   return fdarray__filter(&evlist->pollfd, revents_and_mask,
+   return fdarray__filter(&evlist->parent->pollfd, revents_and_mask,
   perf_evlist__munmap_filtered, NULL);
 }
 
@@ -1015,7 +1021,7 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist 
*evlist, int idx,
struct perf_evsel *evsel;
int revent;
 
-   evlist__for_each_entry(evlist, evsel) {
+   evlist__for_each_entry(evlist->parent, evsel) {
int fd;
 
if (!!evsel->attr.write_backward != (evlist->overwrite && 
evlist->backward))
@@ -1047,16 +1053,16 @@ static int perf_evlist__mmap_per_evsel(struct 
perf_evlist *evlist, int idx,
 * Therefore don't add it for polling.
 */
if (!evsel->system_wide &&
-   __perf_evlist__add_pollfd(evlist, fd, idx, revent) < 0) {
+   __perf_evlist__add_pollfd(evlist->parent, fd, idx, revent) 
< 0) {
perf_evlist__mmap_put(evlist, idx);
return -1;
}
 
if (evsel->attr.read_format & PERF_FORMAT_ID) {
-   if (perf_evlist__id_add_fd(evlist, evsel, cpu, thread,
+   if (perf_evlist__id_add_fd(evlist->parent, evsel, cpu, 
thread,
   fd) < 0)
return -1;
-   perf_evlist__set_sid_idx(evlist, evsel, idx, cpu,
+   perf_evlist__set_sid_idx(evlist->parent, evsel, idx, 
cpu,
 thread);
}
}
@@ -1097,13 +1103,13 @@ static int perf_evlist__mmap_per_thread(struct 
perf_evlist *evlist,
struct mmap_params *mp)
 {
int thread;
-   int nr_threads = thread_map__nr(evlist->threads);
+   int nr_threads = thread_map__nr(evlist->parent->threads);
 
pr_debug2("perf event ring buffer mmapped per thread\n");
for (thread = 0; thread < nr_threads; thread++) {
int output = -1;
 
-  

[PATCH v14 7/8] perf tools: Don't warn about out of order event if write_backward is used

2016-07-06 Thread Wang Nan
If write_backward attribute is set, records are written into kernel
ring buffer from end to beginning, but read from beginning to end.
To avoid 'XX out of order events recorded' warning message (timestamps
of records is in reverse order when using write_backward), suppress the
warning message if write_backward is selected by at lease one event.

Result:

Before this patch:
 # perf record -m 1 -e raw_syscalls:sys_exit/overwrite/ \
-e raw_syscalls:sys_enter \
dd if=/dev/zero of=/dev/null count=300
 300+0 records in
 300+0 records out
 153600 bytes (154 kB) copied, 0.000601617 s, 255 MB/s
 [ perf record: Woken up 5 times to write data ]
 Warning:
 40 out of order events recorded.
 [ perf record: Captured and wrote 0.096 MB perf.data (696 samples) ]

After this patch:
 # perf record -m 1 -e raw_syscalls:sys_exit/overwrite/ \
-e raw_syscalls:sys_enter \
dd if=/dev/zero of=/dev/null count=300
 300+0 records in
 300+0 records out
 153600 bytes (154 kB) copied, 0.000644873 s, 238 MB/s
 [ perf record: Woken up 5 times to write data ]
 [ perf record: Captured and wrote 0.096 MB perf.data (696 samples) ]

Signed-off-by: Wang Nan 
Signed-off-by: He Kuang 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: Nilay Vaish 
Cc: pi3or...@163.com
---
 tools/perf/util/session.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 078d496..5d61242 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1499,10 +1499,27 @@ int perf_session__register_idle_thread(struct 
perf_session *session)
return err;
 }
 
+static void
+perf_session__warn_order(const struct perf_session *session)
+{
+   const struct ordered_events *oe = &session->ordered_events;
+   struct perf_evsel *evsel;
+   bool should_warn = true;
+
+   evlist__for_each_entry(session->evlist, evsel) {
+   if (evsel->attr.write_backward)
+   should_warn = false;
+   }
+
+   if (!should_warn)
+   return;
+   if (oe->nr_unordered_events != 0)
+   ui__warning("%u out of order events recorded.\n", 
oe->nr_unordered_events);
+}
+
 static void perf_session__warn_about_errors(const struct perf_session *session)
 {
const struct events_stats *stats = &session->evlist->stats;
-   const struct ordered_events *oe = &session->ordered_events;
 
if (session->tool->lost == perf_event__process_lost &&
stats->nr_events[PERF_RECORD_LOST] != 0) {
@@ -1559,8 +1576,7 @@ static void perf_session__warn_about_errors(const struct 
perf_session *session)
stats->nr_unprocessable_samples);
}
 
-   if (oe->nr_unordered_events != 0)
-   ui__warning("%u out of order events recorded.\n", 
oe->nr_unordered_events);
+   perf_session__warn_order(session);
 
events_stats__auxtrace_error_warn(stats);
 
-- 
1.8.3.4



Re: [PATCH v4] gpio: add Intel WhiskeyCove GPIO driver

2016-07-06 Thread Bin Gao
On Wed, Jul 06, 2016 at 01:07:15PM +0300, Mika Westerberg wrote:
> On Wed, Jul 06, 2016 at 10:57:19AM +0200, Linus Walleij wrote:
> > On Tue, Jun 28, 2016 at 1:56 AM, Bin Gao  wrote:
> > 
> > > This patch introduces a separate GPIO driver for Intel WhiskeyCove PMIC.
> > > This driver is based on gpio-crystalcove.c.
> > >
> > > Signed-off-by: Ajay Thomas 
> > > Signed-off-by: Bin Gao 
> > > ---
> > > Changes in v4:
> > >  - Converted CTLI_INTCNT_XX macros to less verbose ones INT_DETECT_XX.
> > >  - Add comments about why there is no .pm for the driver.
> > >  - Header files re-ordered.
> > >  - Various coding style change to address Andy's comments.
> > 
> > Mika can I have your ACK/review tag on this driver so I can merge it?
> > I prefer to have all Intel stuff bearing your seal of approval.
> 
> Thanks for your trust :)
> 
> I don't have much comments in addition to what you already pointed out.
> I'll just wait for the next revision and give my ack then.
> 
> > > +static irqreturn_t wcove_gpio_irq_handler(int irq, void *data)
> > > +{
> > > +   int pending;
> > > +   unsigned int p0, p1, virq, gpio;
> > > +   struct wcove_gpio *wg = data;
> 
> Bin,
> 
> Since you are going to make another iteration, please arrange the
> declarations like:
> 
>   unsigned int p0, p1, virq, gpio;
>   struct wcove_gpio *wg = data;
>   int pending;

Yes, will do. Thanks.

-Bin


Re: [PATCH v4] gpio: add Intel WhiskeyCove GPIO driver

2016-07-06 Thread Bin Gao
On Wed, Jul 06, 2016 at 10:57:19AM +0200, Linus Walleij wrote:
> > +static irqreturn_t wcove_gpio_irq_handler(int irq, void *data)
> > +{
> > +   int pending;
> > +   unsigned int p0, p1, virq, gpio;
> > +   struct wcove_gpio *wg = data;
> > +
> > +   if (regmap_read(wg->regmap, IRQ_STATUS_OFFSET + 0, &p0) ||
> > +   regmap_read(wg->regmap, IRQ_STATUS_OFFSET + 1, &p1)) {
> 
> Why can't you use regmap_bulk_read() here?

Will fix this in v5.

> 
> > +   dev_err(wg->chip.parent, "%s(): regmap_read() failed.\n",
> > +   __func__);
> > +   return IRQ_NONE;
> > +   }
> > +
> > +   pending = p0 | (p1 << 8);
> > +
> > +   for (gpio = 0; gpio < WCOVE_GPIO_NUM; gpio++) {
> > +   if (pending & BIT(gpio)) {
> > +   virq = irq_find_mapping(wg->chip.irqdomain, gpio);
> > +   handle_nested_irq(virq);
> > +   }
> > +   }
> > +
> > +   regmap_write(wg->regmap, IRQ_STATUS_OFFSET + 0, p0);
> > +   regmap_write(wg->regmap, IRQ_STATUS_OFFSET + 1, p1);
> 
> Use regmap_bulk_write()?

Will fix this in v5.

> 
> Also you're ignoring the return error code. Check it and dev_err() if
> it fails.

Yes, will fix.

> 
> This loop seems like it could miss interrupts happening while
> processing. Especially edge interrupts, and thatr will lead to serious
> bugs later.
> 
> Please consider the following construction:
> 
> 1. read status register
> 2. Any IRQs active?
>   2.1 No IRQs active: if this is the FIRST iteration, exit with IRQ_NONE
>   2.2 No IRQs active If this the second iteration or later, exit with
> IRQ_HANDLED
>   2.3 IRQs active, continue
> 2. Find first active IRQ
> 3. Handle first active IRQ
> 4. ACK the first active IRQ by writing the status register
> 5. Reiterate from 1
> 
> This way, if two IRQs happen at the same time, or if a new IRQ appears
> while you're inside the interrupt handler, it gets served.

I agree. Writing to status register should be done bit by bit, instead of
one write for all bits. Will fix this in v5.

> 
> > +static void wcove_gpio_dbg_show(struct seq_file *s, struct gpio_chip *chip)
> > +{
> > +   struct wcove_gpio *wg = gpiochip_get_data(chip);
> > +   int gpio, offset, group;
> > +   unsigned int ctlo, ctli, irq_mask, irq_status;
> > +
> > +   for (gpio = 0; gpio < WCOVE_GPIO_NUM; gpio++) {
> > +   group = gpio < GROUP0_NR_IRQS ? 0 : 1;
> > +   regmap_read(wg->regmap, to_reg(gpio, CTRL_OUT), &ctlo);
> > +   regmap_read(wg->regmap, to_reg(gpio, CTRL_IN), &ctli);
> > +   regmap_read(wg->regmap, IRQ_MASK_OFFSET + group, &irq_mask);
> > +   regmap_read(wg->regmap, IRQ_STATUS_OFFSET + group, 
> > &irq_status);
> 
> Ignoring error codes. Fix this.

Will Fix in v5.

> 
> > +   gpiochip_irqchip_add(&wg->chip, &wcove_irqchip, 0,
> > +handle_simple_irq, IRQ_TYPE_NONE);
> 
> Reexamine the use of handle_simple_irq() here. We have two kinds of
> irq hardware: those with one register for ACKing and reading the status
> of an IRQ, and those with two registers for it: one where you ACK the
> IRQ (so it can immediately re-trigger) and one to read the status of
> whether it happened. Sometimes different handling is needed for
> levek and edge IRQs even (c.f. gpio-pl061.c).
> 
> Only the hardware with just one register for both things should use
> handle_simple_irq(). This seems to be the case here but I want you
> to verify.

I will check and fix if it's needed.

> 
> Yours,
> Linus Walleij

Thanks for your review.


[PATCH] arm64: Enable workaround for Cavium erratum 27456 on thunderx-81xx

2016-07-06 Thread Ganapatrao Kulkarni
Cavium erratum 27456 commit 104a0c02e8b1
("arm64: Add workaround for Cavium erratum 27456")
is applicable for thunderx-81xx pass1.0 SoC as well.
Adding code to enable to 81xx.

Signed-off-by: Ganapatrao Kulkarni 
Reviewed-by: Andrew Pinski 
---
 arch/arm64/include/asm/cputype.h | 2 ++
 arch/arm64/kernel/cpu_errata.c   | 6 ++
 2 files changed, 8 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 87e1985..9d9fd4b 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -80,12 +80,14 @@
 #define APM_CPU_PART_POTENZA   0x000
 
 #define CAVIUM_CPU_PART_THUNDERX   0x0A1
+#define CAVIUM_CPU_PART_THUNDERX_81XX  0x0A2
 
 #define BRCM_CPU_PART_VULCAN   0x516
 
 #define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A53)
 #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, 
ARM_CPU_PART_CORTEX_A57)
 #define MIDR_THUNDERX  MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX)
+#define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, 
CAVIUM_CPU_PART_THUNDERX_81XX)
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index d427894..af716b6 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -98,6 +98,12 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
MIDR_RANGE(MIDR_THUNDERX, 0x00,
   (1 << MIDR_VARIANT_SHIFT) | 1),
},
+   {
+   /* Cavium ThunderX, T81 pass 1.0 */
+   .desc = "Cavium erratum 27456",
+   .capability = ARM64_WORKAROUND_CAVIUM_27456,
+   MIDR_RANGE(MIDR_THUNDERX_81XX, 0x00, 0x00),
+   },
 #endif
{
}
-- 
1.8.1.4



Re: Portable Device Tree Connector -- conceptual

2016-07-06 Thread David Gibson
On Fri, Jul 01, 2016 at 01:59:58PM +0300, Pantelis Antoniou wrote:
> Hi Frank,
> 
> Comments inline.
> 
> > On Jul 1, 2016, at 03:02 , Frank Rowand  wrote:
> > 
> > Hi All,
> > 
> > I've been trying to wrap my head around what Pantelis and Rob have written
> > on the subject of a device tree representation of a connector for a
> > daughter board to connect to (eg a cape or a shield) and the representation
> > of the daughter board.  (Or any other physically pluggable object.)
> > 
> > After trying to make sense of what had been written (or presented via slides
> > at a conference - thanks Pantelis!), I decided to go back to first 
> > principals
> > of what we are trying to accomplish.  I came up with some really simple 
> > bogus
> > examples to try to explain what my thought process is.
> > 
> > To start with, assume that the device that will eventually be on a daughter
> > board is first soldered onto the main board.  Then the device tree will
> > look like:
> > 
> > $ cat board.dts
> > /dts-v1/;
> > 
> > / {
> >#address-cells = < 1 >;
> >#size-cells = < 1 >;
> > 
> >tree_1: soc@0 {
> >reg = <0x0 0x0>;
> > 
> >spi_1: spi1 {
> >};
> >};
> > 
> > };
> > 
> > &spi_1 {
> >ethernet-switch@0 {
> >compatible = "micrel,ks8995m";
> >};
> > };
> > 
> > #include "spi_codec.dtsi"
> > 
> > $ cat spi_codec.dtsi
> > &spi_1 {
> > codec@1 {
> > compatible = "ti,tlv320aic26";
> > };
> > };
> > 
> > 
> > #- codec chip on cape
> > 
> > Then suppose I move the codec chip to a cape.  Then I will have the same
> > exact .dts and .dtsi and everything still works.
> > 
> > 
> > @- codec chip on cape, overlay
> > 
> > If I want to use overlays, I only have to add the version and "/plugin/",
> > then use the '-@' flag for dtc (both for the previous board.dts and
> > this spi_codec_overlay.dts):
> > 
> > $ cat spi_codec_overlay.dts
> > /dts-v1/;
> > 
> > /plugin/;
> > 
> > &spi_1 {
> > codec@1 {
> > compatible = "ti,tlv320aic26";
> > };
> > };
> > 
> 
> The correct form now for the /plugin/ declaration should be like
> 
> /dts-v1/ /plugin/;
> 
> The old method still works for backward compatibility.
> 
> In fact with the new patches you don’t even /plugin/ since when
> compiling an overlay we can turn on the plugin flag by looking
> at the output type (dtbo).

I'd prefer to see the dtbo option go away however, in favour of the
/plugin/ flag.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Nouveau] [PATCH v2 0/7] lib: string: add functions to case-convert strings

2016-07-06 Thread Alexandre Courbot
On Wed, Jul 6, 2016 at 7:56 AM, Joe Perches  wrote:
> On Tue, 2016-07-05 at 15:36 -0700, Markus Mayer wrote:
>> On 5 July 2016 at 15:14, Joe Perches  wrote:
>> > On Tue, 2016-07-05 at 13:47 -0700, Markus Mayer wrote:
>> > > This series introduces a family of generic string case conversion
>> > > functions. This kind of functionality is needed in several places in
>> > > the kernel. Right now, everybody seems to be implementing their own
>> > > copy of this functionality.
>> > >
>> > > Based on the discussion of the previous version of this series[1] and
>> > > the use cases found in the kernel, it does look like having several
>> > > flavours of case conversion functions is beneficial. The use cases fall
>> > > into three categories:
>> > > - copying a string and converting the case while specifying a
>> > >   maximum length to mimic strncpy()
>> > > - copying a string and converting the case without specifying a
>> > >   length to mimic strcpy()
>> > > - converting the case of a string in-place (i.e. modifying the
>> > >   string that was passed in)
>> > >
>> > > Consequently, I am proposing these new functions:
>> > > char *strncpytoupper(char *dst, const char *src, size_t len);
>> > > char *strncpytolower(char *dst, const char *src, size_t len);
>> > > char *strcpytoupper(char *dst, const char *src);
>> > > char *strcpytolower(char *dst, const char *src);
>> > > char *strtoupper(char *s);
>> > > char *strtolower(char *s);
>> > I think there isn't much value in anything other
>> > than strto.
>> >
>> > Using str[n]cpy followed by strto is
>> > pretty obvious and rarely used anyway.
>> First time around, folks were proposing the "copy" variants when I
>> submitted just strtolower() by itself[1]. They just asked for source
>> and destination parameters to strtolower(), but looking at the use
>> cases that wouldn't have worked so well. Hence it evolved into these 6
>> functions.
>>
>> Here's a breakdown of how the functions are being used (patches 2-7),
>> see also [2]:
>>
>> Patch 2: strncpytolower()
>> Patch 3: strtolower()
>> Patch 4: strncpytolower() and strtolower()
>> Patch 5: strtolower()
>> Patch 6: strcpytoupper()
>> Patch 7: strcpytoupper()
>>
>> So it does look like the copy + change case variant is more frequently
>> used than just strto.
>
> Are these functions useful?   Not to me, not so much.
>
> None of the functions would have the strcpy performance of
> the arch / asm
> versions of strcpy and the savings in overall
> code isn't significant (or
> measured?).
>
> Of course none of the uses are runtime performance important.

I tend to agree. strcpy is better left to architecture-specific code
when it exists. Then doing a strcpy() followed by strtolower() is not
exactly unintuitive. An explosion of closely related function is
certainly more confusing to me.

I'd just keep strtolower()/strtoupper() because they are commonly done
operations and we can probably save some space by having a unique
implementation. But going beyond that is overthinking the problem
IMHO.


High rate of touch_softlockup makes Soft Lockup detector useless

2016-07-06 Thread Joel Fernandes
Hi,

In a system running a recent kernel, I am trying to use soft lockup
detector to detect soft lockups in the system.
During this exercise, I see that even with real soft lockups, the
kernel is unable to detect them.

Digging in further, I found that the softlockup watchdog is touched
1000s of times per second by the NOHZ code.
prints revealed the following 2 functions calling touch_softlockup_watchdog:
[  165.960292] CPU0 touch: tick_nohz_restart_sched_tick
[  165.960309] CPU1 touch: tick_nohz_update_jiffies

I am wondering, do we really need to touch the softlockup watchdog
from the tick_nohz code?
>From the code comments it looks like the watchdog is touch'ed because
the tick was off and was being turned on so it could the watchdog may
not have been touched for a long time.
BUT, wouldn't the hrtimer interrupt for the watchdog timer cause the
watchdog thread to be scheduled even though the tick was off for a
long time? Then in that case do we really need to touch the softlockup
watchdog from the tick_nohz code?

In any case, looks like the softlockup detection is broken and doesn't
work with nohz.

BTW, commenting out the touch_softlockup seems to make soft lockup
detection work again. Any suggestions for a real fix and the right way
forward?

Thanks,

Joel


Re: [PATCH v13 5/8] perf record: Read from overwritable ring buffer

2016-07-06 Thread Wangnan (F)



On 2016/7/6 20:34, Jiri Olsa wrote:

On Wed, Jul 06, 2016 at 08:03:28PM +0800, Wangnan (F) wrote:


On 2016/7/6 19:38, Jiri Olsa wrote:

On Mon, Jul 04, 2016 at 06:20:06AM +, Wang Nan wrote:

SNIP


+static void
+record__toggle_overwrite_evsels(struct record *rec,
+   enum overwrite_evt_state state)
+{
+   struct perf_evlist *evlist = rec->overwrite_evlist;
+   enum overwrite_evt_state old_state = rec->overwrite_evt_state;
+   enum action {
+   NONE,
+   PAUSE,
+   RESUME,
+   } action = NONE;
+
+   switch (old_state) {
+   case OVERWRITE_EVT_RUNNING: {
+   switch (state) {
+   case OVERWRITE_EVT_DATA_PENDING:
+   action = PAUSE;
+   break;
+   case OVERWRITE_EVT_RUNNING:
+   case OVERWRITE_EVT_EMPTY:
+   default:
+   goto state_err;
+   }
+   break;
+   }
+   case OVERWRITE_EVT_DATA_PENDING: {
+   switch (state) {
+   case OVERWRITE_EVT_EMPTY:
+   break;
+   case OVERWRITE_EVT_RUNNING:
+   case OVERWRITE_EVT_DATA_PENDING:
+   default:
+   goto state_err;
+   }
+   break;
+   }
+   case OVERWRITE_EVT_EMPTY: {
+   switch (state) {
+   case OVERWRITE_EVT_RUNNING:
+   action = RESUME;
+   break;
+   case OVERWRITE_EVT_EMPTY:
+   case OVERWRITE_EVT_DATA_PENDING:
+   default:
+   goto state_err;
+   }
+   break;
+   }
+   default:
+   WARN_ONCE(1, "Shouldn't get there\n");
+   }
+
+   rec->overwrite_evt_state = state;
+
+   if (!evlist)
+   return;

I'd expect this check at the begining

I think even evlist is NULL the state changing is still required.
Actually, the state machine is independent with aux evlist. Even
we without overwritable evsels the state machine is still valid.
So let the state machine runs unconditionally.

hum, can't see that.. it's state machine to govern overwrite evlist, right?
if there's no overwrite evlist we should keep the current processing


Not as easy as I thought. Look at following code:


@@ -1006,8 +1122,27 @@ static int __cmd_record(struct record *rec, int argc, 
const char **argv)
}

if (trigger_is_hit(&switch_output_trigger)) {
+   /*
+* If switch_output_trigger is hit, the data in
+* overwritable ring buffer should have been collected,
+* so overwrite_evt_state should be set to
+* OVERWRITE_EVT_EMPTY.
+*
+* If SIGUSR2 raise after or during 
record__mmap_read_all(),
+* record__mmap_read_all() didn't collect data from
+* overwritable ring buffer. Read again.
+*/
+   if (rec->overwrite_evt_state == OVERWRITE_EVT_RUNNING)
+   continue;
trigger_ready(&switch_output_trigger);

+   /*
+* Reenable events in overwrite ring buffer after
+* record__mmap_read_all(): we should have collected
+* data from it.
+*/
+   record__toggle_overwrite_evsels(rec, 
OVERWRITE_EVT_RUNNING);
+
if (!quiet)
fprintf(stderr, "[ perf record: dump data: Woken up 
%ld times ]\n",
waking);


Here perf tests whether reading from overwritable ring buffer is required.
If SIGUSR2 is received just before the above trigger_is_hit, we should 
read from
overwrite ring buffer again. The OVERWRITE_EVT_RUNNING checker is for 
this reason.


Now if we stop the state machine, the state is stopped at 
OVERWRITE_EVT_RUNNING,

causes perf loops forever.

We can check rec->overwrite_evlist first, but it is ugly, since I 
believe the
overwritable state is independent to overwrite evlist. So I decide to 
introduce

a new state indicate the overwrite evlist is not ready.

Thank you.



if it's meant to govern the mmap reading in general
we should at least rename it
jirka





Re: [PATCH 14/27] HID: wacom: EKR: add a worker to add/remove resources on addition/removal (fwd)

2016-07-06 Thread Julia Lawall
The lack of unlock looks suspicious, although I haven't studied the
complete context.  Also, is 0 correct as a return value in an error case?

julia

-- Forwarded message --
Date: Wed, 6 Jul 2016 00:27:34 +0800
From: kbuild test robot 
To: kbu...@01.org
Cc: Julia Lawall 
Subject: Re: [PATCH 14/27] HID: wacom: EKR: add a worker to add/remove resources
 on addition/removal

CC: kbuild-...@01.org
In-Reply-To: <1467729563-23318-15-git-send-email-benjamin.tissoi...@redhat.com>
TO: Benjamin Tissoires 
CC: Jiri Kosina , Ping Cheng , Jason 
Gerecke , Aaron Skomra , Peter Hutterer 

CC: linux-kernel@vger.kernel.org, linux-in...@vger.kernel.org

Hi,

[auto build test WARNING on hid/for-next]
[also build test WARNING on v4.7-rc6 next-20160705]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Benjamin-Tissoires/HID-wacom-cleanup-EKR-LED/20160705-225431
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid.git for-next
:: branch date: 2 hours ago
:: commit date: 2 hours ago

>> drivers/hid/wacom_wac.c:851:2-8: preceding lock on line 846

git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout cbd7a16fa28311673663974b9edc48bd00d83815
vim +851 drivers/hid/wacom_wac.c

72b236d6 Aaron Skomra   2015-08-20  840 bool connected = 
data[j+2];
72b236d6 Aaron Skomra   2015-08-20  841
cbd7a16f Benjamin Tissoires 2016-07-05  842 
remote_data.remote[i].serial = serial;
cbd7a16f Benjamin Tissoires 2016-07-05  843 
remote_data.remote[i].connected = connected;
72b236d6 Aaron Skomra   2015-08-20  844 }
72b236d6 Aaron Skomra   2015-08-20  845
cbd7a16f Benjamin Tissoires 2016-07-05 @846 
spin_lock_irqsave(&wacom->remote_lock, flags);
72b236d6 Aaron Skomra   2015-08-20  847
cbd7a16f Benjamin Tissoires 2016-07-05  848 ret = 
kfifo_in(&wacom->remote_fifo, &remote_data, sizeof(remote_data));
cbd7a16f Benjamin Tissoires 2016-07-05  849 if (ret != sizeof(remote_data)) 
{
cbd7a16f Benjamin Tissoires 2016-07-05  850 hid_err(wacom->hdev, 
"Can't queue Remote status event.\n");
cbd7a16f Benjamin Tissoires 2016-07-05 @851 return 0;
72b236d6 Aaron Skomra   2015-08-20  852 }
72b236d6 Aaron Skomra   2015-08-20  853
cbd7a16f Benjamin Tissoires 2016-07-05  854 
spin_unlock_irqrestore(&wacom->remote_lock, flags);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


[PATCH v3] f2fs: fix to avoid data update racing between GC and DIO

2016-07-06 Thread Chao Yu
From: Chao Yu 

Datas in file can be operated by GC and DIO simultaneously, so we will
face race case as below:

For write case:
Thread AThread B
- generic_file_direct_write
 - invalidate_inode_pages2_range
 - f2fs_direct_IO
  - do_blockdev_direct_IO
   - do_direct_IO
- get_more_blocks
- f2fs_gc
 - do_garbage_collect
  - gc_data_segment
   - move_data_page
- do_write_data_page
migrate data block to new block 
address
   - dio_bio_submit
   update user data to old block address

For read case:
Thread AThread B
- generic_file_direct_write
 - invalidate_inode_pages2_range
 - f2fs_direct_IO
  - do_blockdev_direct_IO
   - do_direct_IO
- get_more_blocks
- f2fs_balance_fs
 - f2fs_gc
  - do_garbage_collect
   - gc_data_segment
- move_data_page
 - do_write_data_page
 migrate data block to new block 
address
  - write_checkpoint
   - do_checkpoint
- clear_prefree_segments
 - f2fs_issue_discard
 discard old block adress
   - dio_bio_submit
   update user buffer from obsolete block address

In order to fix this, for one file, we should let DIO and GC getting exclusion
against with each other.

Signed-off-by: Chao Yu 
---
v3: use semaphore to avoid racing in between read dio and write dio.
 fs/f2fs/data.c  |  4 
 fs/f2fs/f2fs.h  |  1 +
 fs/f2fs/gc.c| 13 +
 fs/f2fs/super.c |  1 +
 4 files changed, 19 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index b6fd5bd..19197bb 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1712,6 +1712,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
 {
struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = mapping->host;
+   struct f2fs_inode_info *fi = F2FS_I(inode);
size_t count = iov_iter_count(iter);
loff_t offset = iocb->ki_pos;
int err;
@@ -1727,7 +1728,10 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
 
trace_f2fs_direct_IO_enter(inode, offset, count, iov_iter_rw(iter));
 
+   down_read(&fi->dio_rwsem);
err = blockdev_direct_IO(iocb, inode, iter, get_data_block_dio);
+   up_read(&fi->dio_rwsem);
+
if (iov_iter_rw(iter) == WRITE) {
if (err > 0)
set_inode_flag(inode, FI_UPDATE_WRITE);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index bf9a13a..2e439ec 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -474,6 +474,7 @@ struct f2fs_inode_info {
struct list_head inmem_pages;   /* inmemory pages managed by f2fs */
struct mutex inmem_lock;/* lock for inmemory pages */
struct extent_tree *extent_tree;/* cached extent_tree entry */
+   struct rw_semaphore dio_rwsem;  /* avoid racing between dio and gc */
 };
 
 static inline void get_extent_info(struct extent_info *ext,
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index c612137..a9bfb8d 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -755,12 +755,25 @@ next_step:
/* phase 3 */
inode = find_gc_inode(gc_list, dni.ino);
if (inode) {
+   struct f2fs_inode_info *fi = F2FS_I(inode);
+   bool locked = false;
+
+   if (S_ISREG(inode->i_mode)) {
+   if (!down_write_trylock(&fi->dio_rwsem))
+   continue;
+   locked = true;
+   }
+
start_bidx = start_bidx_of_node(nofs, inode)
+ ofs_in_node;
if (f2fs_encrypted_inode(inode) && 
S_ISREG(inode->i_mode))
move_encrypted_block(inode, start_bidx);
else
move_data_page(inode, start_bidx, gc_type);
+
+   if (locked)
+   up_write(&fi->dio_rwsem);
+
stat_inc_data_blk_count(sbi, 1, gc_type);
}
}
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index edd1b35..dde57fb 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -579,6 +579,7

[PATCH] arm64: dts: rockchip: support the usb2phy for rk3399 evb

2016-07-06 Thread Caesar Wang
From: Frank Wang 

This patch adds the usb2phy needed dts node information for rk3399.

USB2.0 PHY is comprised of one Host port and one OTG port.
Host Port is for USB2.0 host controller; OTG port is for USB2.0 part of
USB3.0 OTG controller, and as a part to construct a fully feature TypeC
subsystem.

The USB2.0 vbus gpio is board specific, it's no same with all rk3399
boards, so move it into evb voard.

Signed-off-by: Frank Wang 
Signed-off-by: Caesar Wang 
---

 arch/arm64/boot/dts/rockchip/rk3399-evb.dts |  4 
 arch/arm64/boot/dts/rockchip/rk3399.dtsi| 19 +++
 2 files changed, 23 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399-evb.dts 
b/arch/arm64/boot/dts/rockchip/rk3399-evb.dts
index d33aa06..9be3715 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-evb.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3399-evb.dts
@@ -105,6 +105,10 @@
status = "okay";
 };
 
+&usb2phy {
+   vbus_drv-gpio = <&gpio4 25 GPIO_ACTIVE_HIGH>;
+};
+
 &usb_host0_ehci {
status = "okay";
 };
diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
index 4c84229..21d147f 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
@@ -242,6 +242,25 @@
status = "disabled";
};
 
+   usb2phy: usb2phy {
+   compatible = "rockchip,rk3399-usb-phy";
+   rockchip,grf = <&grf>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   usb2phy0: usb2-phy0 {
+   #phy-cells = <0>;
+   #clock-cells = <0>;
+   reg = <0xe458>;
+   };
+
+   usb2phy1: usb2-phy1 {
+   #phy-cells = <0>;
+   #clock-cells = <0>;
+   reg = <0xe468>;
+   };
+   };
+
usb_host0_ehci: usb@fe38 {
compatible = "generic-ehci";
reg = <0x0 0xfe38 0x0 0x2>;
-- 
1.9.1



Re: [PATCH v6 01/18] remoteproc: st_slim_rproc: add a slimcore rproc driver

2016-07-06 Thread kbuild test robot
Hi,

[auto build test ERROR on robh/for-next]
[also build test ERROR on v4.7-rc6 next-20160706]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Peter-Griffin/Add-support-for-FDMA-DMA-controller-and-slim-core-rproc-found-on-STi-chipsets/20160706-170304
base:   https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git for-next
config: m32r-allmodconfig (attached as .config)
compiler: m32r-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=m32r 

All errors (new ones prefixed by >>):

   ERROR: "bad_dma_ops" [sound/soc/fsl/snd-soc-fsl-asrc.ko] undefined!
   ERROR: "bad_dma_ops" [sound/soc/atmel/snd-soc-atmel-pcm-pdc.ko] undefined!
   ERROR: "bad_dma_ops" [sound/core/snd-pcm.ko] undefined!
   ERROR: "dma_common_mmap" [sound/core/snd-pcm.ko] undefined!
   ERROR: "__ucmpdi2" [lib/842/842_decompress.ko] undefined!
   ERROR: "__ucmpdi2" [fs/btrfs/btrfs.ko] undefined!
>> ERROR: "vring_del_virtqueue" [drivers/remoteproc/remoteproc.ko] undefined!
>> ERROR: "register_virtio_device" [drivers/remoteproc/remoteproc.ko] undefined!
>> ERROR: "bad_dma_ops" [drivers/remoteproc/remoteproc.ko] undefined!
>> ERROR: "unregister_virtio_device" [drivers/remoteproc/remoteproc.ko] 
>> undefined!
>> ERROR: "vring_new_virtqueue" [drivers/remoteproc/remoteproc.ko] undefined!
>> ERROR: "vring_interrupt" [drivers/remoteproc/remoteproc.ko] undefined!
>> ERROR: "vring_transport_features" [drivers/remoteproc/remoteproc.ko] 
>> undefined!
   ERROR: "__ucmpdi2" [drivers/media/i2c/adv7842.ko] undefined!
   ERROR: "__ucmpdi2" [drivers/md/bcache/bcache.ko] undefined!
   ERROR: "__ucmpdi2" [drivers/iio/imu/inv_mpu6050/inv-mpu6050.ko] undefined!
   ERROR: "bad_dma_ops" [drivers/fpga/zynq-fpga.ko] undefined!

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


RE: [RFC PATCH] ACPI / EC: Fix an order issue in ec_remove_handlers()

2016-07-06 Thread Zheng, Lv
This one and the previous one contain problems.
A patch marked as [UPDATE RFC v2] is the correct fix.
Sorry for the noise.

Thanks and best regards
-Lv

> From: Zheng, Lv
> Subject: [RFC PATCH] ACPI / EC: Fix an order issue in ec_remove_handlers()
> 
> There is an order issue in ec_remove_handlers() that the functions invoked
> in it are not invoked in the reversed order of their appearance in
> ec_install_handlers(). This existing issue has been triggered by the
> following commit:
>   Commit: dcf15cbded656a12335bc4151f3f75f10080a375
>   Subject: ACPI / EC: Fix a boot EC regresion by restoring boot EC
> The commit invokes ec_remove_handlers() during runtime, thus uncovers
> this
> issue. This patch fixes this regression.
> 
> Fixes: dcf15cbded65 ("ACPI / EC: Fix a boot EC regresion by restoring boot
> EC")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=102421
> Reported-by: Wolfram Sang 
> Reported-by: Nicholas 
> Cc: Wolfram Sang 
> Cc: Nicholas 
> Signed-off-by: Lv Zheng 
> ---
>  drivers/acpi/ec.c |   14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> Index: linux-acpica/drivers/acpi/ec.c
> ==
> =
> --- linux-acpica.orig/drivers/acpi/ec.c
> +++ linux-acpica/drivers/acpi/ec.c
> @@ -1331,8 +1331,6 @@ static int ec_install_handlers(struct ac
> 
>  static void ec_remove_handlers(struct acpi_ec *ec)
>  {
> - acpi_ec_stop(ec, false);
> -
>   if (test_bit(EC_FLAGS_EC_HANDLER_INSTALLED, &ec->flags)) {
>   if (ACPI_FAILURE(acpi_remove_address_space_handler(ec-
> >handle,
>   ACPI_ADR_SPACE_EC,
> &acpi_ec_space_handler)))
> @@ -1340,6 +1338,17 @@ static void ec_remove_handlers(struct ac
>   clear_bit(EC_FLAGS_EC_HANDLER_INSTALLED, &ec->flags);
>   }
> 
> + /*
> +  * Disabling EC (transactions) after removing the operation region
> +  * handler. This order is required because _REG(DISCONNECT) may
> +  * access the EmbeddedControl operation regions.
> +  *
> +  * Flushing transactions before removing the GPE handler. This is
> +  * required by the current ACPICA GPE design. ACPICA GPE will
> block
> +  * a GPE if there is no way to handle it.
> +  */
> + acpi_ec_stop(ec, false);
> +
>   if (test_bit(EC_FLAGS_GPE_HANDLER_INSTALLED, &ec->flags)) {
>   if (ACPI_FAILURE(acpi_remove_gpe_handler(NULL, ec->gpe,
>   &acpi_ec_gpe_handler)))


Re: [f2fs-dev] [PATCH 1/2] f2fs: fix to return correct trimmed block number in FITRIM interface

2016-07-06 Thread Chao Yu
Hi all,

I think this patch should be wrong, since during fstrim, we should not issue
discard for prefree segment redundantly.

So, Jaegeuk, could you please drop this patch in your branch?

Sorry for the noise.

Thanks,

On 2016/6/30 16:42, Chao Yu wrote:
> During tiggering fstrim, in case of issuing discard for prefree segments,
> we miss acclumulating trimmed block number which will be return to user.
> Fix it.
> 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/segment.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 6d16ecf..5dc14d6 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -732,15 +732,20 @@ void clear_prefree_segments(struct f2fs_sb_info *sbi, 
> struct cp_control *cpc)
>   if (!test_opt(sbi, LFS) || sbi->segs_per_sec == 1) {
>   f2fs_issue_discard(sbi, START_BLOCK(sbi, start),
>   (end - start) << sbi->log_blocks_per_seg);
> + cpc->trimmed +=
> + (end - start) << sbi->log_blocks_per_seg;
>   continue;
>   }
>  next:
>   secno = GET_SECNO(sbi, start);
>   start_segno = secno * sbi->segs_per_sec;
>   if (!IS_CURSEC(sbi, secno) &&
> - !get_valid_blocks(sbi, start, sbi->segs_per_sec))
> + !get_valid_blocks(sbi, start, sbi->segs_per_sec)) {
>   f2fs_issue_discard(sbi, START_BLOCK(sbi, start_segno),
>   sbi->segs_per_sec << sbi->log_blocks_per_seg);
> + cpc->trimmed +=
> + sbi->segs_per_sec << sbi->log_blocks_per_seg;
> + }
>  
>   start = start_segno + sbi->segs_per_sec;
>   if (start < end)
> 


Re: [PATCH v3] powerpc: Export thread_struct.used_vr/used_vsr to user space

2016-07-06 Thread kbuild test robot
Hi,

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.7-rc6 next-20160706]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/wei-guo-simon-gmail-com/powerpc-Export-thread_struct-used_vr-used_vsr-to-user-space/20160707-103044
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-defconfig (attached as .config)
compiler: powerpc-linux-gnu-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   arch/powerpc/kernel/built-in.o: In function `arch_ptrace':
>> (.text+0xad4): undefined reference to `__get_user_bad'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [RFC PATCH v2 1/9] mailbox: Add Amlogic Meson Message-Handling-Unit

2016-07-06 Thread Jassi Brar
On Wed, Jul 6, 2016 at 6:47 PM, Neil Armstrong  wrote:
> 2016-07-04 17:38 GMT+02:00 Jassi Brar :
>> On Tue, Jun 21, 2016 at 3:32 PM, Neil Armstrong  
>> wrote:
>>> Add Amlogic Meson SoCs Message-Handling-Unit as mailbox controller
>>> with 2 independent channels/links to communicate with a remote processor.
>>>
>>> Signed-off-by: Neil Armstrong 
>>> ---
>>>  drivers/mailbox/Makefile|   2 +
>>>  drivers/mailbox/meson_mhu.c | 199 
>>> 
>>>
>> Can we call it pdev_mhu.c or similar so that some other platform using
>> the MHU as a platform_device wouldn't have to embarrassingly call it
>> 'Meson's MHU'?  And also the replace meson with that prefix in the
>> code.
>
> Yes, it may deserve a more generic naming, but pdev_mhu is not good
> looking at all !
> What about platform_mhu ?
>
OK


Re: [f2fs-dev] [PATCH 2/2] f2fs: fix to avoid data update racing between GC and DIO

2016-07-06 Thread Chao Yu
On 2016/7/7 6:37, Jaegeuk Kim wrote:
> On Wed, Jul 06, 2016 at 10:10:57AM +0800, Chao Yu wrote:
>> On 2016/7/6 8:24, Jaegeuk Kim wrote:
>>> On Fri, Jul 01, 2016 at 02:03:17PM +0800, Chao Yu wrote:
 Hi Jaegeuk,

 On 2016/7/1 8:03, Jaegeuk Kim wrote:
> Hi Chao,
>
> On Thu, Jun 30, 2016 at 04:42:48PM +0800, Chao Yu wrote:
>> Datas in file can be operated by GC and DIO simultaneously, so we will
>> face race case as below:
>>
>> For write case:
>> Thread A Thread B
>> - generic_file_direct_write
>>  - invalidate_inode_pages2_range
>>  - f2fs_direct_IO
>>   - do_blockdev_direct_IO
>>- do_direct_IO
>> - get_more_blocks
>>  - f2fs_gc
>>   - do_garbage_collect
>>- gc_data_segment
>> - move_data_page
>>  - do_write_data_page
>>  migrate data block to new block 
>> address
>>- dio_bio_submit
>>update user data to old block address
>>
>> For read case:
>> Thread AThread B
>> - generic_file_direct_write
>>  - invalidate_inode_pages2_range
>>  - f2fs_direct_IO
>>   - do_blockdev_direct_IO
>>- do_direct_IO
>> - get_more_blocks
>>  - f2fs_balance_fs
>>   - f2fs_gc
>>- do_garbage_collect
>> - gc_data_segment
>>  - move_data_page
>>   - do_write_data_page
>>   migrate data block to new block 
>> address
>>- write_checkpoint
>> - do_checkpoint
>>  - clear_prefree_segments
>>   - f2fs_issue_discard
>>  discard old block adress
>>- dio_bio_submit
>>update user buffer from obsolete block address
>>
>> In order to fix this, for one file, we should let DIO and GC getting 
>> exclusion
>> against with each other.
>>
>> Signed-off-by: Chao Yu 
>> ---
>>  fs/f2fs/data.c  |  2 ++
>>  fs/f2fs/f2fs.h  |  1 +
>>  fs/f2fs/gc.c| 14 +-
>>  fs/f2fs/super.c |  1 +
>>  4 files changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>> index ba4963f..08dc060 100644
>> --- a/fs/f2fs/data.c
>> +++ b/fs/f2fs/data.c
>> @@ -1716,7 +1716,9 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, 
>> struct iov_iter *iter)
>>  
>>  trace_f2fs_direct_IO_enter(inode, offset, count, 
>> iov_iter_rw(iter));
>>  
>> +mutex_lock(&F2FS_I(inode)->dio_mutex);
>>  err = blockdev_direct_IO(iocb, inode, iter, get_data_block_dio);
>> +mutex_unlock(&F2FS_I(inode)->dio_mutex);
>
> This means we need to sacrifice entire parallism even in the normal cases?
> Can we find another way?

 1. For dio write vs dio write, writer will grab i_mutex before dio_mutex, 
 so
 anyway, concurrent dio writes will be exclusive.

 2. For dio write vs gc, keep using dio_mutex for making them exclusive.

 3. For dio read vs dio read, and dio read vs gc, what about adding 
 dio_rwsem to
 control the parallelism?

 4. For dio write vs dio read, we grab different lock (write grabs 
 dio_mutex,
 read grabs dio_rwsem), so there is no race condition.
>>>
>>> How about adding a flag in a dio inode and avoiding GCs for there-in blocks?
>>
>> Hmm.. IMO, without lock, it's hard to keep the sequence that let GC checking 
>> the
>> flag after setting it, right?
> 
> Okay, could you add dio_rwsem for now?
> Later, we may need to take a look at dio_overwrite case to mitigate inode_lock
> contention likewise xfs. :)

Sounds good if we can support concurrent overwrite dio! :)

Let me send v3.

Thanks,

> 
> Thanks,
> 
>
>>  if (iov_iter_rw(iter) == WRITE) {
>>  if (err > 0)
>>  set_inode_flag(inode, FI_UPDATE_WRITE);
>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> index bd82b6d..a241576 100644
>> --- a/fs/f2fs/f2fs.h
>> +++ b/fs/f2fs/f2fs.h
>> @@ -474,6 +474,7 @@ struct f2fs_inode_info {
>>  struct list_head inmem_pages;   /* inmemory pages managed by 
>> f2fs */
>>  struct mutex inmem_lock;/* lock for inmemory pages */
>>  struct extent_tree *extent_tree;/* cached extent_tree 
>> entry */
>> +  

[PATCH] f2fs: fix to avoid redundant discard during fstrim

2016-07-06 Thread Chao Yu
From: Chao Yu 

With below test steps, f2fs will issue redundant discard when doing fstrim,
the reason is that we issue discards for both prefree segments and
consecutive freed region user wants to trim, part regions they covered are
overlapped, here, we change to do not to issue any discard for prefree
segments no matter in trimmed range or not.

1. mount -t f2fs -o discard /dev/zram0 /mnt/f2fs
2. fstrim -o 0 -l 3221225472 -m 2097152 -v /mnt/f2fs/
3. dd if=/dev/zero  of=/mnt/f2fs/a bs=2M count=1
4. dd if=/dev/zero  of=/mnt/f2fs/b bs=1M count=1
5. sync
6. rm /mnt/f2fs/a /mnt/f2fs/b
7. fstrim -o 0 -l 3221225472 -m 2097152 -v /mnt/f2fs/

Before:
<...>-5428  [001] ...1  9511.052125: f2fs_issue_discard: dev = (251,0), 
blkstart = 0x2200, blklen = 0x200
<...>-5428  [001] ...1  9511.052787: f2fs_issue_discard: dev = (251,0), 
blkstart = 0x2200, blklen = 0x300

After:
<...>-6764  [000] ...1  9720.382504: f2fs_issue_discard: dev = (251,0), 
blkstart = 0x2200, blklen = 0x300

Signed-off-by: Chao Yu 
---
 fs/f2fs/segment.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 6d16ecf..5dafc56 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -710,21 +710,33 @@ void clear_prefree_segments(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
unsigned long *prefree_map = dirty_i->dirty_segmap[PRE];
unsigned int start = 0, end = -1;
unsigned int secno, start_segno;
+   unsigned int trim_start = cpc->trim_start;
+   unsigned int trim_end = cpc->trim_end;
+   bool force = (cpc->reason == CP_DISCARD);
 
mutex_lock(&dirty_i->seglist_lock);
 
while (1) {
int i;
+   unsigned int trimmed = 0;
+
start = find_next_bit(prefree_map, MAIN_SEGS(sbi), end + 1);
if (start >= MAIN_SEGS(sbi))
break;
end = find_next_zero_bit(prefree_map, MAIN_SEGS(sbi),
start + 1);
 
-   for (i = start; i < end; i++)
-   clear_bit(i, prefree_map);
+   for (i = start; i < end; i++) {
+   if (!force || (i >= trim_start && i <= trim_end)) {
+   clear_bit(i, prefree_map);
+   trimmed++;
+   }
+   }
+
+   dirty_i->nr_dirty[PRE] -= trimmed;
 
-   dirty_i->nr_dirty[PRE] -= end - start;
+   if (force)
+   continue;
 
if (!test_opt(sbi, DISCARD))
continue;
@@ -750,7 +762,7 @@ next:
 
/* send small discards */
list_for_each_entry_safe(entry, this, head, list) {
-   if (cpc->reason == CP_DISCARD && entry->len < cpc->trim_minlen)
+   if (force && entry->len < cpc->trim_minlen)
goto skip;
f2fs_issue_discard(sbi, entry->blkaddr, entry->len);
cpc->trimmed += entry->len;
-- 
2.7.2



Re: [PATCH] drm/radeon: Remove deprecated create_singlethread_workqueue

2016-07-06 Thread Michel Dänzer
On 06.07.2016 22:45, Tejun Heo wrote:
> On Wed, Jul 06, 2016 at 12:12:52PM +0900, Michel Dänzer wrote:
> 
>> Not being very familiar with the workqueue APIs, I'll describe how it's
>> supposed to work from a driver POV, which will hopefully help you guys
>> decide on the most appropriate alloc_workqueue parameters.
>>
>> There is one flip work queue for each hardware CRTC. At most one
>> radeon_flip_work_func item can be queued for any of them at any time.
>> When a radeon_flip_work_func item is queued, it should be executed ASAP
>> (so WQ_HIGHPRI might be appropriate?).
> 
> Hmmm... the only time WQ_HIGHPRI should be used is when it'd otherwise
> require a kthread w/ nice value at -20.  Would that be the case here?
> What are the consequences of the work item getting delayed?

A page flip may be delayed to a later display refresh cycle.


> Also, what kind of delays matter here?  Is it millisec range or micro?

It can be the latter in theory, but normally rather the former.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v3 01/21] backports: move legacy and SmPL patch application into helper

2016-07-06 Thread Johannes Berg
On Thu, 2016-07-07 at 02:10 +0200, Luis R. Rodriguez wrote:
> On Mon, Jul 04, 2016 at 11:33:03AM +0200, Johannes Berg wrote:
> > On Tue, 2014-11-11 at 00:14 -0800, Luis R. Rodriguez wrote:
> > > From: "Luis R. Rodriguez" 
> > > 
> > > This allows us to extend how backports uses patches for
> > > different types of applications. This will later be used
> > > for kernel integration support, for example.
> > > 
> > > This should have no functional change.
> > 
> > Obviously this patch was applied a long time ago,
> 
> Geesh yes over 2 year ago.

:)

> > but you lied - it has a functional change:
> > 
> > > +if process.returncode != 0:
> > > +if not args.verbose:
> > > +logwrite("Failed to apply changes from %s" %
> > > print_name)
> > > +for line in output:
> > > +logwrite('> %s' % line)
> > > +raise Exception('Patch failed')
> > 
> > vs.
> > 
> > > -if process.returncode != 0:
> > > -if not args.verbose:
> > > -logwrite("Failed to apply changes from %s" %
> > > print_name)
> > > -for line in output:
> > > -logwrite('> %s' % line)
> > > -return 2
> > 
> > This had a major impact on the devel/git-tracker.py tool.
> 
> Sorry about that, is there an easy fix for it? Is there a test
> we can do to avoid further regressions against the tracker ?
> 

Luca has a fix.

johannes


Re: [PATCH v2] cpufreq: powernv: Replacing pstate_id with frequency table index

2016-07-06 Thread Akshay Adiga



On 06/30/2016 11:53 AM, Akshay Adiga wrote:

Refactoring code to use frequency table index instead of pstate_id.
This abstraction will make the code independent of the pstate values.

- No functional changes
- The highest frequency is at frequency table index 0 and the frequency
   decreases as the index increases.
- Macros pstates_to_idx() and idx_to_pstate() can be used for conversion
   between pstate_id and index.
- powernv_pstate_info now contains frequency table index to min, max and
   nominal frequency (instead of pstate_ids)
- global_pstate_info new stores index values instead pstate ids.
- variables renamed as *_idx which now store index instead of pstate

Signed-off-by: Akshay Adiga 
Reviewed-by: Gautham R. Shenoy 
---
Changes from v1:
   - changed macro names from get_pstate()/ get_index() to
idx_to_pstate()/ pstate_to_idx()
   - Renamed variables that store index instead of pstate_id to *_idx
   - Retained previous printk's

v1 : http://marc.info/?l=linux-pm&m=146677701501225&w=1

  drivers/cpufreq/powernv-cpufreq.c | 177 ++
  1 file changed, 102 insertions(+), 75 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index b29c5c2..72c91d8 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -64,12 +64,14 @@
  /**
   * struct global_pstate_info -Per policy data structure to maintain 
history of
   *global pstates
- * @highest_lpstate:   The local pstate from which we are ramping down
+ * @highest_lpstate_idx:   The local pstate index from which we are
+ * ramping down
   * @elapsed_time: Time in ms spent in ramping down from
- * highest_lpstate
+ * highest_lpstate_idx
   * @last_sampled_time:Time from boot in ms when global 
pstates were
   *last set
- * @last_lpstate,last_gpstate: Last set values for local and global pstates
+ * @last_lpstate_idx,  Last set value of local pstate and global
+ * last_gpstate_idxpstate in terms of cpufreq table index
   * @timer:Is used for ramping down if cpu goes idle for
   *a long time with global pstate held high
   * @gpstate_lock: A spinlock to maintain synchronization between
@@ -77,11 +79,11 @@
   *governer's target_index calls
   */
  struct global_pstate_info {
-   int highest_lpstate;
+   int highest_lpstate_idx;
unsigned int elapsed_time;
unsigned int last_sampled_time;
-   int last_lpstate;
-   int last_gpstate;
+   int last_lpstate_idx;
+   int last_gpstate_idx;
spinlock_t gpstate_lock;
struct timer_list timer;
  };
@@ -124,29 +126,47 @@ static int nr_chips;
  static DEFINE_PER_CPU(struct chip *, chip_info);
  
  /*

- * Note: The set of pstates consists of contiguous integers, the
- * smallest of which is indicated by powernv_pstate_info.min, the
- * largest of which is indicated by powernv_pstate_info.max.
+ * Note:
+ * The set of pstates consists of contiguous integers.
+ * powernv_pstate_info stores the index of the frequency table for
+ * max, min and nominal frequencies. It also stores number of
+ * available frequencies.
   *
- * The nominal pstate is the highest non-turbo pstate in this
- * platform. This is indicated by powernv_pstate_info.nominal.
+ * powernv_pstate_info.nominal indicates the index to the highest
+ * non-turbo frequency.
   */
  static struct powernv_pstate_info {
-   int min;
-   int max;
-   int nominal;
-   int nr_pstates;
+   unsigned int min;
+   unsigned int max;
+   unsigned int nominal;
+   unsigned int nr_pstates;
  } powernv_pstate_info;
  
+/* Use following macros for conversions between pstate_id and index */

+static inline int idx_to_pstate(unsigned int i)
+{
+   return powernv_freqs[i].driver_data;
+}
+
+static inline unsigned int pstate_to_idx(int pstate)
+{
+   /*
+* abs() is deliberately used so that is works with
+* both monotonically increasing and decreasing
+* pstate values
+*/
+   return abs(pstate - idx_to_pstate(powernv_pstate_info.max));
+}
+
  static inline void reset_gpstates(struct cpufreq_policy *policy)
  {
struct global_pstate_info *gpstates = policy->driver_data;
  
-	gpstates->highest_lpstate = 0;

+   gpstates->highest_lpstate_idx = 0;
gpstates->elapsed_time = 0;
gpstates->last_sampled_time = 0;
-   gpstates->last_lpstate = 0;
-   gpstates->last_gpstate = 0;
+   gpstates->last_lpstate_idx = 0;
+   gpstates->last_gpstate_idx = 0;
  }
  
  /*

@@ -156,9 +176,10 @@ static inline void reset_gpstates(struct cpufreq_policy 
*policy)
  static int init_powernv_pstates(void)
  {
struct device_node *power_mg

linux-next: manual merge of the clockevents tree with the arm-soc tree

2016-07-06 Thread Stephen Rothwell
Hi Daniel,

Today's linux-next merge of the clockevents tree got a conflict in:

  arch/arm/Kconfig

between commit:

  c86f51737f8d ("ARM: clps711x: Switch to MULTIPLATFORM")

from the arm-soc tree and commit:

  250e46aa3bb3 ("clocksource/drivers/clps_711x: Add the COMPILE_TEST option")

from the clockevents tree.

I fixed it up (I used the arm-soc version of this file and then added
the following merge fix patch) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

From: Stephen Rothwell 
Date: Thu, 7 Jul 2016 13:59:06 +1000
Subject: [PATCH] clocksource/drivers/clps_711x: fixup for "ARM: clps711x: 
Switch to MULTIPLATFORM"

Signed-off-by: Stephen Rothwell 
---
 arch/arm/mach-clps711x/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mach-clps711x/Kconfig b/arch/arm/mach-clps711x/Kconfig
index 3b56197ccfd0..dc7c6edeab39 100644
--- a/arch/arm/mach-clps711x/Kconfig
+++ b/arch/arm/mach-clps711x/Kconfig
@@ -3,8 +3,8 @@ menuconfig ARCH_CLPS711X
depends on ARCH_MULTI_V4T
select ARCH_REQUIRE_GPIOLIB
select AUTO_ZRELADDR
-   select CLKSRC_MMIO
select CLKSRC_OF
+   select CLPS711X_TIMER
select COMMON_CLK
select CPU_ARM720T
select GENERIC_CLOCKEVENTS
-- 
2.8.1

-- 
Cheers,
Stephen Rothwell


Re: [PATCH] net/ixgbe: Allow resetting VF admin mac to zero

2016-07-06 Thread Stephen Hemminger
On Fri,  1 Jul 2016 11:19:38 +0200
Juerg Haefliger  wrote:

> The VF administrative mac addresses (stored in the PF driver) are
> initialized to zero when the PF driver starts up.
> 
> These addresses may be modified in the PF driver through ndo calls
> initiated by iproute2 or libvirt.
> 
> While we allow the PF/host to change the VF admin mac address from zero
> to a valid unicast mac, we do not allow restoring the VF admin mac to
> zero. We currently only allow changing this mac to a different unicast mac.
> 
> This leads to problems when libvirt scripts are used to deal with
> VF mac addresses, and libvirt attempts to revoke the mac so this
> host will not use it anymore.
> 
> Fix this by allowing resetting a VF administrative MAC back to zero.
> 
> Implementation and commit message shamelessly stolen from:
> commit 6e5224224faa ("net/mlx4_core: Allow resetting VF admin mac to zero")
> 
> Signed-off-by: Juerg Haefliger 

Since set mac is allowed any time even when device is up, you must
prevent a a device that is in UP state from having all zero MAC address.


Re: [PATCH v3] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest

2016-07-06 Thread Wanpeng Li
2016-07-07 1:11 GMT+08:00 Paolo Bonzini :
>
>
> On 06/07/2016 18:03, Haozhong Zhang wrote:
 This patch also fixed the crash of L1 Xen with L2 HVM guest. Xen does
 not enable preemption timer for HVM guests, and will get panic if it
 receives a preemption timer vmexit.
>>>
>>> Thanks!  I'm still not sure why the bit is set in the vmcs02 though...
>>
>> Yes, it looks really weird.
>>
>> I replaced "return false" in Wanpeng's patch by
>>
>> pr_info("VMCS: preemption timer enabled = %d\n",
>> !!(vmcs_read32(PIN_BASED_VM_EXEC_CONTROL) & 
>> PIN_BASED_VMX_PREEMPTION_TIMER));
>>
>> and redid my test. As expected, L1 Xen crashed due to the unexpected
>> preemption timer vmexit. I got a log from above statement just before crash:
>>
>> VMCS: preemption timer enabled = 1
>>
>> which is expected to be 0, because preemption timer is disabled in
>> vmcs02. I also modified L1 Xen to dump VMCS at crash, and it says
>> preemption timer is disabled.
>>
>> I noticed Jim Mattson recently sent a patch "KVM: nVMX: Fix memory
>> corruption when using VMCS shadowing" to fix the inconsistency between
>> vmcs12 and its shadow. Is it relevant here? I'll test his patch
>> tomorrow.
>
> No, it shouldn't have any effect.
>
> I think it happens when the post_block hook switches back from sw_timer

Please review my another patch 'KVM: nVMX: Fix preemption timer bit
set in vmcs02 even if L1 doesn't enable it', which can fix the vmcs02
bit set.

> to hv_timer, and L2 is running.  So the right fix should be along the
> lines of what I posted earlier.  If you don't beat me to it, I'll take
> another look tomorrow.

Maybe you can continue "L1 TSC deadline timer to trigger while L2 is
running" work based on my two bugfixes, however, your patch is still
calltrace on top of my two fixes.

Regards,
Wanpeng Li


Re: Hang due to nfs letting tasks freeze with locked inodes

2016-07-06 Thread Seth Forshee
On Wed, Jul 06, 2016 at 06:07:18PM -0400, Jeff Layton wrote:
> On Wed, 2016-07-06 at 12:46 -0500, Seth Forshee wrote:
> > We're seeing a hang when freezing a container with an nfs bind mount while
> > running iozone. Two iozone processes were hung with this stack trace.
> > 
> >  [] schedule+0x35/0x80
> >  [] schedule_preempt_disabled+0xe/0x10
> >  [] __mutex_lock_slowpath+0xb9/0x130
> >  [] mutex_lock+0x1f/0x30
> >  [] do_unlinkat+0x12b/0x2d0
> >  [] SyS_unlink+0x16/0x20
> >  [] entry_SYSCALL_64_fastpath+0x16/0x71
> > 
> > This seems to be due to another iozone thread frozen during unlink with
> > this stack trace:
> > 
> >  [] __refrigerator+0x7a/0x140
> >  [] nfs4_handle_exception+0x118/0x130 [nfsv4]
> >  [] nfs4_proc_remove+0x7d/0xf0 [nfsv4]
> >  [] nfs_unlink+0x149/0x350 [nfs]
> >  [] vfs_unlink+0xf1/0x1a0
> >  [] do_unlinkat+0x279/0x2d0
> >  [] SyS_unlink+0x16/0x20
> >  [] entry_SYSCALL_64_fastpath+0x16/0x71
> > 
> > Since nfs is allowing the thread to be frozen with the inode locked it's
> > preventing other threads trying to lock the same inode from freezing. It
> > seems like a bad idea for nfs to be doing this.
> > 
> 
> Yeah, known problem. Not a simple one to fix though.
> 
> > Can nfs do something different here to prevent this? Maybe use a
> > non-freezable sleep and let the operation complete, or else abort the
> > operation and return ERESTARTSYS?
> 
> The problem with letting the op complete is that often by the time you
> get to the point of trying to freeze processes, the network interfaces
> are already shut down. So the operation you're waiting on might never
> complete. Stuff like suspend operations on your laptop fail, leading to
> fun bug reports like: "Oh, my laptop burned to crisp inside my bag
> because the suspend never completed."
> 
> You could (in principle) return something like -ERESTARTSYS iff the
> call has not yet been transmitted. If it has already been transmitted,
> then you might end up sending the call a second time (but not as an RPC
> retransmission of course). If that call was non-idempotent then you end
> up with all of _those_ sorts of problems.
> 
> Also, -ERESTARTSYS is not quite right as it doesn't always cause the
> call to be restarted. It depends on the syscall. I think this would
> probably need some other sort of syscall-restart machinery plumbed in.

I don't really know much at all about how NFS works, so I hope you don't
mind indulging me in some questions.

What happens then if you suspend waiting for an op to complete and then
resume an hour later? Will it actually succeed or end up returning some
sort of "timed out" error?

If it's going to be an error (or even likely to be one) could the op
just be aborted immediately with an error code? It just seems like there
must be something better than potentially deadlocking the kernel.

Thanks,
Seth



RE: [PATCH v2] phy: add phy-hisi-inno-usb2

2016-07-06 Thread Lipengcheng
Hi, Kishon
I am sorry. Please ignore the patch. The mail received maintainer 
incomplete and patch version is not correct.
I will send the patch again.

Thanks,
Pengcheng Li

> -Original Message-
> From: Lipengcheng
> Sent: Thursday, July 07, 2016 11:12 AM
> To: 'Kishon Vijay Abraham I'; linux-kernel@vger.kernel.org; Lidongpo
> Cc: Xuejiancheng; Zhangzhenxing (Christian, Device ChipSet)
> Subject: RE: [PATCH v2] phy: add phy-hisi-inno-usb2
> 
> Hi, Kishon
> 
> > -Original Message-
> > From: Kishon Vijay Abraham I [mailto:kis...@ti.com]
> > Sent: Monday, July 04, 2016 6:59 PM
> > To: Lipengcheng; linux-kernel@vger.kernel.org; Lidongpo
> > Subject: Re: [PATCH v2] phy: add phy-hisi-inno-usb2
> >
> > Hi,
> >
> > On Sunday 03 July 2016 12:20 PM, l00229106 wrote:
> > > Add support for inno usb2 phy integrated on some hisilicon SOCs.
> > >
> > > Signed-off-by: Pengcheng Li 
> > > ---
> > >  .../devicetree/bindings/phy/phy-hisi-inno-usb2.txt |  48 
> > >  drivers/phy/Kconfig|  10 +
> > >  drivers/phy/Makefile   |   1 +
> > >  drivers/phy/phy-hisi-inno-usb2.c   | 298 
> > > +
> > >  4 files changed, 357 insertions(+)
> > >  create mode 100644
> > > Documentation/devicetree/bindings/phy/phy-hisi-inno-usb2.txt
> > >  create mode 100644 drivers/phy/phy-hisi-inno-usb2.c
> > >
> > > diff --git
> > > a/Documentation/devicetree/bindings/phy/phy-hisi-inno-usb2.txt
> > > b/Documentation/devicetree/bindings/phy/phy-hisi-inno-usb2.txt
> > > new file mode 100644
> > > index 000..59eaf73
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/phy/phy-hisi-inno-usb2.txt
> > > @@ -0,0 +1,48 @@
> > > +HiSilicon INNO USB2 PHY
> > > +---
> > > +Required properties:
> > > +- compatible: Should be "hisilicon,inno_usb2_phy"
> > > +- #phy-cells: Must be 0
> > > +- hisilicon,peripheral-syscon: Phandle of syscon used to control phy.
> > > +- hisilicon,reg-num: Number of phy registers which should be
> > > +configured at phy intialization stage
> > > +- hisilicon,reg-seq: Sequence of triplets of (address, value, delay-us).
> > > +The number of triplets is equal to "hisilicon,reg-num". Each
> > > +triplet is used to write one phy register. The delay-us cell
> > > +represents the delay time in microseconds to be applied after each write.
> > > +- clocks: Phandle and clock specifier pair for reference clock 
> > > utmi_refclk.
> > > +- resets: List of phandle and reset specifier pairs for each reset
> > > +signal in reset-names.
> > > +- reset-names: Should be "por_rst" and "test_rst". The test_rst
> > > +only exists in some of SOCs, so it is optional.
> > > +
> > > +Phy node can includes up to four subnodes. Each subnode represents one 
> > > port.
> > > +The required properties of port node are as follows:
> > > +- clocks: Phandle and clock specifier pair for utmi_clock.
> > > +- resets: List of phandle and reset specifier pairs for port reset and 
> > > utmi reset.
> > > +- reset-names: List of reset signal names. Should be "port_rst" and 
> > > "utmi_rst"
> > > +
> > > +Refer to phy/phy-bindings.txt for the generic PHY binding
> > > +properties
> > > +
> > > +Example:
> > > +usb_phy: phy {
> > > +  compatible = "hisilicon,inno_usb2_phy";
> > > +  #phy-cells = <0>;
> > > +  hisilicon,peripheral-syscon = <&peri_ctrl>;
> > > +  hisilicon,reg-num = <7>;
> > > +  hisilicon,reg-seq = <0x80 0x80 20>,
> > > +  <0x80 0xa0060c 200>,
> > > +  <0x80 0x80001c 20>,
> > > +  <0x80 0xa0001c 20>,
> > > +  <0x80 0x80060f 20>,
> > > +  <0x80 0xa0060f 20>,
> > > +  <0x80 0x800a4b 20>;
> > > +  clocks = <&crg USB2_REF_CLK>;
> > > +  resets = <&crg 0xb4 2>;
> > > +  reset-names = "por_rst";
> > > +  port0 {
> > > +  clocks = <&crg USB2_UTMI0_CLK>;
> > > +  resets = <&crg 0xb4 5>, <&crg 0xb4 1>;
> > > +  reset-names = "port_rst", "utmi_rst";
> > > +  };
> > > +  };
> > > diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig index
> > > 26566db..8f043ca 100644
> > > --- a/drivers/phy/Kconfig
> > > +++ b/drivers/phy/Kconfig
> > > @@ -205,6 +205,16 @@ config PHY_EXYNOS5250_SATA
> > > SATA 3.0 Gb/s, SATA 6.0 Gb/s speeds. It supports one SATA host
> > > port to accept one SATA device.
> > >
> > > +config PHY_HISI_INNO_USB2
> > > + tristate "Hisilicon Inno USB2 PHY support"
> > > + depends on (ARCH_HISI) || COMPILE_TEST
> > > + select GENERIC_PHY
> > > + select MFD_SYSCON
> > > + help
> > > +   Support for INNO PHY on Hisilicon Socs. This Phy supports
> > > +   USB 1.5Mb/s, USB 12Mb/s, USB 480Mb/s speeds. It suppots one
> > > +   USB host port to accept one USB device.
> > > +
> > >  config PHY_HIX5HD2_SATA
> > >   tristate "HIX5HD2 SATA PHY Driver"
> > >   depends on ARCH_HIX5HD

[PATCH] MAINTAINERS: Update hwspinlock paths

2016-07-06 Thread Bjorn Andersson
Include all files in drivers/hwspinlock and hwlock related dt bindings
in the hw spinlock section of MAINTAINERS.

Cc: Peter Chen 
Signed-off-by: Bjorn Andersson 
---
 MAINTAINERS | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 87b956d492c8..697f797f9e06 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5268,8 +5268,9 @@ M:Bjorn Andersson 
 L: linux-remotep...@vger.kernel.org
 S: Maintained
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/ohad/hwspinlock.git
+F: Documentation/devicetree/bindings/hwlock/
 F: Documentation/hwspinlock.txt
-F: drivers/hwspinlock/hwspinlock_*
+F: drivers/hwspinlock/
 F: include/linux/hwspinlock.h
 
 HARMONY SOUND DRIVER
-- 
2.5.0



[PATCH v3 1/2] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest

2016-07-06 Thread Wanpeng Li
From: Wanpeng Li 

BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: [<  (null)>]   (null)
PGD 0
Oops: 0010 [#1] SMP
Call Trace:
 ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
 handle_preemption_timer+0xe/0x20 [kvm_intel]
 vmx_handle_exit+0x169/0x15a0 [kvm_intel]
 ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
 kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
 ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
 ? vcpu_load+0x1c/0x60 [kvm]
 ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
 kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
 do_vfs_ioctl+0x96/0x6a0
 ? __fget_light+0x2a/0x90
 SyS_ioctl+0x79/0x90
 do_syscall_64+0x68/0x180
 entry_SYSCALL64_slow_path+0x25/0x25
Code:  Bad RIP value.
RIP  [<  (null)>]   (null)
 RSP 
CR2: 
---[ end trace 9c70c48b1a2bc66e ]---

This can be reproduced readily by preemption timer enabled on L0 and disabled
on L1.

Preemption timer for nested VMX is emulated by hrtimer which is started on L2
entry, stopped on L2 exit and evaluated via the check_nested_events hook. 
However,
nested_vmx_exit_handled is always return true for preemption timer vmexit, then
the L1 preemption timer vmexit is captured and be treated as a L2 preemption
timer vmexit, incurr a nested vmexit dereference NULL pointer.

This patch fix it by depending on check_nested_events to capture L2 preemption
timer(emulated hrtimer) expire and nested vmexit.

Tested-by: Haozhong Zhang 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Yunhong Jiang 
Cc: Jan Kiszka 
Cc: Haozhong Zhang 
Signed-off-by: Wanpeng Li 
---
v2 -> v3:
 * update patch subject
v1 -> v2:
 * fix typo in patch description

 arch/x86/kvm/vmx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 85e2f0a..29c16a8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8041,6 +8041,8 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
return nested_cpu_has2(vmcs12, SECONDARY_EXEC_XSAVES);
case EXIT_REASON_PCOMMIT:
return nested_cpu_has2(vmcs12, SECONDARY_EXEC_PCOMMIT);
+   case EXIT_REASON_PREEMPTION_TIMER:
+   return false;
default:
return true;
}
-- 
1.9.1



[PATCH v3 2/2] KVM: nVMX: Fix preemption timer bit set in vmcs02 even if L1 doesn't enable it

2016-07-06 Thread Wanpeng Li
From: Wanpeng Li 

We will go to vcpu_run() loop after L0 emulates VMRESUME which incurs 
kvm_sched_out and kvm_sched_in operations since cond_resched() will be 
called once need resched. Preemption timer will be reprogrammed if vCPU 
is scheduled to a different pCPU. Then the preemption timer bit of vmcs02 
will be set if L0 enable preemption timer to run L1 even if L1 doesn't 
enable preemption timer to run L2.

This patch fix it by don't reprogram preemption timer of vmcs02 if L1's 
vCPU is scheduled on diffent pCPU when we are in the way to vmresume 
nested guest.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Yunhong Jiang 
Cc: Jan Kiszka 
Cc: Haozhong Zhang 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/x86.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0cc6cf8..e8fe16a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2742,7 +2742,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (tsc_delta < 0)
mark_tsc_unstable("KVM discovered backwards TSC");
 
-   if (kvm_lapic_hv_timer_in_use(vcpu) &&
+   if (!is_guest_mode(vcpu) &&
+   kvm_lapic_hv_timer_in_use(vcpu) &&
kvm_x86_ops->set_hv_timer(vcpu,
kvm_get_lapic_tscdeadline_msr(vcpu)))
kvm_lapic_switch_to_sw_timer(vcpu);
-- 
1.9.1



[PATCH v4] acpi, nfit: treat virtual ramdisk SPA as pmem region

2016-07-06 Thread Lee, Chun-Yi
This patch adds logic to treat virtual ramdisk SPA as pmem region, then
ramdisk's /dev/pmem* device can be mounted with iso9660.

It's useful to work with the httpboot in EFI firmware to pull a remote
ISO file to the local memory region for booting and installation.

Wiki page of UEFI HTTPBoot with OVMF:
https://en.opensuse.org/UEFI_HTTPBoot_with_OVMF

The ramdisk function in EDK2/OVMF generates a ACPI0012 root device that
it contains empty _STA but without _DSM:

DefinitionBlock ("ssdt2.aml", "SSDT", 2, "INTEL ", "RamDisk ", 0x1000)
{
Scope (\_SB)
{
Device (NVDR)
{
Name (_HID, "ACPI0012")  // _HID: Hardware ID
Name (_STR, Unicode ("NVDIMM Root Device"))  // _STR: Description 
String
Method (_STA, 0, NotSerialized)  // _STA: Status
{
Return (0x0F)
}
}
}
}

In section 5.2.25.2 of ACPI 6.1 spec, it mentions that the "SPA Range
Structure Index" of virtual SPA shall be set to zero. That means virtual SPA
will not be associated by any NVDIMM region mapping.

The VCD's SPA Range Structure in NFIT is similar to virtual disk region
as following:

[028h 0040   2]Subtable Type :  [System Physical Address 
Range]
[02Ah 0042   2]   Length : 0038

[02Ch 0044   2]  Range Index : 
[02Eh 0046   2]Flags (decoded below) : 
   Add/Online Operation Only : 0
  Proximity Domain Valid : 0
[030h 0048   4] Reserved : 
[034h 0052   4] Proximity Domain : 
[038h 0056  16]   Address Range GUID : 
77AB535A-45FC-624B-5560-F7B281D1F96E
[048h 0072   8]   Address Range Base : B6ABD018
[050h 0080   8] Address Range Length : 0550
[058h 0088   8] Memory Map Attribute : 

The way to not associate a SPA range is to never reference it from a "flush 
hint",
"interleave", or "control region" table.

After testing on OVMF, pmem driver can support the region that it doesn't
assoicate to any NVDIMM mapping. So, treat VCD like pmem is a idea to get
a pmem block device that it contains iso. 

v4:
Instoduce nfit_spa_is_virtual() to check virtual ramdisk SPA and create
pmem region.

v3:
To simplify patch, removed useless VCD region in libnvdimm.
 
v2:
Removed the code for setting VCD to a read-only region.

Cc: Gary Lin 
Cc: Dan Williams 
Cc: Ross Zwisler 
Cc: "Rafael J. Wysocki" 
Cc: Linda Knippers 
Signed-off-by: Lee, Chun-Yi 
---
 drivers/acpi/nfit.c  |  8 +++-
 drivers/nvdimm/region_devs.c | 26 +-
 include/linux/libnvdimm.h|  2 ++
 3 files changed, 34 insertions(+), 2 deletions(-)

Index: linux/drivers/acpi/nfit.c
===
--- linux.orig/drivers/acpi/nfit.c
+++ linux/drivers/acpi/nfit.c
@@ -1980,6 +1980,14 @@ static int acpi_nfit_init_mapping(struct
return 0;
 }
 
+static bool nfit_spa_is_virtual(struct acpi_nfit_system_address *spa)
+{
+   return (nfit_spa_type(spa) == NFIT_SPA_VDISK ||
+   nfit_spa_type(spa) == NFIT_SPA_VCD   ||
+   nfit_spa_type(spa) == NFIT_SPA_PDISK ||
+   nfit_spa_type(spa) == NFIT_SPA_PCD);
+}
+
 static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc,
struct nfit_spa *nfit_spa)
 {
@@ -1995,7 +2003,7 @@ static int acpi_nfit_register_region(str
if (nfit_spa->nd_region)
return 0;
 
-   if (spa->range_index == 0) {
+   if (spa->range_index == 0 && !nfit_spa_is_virtual(spa)) {
dev_dbg(acpi_desc->dev, "%s: detected invalid spa index\n",
__func__);
return 0;
@@ -2059,6 +2067,11 @@ static int acpi_nfit_register_region(str
ndr_desc);
if (!nfit_spa->nd_region)
rc = -ENOMEM;
+   } else if (nfit_spa_is_virtual(spa)) {
+   nfit_spa->nd_region = nvdimm_pmem_region_create(nvdimm_bus,
+   ndr_desc);
+   if (!nfit_spa->nd_region)
+   rc = -ENOMEM;
}
 
  out:


[PATCH] xen/apic: Update the comment for apic_id_mask

2016-07-06 Thread Wei Jiangang
verify_local_APIC() had been removed by
commit 4399c03c6780 ("x86/apic: Remove verify_local_APIC()"),
so apic_id_mask isn't used by it.

Signed-off-by: Wei Jiangang 
---
 arch/x86/xen/apic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
index db52a7fafcc2..9cbb1f48381b 100644
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -177,7 +177,7 @@ static struct apic xen_pv_apic = {
 
.get_apic_id= xen_get_apic_id,
.set_apic_id= xen_set_apic_id, /* Can be NULL on 
32-bit. */
-   .apic_id_mask   = 0xFF << 24, /* Used by 
verify_local_APIC. Match with what xen_get_apic_id does. */
+   .apic_id_mask   = 0xFF << 24, /* Match with what 
xen_get_apic_id does. */
 
.cpu_mask_to_apicid_and = flat_cpu_mask_to_apicid_and,
 
-- 
1.9.3





RE: [PATCH v2] phy: add phy-hisi-inno-usb2

2016-07-06 Thread Lipengcheng
Hi, Kishon

> -Original Message-
> From: Kishon Vijay Abraham I [mailto:kis...@ti.com]
> Sent: Monday, July 04, 2016 6:59 PM
> To: Lipengcheng; linux-kernel@vger.kernel.org; Lidongpo
> Subject: Re: [PATCH v2] phy: add phy-hisi-inno-usb2
> 
> Hi,
> 
> On Sunday 03 July 2016 12:20 PM, l00229106 wrote:
> > Add support for inno usb2 phy integrated on some hisilicon SOCs.
> >
> > Signed-off-by: Pengcheng Li 
> > ---
> >  .../devicetree/bindings/phy/phy-hisi-inno-usb2.txt |  48 
> >  drivers/phy/Kconfig|  10 +
> >  drivers/phy/Makefile   |   1 +
> >  drivers/phy/phy-hisi-inno-usb2.c   | 298 
> > +
> >  4 files changed, 357 insertions(+)
> >  create mode 100644
> > Documentation/devicetree/bindings/phy/phy-hisi-inno-usb2.txt
> >  create mode 100644 drivers/phy/phy-hisi-inno-usb2.c
> >
> > diff --git
> > a/Documentation/devicetree/bindings/phy/phy-hisi-inno-usb2.txt
> > b/Documentation/devicetree/bindings/phy/phy-hisi-inno-usb2.txt
> > new file mode 100644
> > index 000..59eaf73
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/phy/phy-hisi-inno-usb2.txt
> > @@ -0,0 +1,48 @@
> > +HiSilicon INNO USB2 PHY
> > +---
> > +Required properties:
> > +- compatible: Should be "hisilicon,inno_usb2_phy"
> > +- #phy-cells: Must be 0
> > +- hisilicon,peripheral-syscon: Phandle of syscon used to control phy.
> > +- hisilicon,reg-num: Number of phy registers which should be
> > +configured at phy intialization stage
> > +- hisilicon,reg-seq: Sequence of triplets of (address, value, delay-us).
> > +The number of triplets is equal to "hisilicon,reg-num". Each triplet
> > +is used to write one phy register. The delay-us cell represents the
> > +delay time in microseconds to be applied after each write.
> > +- clocks: Phandle and clock specifier pair for reference clock utmi_refclk.
> > +- resets: List of phandle and reset specifier pairs for each reset
> > +signal in reset-names.
> > +- reset-names: Should be "por_rst" and "test_rst". The test_rst only
> > +exists in some of SOCs, so it is optional.
> > +
> > +Phy node can includes up to four subnodes. Each subnode represents one 
> > port.
> > +The required properties of port node are as follows:
> > +- clocks: Phandle and clock specifier pair for utmi_clock.
> > +- resets: List of phandle and reset specifier pairs for port reset and 
> > utmi reset.
> > +- reset-names: List of reset signal names. Should be "port_rst" and 
> > "utmi_rst"
> > +
> > +Refer to phy/phy-bindings.txt for the generic PHY binding properties
> > +
> > +Example:
> > +usb_phy: phy {
> > +compatible = "hisilicon,inno_usb2_phy";
> > +#phy-cells = <0>;
> > +hisilicon,peripheral-syscon = <&peri_ctrl>;
> > +hisilicon,reg-num = <7>;
> > +hisilicon,reg-seq = <0x80 0x80 20>,
> > +<0x80 0xa0060c 200>,
> > +<0x80 0x80001c 20>,
> > +<0x80 0xa0001c 20>,
> > +<0x80 0x80060f 20>,
> > +<0x80 0xa0060f 20>,
> > +<0x80 0x800a4b 20>;
> > +clocks = <&crg USB2_REF_CLK>;
> > +resets = <&crg 0xb4 2>;
> > +reset-names = "por_rst";
> > +port0 {
> > +clocks = <&crg USB2_UTMI0_CLK>;
> > +resets = <&crg 0xb4 5>, <&crg 0xb4 1>;
> > +reset-names = "port_rst", "utmi_rst";
> > +};
> > +};
> > diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig index
> > 26566db..8f043ca 100644
> > --- a/drivers/phy/Kconfig
> > +++ b/drivers/phy/Kconfig
> > @@ -205,6 +205,16 @@ config PHY_EXYNOS5250_SATA
> >   SATA 3.0 Gb/s, SATA 6.0 Gb/s speeds. It supports one SATA host
> >   port to accept one SATA device.
> >
> > +config PHY_HISI_INNO_USB2
> > +   tristate "Hisilicon Inno USB2 PHY support"
> > +   depends on (ARCH_HISI) || COMPILE_TEST
> > +   select GENERIC_PHY
> > +   select MFD_SYSCON
> > +   help
> > + Support for INNO PHY on Hisilicon Socs. This Phy supports
> > + USB 1.5Mb/s, USB 12Mb/s, USB 480Mb/s speeds. It suppots one
> > + USB host port to accept one USB device.
> > +
> >  config PHY_HIX5HD2_SATA
> > tristate "HIX5HD2 SATA PHY Driver"
> > depends on ARCH_HIX5HD2 && OF && HAS_IOMEM diff --git
> > a/drivers/phy/Makefile b/drivers/phy/Makefile index 24596a9..ef6a24b
> > 100644
> > --- a/drivers/phy/Makefile
> > +++ b/drivers/phy/Makefile
> > @@ -24,6 +24,7 @@ obj-$(CONFIG_TI_PIPE3)+= 
> > phy-ti-pipe3.o
> >  obj-$(CONFIG_TWL4030_USB)  += phy-twl4030-usb.o
> >  obj-$(CONFIG_PHY_EXYNOS5250_SATA)  += phy-exynos5250-sata.o
> >  obj-$(CONFIG_PHY_HIX5HD2_SATA) += phy-hix5hd2-sata.o
> > +obj-$(CONFIG_PHY_HISI_INNO_USB2)   += phy-hisi-inno-usb2.o
> >  obj-$(CONFIG_PHY_HI6220_USB)   += phy-hi6220-usb.o
> >  obj-$(CONFIG_PHY_MT65XX

[PATCH v6 5/5] usb: dwc3: rockchip: add devicetree bindings documentation

2016-07-06 Thread William Wu
This patch adds the devicetree documentation required for Rockchip
USB3.0 core wrapper consisting of USB3.0 IP from Synopsys.

It supports DRD mode, and could operate in device mode (SS, HS, FS)
and host mode (SS, HS, FS, LS).

Signed-off-by: William Wu 
---
Changes in v6:
- rename bus_clk, and add usbdrd3_1 node as an example (Heiko)

Changes in v5:
- rename clock-names, and remove unnecessary clocks (Heiko)

Changes in v4:
- modify commit log, and add phy documentation location (Sergei)

Changes in v3:
- add dwc3 address (balbi)

Changes in v2:
- add rockchip,dwc3.txt to Documentation/devicetree/bindings/ (balbi, Brian)

 .../devicetree/bindings/usb/rockchip,dwc3.txt  | 59 ++
 1 file changed, 59 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/usb/rockchip,dwc3.txt

diff --git a/Documentation/devicetree/bindings/usb/rockchip,dwc3.txt 
b/Documentation/devicetree/bindings/usb/rockchip,dwc3.txt
new file mode 100644
index 000..0536a93
--- /dev/null
+++ b/Documentation/devicetree/bindings/usb/rockchip,dwc3.txt
@@ -0,0 +1,59 @@
+Rockchip SuperSpeed DWC3 USB SoC controller
+
+Required properties:
+- compatible:  should contain "rockchip,rk3399-dwc3" for rk3399 SoC
+- clocks:  A list of phandle + clock-specifier pairs for the
+   clocks listed in clock-names
+- clock-names: Should contain the following:
+  "ref_clk"Controller reference clk, have to be 24 MHz
+  "suspend_clk"Controller suspend clk, have to be 24 MHz or 32 KHz
+  "bus_clk"Master/Core clock, have to be >= 62.5 MHz for SS
+   operation and >= 30MHz for HS operation
+  "grf_clk"Controller grf clk
+
+Required child node:
+A child node must exist to represent the core DWC3 IP block. The name of
+the node is not important. The content of the node is defined in dwc3.txt.
+
+Phy documentation is provided in the following places:
+Documentation/devicetree/bindings/phy/rockchip,dwc3-usb-phy.txt
+
+Example device nodes:
+
+   usbdrd3_0: usb@fe80 {
+   compatible = "rockchip,rk3399-dwc3";
+   clocks = <&cru SCLK_USB3OTG0_REF>, <&cru SCLK_USB3OTG0_SUSPEND>,
+<&cru ACLK_USB3OTG0>, <&cru ACLK_USB3_GRF>;
+   clock-names = "ref_clk", "suspend_clk",
+ "bus_clk", "grf_clk";
+   #address-cells = <2>;
+   #size-cells = <2>;
+   ranges;
+   status = "disabled";
+   usbdrd_dwc3_0: dwc3@fe80 {
+   compatible = "snps,dwc3";
+   reg = <0x0 0xfe80 0x0 0x10>;
+   interrupts = ;
+   dr_mode = "otg";
+   status = "disabled";
+   };
+   };
+
+   usbdrd3_1: usb@fe90 {
+   compatible = "rockchip,rk3399-dwc3";
+   clocks = <&cru SCLK_USB3OTG1_REF>, <&cru SCLK_USB3OTG1_SUSPEND>,
+<&cru ACLK_USB3OTG1>, <&cru ACLK_USB3_GRF>;
+   clock-names = "ref_clk", "suspend_clk",
+ "bus_clk", "grf_clk";
+   #address-cells = <2>;
+   #size-cells = <2>;
+   ranges;
+   status = "disabled";
+   usbdrd_dwc3_1: dwc3@fe90 {
+   compatible = "snps,dwc3";
+   reg = <0x0 0xfe90 0x0 0x10>;
+   interrupts = ;
+   dr_mode = "otg";
+   status = "disabled";
+   };
+   };
-- 
1.9.1




[PATCH v6 3/5] usb: dwc3: add phyif_utmi_quirk

2016-07-06 Thread William Wu
Add a quirk to configure the core to support the
UTMI+ PHY with an 8- or 16-bit interface. UTMI+ PHY
interface is hardware property, and it's platform
dependent. Normall, the PHYIf can be configured
during coreconsultant. But for some specific usb
cores(e.g. rk3399 soc dwc3), the default PHYIf
configuration value is fault, so we need to
reconfigure it by software.

And refer to the dwc3 databook, the GUSB2PHYCFG.USBTRDTIM
must be set to the corresponding value according to
the UTMI+ PHY interface.

Signed-off-by: William Wu 
---
Changes in v6:
- use '-' instead of '_' in dts (Rob Herring)

Changes in v5:
- None

Changes in v4:
- rebase on top of balbi testing/next, remove pdata (balbi)

Changes in v3:
- None

Changes in v2:
- add a quirk for phyif_utmi (balbi)

 Documentation/devicetree/bindings/usb/dwc3.txt |  4 
 drivers/usb/dwc3/core.c| 19 +++
 drivers/usb/dwc3/core.h| 12 
 3 files changed, 35 insertions(+)

diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt 
b/Documentation/devicetree/bindings/usb/dwc3.txt
index 020b0e9..8d7317d 100644
--- a/Documentation/devicetree/bindings/usb/dwc3.txt
+++ b/Documentation/devicetree/bindings/usb/dwc3.txt
@@ -42,6 +42,10 @@ Optional properties:
  - snps,dis-u2-freeclk-exists-quirk: when set, clear the u2_freeclk_exists
in GUSB2PHYCFG, specify that USB2 PHY doesn't provide
a free-running PHY clock.
+ - snps,phyif-utmi-quirk: when set core will set phyif UTMI+ interface.
+ - snps,phyif-utmi: the value to configure the core to support a UTMI+ PHY
+   with an 8- or 16-bit interface. Value 0 select 8-bit
+   interface, value 1 select 16-bit interface.
  - snps,is-utmi-l1-suspend: true when DWC3 asserts output signal
utmi_l1_suspend_n, false when asserts utmi_sleep_n
  - snps,hird-threshold: HIRD threshold
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 0b7bfd2..94036b1 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -408,6 +408,7 @@ static void dwc3_cache_hwparams(struct dwc3 *dwc)
 static int dwc3_phy_setup(struct dwc3 *dwc)
 {
u32 reg;
+   u32 usbtrdtim;
int ret;
 
reg = dwc3_readl(dwc->regs, DWC3_GUSB3PIPECTL(0));
@@ -503,6 +504,15 @@ static int dwc3_phy_setup(struct dwc3 *dwc)
if (dwc->dis_u2_freeclk_exists_quirk)
reg &= ~DWC3_GUSB2PHYCFG_U2_FREECLK_EXISTS;
 
+   if (dwc->phyif_utmi_quirk) {
+   reg &= ~(DWC3_GUSB2PHYCFG_PHYIF_MASK |
+  DWC3_GUSB2PHYCFG_USBTRDTIM_MASK);
+   usbtrdtim = dwc->phyif_utmi ? USBTRDTIM_UTMI_16_BIT :
+   USBTRDTIM_UTMI_8_BIT;
+   reg |= DWC3_GUSB2PHYCFG_PHYIF(dwc->phyif_utmi) |
+  DWC3_GUSB2PHYCFG_USBTRDTIM(usbtrdtim);
+   }
+
dwc3_writel(dwc->regs, DWC3_GUSB2PHYCFG(0), reg);
 
return 0;
@@ -834,6 +844,7 @@ static int dwc3_probe(struct platform_device *pdev)
struct resource *res;
struct dwc3 *dwc;
u8  lpm_nyet_threshold;
+   u8  phyif_utmi;
u8  tx_de_emphasis;
u8  hird_threshold;
 
@@ -880,6 +891,9 @@ static int dwc3_probe(struct platform_device *pdev)
/* default to highest possible threshold */
lpm_nyet_threshold = 0xff;
 
+   /* default to UTMI+ 8-bit interface */
+   phyif_utmi = 0;
+
/* default to -3.5dB de-emphasis */
tx_de_emphasis = 1;
 
@@ -929,6 +943,10 @@ static int dwc3_probe(struct platform_device *pdev)
"snps,dis_rxdet_inp3_quirk");
dwc->dis_u2_freeclk_exists_quirk = device_property_read_bool(dev,
"snps,dis-u2-freeclk-exists-quirk");
+   dwc->phyif_utmi_quirk = device_property_read_bool(dev,
+   "snps,phyif-utmi-quirk");
+device_property_read_u8(dev, "snps,phyif-utmi",
+&phyif_utmi);
 
dwc->tx_de_emphasis_quirk = device_property_read_bool(dev,
"snps,tx_de_emphasis_quirk");
@@ -940,6 +958,7 @@ static int dwc3_probe(struct platform_device *pdev)
 &dwc->fladj);
 
dwc->lpm_nyet_threshold = lpm_nyet_threshold;
+   dwc->phyif_utmi = phyif_utmi;
dwc->tx_de_emphasis = tx_de_emphasis;
 
dwc->hird_threshold = hird_threshold
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index f321a5c..cf6696c 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -203,6 +203,12 @@
 #define DWC3_GUSB2PHYCFG_SUSPHY(1 << 6)
 #define DWC3_GUSB2PHYCFG_ULPI_UTMI (1 << 4)
 #define DWC3_GUSB2PHYCFG_ENBLSLPM  (1 << 8)
+#define DWC3_GUSB2PHYCFG_PHYIF(n)  (n << 3)
+#define DWC3

[PATCH v6 1/5] usb: dwc3: of-simple: add compatible for rockchip rk3399

2016-07-06 Thread William Wu
Rockchip platform merely enable usb3 clocks and
populate its children. So we can use this generic
glue layer to support Rockchip dwc3.

Signed-off-by: William Wu 
---
Changes in v6:
- None

Changes in v5:
- change compatible from "rockchip,dwc3" to "rockchip,rk3399-dwc3" (Heiko)

Changes in v4:
- None

Changes in v3:
- None

Changes in v2:
- sort the list of_dwc3_simple_match (Doug)

 drivers/usb/dwc3/dwc3-of-simple.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/dwc3/dwc3-of-simple.c 
b/drivers/usb/dwc3/dwc3-of-simple.c
index 9743353..05c9349 100644
--- a/drivers/usb/dwc3/dwc3-of-simple.c
+++ b/drivers/usb/dwc3/dwc3-of-simple.c
@@ -161,6 +161,7 @@ static const struct dev_pm_ops dwc3_of_simple_dev_pm_ops = {
 
 static const struct of_device_id of_dwc3_simple_match[] = {
{ .compatible = "qcom,dwc3" },
+   { .compatible = "rockchip,rk3399-dwc3" },
{ .compatible = "xlnx,zynqmp-dwc3" },
{ /* Sentinel */ }
 };
-- 
1.9.1




[PATCH v6 2/5] usb: dwc3: add dis_u2_freeclk_exists_quirk

2016-07-06 Thread William Wu
Add a quirk to clear the GUSB2PHYCFG.U2_FREECLK_EXISTS bit,
which specifies whether the USB2.0 PHY provides a free-running
PHY clock, which is active when the clock control input is active.

Signed-off-by: William Wu 
---
Changes in v6:
- use '-' instead of '_' in dts (Rob Herring)

Changes in v5:
- None

Changes in v4:
- rebase on top of balbi testing/next, remove pdata (balbi)

Changes in v3:
- None

Changes in v2:
- None

 Documentation/devicetree/bindings/usb/dwc3.txt | 3 +++
 drivers/usb/dwc3/core.c| 5 +
 drivers/usb/dwc3/core.h| 5 +
 3 files changed, 13 insertions(+)

diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt 
b/Documentation/devicetree/bindings/usb/dwc3.txt
index 7d7ce08..020b0e9 100644
--- a/Documentation/devicetree/bindings/usb/dwc3.txt
+++ b/Documentation/devicetree/bindings/usb/dwc3.txt
@@ -39,6 +39,9 @@ Optional properties:
disabling the suspend signal to the PHY.
  - snps,dis_rxdet_inp3_quirk: when set core will disable receiver detection
in PHY P3 power state.
+ - snps,dis-u2-freeclk-exists-quirk: when set, clear the u2_freeclk_exists
+   in GUSB2PHYCFG, specify that USB2 PHY doesn't provide
+   a free-running PHY clock.
  - snps,is-utmi-l1-suspend: true when DWC3 asserts output signal
utmi_l1_suspend_n, false when asserts utmi_sleep_n
  - snps,hird-threshold: HIRD threshold
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 9466431..0b7bfd2 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -500,6 +500,9 @@ static int dwc3_phy_setup(struct dwc3 *dwc)
if (dwc->dis_enblslpm_quirk)
reg &= ~DWC3_GUSB2PHYCFG_ENBLSLPM;
 
+   if (dwc->dis_u2_freeclk_exists_quirk)
+   reg &= ~DWC3_GUSB2PHYCFG_U2_FREECLK_EXISTS;
+
dwc3_writel(dwc->regs, DWC3_GUSB2PHYCFG(0), reg);
 
return 0;
@@ -924,6 +927,8 @@ static int dwc3_probe(struct platform_device *pdev)
"snps,dis_enblslpm_quirk");
dwc->dis_rxdet_inp3_quirk = device_property_read_bool(dev,
"snps,dis_rxdet_inp3_quirk");
+   dwc->dis_u2_freeclk_exists_quirk = device_property_read_bool(dev,
+   "snps,dis-u2-freeclk-exists-quirk");
 
dwc->tx_de_emphasis_quirk = device_property_read_bool(dev,
"snps,tx_de_emphasis_quirk");
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 45d6de5..f321a5c 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -199,6 +199,7 @@
 
 /* Global USB2 PHY Configuration Register */
 #define DWC3_GUSB2PHYCFG_PHYSOFTRST(1 << 31)
+#define DWC3_GUSB2PHYCFG_U2_FREECLK_EXISTS (1 << 30)
 #define DWC3_GUSB2PHYCFG_SUSPHY(1 << 6)
 #define DWC3_GUSB2PHYCFG_ULPI_UTMI (1 << 4)
 #define DWC3_GUSB2PHYCFG_ENBLSLPM  (1 << 8)
@@ -799,6 +800,9 @@ struct dwc3_scratchpad_array {
  * @dis_u2_susphy_quirk: set if we disable usb2 suspend phy
  * @dis_enblslpm_quirk: set if we clear enblslpm in GUSB2PHYCFG,
  *  disabling the suspend signal to the PHY.
+ * @dis_u2_freeclk_exists_quirk : set if we clear u2_freeclk_exists
+ * in GUSB2PHYCFG, specify that USB2 PHY doesn't
+ * provide a free-running PHY clock.
  * @tx_de_emphasis_quirk: set if we enable Tx de-emphasis quirk
  * @tx_de_emphasis: Tx de-emphasis value
  * 0   - -6dB de-emphasis
@@ -942,6 +946,7 @@ struct dwc3 {
unsigneddis_u2_susphy_quirk:1;
unsigneddis_enblslpm_quirk:1;
unsigneddis_rxdet_inp3_quirk:1;
+   unsigneddis_u2_freeclk_exists_quirk:1;
 
unsignedtx_de_emphasis_quirk:1;
unsignedtx_de_emphasis:2;
-- 
1.9.1




[PATCH v6 4/5] usb: dwc3: add dis_del_phy_power_chg_quirk

2016-07-06 Thread William Wu
Add a quirk to clear the GUSB3PIPECTL.DELAYP1TRANS bit,
which specifies whether disable delay PHY power change
from P0 to P1/P2/P3 when link state changing from U0
to U1/U2/U3 respectively.

Signed-off-by: William Wu 
---
Changes in v6:
- use '-' instead of '_' in dts (Rob Herring)

Changes in v5:
- None

Changes in v4:
- rebase on top of balbi testing/next, remove pdata (balbi)

Changes in v3:
- None

Changes in v2:
- None

 Documentation/devicetree/bindings/usb/dwc3.txt | 2 ++
 drivers/usb/dwc3/core.c| 5 +
 drivers/usb/dwc3/core.h| 3 +++
 3 files changed, 10 insertions(+)

diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt 
b/Documentation/devicetree/bindings/usb/dwc3.txt
index 8d7317d..1c140df 100644
--- a/Documentation/devicetree/bindings/usb/dwc3.txt
+++ b/Documentation/devicetree/bindings/usb/dwc3.txt
@@ -42,6 +42,8 @@ Optional properties:
  - snps,dis-u2-freeclk-exists-quirk: when set, clear the u2_freeclk_exists
in GUSB2PHYCFG, specify that USB2 PHY doesn't provide
a free-running PHY clock.
+ - snps,dis-del-phy-power-chg-quirk: when set core will change PHY power
+   from P0 to P1/P2/P3 without delay.
  - snps,phyif-utmi-quirk: when set core will set phyif UTMI+ interface.
  - snps,phyif-utmi: the value to configure the core to support a UTMI+ PHY
with an 8- or 16-bit interface. Value 0 select 8-bit
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 94036b1..e79d6a4 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -449,6 +449,9 @@ static int dwc3_phy_setup(struct dwc3 *dwc)
if (dwc->dis_u3_susphy_quirk)
reg &= ~DWC3_GUSB3PIPECTL_SUSPHY;
 
+   if (dwc->dis_del_phy_power_chg_quirk)
+   reg &= ~DWC3_GUSB3PIPECTL_DEPOCHANGE;
+
dwc3_writel(dwc->regs, DWC3_GUSB3PIPECTL(0), reg);
 
reg = dwc3_readl(dwc->regs, DWC3_GUSB2PHYCFG(0));
@@ -943,6 +946,8 @@ static int dwc3_probe(struct platform_device *pdev)
"snps,dis_rxdet_inp3_quirk");
dwc->dis_u2_freeclk_exists_quirk = device_property_read_bool(dev,
"snps,dis-u2-freeclk-exists-quirk");
+   dwc->dis_del_phy_power_chg_quirk = device_property_read_bool(dev,
+   "snps,dis-del-phy-power-chg-quirk");
dwc->phyif_utmi_quirk = device_property_read_bool(dev,
"snps,phyif-utmi-quirk");
 device_property_read_u8(dev, "snps,phyif-utmi",
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index cf6696c..55e136d 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -809,6 +809,8 @@ struct dwc3_scratchpad_array {
  * @dis_u2_freeclk_exists_quirk : set if we clear u2_freeclk_exists
  * in GUSB2PHYCFG, specify that USB2 PHY doesn't
  * provide a free-running PHY clock.
+ * @dis_del_phy_power_chg_quirk: set if we disable delay phy power
+ * change quirk.
  * @phyif_utmi_quirk: set if we enable phyif UTMI+ quirk
  * @phyif_utmi: UTMI+ PHY interface value
  * 0   - 8 bits
@@ -957,6 +959,7 @@ struct dwc3 {
unsigneddis_enblslpm_quirk:1;
unsigneddis_rxdet_inp3_quirk:1;
unsigneddis_u2_freeclk_exists_quirk:1;
+   unsigneddis_del_phy_power_chg_quirk:1;
 
unsignedphyif_utmi_quirk:1;
unsignedphyif_utmi:1;
-- 
1.9.1




[PATCH v6 0/5] support rockchip dwc3 driver

2016-07-06 Thread William Wu
This series add support for rockchip dwc3 driver,
and add additional optional properties for specific
platforms (e.g., rockchip rk3399 platform).

William Wu (5):
  usb: dwc3: of-simple: add compatible for rockchip rk3399
  usb: dwc3: add dis_u2_freeclk_exists_quirk
  usb: dwc3: add phyif_utmi_quirk
  usb: dwc3: add dis_del_phy_power_chg_quirk
  usb: dwc3: rockchip: add devicetree bindings documentation

 Documentation/devicetree/bindings/usb/dwc3.txt |  9 
 .../devicetree/bindings/usb/rockchip,dwc3.txt  | 59 ++
 drivers/usb/dwc3/core.c| 29 +++
 drivers/usb/dwc3/core.h| 20 
 drivers/usb/dwc3/dwc3-of-simple.c  |  1 +
 5 files changed, 118 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/usb/rockchip,dwc3.txt

-- 
1.9.1




RE: [PATCH][RFC v3] x86, hotplug: Use hlt instead of mwait if invoked from disable_nonboot_cpus

2016-07-06 Thread Chen, Yu C

> -Original Message-
> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Sent: Thursday, July 07, 2016 8:33 AM
> To: Chen, Yu C; James Morse
> Cc: linux...@vger.kernel.org; Thomas Gleixner; H. Peter Anvin; Pavel Machek;
> Borislav Petkov; Peter Zijlstra; Ingo Molnar; Len Brown; x...@kernel.org; 
> linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH][RFC v3] x86, hotplug: Use hlt instead of mwait if invoked
> from disable_nonboot_cpus
> 
> On Tuesday, June 28, 2016 05:16:43 PM Chen Yu wrote:
> > Stress test from Varun Koyyalagunta reports that, the nonboot CPU
> > would hang occasionally, when resuming from hibernation. Further
> > investigation shows that, the precise stage when nonboot CPU hangs, is
> > the time when the nonboot CPU been woken up incorrectly, and tries to
> > monitor the mwait_ptr for the second time, then an exception is
> > triggered due to illegal vaddr access, say, something like, 'Unable to
> > handler kernel address of 0x8800ba800010...'
> >
> > Further investigation shows that, this exception is caused by
> > accessing a page without PRESENT flag, because the pte entry for this
> > vaddr is zero. Here's the scenario how this problem
> > happens: Page table for direct mapping is allocated dynamically by
> > kernel_physical_mapping_init, it is possible that in the resume
> > process, when the boot CPU is trying to write back pages to their
> > original address, and just right to writes to the monitor mwait_ptr
> > then wakes up one of the nonboot CPUs, since the page table currently
> > used by the nonboot CPU might not the same as it is before the
> > hibernation, an exception might occur due to inconsistent page table.
> >
> > First try is to get rid of this problem by changing the monitor
> > address from task.flag to zero page, because no one would write data
> > to zero page. But there is still problem because of a ping-pong wake
> > up scenario in mwait_play_dead:
> >
> > One possible implementation of a clflush is a read-invalidate snoop,
> > which is what a store might look like, so cflush might break the mwait.
> >
> > 1. CPU1 wait at zero page
> > 2. CPU2 cflush zero page, wake CPU1 up, then CPU2 waits at zero page
> > 3. CPU1 is woken up, and invoke cflush zero page, thus wake up CPU2 again.
> > then the nonboot CPUs never sleep for long.
> >
> > So it's better to monitor different address for each nonboot CPUs,
> > however since there is only one zero page, at most:
> > PAGE_SIZE/L1_CACHE_LINE CPUs are satisfied, which is usually 64 on a
> > x86_64, apparently it's not enough for servers, maybe more zero pages
> > are required.
> >
> > So choose a new solution as Brian suggested, to put the nonboot CPUs
> > into hlt before resume, without touching any memory during s/r.
> > Theoretically there might still be some problems if some of the CPUs
> > have already been put offline, but since the case is very rare and
> > users can work around it, we do not deal with this special case in
> > kernel for now.
> >
> > BTW, as James mentioned, he might want to encapsulate
> > disable_nonboot_cpus into arch-specific, so this patch might need small
> change after that.
> >
> > Comments and suggestions would be appreciated.
> >
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=106371
> > Reported-and-tested-by: Varun Koyyalagunta 
> > Signed-off-by: Chen Yu 
> 
> Below is my sort of version of this (untested) and I did it this way, because 
> the
> issue is specific to resume from hibernation (the workaround need not be
> applied anywhere else) and the hibernate_resume_nonboot_cpu_disable()
> thing may be useful to arm64 too if I'm not mistaken (James?).

James might want a flag to distinguish whether it is from suspend or resume,
in his arch-specific disabled_nonboot_cpus?

and this patch works on my xeon.
Tested-by: Chen Yu 

> 
> Actually, if arm64 uses it too, the __weak implementation can be dropped,
> because it will be possible to make it depend on ARCH_HIBERNATION_HEADER
> (x86 and arm64 are the only users of that).
> 
> Thanks,
> Rafael
> 


Re: [PATCH 2/2] ARM: BCM5301X: Fix NAND ECC parameters for D-Link DIR-885L

2016-07-06 Thread Florian Fainelli
Le 06/06/2016 00:43, Rafał Miłecki a écrit :
> This device uses BCH-1 instead of BCH-8. This fixes ECC errors and makes
> NAND usable with brcmnand.
> 
> Signed-off-by: Rafał Miłecki 

And also applied, thanks!
-- 
Florian


Re: [PATCH 1/2] ARM: BCM5301X: Specify NAND chip select and ECC in separated files

2016-07-06 Thread Florian Fainelli
Le 06/06/2016 00:43, Rafał Miłecki a écrit :
> Using separated file with common chip select parameters will allow us
> adding other ECC setups without code duplication.
> 
> Signed-off-by: Rafał Miłecki 

Applied, thanks!
-- 
Florian


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-06 Thread Xiao Guangrong



On 07/06/2016 07:48 PM, Paolo Bonzini wrote:



On 06/07/2016 06:02, Xiao Guangrong wrote:




May I ask you what the exact issue you have with this interface for
Intel to support
your own GPU virtualization?


Intel's vGPU can work with this framework. We really appreciate your
/ nvidia's
contribution.


Then, I don't think we should embargo Paolo's patch.


This patchset is specific for the framework design, i.e, mapping memory when
fault happens rather than mmap(), and this design is exact what we are
discussing for nearly two days.


I disagree, this patch fixes a bug because what Neo is doing is legal.
It may not be the design that will be committed, but the bug they found
in KVM is real.



I just worried if we really need fault-on-demand for device memory, i.e,
if device memory overcommit is safe enough.

It lacks a graceful way to recover the workload if the resource is really
overloaded. Unlike with normal memory, host kernel and guest kernel can not
do anything except killing the VM under this case. So the VM get crashed
due to device emulation, that is not safe as the device can be accessed in
userspace even with unprivileged user, it is vulnerable in data center.



Re: [PATCH v3 13/14] clk: sunxi-ng: Add H3 clocks

2016-07-06 Thread Michael Turquette
Hi Maxime,

Quoting Maxime Ripard (2016-06-29 12:05:34)
> +static void __init sun8i_h3_ccu_setup(struct device_node *node)
> +{
> +   void __iomem *reg;
> +   u32 val;
> +
> +   reg = of_io_request_and_map(node, 0, of_node_full_name(node));
> +   if (IS_ERR(reg)) {
> +   pr_err("%s: Could not map the clock registers\n",
> +  of_node_full_name(node));
> +   return;
> +   }
> +
> +   /* Force the PLL-Audio-1x divider to 4 */
> +   val = readl(reg + SUN8I_H3_PLL_AUDIO_REG);
> +   val &= ~GENMASK(4, 0);
> +   writel(val | 3, reg + SUN8I_H3_PLL_AUDIO_REG);
> +
> +   sunxi_ccu_probe(node, reg, &sun8i_h3_ccu_desc);
> +}
> +CLK_OF_DECLARE(sun8i_h3_ccu, "allwinner,sun8i-h3-ccu",
> +  sun8i_h3_ccu_setup);

There are several examples of drivers that split the clocks between
"early" CLK_OF_DECLARE clocks and "late" module clocks. If you really
need early clocks (which is less likely on a 64-bit platform with
architected timers), it would be nice to pair that with a proper
platform_driver (using builtin_platform_driver most likely).

Otherwise that is my only nitpick with this series. Looks good!

Best regards,
Mike


[PATCH v3] powerpc: Export thread_struct.used_vr/used_vsr to user space

2016-07-06 Thread wei . guo . simon
From: Simon Guo 

These 2 fields track whether user process has used Altivec/VSX
registers or not. They are used by kernel to setup signal frame
on user stack correctly regarding vector part.

CRIU(Checkpoint and Restore In User space) builds signal frame
for restored process. It will need this export information to
setup signal frame correctly. And CRIU will need to restore these
2 fields for the restored process.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Kees Cook 
Cc: Rashmica Gupta 
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Cc: Laurent Dufour 
Signed-off-by: Simon Guo 
Reviewed-by: Laurent Dufour 
--

v2 -> v3:
- enlarge reg_usage from 32 to 64 bits
- prefix ptrace API with PPC_

v1 -> v2:
- minor change for coding style
---
 arch/powerpc/include/uapi/asm/ptrace.h | 11 ++
 arch/powerpc/kernel/ptrace.c   | 39 ++
 arch/powerpc/kernel/ptrace32.c |  2 ++
 3 files changed, 52 insertions(+)

diff --git a/arch/powerpc/include/uapi/asm/ptrace.h 
b/arch/powerpc/include/uapi/asm/ptrace.h
index 8036b38..b357677 100644
--- a/arch/powerpc/include/uapi/asm/ptrace.h
+++ b/arch/powerpc/include/uapi/asm/ptrace.h
@@ -176,6 +176,17 @@ struct pt_regs {
 #define PTRACE_GETREGS64 0x16
 #define PTRACE_SETREGS64 0x17
 
+/*
+ * Get or set some register used bit.
+ * The flags will be saved in a 64 bit data.
+ * Currently it is only used for VR/VSR usage.
+ */
+#define PPC_PTRACE_GET_REGS_USAGE0x97
+#define PPC_PTRACE_SET_REGS_USAGE0x96
+
+#define PPC_PTRACE_REGS_USAGE_VR_BIT  0x01UL
+#define PPC_PTRACE_REGS_USAGE_VSR_BIT 0x02UL
+
 /* Calls to trace a 64bit program from a 32bit program */
 #define PPC_PTRACE_PEEKTEXT_3264 0x95
 #define PPC_PTRACE_PEEKDATA_3264 0x94
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index a9aa2a5..d1431bb 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -3018,6 +3018,45 @@ long arch_ptrace(struct task_struct *child, long request,
 REGSET_SPE, 0, 35 * sizeof(u32),
 datavp);
 #endif
+   case PPC_PTRACE_GET_REGS_USAGE:
+   {
+   u64 *u64_datap = (u64 *)datavp;
+   u64 reg_usage = 0;
+
+   if (addr != sizeof(u64))
+   return -EINVAL;
+
+#ifdef CONFIG_ALTIVEC
+   if (child->thread.used_vr)
+   reg_usage |= PPC_PTRACE_REGS_USAGE_VR_BIT;
+#endif
+#ifdef CONFIG_VSX
+   if (child->thread.used_vsr)
+   reg_usage |= PPC_PTRACE_REGS_USAGE_VSR_BIT;
+#endif
+   return put_user(reg_usage, u64_datap);
+   }
+   case PPC_PTRACE_SET_REGS_USAGE:
+   {
+   u64 *u64_datap = (u64 *)datavp;
+   u64 reg_usage = 0;
+
+   if (addr != sizeof(u64))
+   return -EINVAL;
+
+   ret = get_user(reg_usage, u64_datap);
+   if (ret)
+   return ret;
+#ifdef CONFIG_ALTIVEC
+   child->thread.used_vr =
+   !!(reg_usage & PPC_PTRACE_REGS_USAGE_VR_BIT);
+#endif
+#ifdef CONFIG_VSX
+   child->thread.used_vsr =
+   !!(reg_usage & PPC_PTRACE_REGS_USAGE_VSR_BIT);
+#endif
+   break;
+   }
 
default:
ret = ptrace_request(child, request, addr, data);
diff --git a/arch/powerpc/kernel/ptrace32.c b/arch/powerpc/kernel/ptrace32.c
index f52b7db..3aaa773 100644
--- a/arch/powerpc/kernel/ptrace32.c
+++ b/arch/powerpc/kernel/ptrace32.c
@@ -305,6 +305,8 @@ long compat_arch_ptrace(struct task_struct *child, 
compat_long_t request,
case PPC_PTRACE_GETHWDBGINFO:
case PPC_PTRACE_SETHWDEBUG:
case PPC_PTRACE_DELHWDEBUG:
+   case PPC_PTRACE_GET_REGS_USAGE:
+   case PPC_PTRACE_SET_REGS_USAGE:
ret = arch_ptrace(child, request, addr, data);
break;
 
-- 
1.8.3.1



Re: [PATCH V2 05/10] firmware: tegra: add BPMP support

2016-07-06 Thread Alexandre Courbot
On Thu, Jul 7, 2016 at 1:47 AM, Matt Longnecker  wrote:
> Alex,
>
>
> On 07/06/2016 04:39 AM, Alexandre Courbot wrote:
>>>
>>> diff --git a/include/soc/tegra/bpmp_abi.h b/include/soc/tegra/bpmp_abi.h
>>> >new file mode 100644
>>> >index ..0aaef5960e29
>>> >--- /dev/null
>>> >+++ b/include/soc/tegra/bpmp_abi.h
>>> >@@ -0,0 +1,1601 @@
>>> >+/*
>>> >+ * Copyright (c) 2014-2016, NVIDIA CORPORATION.  All rights reserved.
>>> >+ *
>>> >+ * This program is free software; you can redistribute it and/or modify
>>> > it
>>> >+ * under the terms and conditions of the GNU General Public License,
>>> >+ * version 2, as published by the Free Software Foundation.
>>> >+ *
>>> >+ * This program is distributed in the hope it will be useful, but
>>> > WITHOUT
>>> >+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
>>> > or
>>> >+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
>>> > License for
>>> >+ * more details.
>>> >+ *
>>> >+ * You should have received a copy of the GNU General Public License
>>> >+ * along with this program.  If not, see.
>>> >+ */
>>> >+
>>> >+#ifndef_ABI_BPMP_ABI_H_
>>> >+#define_ABI_BPMP_ABI_H_
>>> >+
>>>
>> ...
>>
>> There is a lot of stuff in this file, most of which we are not using
>> now - this is ok, but unless this is a file synced from an outside
>> resource maybe we should trim the structures we don't need and add
>> them as we make use of them? It helps dividing the work in bite-size
>> chunks.
>>
>> Regarding the documentation format of this file, is this valid kernel
>> documentation since the adoption of Sphynx? Or is it whatever the
>> origin is using?
>
> bpmp_abi.h is meant to be delivered as is from an NVIDIA internal repo to a
> variety of OS'es. Each of them has a different documentation standard and
> coding standard.
>
> I'd like to avoid trimming parts from the file (or even worse modifying
> parts of the file) so that future deliveries are trivial.

Makes sense, thanks to you and Stephen for the clarification.


2% lånetilbud

2016-07-06 Thread David Rogers
Opmærksomhed;

Dette er for at meddele Dem, at som et privat selskab, Udlån selskab baseret i 
Storbritannien (Um Financial Ltd). Vi giver lån til en værdi af $ 10.000 til $ 
300.000.000 på 2% rente til alle interesserede. Country er ikke en barriere, så 
er du velkommen til at kontakte os på denne e-mail: i...@umafh.co.uk


[PATCH -next v2] memory: atmel-ebi: use PTR_ERR_OR_ZERO() to simplify the code

2016-07-06 Thread weiyj_lk
From: Wei Yongjun 

Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR.

Generated by coccinelle.

Signed-off-by: Wei Yongjun 
---
 drivers/memory/atmel-ebi.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/memory/atmel-ebi.c b/drivers/memory/atmel-ebi.c
index f87ad6f..b5ed3bd 100644
--- a/drivers/memory/atmel-ebi.c
+++ b/drivers/memory/atmel-ebi.c
@@ -410,10 +410,7 @@ static int at91sam9_ebi_init(struct at91_ebi *ebi)
 
field.reg = AT91SAM9_SMC_MODE(AT91SAM9_SMC_GENERIC);
fields->mode = devm_regmap_field_alloc(ebi->dev, ebi->smc, field);
-   if (IS_ERR(fields->mode))
-   return PTR_ERR(fields->mode);
-
-   return 0;
+   return PTR_ERR_OR_ZERO(fields->mode);
 }
 
 static int sama5d3_ebi_init(struct at91_ebi *ebi)
@@ -441,10 +438,7 @@ static int sama5d3_ebi_init(struct at91_ebi *ebi)
 
field.reg = SAMA5_SMC_MODE(SAMA5_SMC_GENERIC);
fields->mode = devm_regmap_field_alloc(ebi->dev, ebi->smc, field);
-   if (IS_ERR(fields->mode))
-   return PTR_ERR(fields->mode);
-
-   return 0;
+   return PTR_ERR_OR_ZERO(fields->mode);
 }
 
 static int at91_ebi_dev_setup(struct at91_ebi *ebi, struct device_node *np,




Re: [PATCH v3] KVM: nVMX: Fix incorrect preemption timer vmexit in nested guest

2016-07-06 Thread Wanpeng Li
2016-07-07 1:11 GMT+08:00 Paolo Bonzini :
>
>
> On 06/07/2016 18:03, Haozhong Zhang wrote:
 This patch also fixed the crash of L1 Xen with L2 HVM guest. Xen does
 not enable preemption timer for HVM guests, and will get panic if it
 receives a preemption timer vmexit.
>>>
>>> Thanks!  I'm still not sure why the bit is set in the vmcs02 though...
>>
>> Yes, it looks really weird.
>>
>> I replaced "return false" in Wanpeng's patch by
>>
>> pr_info("VMCS: preemption timer enabled = %d\n",
>> !!(vmcs_read32(PIN_BASED_VM_EXEC_CONTROL) & 
>> PIN_BASED_VMX_PREEMPTION_TIMER));
>>
>> and redid my test. As expected, L1 Xen crashed due to the unexpected
>> preemption timer vmexit. I got a log from above statement just before crash:
>>
>> VMCS: preemption timer enabled = 1
>>
>> which is expected to be 0, because preemption timer is disabled in
>> vmcs02. I also modified L1 Xen to dump VMCS at crash, and it says
>> preemption timer is disabled.
>>
>> I noticed Jim Mattson recently sent a patch "KVM: nVMX: Fix memory
>> corruption when using VMCS shadowing" to fix the inconsistency between
>> vmcs12 and its shadow. Is it relevant here? I'll test his patch
>> tomorrow.
>
> No, it shouldn't have any effect.
>
> I think it happens when the post_block hook switches back from sw_timer
> to hv_timer, and L2 is running.  So the right fix should be along the
> lines of what I posted earlier.  If you don't beat me to it, I'll take
> another look tomorrow.

I think I just figure out the root cause, I will send out a patch to fix it.

Regards,
Wanpeng Li


Re: [RESEND RFC PATCH 5/5] platform: x86: add platform driver for UP Board

2016-07-06 Thread Bryan O'Donoghue
On Mon, 2016-07-04 at 17:07 +0100, Dan O'Donovan wrote:
> This platform driver instantiates a platform device relevant to the
> UP board, in particular a device representing the unique I/O pin CPLD
> controller on the UP board.
> 
> In addition, this driver registers pin maps to configure
> appropriately the underlying SoC GPIO pins for use with the
> UP Board I/O pin header.
> 
> Signed-off-by: Dan O'Donovan 
> ---
>  drivers/platform/x86/Kconfig|  13 
>  drivers/platform/x86/Makefile   |   5 ++
>  drivers/platform/x86/up_board.c | 167
> 
>  3 files changed, 185 insertions(+)
>  create mode 100644 drivers/platform/x86/up_board.c
> 
> diff --git a/drivers/platform/x86/Kconfig
> b/drivers/platform/x86/Kconfig
> index 3ec0025..b579adb 100644
> --- a/drivers/platform/x86/Kconfig
> +++ b/drivers/platform/x86/Kconfig
> @@ -1011,4 +1011,17 @@ config INTEL_TELEMETRY
>     used to get various SoC events and parameters
>     directly via debugfs files. Various tools may use
>     this interface for SoC state monitoring.
> +
> +config UP_BOARD
> + bool "UP Board Platform I/O Driver"

Addressing your question in the cover letter - I'm not sure where
up_board.c should go but, I reckon up_board_leds.c should go into
drivers/leds, up_board_gpio.c should go into drivers/gpio etc.

My highly scientific feeling is that up_board.c and up_board_cpld.c
shouldn't be in this directory but, that they would fit well in
drivers/mfd.

> + depends on ACPI && PINCTRL_CHERRYVIEW
> + select GPIOLIB_IRQCHIP
> + select LEDS_CLASS
> + select NEW_LEDS
> + ---help---
> +   This driver provides support for the platform functions on
> the UP
> +   board.  It includes platform, pinctrl and gpio drivers for
> the CPLD
> +   that manages the external pin header, as well as a driver
> for the
> +   built-in LEDs.
> +
>  endif # X86_PLATFORM_DEVICES
> diff --git a/drivers/platform/x86/Makefile
> b/drivers/platform/x86/Makefile
> index 9b11b40..687c583 100644
> --- a/drivers/platform/x86/Makefile
> +++ b/drivers/platform/x86/Makefile
> @@ -70,3 +70,8 @@ obj-$(CONFIG_INTEL_TELEMETRY)   +=
> intel_telemetry_core.o \
>      intel_telemetry_pltdrv.o \
>      intel_telemetry_debugfs.o
>  obj-$(CONFIG_INTEL_PMC_CORE)+= intel_pmc_core.o
> +obj-$(CONFIG_UP_BOARD)   += up_board.o \
> +    up_board_cpld.o \
> +    up_board_pinctrl.o \
> +    up_board_gpio.o \
> +    up_board_leds.o
> diff --git a/drivers/platform/x86/up_board.c
> b/drivers/platform/x86/up_board.c
> new file mode 100644
> index 000..8635759
> --- /dev/null
> +++ b/drivers/platform/x86/up_board.c
> @@ -0,0 +1,167 @@
> +

Dangling extra line.

> +
> +/*
> + * On the UP board, if the ODEn bit 

Do you mean Open Drain Enable bit ? I think this comment will parse
better if its called out explicitly.

> is set on the pad configuration
> + * it seems to impair some functions on the I/O header such as UART,
> SPI
> + * and I2C.  So we disable it for all header pins by default.
> + */

Seems to impair is a bit vague... what does impair mean - not working
or just wrong ?

> +static struct up_board_info *up_board;
> +
> +static int __init
> +up_board_init_devices(void)

Why are you breaking up the __init and the function name onto separate
lines ?

> +{
> + const struct dmi_system_id *system_id;
> + int ret;
> +
> + system_id = dmi_first_match(up_board_id_table);
> + if (!system_id)
> + return -ENXIO;

How about -ENODEV

> +
> +
> + up_board->vreg_pdev =
> + regulator_register_always_on(0, "fixed-3.3V",
> +  vref3v3_consumers,
> +  ARRAY_SIZE(vref3v3_cons
> umers),
> +  330);
> + if (!up_board->vreg_pdev) {
> + pr_err("Failed to register UP Board ADC vref
> regulator");

dev_err(&up_board->cpld_pdev.dev) ?

> + platform_device_unregister(up_board->cpld_pdev);
> + return -ENODEV;
> + }
> +
> + return 0;
> +}
> +
> +static void __exit
> +up_board_exit(void)

Same comment on multiple line declarations.

---
bod



linux-next: manual merge of the mac80211-next tree with the wireless-drivers-next tree

2016-07-06 Thread Stephen Rothwell
Hi Johannes,

Today's linux-next merge of the mac80211-next tree got a conflict in:

  drivers/net/wireless/marvell/mwifiex/cmdevt.c

between commit:

  a9c790ba23eb ("mwifiex: factor out mwifiex_cancel_scan")

from the wireless-drivers-next tree and commit:

  1d76250bd34a ("nl80211: support beacon report scanning")

from the mac80211-next tree.

I fixed it up (I used the wireless-drivers-next tree version of this file
and then added the following merge fix patch) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

From: Stephen Rothwell 
Date: Thu, 7 Jul 2016 11:51:35 +1000
Subject: [PATCH] mwifiex: fixup for "nl80211: support beacon report scanning"

Signed-off-by: Stephen Rothwell 
---
 drivers/net/wireless/marvell/mwifiex/scan.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/scan.c 
b/drivers/net/wireless/marvell/mwifiex/scan.c
index 4d21ca9744c1..ed3de0754a08 100644
--- a/drivers/net/wireless/marvell/mwifiex/scan.c
+++ b/drivers/net/wireless/marvell/mwifiex/scan.c
@@ -2026,9 +2026,13 @@ void mwifiex_cancel_scan(struct mwifiex_adapter *adapter)
if (!priv)
continue;
if (priv->scan_request) {
+   struct cfg80211_scan_info info = {
+   .aborted = true,
+   };
+
mwifiex_dbg(adapter, INFO,
"info: aborting scan\n");
-   cfg80211_scan_done(priv->scan_request, 1);
+   cfg80211_scan_done(priv->scan_request, &info);
priv->scan_request = NULL;
}
}
-- 
2.8.1

-- 
Cheers,
Stephen Rothwell


Re: [PATCH] x86: remove LTO flags

2016-07-06 Thread Andi Kleen
On Wed, Jul 06, 2016 at 05:22:50PM -0700, Luis R. Rodriguez wrote:
> The setup for LTO never made it upstream, and although this has
> some users, this is now really old stuff for a gcc 4.7 LTO problem.

Sorry, the LTO flags are still needed. Please don't remove

The 4.7 only workaround was only the LTO_REFERENCE_INITCALL macro,
which can be removed (if LTO is not support 4.7 anymore)

-Andi


[RFC PATCH] ACPI / EC: Fix an order issue in ec_remove_handlers()

2016-07-06 Thread Lv Zheng
There is an order issue in ec_remove_handlers() that the functions invoked
in it are not invoked in the reversed order of their appearance in
ec_install_handlers(). This existing issue has been triggered by the
following commit:
  Commit: dcf15cbded656a12335bc4151f3f75f10080a375
  Subject: ACPI / EC: Fix a boot EC regresion by restoring boot EC
The commit invokes ec_remove_handlers() during runtime, thus uncovers this
issue. This patch fixes this regression.

Fixes: dcf15cbded65 ("ACPI / EC: Fix a boot EC regresion by restoring boot EC")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=102421
Reported-by: Wolfram Sang 
Reported-by: Nicholas 
Cc: Wolfram Sang 
Cc: Nicholas 
Signed-off-by: Lv Zheng 
---
 drivers/acpi/ec.c |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index b1050a0..9ff3d4b 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -1331,7 +1331,12 @@ static int ec_install_handlers(struct acpi_ec *ec)
 
 static void ec_remove_handlers(struct acpi_ec *ec)
 {
-   acpi_ec_stop(ec, false);
+   if (test_bit(EC_FLAGS_GPE_HANDLER_INSTALLED, &ec->flags)) {
+   if (ACPI_FAILURE(acpi_remove_gpe_handler(NULL, ec->gpe,
+   &acpi_ec_gpe_handler)))
+   pr_err("failed to remove gpe handler\n");
+   clear_bit(EC_FLAGS_GPE_HANDLER_INSTALLED, &ec->flags);
+   }
 
if (test_bit(EC_FLAGS_EC_HANDLER_INSTALLED, &ec->flags)) {
if (ACPI_FAILURE(acpi_remove_address_space_handler(ec->handle,
@@ -1340,12 +1345,7 @@ static void ec_remove_handlers(struct acpi_ec *ec)
clear_bit(EC_FLAGS_EC_HANDLER_INSTALLED, &ec->flags);
}
 
-   if (test_bit(EC_FLAGS_GPE_HANDLER_INSTALLED, &ec->flags)) {
-   if (ACPI_FAILURE(acpi_remove_gpe_handler(NULL, ec->gpe,
-   &acpi_ec_gpe_handler)))
-   pr_err("failed to remove gpe handler\n");
-   clear_bit(EC_FLAGS_GPE_HANDLER_INSTALLED, &ec->flags);
-   }
+   acpi_ec_stop(ec, false);
 }
 
 static struct acpi_ec *acpi_ec_alloc(void)
-- 
1.7.10



[PATCH] [RFC V1]s390/perf: fix 'start' address of module's map

2016-07-06 Thread Song Shan Gong
At preset, when creating module's map, perf gets 'start' address by parsing
'proc/modules', but it's module base address, isn't the start address of
'.text' section. In most archs, it's OK. But for s390, it places 'GOT' and
'PLT' relocations before '.text' section. So there exists an offset between
module base address and '.text' section, which will incur wrong symbol
resolution for modules.

Fix this bug by getting 'start' address of module's map from parsing
'/sys/module/[module name]/sections/.text', not from '/proc/modules'.

Signed-off-by: Song Shan Gong 
---
 tools/perf/arch/s390/util/Build  |  2 ++
 tools/perf/arch/s390/util/sym-handling.c | 49 
 tools/perf/util/machine.c|  6 
 tools/perf/util/machine.h|  2 ++
 4 files changed, 59 insertions(+)
 create mode 100644 tools/perf/arch/s390/util/sym-handling.c

diff --git a/tools/perf/arch/s390/util/Build b/tools/perf/arch/s390/util/Build
index 8a61372..5e322ed 100644
--- a/tools/perf/arch/s390/util/Build
+++ b/tools/perf/arch/s390/util/Build
@@ -2,3 +2,5 @@ libperf-y += header.o
 libperf-y += kvm-stat.o
 
 libperf-$(CONFIG_DWARF) += dwarf-regs.o
+
+libperf-y += sym-handling.o
diff --git a/tools/perf/arch/s390/util/sym-handling.c 
b/tools/perf/arch/s390/util/sym-handling.c
new file mode 100644
index 000..efe2a50
--- /dev/null
+++ b/tools/perf/arch/s390/util/sym-handling.c
@@ -0,0 +1,49 @@
+#include 
+#include 
+#include 
+#include "symbol.h"
+#include "map.h"
+#include "util.h"
+#include "machine.h"
+
+int arch__fix_module_baseaddr(struct machine *machine,
+   u64 *start, const char *name)
+{
+   char path[PATH_MAX];
+   char *module_name = strdup(name);
+   int len = strlen(module_name);
+   FILE *file;
+   int err = 0;
+   u64 text_start;
+   char *line = NULL;
+   size_t n;
+   char *sep;
+
+   module_name[len - 1] = '\0';
+   module_name += 1;
+   snprintf(path, PATH_MAX, "%s/sys/module/%s/sections/.text",
+   machine->root_dir, module_name);
+   file = fopen(path, "r");
+   if (file == NULL)
+   return -1;
+
+   len = getline(&line, &n, file);
+   if (len < 0) {
+   err = -1;
+   goto out;
+   }
+   line[--len] = '\0'; /* \n */
+   sep = strrchr(line, 'x');
+   if (sep == NULL) {
+   err = -1;
+   goto out;
+   }
+   hex2u64(sep + 1, &text_start);
+
+   *start = text_start;
+out:
+   free(line);
+   fclose(file);
+   free(module_name - 1);
+   return err;
+}
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index b177218..e5c2721 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1091,12 +1091,18 @@ static int machine__set_modules_path(struct machine 
*machine)
 
return map_groups__set_modules_path_dir(&machine->kmaps, modules_path, 
0);
 }
+int __weak arch__fix_module_baseaddr(struct machine *machine __maybe_unused,
+   u64 *start __maybe_unused, const char *name 
__maybe_unused)
+{
+   return 0;
+}
 
 static int machine__create_module(void *arg, const char *name, u64 start)
 {
struct machine *machine = arg;
struct map *map;
 
+   arch__fix_module_baseaddr(machine, &start, name);
map = machine__findnew_module_map(machine, start, name);
if (map == NULL)
return -1;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 41ac9cf..da7b6c0 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -216,6 +216,8 @@ struct symbol *machine__find_kernel_function_by_name(struct 
machine *machine,
 
 struct map *machine__findnew_module_map(struct machine *machine, u64 start,
const char *filename);
+int arch__fix_module_baseaddr(struct machine *machine, u64 *start,
+   const char *name);
 
 int __machine__load_kallsyms(struct machine *machine, const char *filename,
 enum map_type type, bool no_kcore, symbol_filter_t 
filter);
-- 
2.3.0



Re: [PATCH v3 3/7] sparc: support static_key usage in non-module __exit sections

2016-07-06 Thread David Miller
From: Jason Baron 
Date: Wed,  6 Jul 2016 17:42:32 -0400

> The jump table can reference text found in an __exit section. Thus,
> instead of discarding it at build/link time, include EXIT_TEXT as part
> of __init and release it at system boot time.
> 
> Without this patch the link fails with:
> 
> `.exit.text' referenced in section `__jump_table' of xxx.o:
> defined in discarded section `.exit.text' of xxx.o
> 
> Cc: "David S. Miller" 
> Cc: sparcli...@vger.kernel.org
> Signed-off-by: Jason Baron 

Acked-by: David S. Miller 


[PATCH v2] irqchip/qeic: move qeic driver from drivers/soc/fsl/qe

2016-07-06 Thread Zhao Qiang
The driver stays the same.

Signed-off-by: Zhao Qiang 
---
Changes for v2:
- modify the subject and commit msg

 drivers/irqchip/Makefile| 1 +
 drivers/{soc/fsl/qe => irqchip}/qe_ic.c | 0
 drivers/{soc/fsl/qe => irqchip}/qe_ic.h | 0
 drivers/soc/fsl/qe/Makefile | 2 +-
 4 files changed, 2 insertions(+), 1 deletion(-)
 rename drivers/{soc/fsl/qe => irqchip}/qe_ic.c (100%)
 rename drivers/{soc/fsl/qe => irqchip}/qe_ic.h (100%)

diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
index 38853a1..cef999d 100644
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@ -69,3 +69,4 @@ obj-$(CONFIG_PIC32_EVIC)  += irq-pic32-evic.o
 obj-$(CONFIG_MVEBU_ODMI)   += irq-mvebu-odmi.o
 obj-$(CONFIG_LS_SCFG_MSI)  += irq-ls-scfg-msi.o
 obj-$(CONFIG_EZNPS_GIC)+= irq-eznps.o
+obj-$(CONFIG_QUICC_ENGINE) += qe_ic.o
diff --git a/drivers/soc/fsl/qe/qe_ic.c b/drivers/irqchip/qe_ic.c
similarity index 100%
rename from drivers/soc/fsl/qe/qe_ic.c
rename to drivers/irqchip/qe_ic.c
diff --git a/drivers/soc/fsl/qe/qe_ic.h b/drivers/irqchip/qe_ic.h
similarity index 100%
rename from drivers/soc/fsl/qe/qe_ic.h
rename to drivers/irqchip/qe_ic.h
diff --git a/drivers/soc/fsl/qe/Makefile b/drivers/soc/fsl/qe/Makefile
index 2031d38..51e4726 100644
--- a/drivers/soc/fsl/qe/Makefile
+++ b/drivers/soc/fsl/qe/Makefile
@@ -1,7 +1,7 @@
 #
 # Makefile for the linux ppc-specific parts of QE
 #
-obj-$(CONFIG_QUICC_ENGINE)+= qe.o qe_common.o qe_ic.o qe_io.o
+obj-$(CONFIG_QUICC_ENGINE)+= qe.o qe_common.o qe_io.o
 obj-$(CONFIG_CPM)  += qe_common.o
 obj-$(CONFIG_UCC)  += ucc.o
 obj-$(CONFIG_UCC_SLOW) += ucc_slow.o
-- 
2.1.0.27.g96db324



Re: [PATCH] ipv6: Fix soft lockup for ipv6 network notifier.

2016-07-06 Thread Ding Tianhong
On 2016/7/6 16:44, Eric Dumazet wrote:
> On Wed, 2016-07-06 at 16:15 +0800, Ding Tianhong wrote:
>> Hi Eric:
>>
>> I had found out that the patch aaf92f(netfilter: conntrack: resched in
>> nf_ct_iterate_cleanup) solve the problem, 
>> this patch add cond_sched() in the nf_ct_iterate_cleanup() which will
>> be called in the net notifier chain every time,
>> and I revert this patch at kernel 4.7-rc4 , it will panic for soft
>> lockup, so I am not sure whether our patch is need,
>> it looks like if I disable the CONFIG for netfilter that would
>> register the nf_ct_iterate_cleanup as notifier, the problem still be
>> exist.
> 
> Well, I do not have conntrack on my kernels, and I can not reproduce the
> issue.
> 
> So I am guessing other patches also solved a scalability issue, between
> 4.1 and 4.7
> 
> I am aware of something that David did for IPv4, but this might help as
> well for IPv6.
> 
> commit fbd40ea0180a2d328c5adc61414dc8bab9335ce2
> ipv4: Don't do expensive useless work during inetdev destroy.
> 

Hi Eric:

I check this patch:
[root@localhost linux]# git name-rev fbd40ea0180a2d328c5adc61414dc8bab9335ce2
fbd40ea0180a2d328c5adc61414dc8bab9335ce2 tags/v4.6-rc1~91^2~63

So the kernel4.7-RC4 already has this patch, but it have no effort if I revert 
the commit  aaf92f(netfilter: conntrack: resched in
nf_ct_iterate_cleanup), So I don't think David's patch could fix this problem.

Thanks
Ding

> 
> 
> 
> 




Re: [PATCH 09/31] mm, vmscan: by default have direct reclaim only shrink once per node

2016-07-06 Thread Joonsoo Kim
On Fri, Jul 01, 2016 at 09:01:17PM +0100, Mel Gorman wrote:
> Direct reclaim iterates over all zones in the zonelist and shrinking them
> but this is in conflict with node-based reclaim.  In the default case,
> only shrink once per node.
> 
> Signed-off-by: Mel Gorman 
> Acked-by: Johannes Weiner 
> Acked-by: Vlastimil Babka 
> ---
>  mm/vmscan.c | 19 +++
>  1 file changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index b524d3b72527..34656173a670 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2552,14 +2552,6 @@ static inline bool compaction_ready(struct zone *zone, 
> int order, int classzone_
>   * try to reclaim pages from zones which will satisfy the caller's allocation
>   * request.
>   *
> - * We reclaim from a zone even if that zone is over high_wmark_pages(zone).
> - * Because:
> - * a) The caller may be trying to free *extra* pages to satisfy a 
> higher-order
> - *allocation or
> - * b) The target zone may be at high_wmark_pages(zone) but the lower zones
> - *must go *over* high_wmark_pages(zone) to satisfy the `incremental min'
> - *zone defense algorithm.
> - *
>   * If a zone is deemed to be full of pinned pages then just give it a light
>   * scan then give up on it.
>   */
> @@ -2571,6 +2563,7 @@ static void shrink_zones(struct zonelist *zonelist, 
> struct scan_control *sc)
>   unsigned long nr_soft_scanned;
>   gfp_t orig_mask;
>   enum zone_type classzone_idx;
> + pg_data_t *last_pgdat = NULL;
>  
>   /*
>* If the number of buffer_heads in the machine exceeds the maximum
> @@ -2600,6 +2593,16 @@ static void shrink_zones(struct zonelist *zonelist, 
> struct scan_control *sc)
>   classzone_idx--;
>  
>   /*
> +  * Shrink each node in the zonelist once. If the zonelist is
> +  * ordered by zone (not the default) then a node may be
> +  * shrunk multiple times but in that case the user prefers
> +  * lower zones being preserved
> +  */
> + if (zone->zone_pgdat == last_pgdat)
> + continue;
> + last_pgdat = zone->zone_pgdat;
> +
> + /*

After this change, compaction_ready() which uses zone information
would be called with highest zone in node. So, if some lower zone in
that node is compaction-ready, we cannot stop the reclaim.

Thanks.


[PATCH 4.6 17/31] USB: uas: Fix slave queue_depth not being set

2016-07-06 Thread Greg Kroah-Hartman
4.6-stable review patch.  If anyone has any objections, please let me know.

--

From: Hans de Goede 

commit 593224ea77b1ca842f45cf76f4deeef44dfbacd1 upstream.

Commit 198de51dbc34 ("USB: uas: Limit qdepth at the scsi-host level")
removed the scsi_change_queue_depth() call from uas_slave_configure()
assuming that the slave would inherit the host's queue_depth, which
that commit sets to the same value.

This is incorrect, without the scsi_change_queue_depth() call the slave's
queue_depth defaults to 1, introducing a performance regression.

This commit restores the call, fixing the performance regression.

Fixes: 198de51dbc34 ("USB: uas: Limit qdepth at the scsi-host level")
Reported-by: Tom Yan 
Signed-off-by: Hans de Goede 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/usb/storage/uas.c |1 +
 1 file changed, 1 insertion(+)

--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -835,6 +835,7 @@ static int uas_slave_configure(struct sc
if (devinfo->flags & US_FL_BROKEN_FUA)
sdev->broken_fua = 1;
 
+   scsi_change_queue_depth(sdev, devinfo->qdepth - 2);
return 0;
 }
 




[PATCH 4.6 14/31] crypto: vmx - Increase priority of aes-cbc cipher

2016-07-06 Thread Greg Kroah-Hartman
4.6-stable review patch.  If anyone has any objections, please let me know.

--

From: Anton Blanchard 

commit 12d3f49e1ffbbf8cbbb60acae5a21103c5c841ac upstream.

All of the VMX AES ciphers (AES, AES-CBC and AES-CTR) are set at
priority 1000. Unfortunately this means we never use AES-CBC and
AES-CTR, because the base AES-CBC cipher that is implemented on
top of AES inherits its priority.

To fix this, AES-CBC and AES-CTR have to be a higher priority. Set
them to 2000.

Testing on a POWER8 with:

cryptsetup benchmark --cipher aes --key-size 256

Shows decryption speed increase from 402.4 MB/s to 3069.2 MB/s,
over 7x faster. Thanks to Mike Strosaker for helping me debug
this issue.

Fixes: 8c755ace357c ("crypto: vmx - Adding CBC routines for VMX module")
Signed-off-by: Anton Blanchard 
Signed-off-by: Herbert Xu 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/crypto/vmx/aes_cbc.c |2 +-
 drivers/crypto/vmx/aes_ctr.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/crypto/vmx/aes_cbc.c
+++ b/drivers/crypto/vmx/aes_cbc.c
@@ -182,7 +182,7 @@ struct crypto_alg p8_aes_cbc_alg = {
.cra_name = "cbc(aes)",
.cra_driver_name = "p8_aes_cbc",
.cra_module = THIS_MODULE,
-   .cra_priority = 1000,
+   .cra_priority = 2000,
.cra_type = &crypto_blkcipher_type,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER | CRYPTO_ALG_NEED_FALLBACK,
.cra_alignmask = 0,
--- a/drivers/crypto/vmx/aes_ctr.c
+++ b/drivers/crypto/vmx/aes_ctr.c
@@ -166,7 +166,7 @@ struct crypto_alg p8_aes_ctr_alg = {
.cra_name = "ctr(aes)",
.cra_driver_name = "p8_aes_ctr",
.cra_module = THIS_MODULE,
-   .cra_priority = 1000,
+   .cra_priority = 2000,
.cra_type = &crypto_blkcipher_type,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER | CRYPTO_ALG_NEED_FALLBACK,
.cra_alignmask = 0,




[PATCH 4.6 01/31] net_sched: fix pfifo_head_drop behavior vs backlog

2016-07-06 Thread Greg Kroah-Hartman
4.6-stable review patch.  If anyone has any objections, please let me know.

--

From: Eric Dumazet 

[ Upstream commit 6c0d54f1897d229748d4f41ef919078db6db2123 ]

When the qdisc is full, we drop a packet at the head of the queue,
queue the current skb and return NET_XMIT_CN

Now we track backlog on upper qdiscs, we need to call
qdisc_tree_reduce_backlog(), even if the qlen did not change.

Fixes: 2f5fb43f ("net_sched: update hierarchical backlog too")
Signed-off-by: Eric Dumazet 
Cc: WANG Cong 
Cc: Jamal Hadi Salim 
Acked-by: Cong Wang 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/sched/sch_fifo.c |4 
 1 file changed, 4 insertions(+)

--- a/net/sched/sch_fifo.c
+++ b/net/sched/sch_fifo.c
@@ -37,14 +37,18 @@ static int pfifo_enqueue(struct sk_buff
 
 static int pfifo_tail_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
+   unsigned int prev_backlog;
+
if (likely(skb_queue_len(&sch->q) < sch->limit))
return qdisc_enqueue_tail(skb, sch);
 
+   prev_backlog = sch->qstats.backlog;
/* queue full, remove one skb to fulfill the limit */
__qdisc_queue_drop_head(sch, &sch->q);
qdisc_qstats_drop(sch);
qdisc_enqueue_tail(skb, sch);
 
+   qdisc_tree_reduce_backlog(sch, 0, prev_backlog - sch->qstats.backlog);
return NET_XMIT_CN;
 }
 




[PATCH 4.6 15/31] crypto: ux500 - memmove the right size

2016-07-06 Thread Greg Kroah-Hartman
4.6-stable review patch.  If anyone has any objections, please let me know.

--

From: Linus Walleij 

commit 19ced623db2fe91604d69f7d86b03144c5107739 upstream.

The hash buffer is really HASH_BLOCK_SIZE bytes, someone
must have thought that memmove takes n*u32 words by mistake.
Tests work as good/bad as before after this patch.

Cc: Joakim Bech 
Reported-by: David Binderman 
Signed-off-by: Linus Walleij 
Signed-off-by: Herbert Xu 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/crypto/ux500/hash/hash_core.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/crypto/ux500/hash/hash_core.c
+++ b/drivers/crypto/ux500/hash/hash_core.c
@@ -781,7 +781,7 @@ static int hash_process_data(struct hash
&device_data->state);
memmove(req_ctx->state.buffer,
device_data->state.buffer,
-   HASH_BLOCK_SIZE / sizeof(u32));
+   HASH_BLOCK_SIZE);
if (ret) {
dev_err(device_data->dev,
"%s: hash_resume_state() 
failed!\n",
@@ -832,7 +832,7 @@ static int hash_process_data(struct hash
 
memmove(device_data->state.buffer,
req_ctx->state.buffer,
-   HASH_BLOCK_SIZE / sizeof(u32));
+   HASH_BLOCK_SIZE);
if (ret) {
dev_err(device_data->dev, "%s: 
hash_save_state() failed!\n",
__func__);




[PATCH 1/1] mfd: Use gpio-ich driver for 8-series and 9-series Intel PCH devices

2016-07-06 Thread Dan Gora


The Intel 8-series and 9-series PCH devices, described by the descriptors
LPC_LPT and LPC_9S although codenamed 'lynxpoint' do not use the same GPIO
register layout which is used by the gpio-lynxpoint driver.  They use the
same ICH_V5_GPIO layout as the gpio-ich driver.

See:
http://www.intel.com/content/www/us/en/chipsets/8-series-chipset-pch-datasheet.html
http://www.intel.com/content/www/us/en/chipsets/9-series-chipset-pch-datasheet.html

The devices described by "Mobile 4th Generation Intel Core Processor
Family I/O" manual use the gpio-lynxpoint driver and are described by the
LPC_LPT_LP descriptor.

See:
http://www.intel.com/content/www/us/en/processors/core/4th-gen-core-family-mobile-i-o-datasheet.html

Signed-off-by: Dan Gora 
---
 drivers/mfd/lpc_ich.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/mfd/lpc_ich.c b/drivers/mfd/lpc_ich.c
index bd3aa45..66f72ad 100644
--- a/drivers/mfd/lpc_ich.c
+++ b/drivers/mfd/lpc_ich.c
@@ -493,6 +493,7 @@ static struct lpc_ich_info lpc_chipset_info[] = {
[LPC_LPT] = {
.name = "Lynx Point",
.iTCO_version = 2,
+   .gpio_version = ICH_V5_GPIO,
},
[LPC_LPT_LP] = {
.name = "Lynx Point_LP",
@@ -530,6 +531,7 @@ static struct lpc_ich_info lpc_chipset_info[] = {
[LPC_9S] = {
.name = "9 Series",
.iTCO_version = 2,
+   .gpio_version = ICH_V5_GPIO,
},
 };

--
2.8.0


  1   2   3   4   5   6   7   8   >