Re: [PATCH] ARM: dts: omap3-n900: Allow gpio keys to be disabled

2016-01-20 Thread Ivaylo Dimitrov

Ping

On 10.01.2016 19:46, Ivaylo Dimitrov wrote:

Add linux,can-disable; to all gpios exported from gpio-keys driver, so
userspace can disable them

Signed-off-by: Ivaylo Dimitrov 
---
  arch/arm/boot/dts/omap3-n900.dts | 6 ++
  1 file changed, 6 insertions(+)



Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-20 Thread Daniel Vetter
On Thu, Jan 21, 2016 at 03:41:27PM +0900, Michel Dänzer wrote:
> On 21.01.2016 15:38, Michel Dänzer wrote:
> > On 21.01.2016 14:31, Mario Kleiner wrote:
> >> On 01/21/2016 04:43 AM, Michel Dänzer wrote:
> >>> On 21.01.2016 05:32, Mario Kleiner wrote:
> 
>  So the problem is that AMDs hardware frame counters reset to
>  zero during a modeset. The old DRM code dealt with drivers doing that by
>  keeping vblank irqs enabled during modesets and incrementing vblank
>  count by one during each vblank irq, i think that's what
>  drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
> >>>
> >>> Right, looks like there's been a regression breaking this. I suspect the
> >>> problem is that vblank->last isn't getting updated from
> >>> drm_vblank_post_modeset. Not sure which change broke that though, or how
> >>> to fix it. Ville?
> >>>
> >>
> >> The whole logic has changed and the software counter updates are now
> >> driven all the time by the hw counter.
> >>
> >>>
> >>> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
> >>> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
> >>> vblank counters"). I've been meaning to track that down since then; one
> >>> of these days hopefully, but if anybody has any ideas offhand...
> >>
> >> I spent the last few hours reading through the drm and radeon code and i
> >> think what should probably work is to replace the
> >> drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
> >> calls. These are apparently meant for drivers whose hw counters reset
> >> during modeset, [...]
> > 
> > ... just like drm_vblank_pre/post_modeset. That those were broken is a
> > regression which needs to be fixed anyway. I don't think switching to
> > drm_vblank_on/off is suitable for stable trees.
> 
> Even more so since as I mentioned, there is (has been since at least
> about half a year ago) a counter jumping bug with drm_vblank_on/off as well.

Hm, never noticed you reported that. I thought the reason for not picking
up my drm_vblank_on/off patches was that there's a bug in amdgpu userspace
where it tried to use vblank waits on a disabled pipe?

Can you please point me at the vblank on/off jump bug please?

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[RFC PATCH] mmc: dw_mmc: remove redundant num_slots check

2016-01-20 Thread Shawn Lin
num_slots comes from pdata if existing, otherwise from
dw_mci_parse_dt which make it at least one slot. If
num_slots is less than 1 for the existing pdata case,
current code return -ENODEV. But dw_mci_probe seems to
treat this a optional case as it will call SDMMC_GET_SLOT_NUM
if no slot assigned.

Signed-off-by: Shawn Lin 

---

 drivers/mmc/host/dw_mmc.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 7128351..a116ec6 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -2949,12 +2949,6 @@ int dw_mci_probe(struct dw_mci *host)
}
}
 
-   if (host->pdata->num_slots < 1) {
-   dev_err(host->dev,
-   "Platform data must supply num_slots.\n");
-   return -ENODEV;
-   }
-
host->biu_clk = devm_clk_get(host->dev, "biu");
if (IS_ERR(host->biu_clk)) {
dev_dbg(host->dev, "biu clock not available\n");
-- 
2.3.7




Re: [RFC PATCH 2/2] lightnvm: add non-continuous lun target creation support

2016-01-20 Thread Matias Bjørling
On 01/21/2016 08:44 AM, Wenwei Tao wrote:
> 2016-01-20 21:19 GMT+08:00 Matias Bjørling :
>> On 01/15/2016 12:44 PM, Wenwei Tao wrote:
>>> When create a target, we specify the begin lunid and
>>> the end lunid, and get the corresponding continuous
>>> luns from media manager, if one of the luns is not free,
>>> we failed to create the target, even if the device's
>>> total free luns are enough.
>>>
>>> So add non-continuous lun target creation support,
>>> thus we can improve the backend device's space utilization.
>>
>> A couple of questions:
>>
>> A user inits lun 3-4 and afterwards another 1-6, then only 1,2,5,6 would
>> be initialized?
>>
>> What about the case where init0 uses 3-4, and init1 uses 1-6, and would
>> share 3-4 with init0?
>>
>> Would it be better to give a list of LUNs as a bitmap, and then try to
>> initialize on top of that? with the added functionality of the user may
>> reserve luns (and thereby reject others attempting to use them)
>>
> 
> I'm not quite understand the bitmap you mentioned.
> This patch do have a bitmap : dev->lun_map and the target creation is
> on top of this bitmap.
> 
> The way how a target gets its LUNs is based on its creation flags.
> If NVM_C_FIXED is set, this means the target wants get its LUNs
> exactly as it specifies from lun_begin to lun_end, if any of them are
> occupied by others, the creation fail.
> If NVM_C_FIXED is not set, the target will get its LUNs from free LUNs
> between  0 and dev->nr_luns, there is no guarantee that final LUNs are
> continuous.
> 
> For the first question, if NVM_C_FIXED is used second creation would
> be fail since 3-4 are already used, otherwise it will success if we
> have enough free LUNs left, but the final LUNs may not from 1 to 6,
> e.g. 1, 2, 5, 6, 7, 11.
> 
> For the second question, from explanation above we know that sharing
> LUNs would not happen in current design.

This is an interesting discussion. This could boil down to a device
supporting either a dense or sparse translation map (or none).

With a dense translation map, there is a 1-to-1 relationship between
lbas and ppas.

With a sparse translation map (or no translation map, handled completely
by the host), we may share luns.

For current implementations, a dense mapping is supported. I wonder the
cost of implementing a sparse map (e.g. b-tree structure) on a device is
a good design choice.

If the device supports sparse mapping, then we should add another bit to
the extension bitmap, and then allow luns to shared. In the current
case, we should properly just deny luns to be shared between targets.

How about extending the functionality to take a bitmap of luns, which
defines the luns that we like to map. Do the necessary check if any of
them is in use, and then proceed if all is available?

That'll remove the ambiguity from selection luns, and instead enable the
user to make the correct decision up front?






[PATCH] PM / devfreq: tegra: Set freq in rate callback

2016-01-20 Thread Tomeu Vizoso
As per the documentation of the devfreq_dev_profile.target callback, set
the freq argument to the new frequency before returning.

This caused endless messages like this after recent changes in the core:

devfreq 6000c800.actmon: Couldn't update frequency transition information.

Signed-off-by: Tomeu Vizoso 
Reported-by: Tyler Baker 
---
 drivers/devfreq/tegra-devfreq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/devfreq/tegra-devfreq.c b/drivers/devfreq/tegra-devfreq.c
index 848b93ee930f..fe9dce0245bf 100644
--- a/drivers/devfreq/tegra-devfreq.c
+++ b/drivers/devfreq/tegra-devfreq.c
@@ -500,6 +500,8 @@ static int tegra_devfreq_target(struct device *dev, 
unsigned long *freq,
clk_set_min_rate(tegra->emc_clock, rate);
clk_set_rate(tegra->emc_clock, 0);
 
+   *freq = rate;
+
return 0;
 }
 
-- 
2.5.0



[PATCH v3 3/4] dts/ls2080a: update the DTS for QSPI and DSPI support

2016-01-20 Thread Yuan Yao
Signed-off-by: Yuan Yao 
---
Changed in v3:
No changes.

Changed in v2:
Update my email to 
---
 arch/arm64/boot/dts/freescale/fsl-ls2080a-qds.dts | 9 -
 arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi| 4 ++--
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls2080a-qds.dts 
b/arch/arm64/boot/dts/freescale/fsl-ls2080a-qds.dts
index 4cb996d..e8801fa 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls2080a-qds.dts
+++ b/arch/arm64/boot/dts/freescale/fsl-ls2080a-qds.dts
@@ -178,7 +178,14 @@
 
  {
status = "okay";
-   qflash0: s25fl008k {
+   flash0: s25fl256s1@0 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   compatible = "st,m25p80";
+   spi-max-frequency = <2000>;
+   reg = <0>;
+   };
+   flash2: s25fl256s1@2 {
#address-cells = <1>;
#size-cells = <1>;
compatible = "st,m25p80";
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
index 2b23d03..65e612a 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
@@ -318,7 +318,7 @@
 
dspi: dspi@210 {
status = "disabled";
-   compatible = "fsl,vf610-dspi";
+   compatible = "fsl,ls2080a-dspi", "fsl,ls2085a-dspi";
#address-cells = <1>;
#size-cells = <0>;
reg = <0x0 0x210 0x0 0x1>;
@@ -444,7 +444,7 @@
 
qspi: quadspi@20c {
status = "disabled";
-   compatible = "fsl,vf610-qspi";
+   compatible = "fsl,ls2080a-qspi", "fsl,ls1021a-qspi";
#address-cells = <1>;
#size-cells = <0>;
reg = <0x0 0x20c 0x0 0x1>,
-- 
2.1.0.27.g96db324



[V2 PATCH 1/1] genirq: fix desc->action become NULL error

2016-01-20 Thread zyjzyj2000

Hi, all

According to the suggestions from Thomas Gleixner, I made a new patch
to fix this problem.

Changes:
The commit 71f64340fc0e will not be reverted. And action test is
inserted.

Best Regards!
Zhu Yanjun


[V2 PATCH 1/1] genirq: fix desc->action become NULL error

2016-01-20 Thread zyjzyj2000
From: Zhu Yanjun 

After this commit 71f64340fc0e ("genirq: Remove the second parameter
from handle_irq_event_percpu()") is applied, the variable desc->action is
not protected by raw_spin_lock. The following calltrace will pop up.

BUG: unable to handle kernel NULL pointer dereference at 0008
IP: [] handle_irq_event_percpu+0x31/0x1c0
...
Call Trace:

[] handle_irq_event+0x3c/0x60
[] handle_edge_irq+0xcf/0x160
[] handle_irq+0x1a/0x30
[] do_IRQ+0x57/0xf0
[] common_interrupt+0x7f/0x7f

[] ? _raw_write_unlock_irq+0x12/0x30
[] _raw_spin_unlock_irq+0xe/0x10
[] finish_task_switch+0x9a/0x1f0
[] __schedule+0x3c5/0xb60
[] schedule+0x3f/0x90
[] schedule_preempt_disabled+0x18/0x30
[] cpu_startup_entry+0x13c/0x320
[] start_secondary+0xf1/0x100
RIP [] handle_irq_event_percpu+0x31/0x1c0
...
The reason is as below:

The variable desc->action is not protected anymore. So desc->action is
removed concurrently.

CPU 0   CPU 1

free_irq()  lock(desc)
lock(desc)  handle_edge_irq()
  handle_irq_event(desc)
unlock(desc)
desc->action = NULL handle_irq_event_percpu(desc)
  action = desc->action

Because we either see a valid desc->action or NULL. If the action is about to
be removed it is still valid as free_irq() is blocked on synchronize_irq().

free_irq()  lock(desc)
lock(desc)  handle_edge_irq()
  handle_irq_event(desc)
set(INPROGRESS)
unlock(desc)
  handle_irq_event_percpu(desc)
action = desc->action
desc->action = NULL
sychronize_irq()
  while(INPROGRESS);   lock(desc)
   clr(INPROGRESS)
free(action)

That's basically the same mechanism as we have for shared
interrupts. The variable action->next can become NULL while
handle_irq_event_percpu() runs. Either it sees the action or
NULL. It does not matter, because action itself cannot go away.

Suggested-by: Thomas Gleixner 
Signed-off-by: Zhu Yanjun 
---
 kernel/irq/handle.c |   12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index a302cf9..7510b72 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -136,9 +136,14 @@ irqreturn_t handle_irq_event_percpu(struct irq_desc *desc)
 {
irqreturn_t retval = IRQ_NONE;
unsigned int flags = 0, irq = desc->irq_data.irq;
-   struct irqaction *action = desc->action;
+   struct irqaction *action;
 
-   do {
+   /*
+* READ_ONCE is not required here. The compiler cannot reload action
+* because it'll be action->next for the second iteration of the loop.
+*/
+   action = desc->action;
+   while (action) {
irqreturn_t res;
 
trace_irq_handler_entry(irq, action);
@@ -173,7 +179,7 @@ irqreturn_t handle_irq_event_percpu(struct irq_desc *desc)
 
retval |= res;
action = action->next;
-   } while (action);
+   }
 
add_interrupt_randomness(irq, flags);
 
-- 
1.7.9.5



Re: [RFC PATCH 2/2] lightnvm: add non-continuous lun target creation support

2016-01-20 Thread Wenwei Tao
2016-01-20 21:19 GMT+08:00 Matias Bjørling :
> On 01/15/2016 12:44 PM, Wenwei Tao wrote:
>> When create a target, we specify the begin lunid and
>> the end lunid, and get the corresponding continuous
>> luns from media manager, if one of the luns is not free,
>> we failed to create the target, even if the device's
>> total free luns are enough.
>>
>> So add non-continuous lun target creation support,
>> thus we can improve the backend device's space utilization.
>
> A couple of questions:
>
> A user inits lun 3-4 and afterwards another 1-6, then only 1,2,5,6 would
> be initialized?
>
> What about the case where init0 uses 3-4, and init1 uses 1-6, and would
> share 3-4 with init0?
>
> Would it be better to give a list of LUNs as a bitmap, and then try to
> initialize on top of that? with the added functionality of the user may
> reserve luns (and thereby reject others attempting to use them)
>

I'm not quite understand the bitmap you mentioned.
This patch do have a bitmap : dev->lun_map and the target creation is
on top of this bitmap.

The way how a target gets its LUNs is based on its creation flags.
If NVM_C_FIXED is set, this means the target wants get its LUNs
exactly as it specifies from lun_begin to lun_end, if any of them are
occupied by others, the creation fail.
If NVM_C_FIXED is not set, the target will get its LUNs from free LUNs
between  0 and dev->nr_luns, there is no guarantee that final LUNs are
continuous.

For the first question, if NVM_C_FIXED is used second creation would
be fail since 3-4 are already used, otherwise it will success if we
have enough free LUNs left, but the final LUNs may not from 1 to 6,
e.g. 1, 2, 5, 6, 7, 11.

For the second question, from explanation above we know that sharing
LUNs would not happen in current design.

>>
>> Signed-off-by: Wenwei Tao 
>> ---
>>  drivers/lightnvm/core.c   |  25 ++---
>>  drivers/lightnvm/gennvm.c |  42 -
>>  drivers/lightnvm/rrpc.c   | 212 
>> ++
>>  drivers/lightnvm/rrpc.h   |  10 +-
>>  include/linux/lightnvm.h  |  26 +-
>>  include/uapi/linux/lightnvm.h |   2 +
>>  6 files changed, 216 insertions(+), 101 deletions(-)
>>
>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>> index d938636..fe48434 100644
>> --- a/drivers/lightnvm/core.c
>> +++ b/drivers/lightnvm/core.c
>> @@ -27,7 +27,6 @@
>>  #include 
>>  #include 
>>  #include 
>> -#include 
>>
>>  static LIST_HEAD(nvm_targets);
>>  static LIST_HEAD(nvm_mgrs);
>> @@ -237,6 +236,11 @@ static int nvm_core_init(struct nvm_dev *dev)
>>   dev->luns_per_chnl *
>>   dev->nr_chnls;
>>   dev->total_pages = dev->total_blocks * dev->pgs_per_blk;
>> + dev->lun_map = kcalloc(BITS_TO_LONGS(dev->nr_luns),
>> + sizeof(unsigned long), GFP_KERNEL);
>> + if (!dev->lun_map)
>> + return -ENOMEM;
>> +
>>   INIT_LIST_HEAD(>online_targets);
>>   spin_lock_init(>lock);
>>
>> @@ -369,6 +373,7 @@ void nvm_unregister(char *disk_name)
>>   up_write(_lock);
>>
>>   nvm_exit(dev);
>> + kfree(dev->lun_map);
>>   kfree(dev);
>>  }
>>  EXPORT_SYMBOL(nvm_unregister);
>> @@ -385,6 +390,7 @@ static int nvm_create_target(struct nvm_dev *dev,
>>   struct gendisk *tdisk;
>>   struct nvm_tgt_type *tt;
>>   struct nvm_target *t;
>> + unsigned long flags;
>>   void *targetdata;
>>
>>   if (!dev->mt) {
>> @@ -429,7 +435,8 @@ static int nvm_create_target(struct nvm_dev *dev,
>>   tdisk->fops = _fops;
>>   tdisk->queue = tqueue;
>>
>> - targetdata = tt->init(dev, tdisk, s->lun_begin, s->lun_end);
>> + flags = calc_nvm_create_bits(create->flags);
>> + targetdata = tt->init(dev, tdisk, s->lun_begin, s->lun_end, flags);
>>   if (IS_ERR(targetdata))
>>   goto err_init;
>>
>> @@ -582,16 +589,17 @@ static int nvm_configure_create(const char *val)
>>   struct nvm_ioctl_create create;
>>   char opcode;
>>   int lun_begin, lun_end, ret;
>> + __u32 c_flags;
>>
>> - ret = sscanf(val, "%c %256s %256s %48s %u:%u", , create.dev,
>> + ret = sscanf(val, "%c %256s %256s %48s %u:%u %u", , create.dev,
>>   create.tgtname, create.tgttype,
>> - _begin, _end);
>> - if (ret != 6) {
>> + _begin, _end, 
>> _flags);
>> + if (ret != 7) {
>>   pr_err("nvm: invalid command. Use \"opcode device name tgttype 
>> lun_begin:lun_end\".\n");
>>   return -EINVAL;
>>   }
>>
>> - create.flags = 0;
>> + create.flags = c_flags;
>>   create.conf.type = NVM_CONFIG_TYPE_SIMPLE;
>>   create.conf.s.lun_begin = lun_begin;
>>   create.conf.s.lun_end = lun_end;
>> @@ -761,11 +769,6 @@ static long nvm_ioctl_dev_create(struct file *file, 
>> void __user *arg)
>>   create.tgttype[NVM_TTYPE_NAME_MAX 

Re: [PATCH 2/2] scsi: Fix RCU handling for VPD pages

2016-01-20 Thread Hannes Reinecke
On 01/21/2016 07:35 AM, Alexander Duyck wrote:
> This patch is meant to fix the RCU handling for VPD pages.  The original
> code had a number of issues including the fact that the local variables
> were being declared as __rcu, the RCU variable being directly accessed
> outside of the RCU locked region, and the fact that length was not
> associated with the data so it would be possible to get a mix and match of
> the length for one VPD page with the data from another.
> 
> Fixes: 09e2b0b14690 ("scsi: rescan VPD attributes")
> Signed-off-by: Alexander Duyck 
> ---
>  drivers/scsi/scsi.c|   52 
> +++-
>  drivers/scsi/scsi_lib.c|   12 +-
>  drivers/scsi/scsi_sysfs.c  |   14 +++-
>  include/scsi/scsi_device.h |   14 
>  4 files changed, 50 insertions(+), 42 deletions(-)
> 
Thanks for fixing this up. I didn't really like the two distinct
variables for vpd buffer and length, too, but hadn't thought of
using a struct for here.

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: [PATCH 1/2] scsi: Do not attach VPD to devices that don't support it

2016-01-20 Thread Hannes Reinecke
On 01/21/2016 07:35 AM, Alexander Duyck wrote:
> The patch "scsi: rescan VPD attributes" introduced a regression in which
> devices that don't support VPD were being scanned for VPD attributes
> anyway.  This could cause issues for this parts and should be avoided so
> the check for scsi_level has been moved out of scsi_add_lun and into
> scsi_attach_vpd so that all callers will not scan VPD for devices that
> don't support it.
> 
> Fixes: 09e2b0b14690 ("scsi: rescan VPD attributes")
> Signed-off-by: Alexander Duyck 
> ---
>  drivers/scsi/scsi.c  |3 +++
>  drivers/scsi/scsi_scan.c |3 +--
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index b1bf42b93fcc..ed085e78c893 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -784,6 +784,9 @@ void scsi_attach_vpd(struct scsi_device *sdev)
>   int pg83_supported = 0;
>   unsigned char __rcu *vpd_buf, *orig_vpd_buf = NULL;
>  
> + if (sdev->scsi_level < SCSI_3)
> + return;
> +
>   if (sdev->skip_vpd_pages)
>   return;
>  retry_pg0:
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 6a820668d442..1b16c89e0cf9 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -986,8 +986,7 @@ static int scsi_add_lun(struct scsi_device *sdev, 
> unsigned char *inq_result,
>   }
>   }
>  
> - if (sdev->scsi_level >= SCSI_3)
> - scsi_attach_vpd(sdev);
> + scsi_attach_vpd(sdev);
>  
>   sdev->max_queue_depth = sdev->queue_depth;
>  
> 
Isn't this slightly pointless, given that we're testing the inverse
condition in scsi_attach_vpd()?

And in anycase, I guess we should be using the same logic sd.c is
using. Please see the attached patch.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
From bc662c5a0255e868746ef317e2eff04dc3fcfac5 Mon Sep 17 00:00:00 2001
From: Hannes Reinecke 
Date: Thu, 21 Jan 2016 08:18:49 +0100
Subject: [PATCH] scsi: Do not attach VPD to devices that don't support it

The patch "scsi: rescan VPD attributes" introduced a regression in which
devices that don't support VPD were being scanned for VPD attributes
anyway.  This could cause issues for this parts and should be avoided so
the check for scsi_level has been moved out of scsi_add_lun and into
scsi_attach_vpd so that all callers will not scan VPD for devices that
don't support it.

Fixes: 09e2b0b14690 ("scsi: rescan VPD attributes")

Suggested-by: Alexander Duyck 
Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi.c|  3 ++-
 drivers/scsi/sd.c  | 19 +--
 include/scsi/scsi_device.h | 25 +
 3 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index b1bf42b..1deb6ad 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -784,8 +784,9 @@ void scsi_attach_vpd(struct scsi_device *sdev)
 	int pg83_supported = 0;
 	unsigned char __rcu *vpd_buf, *orig_vpd_buf = NULL;
 
-	if (sdev->skip_vpd_pages)
+	if (!scsi_device_supports_vpd(sdev))
 		return;
+
 retry_pg0:
 	vpd_buf = kmalloc(vpd_len, GFP_KERNEL);
 	if (!vpd_buf)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 5451980..868d58c 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2789,23 +2789,6 @@ static void sd_read_write_same(struct scsi_disk *sdkp, unsigned char *buffer)
 		sdkp->ws10 = 1;
 }
 
-static int sd_try_extended_inquiry(struct scsi_device *sdp)
-{
-	/* Attempt VPD inquiry if the device blacklist explicitly calls
-	 * for it.
-	 */
-	if (sdp->try_vpd_pages)
-		return 1;
-	/*
-	 * Although VPD inquiries can go to SCSI-2 type devices,
-	 * some USB ones crash on receiving them, and the pages
-	 * we currently ask for are for SPC-3 and beyond
-	 */
-	if (sdp->scsi_level > SCSI_SPC_2 && !sdp->skip_vpd_pages)
-		return 1;
-	return 0;
-}
-
 /**
  *	sd_revalidate_disk - called the first time a new disk is seen,
  *	performs disk spin up, read_capacity, etc.
@@ -2844,7 +2827,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 	if (sdkp->media_present) {
 		sd_read_capacity(sdkp, buffer);
 
-		if (sd_try_extended_inquiry(sdp)) {
+		if (scsi_device_supports_vpd(sdp)) {
 			sd_read_block_provisioning(sdkp);
 			sd_read_block_limits(sdkp);
 			sd_read_block_characteristics(sdkp);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index a5fc682..d9aea6c 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -515,6 +515,31 @@ static inline int scsi_device_tpgs(struct scsi_device *sdev)
 	return sdev->inquiry ? (sdev->inquiry[5] >> 4) & 0x3 : 0;
 }
 
+/**
+ * scsi_device_supports_vpd - test if a device supports VPD pages
+ * @sdev: the  

[PATCH kernel] powerpc: Make vmalloc_to_phys() public

2016-01-20 Thread Alexey Kardashevskiy
This makes vmalloc_to_phys() public as there will be another user
(in-kernel VFIO acceleration) for it soon.

As a part of future little optimization, this changes the helper to call
vmalloc_to_pfn() instead of vmalloc_to_page() as the size of the
struct page may not be power-of-two aligned which will make gcc use
multiply instructions instead of shifts.

Signed-off-by: Alexey Kardashevskiy 
---

A couple of notes:

1. real_vmalloc_addr() will be reworked later by Paul separately;

2. the optimization note it not valid at the moment as
vmalloc_to_pfn() calls vmalloc_to_page() which does the actual
search; these helpers functionality will be swapped later
(also, by Paul).

---
 arch/powerpc/include/asm/pgtable.h | 3 +++
 arch/powerpc/mm/pgtable.c  | 8 
 arch/powerpc/perf/hv-24x7.c| 8 
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index ac9fb11..47897a3 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -78,6 +78,9 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, 
unsigned long ea,
}
return __find_linux_pte_or_hugepte(pgdir, ea, is_thp, shift);
 }
+
+unsigned long vmalloc_to_phys(void *vmalloc_addr);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 83dfd79..de37ff4 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -243,3 +243,11 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long 
addr)
 }
 #endif /* CONFIG_DEBUG_VM */
 
+unsigned long vmalloc_to_phys(void *va)
+{
+   unsigned long pfn = vmalloc_to_pfn(va);
+
+   BUG_ON(!pfn);
+   return __pa(pfn_to_kaddr(pfn)) + offset_in_page(va);
+}
+EXPORT_SYMBOL_GPL(vmalloc_to_phys);
diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 9f9dfda..3b09ecf 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -493,14 +493,6 @@ static size_t event_to_attr_ct(struct hv_24x7_event_data 
*event)
}
 }
 
-static unsigned long vmalloc_to_phys(void *v)
-{
-   struct page *p = vmalloc_to_page(v);
-
-   BUG_ON(!p);
-   return page_to_phys(p) + offset_in_page(v);
-}
-
 /* */
 struct event_uniq {
struct rb_node node;
-- 
2.5.0.rc3



Re: [PATCH RFC ] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-20 Thread Ingo Molnar

Cc:-ed other gents who touched the mutex code recently. Mail quoted below.

Thanks,

Ingo

* Ding Tianhong  wrote:

> I build a script to create several process for ioctl loop calling,
> the ioctl will calling the kernel function just like:
> xx_ioctl {
> ...
> rtnl_lock();
> function();
> rtnl_unlock();
> ...
> }
> The function may sleep several ms, but will not halt, at the same time
> another user service may calling ifconfig to change the state of the
> ethernet, and after several hours, the hung task thread report this problem:
> 
> 
> 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
> [149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [149738.042280] ifconfig D 88061ec13680 0 11890 11573 0x0080
> [149738.042284] 88052449bd40 0082 88053a33f300 
> 88052449bfd8
> [149738.042286] 88052449bfd8 88052449bfd8 88053a33f300 
> 819e6240
> [149738.042288] 819e6244 88053a33f300  
> 819e6248
> [149738.042290] Call Trace:
> [149738.042300] [] schedule_preempt_disabled+0x29/0x70
> [149738.042303] [] __mutex_lock_slowpath+0xc5/0x1c0
> [149738.042305] [] mutex_lock+0x1f/0x2f
> [149738.042309] [] rtnl_lock+0x15/0x20
> [149738.042311] [] dev_ioctl+0xda/0x590
> [149738.042314] [] ? __do_page_fault+0x21c/0x560
> [149738.042318] [] sock_do_ioctl+0x45/0x50
> [149738.042320] [] sock_ioctl+0x1f0/0x2c0
> [149738.042324] [] do_vfs_ioctl+0x2e5/0x4c0
> [149738.042327] [] ? fget_light+0xa0/0xd0
> 
>  cut here 
> 
> I got the vmcore and found that the ifconfig is already in the wait_list of 
> the
> rtnl_lock for 120 second, but my process could get and release the rtnl_lock
> normally several times in one second, so it means that my process jump the
> queue and the ifconfig couldn't get the rtnl all the time, I check the mutex 
> lock
> slow path and found that the mutex may spin on owner ignore whether the  wait 
> list
> is empty, it will cause the task in the wait list always be cut in line, so 
> add
> test for wait list in the mutex_can_spin_on_owner and avoid this problem.
> 
> Signed-off-by: Ding Tianhong 
> ---
>  kernel/locking/mutex.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 0551c21..596b341 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex 
> *lock)
>   struct task_struct *owner;
>   int retval = 1;
>  
> - if (need_resched())
> + if (need_resched() || atomic_read(>count) == -1)
>   return 0;
>  
>   rcu_read_lock();
> @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex 
> *lock)
>  /*
>   * Optimistic spinning.
>   *
> - * We try to spin for acquisition when we find that the lock owner
> - * is currently running on a (different) CPU and while we don't
> - * need to reschedule. The rationale is that if the lock owner is
> - * running, it is likely to release the lock soon.
> + * We try to spin for acquisition when we find that there are no
> + * pending waiters and the lock owner is currently running on a
> + * (different) CPU and while we don't need to reschedule. The
> + * rationale is that if the lock owner is running, it is likely
> + * to release the lock soon.
>   *
>   * Since this needs the lock owner, and this mutex implementation
>   * doesn't track the owner atomically in the lock field, we need to
> -- 
> 2.5.0
> 
> 


Re: [PATCH] MAINTAINERS: Update mailing list for Renesas ARM64 SoC Development

2016-01-20 Thread Magnus Damm
Hi Simon, Linus, everyone,

On Thu, Jan 21, 2016 at 3:21 PM, Simon Horman
 wrote:
> Update the mailing list used for development of support for
> ARM64 Renesas SoCs.
>
> This is a follow-up for a similar change for other Renesas SoCs and
> drivers uses by Renesas SoCs. The ARM64 SoC entry was not updated in
> that patch as it was not yet present in mainline.
>
> The motivation for the mailing list update is that Renesas SoCs are now
> much wider than the SH architecture and there is some desire from some for
> the linux-sh list to refocus on discussion of the work on the SH
> architecture.
>
> Cc: Magnus Damm 
> Signed-off-by: Simon Horman 
> ---
>  MAINTAINERS | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Thanks, looking good!

Acked-by: Magnus Damm 

Cheers,

/ magnus


RE: [Resend PATCH V5 1/1] NTB: Add support for AMD PCI-Express Non-Transparent Bridge

2016-01-20 Thread Yu, Xiangliang
> From: Xiangliang Yu 
> 
> > Signed-off-by: Xiangliang Yu 
> 
> Yes.
> 
> > Reviewed-by: Jon Mason 
> 
> Maybe, but that's for Jon to decide.  If he accepts it, he will add 
> signed-off-by,
> but again, that's for Jon to decide.

Jon also spend a lot of time to review the code, I think should show his working
On this patch.

> > Reviewed-by: Allen Hubbe 
> 
> Adding my reviewed-by is hardly a reason to resend the whole patch.

It will make work easier if merging.



[PATCH v3] clk: rockchip: Add support for multiple clock providers

2016-01-20 Thread Xing Zheng
There are need to support Multi-CRUs probability in future, but
it is not supported on the current Rockchip Clock Framework.

Therefore, this patch add support a provider as the parameter
handler when we call the clock register functions for per CRU.

Signed-off-by: Xing Zheng 
---

Changes in v3:
- Remove panic, use the normal error codes for returning.
- The ctx struct include the cru_node and grf members for pll tracking.

Changes in v2:
- Fix missed to call rockchip_clk_common_cru_init when SoCs clock init.

 drivers/clk/rockchip/clk-pll.c|   30 
 drivers/clk/rockchip/clk-rk3036.c |   17 +++--
 drivers/clk/rockchip/clk-rk3188.c |   48 +
 drivers/clk/rockchip/clk-rk3228.c |   17 +++--
 drivers/clk/rockchip/clk-rk3288.c |   19 +++--
 drivers/clk/rockchip/clk-rk3368.c |   21 --
 drivers/clk/rockchip/clk.c|  144 +++--
 drivers/clk/rockchip/clk.h|   49 +
 8 files changed, 229 insertions(+), 116 deletions(-)

diff --git a/drivers/clk/rockchip/clk-pll.c b/drivers/clk/rockchip/clk-pll.c
index b7e66c9..649db7d 100644
--- a/drivers/clk/rockchip/clk-pll.c
+++ b/drivers/clk/rockchip/clk-pll.c
@@ -46,6 +46,8 @@ struct rockchip_clk_pll {
const struct rockchip_pll_rate_table *rate_table;
unsigned intrate_count;
spinlock_t  *lock;
+
+   struct rockchip_clk_provider *ctx;
 };
 
 #define to_rockchip_clk_pll(_hw) container_of(_hw, struct rockchip_clk_pll, hw)
@@ -90,7 +92,7 @@ static long rockchip_pll_round_rate(struct clk_hw *hw,
  */
 static int rockchip_pll_wait_lock(struct rockchip_clk_pll *pll)
 {
-   struct regmap *grf = rockchip_clk_get_grf();
+   struct regmap *grf = rockchip_clk_get_grf(pll->ctx);
unsigned int val;
int delay = 2400, ret;
 
@@ -246,7 +248,7 @@ static int rockchip_rk3036_pll_set_rate(struct clk_hw *hw, 
unsigned long drate,
struct rockchip_clk_pll *pll = to_rockchip_clk_pll(hw);
const struct rockchip_pll_rate_table *rate;
unsigned long old_rate = rockchip_rk3036_pll_recalc_rate(hw, prate);
-   struct regmap *grf = rockchip_clk_get_grf();
+   struct regmap *grf = rockchip_clk_get_grf(pll->ctx);
 
if (IS_ERR(grf)) {
pr_debug("%s: grf regmap not available, aborting rate change\n",
@@ -485,7 +487,7 @@ static int rockchip_rk3066_pll_set_rate(struct clk_hw *hw, 
unsigned long drate,
struct rockchip_clk_pll *pll = to_rockchip_clk_pll(hw);
const struct rockchip_pll_rate_table *rate;
unsigned long old_rate = rockchip_rk3066_pll_recalc_rate(hw, prate);
-   struct regmap *grf = rockchip_clk_get_grf();
+   struct regmap *grf = rockchip_clk_get_grf(pll->ctx);
 
if (IS_ERR(grf)) {
pr_debug("%s: grf regmap not available, aborting rate change\n",
@@ -558,7 +560,7 @@ static void rockchip_rk3066_pll_init(struct clk_hw *hw)
 rate->no, cur.no, rate->nf, cur.nf, rate->nb, cur.nb);
if (rate->nr != cur.nr || rate->no != cur.no || rate->nf != cur.nf
 || rate->nb != cur.nb) {
-   struct regmap *grf = rockchip_clk_get_grf();
+   struct regmap *grf = rockchip_clk_get_grf(pll->ctx);
 
if (IS_ERR(grf))
return;
@@ -590,12 +592,13 @@ static const struct clk_ops rockchip_rk3066_pll_clk_ops = 
{
  * Common registering of pll clocks
  */
 
-struct clk *rockchip_clk_register_pll(enum rockchip_pll_type pll_type,
+struct clk *rockchip_clk_register_pll(struct rockchip_clk_provider *ctx,
+   enum rockchip_pll_type pll_type,
const char *name, const char *const *parent_names,
-   u8 num_parents, void __iomem *base, int con_offset,
-   int grf_lock_offset, int lock_shift, int mode_offset,
-   int mode_shift, struct rockchip_pll_rate_table *rate_table,
-   u8 clk_pll_flags, spinlock_t *lock)
+   u8 num_parents, int con_offset, int grf_lock_offset,
+   int lock_shift, int mode_offset, int mode_shift,
+   struct rockchip_pll_rate_table *rate_table,
+   u8 clk_pll_flags)
 {
const char *pll_parents[3];
struct clk_init_data init;
@@ -619,11 +622,11 @@ struct clk *rockchip_clk_register_pll(enum 
rockchip_pll_type pll_type,
/* create the mux on top of the real pll */
pll->pll_mux_ops = _mux_ops;
pll_mux = >pll_mux;
-   pll_mux->reg = base + mode_offset;
+   pll_mux->reg = ctx->reg_base + mode_offset;
pll_mux->shift = mode_shift;
pll_mux->mask = PLL_MODE_MASK;
pll_mux->flags = 0;
-   pll_mux->lock = lock;
+   pll_mux->lock = >lock;
pll_mux->hw.init = 
 
if (pll_type == pll_rk3036 || pll_type == pll_rk3066)
@@ -690,11 +693,12 @@ struct clk *rockchip_clk_register_pll(enum 
rockchip_pll_type pll_type,
 

co-maintainance of remoteproc/rpmsg/hwspinlock

2016-01-20 Thread Ohad Ben-Cohen
Hi everyone,

Due to lack of time in the next few months, I've asked Bjorn Andersson for
help with the remoteproc/rpmsg/hwspinlock maintainership.

Bjorn has kindly agreed to step up and co-maintain
remoteproc/rpmsg/hwspinlock with me, and we expect that Bjorn will start
picking up patches as soon as the next development cycle begins.

Bjorn, thanks so much and please feel free to send a MAINTAINERS patch
adding yourself as a maintainer for these three subsystems.

I will still be available for any assistance or questions,

Best wishes,
Ohad.


[PATCH v3 2/4] Documentation: fsl-quadspi: Add fsl, ls2080a-qspi compatible string

2016-01-20 Thread Yuan Yao
new compatible string: "fsl,ls2080a-qspi".

Signed-off-by: Yuan Yao 
---
Changed in v3:
Add the modifier for new compatible string like:
"fsl,ls2080a-qspi" followed by "fsl,ls1021a-qspi"

Changed in v2:
Update my email to 
---
 Documentation/devicetree/bindings/mtd/fsl-quadspi.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/mtd/fsl-quadspi.txt 
b/Documentation/devicetree/bindings/mtd/fsl-quadspi.txt
index 00c587b..0df2f3a 100644
--- a/Documentation/devicetree/bindings/mtd/fsl-quadspi.txt
+++ b/Documentation/devicetree/bindings/mtd/fsl-quadspi.txt
@@ -3,7 +3,9 @@
 Required properties:
   - compatible : Should be "fsl,vf610-qspi", "fsl,imx6sx-qspi",
 "fsl,imx7d-qspi", "fsl,imx6ul-qspi",
-"fsl,ls1021-qspi"
+"fsl,ls1021a-qspi"
+or
+"fsl,ls2080a-qspi" followed by "fsl,ls1021a-qspi"
   - reg : the first contains the register location and length,
   the second contains the memory mapping address and length
   - reg-names: Should contain the reg names "QuadSPI" and "QuadSPI-memory"
-- 
2.1.0.27.g96db324



[PATCH v3 4/4] Documentation: fsl-quadspi: Add optional properties

2016-01-20 Thread Yuan Yao
Add optional properties for QSPI:
big-endian
if the register is big endian on this platform.

Signed-off-by: Yuan Yao 
---
Changed in v3:
No changes.

Changed in v2:
Update my email to 
---
 Documentation/devicetree/bindings/mtd/fsl-quadspi.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/mtd/fsl-quadspi.txt 
b/Documentation/devicetree/bindings/mtd/fsl-quadspi.txt
index 0df2f3a..0333ec8 100644
--- a/Documentation/devicetree/bindings/mtd/fsl-quadspi.txt
+++ b/Documentation/devicetree/bindings/mtd/fsl-quadspi.txt
@@ -21,6 +21,7 @@ Optional properties:
  But if there are two NOR flashes connected to the
  bus, you should enable this property.
  (Please check the board's schematic.)
+  - big-endian : That means the IP register is big endian
 
 Example:
 
-- 
2.1.0.27.g96db324



[PATCH v3 1/4] Documentation: fsl-quadspi: Add fsl,ls2080a-dspi compatible string

2016-01-20 Thread Yuan Yao
new compatible string: "fsl,ls2080a-qspi".

Signed-off-by: Yuan Yao 
---
Changed in v3:
Add the modifier for new compatible string like:
"fsl,ls2080a-dspi" followed by "fsl,ls2085a-dspi"

Changed in v2:
Update my email to 
---
 Documentation/devicetree/bindings/spi/spi-fsl-dspi.txt | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/spi/spi-fsl-dspi.txt 
b/Documentation/devicetree/bindings/spi/spi-fsl-dspi.txt
index fa77f87..1ad0fe3 100644
--- a/Documentation/devicetree/bindings/spi/spi-fsl-dspi.txt
+++ b/Documentation/devicetree/bindings/spi/spi-fsl-dspi.txt
@@ -1,7 +1,10 @@
 ARM Freescale DSPI controller
 
 Required properties:
-- compatible : "fsl,vf610-dspi", "fsl,ls1021a-v1.0-dspi", "fsl,ls2085a-dspi"
+- compatible : "fsl,vf610-dspi", "fsl,ls1021a-v1.0-dspi",
+   "fsl,ls2085a-dspi"
+   or
+   "fsl,ls2080a-dspi" followed by "fsl,ls2085a-dspi"
 - reg : Offset and length of the register set for the device
 - interrupts : Should contain SPI controller interrupt
 - clocks: from common clock binding: handle to dspi clock.
-- 
2.1.0.27.g96db324



Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup

2016-01-20 Thread Jeff Merkey
Ok, here's what I found after several hours of debugging and reviewing
this subsystem:

This subsystem plays is pretty loose in doing its math on 64 bit
registers.  I traced through ktime_get_ts64 hundreds of times and
sampled data running through it and from what I saw, just normal
operations comes dangerously close to causing the RAX register to
wrap.   If the delta gets too big it does wrap and I observed it
happening with the debugger tracing through the code.  It wraps
because of a sar instruction generated from the inline macros.

The wrap happens in this inline function.

static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
{
cycle_t delta;
s64 nsec;

delta = timekeeping_get_delta(tkr);

nsec = delta * tkr->mult + tkr->xtime_nsec;
nsec >>= tkr->shift;<< wrap caused here

/* If arch requires, add in get_arch_timeoffset() */
return nsec + arch_gettimeoffset();
}

You only have 64 bits of register and the numbers being calculated
here are big.   By way of example, I observed the following during
normal operations:

delta  (RAX)   | tkr->mult (RDX)

0x1578760x65ee27
0xf1855   0x65f158
0x16cf05 0x65f408
303bc30x65f154

When this bug occurs different story.

delta  (RAX)| tkr->mult (RDX)

0x243283994b8 0x65233

So it goes like this:

nsec = delta * tkr->mult + tkr->xtime_nsec;
0x243283994b8 * 0x65233
imul   rax,rdx = 0xE6A2Ce1f1ea690a8

nsec >>= tkr->shift;<< wrap caused here
sarrax,cl  =  0xFFE6BFB3B7C3

the sar instruction doesn't just shift, it backfills the signedness of
the value, so this instruction is not doing what the C code is asking
it to do.  I am guessing that somewhere in this mass of macros,
something may have gotten declared wrong or incomplete (declared
signed ?).

The assembler output for this section that calls the macro to
calculate nsecs shows the sar instruction:

delta = timekeeping_get_delta(tkr);

nsec = delta * tkr->mult + tkr->xtime_nsec;
 29b:   48 0f af c2 imul   %rdx,%rax
 29f:   48 03 05 00 00 00 00add0x0(%rip),%rax# 2a6

nsec >>= tkr->shift;
 2a6:   48 d3 f8sar%cl,%rax


There is another problem with the tkr->read returning an unchanging,
unclearable number when this bug occurs for the delta value.  I
appears for whatever reason the clock has gone to sleep or gone away
and is no longer updating its counters.

static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
{
cycle_t cycle_now, delta;

/* read clocksource */
cycle_now = tkr->read(tkr->clock); << returns the same value after
this bug happens

/* calculate the delta since the last update_wall_time */
delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); <<
cycle last is also the same value.

return delta;
}

This problem appears to have several things happening at once.
Probably the most concerning is that the assembler output is making
some assumptions about the SIGNEDNESS of the values being shifted and
using sar instead of shl instructions.

I am also concerned about the thr->read function returning an
unchanging value when this problem shows up.

This subsystem plays it fast and loose with its math, and if the clock
gets delayed or out of sync, it will wrap in the above function and it
will trigger the Hard Lockup detector if the value is large enough in
RAX.  The sanity check for CONFIG_DEBUG_TIMEKEEPER does not catch the
code path where this delta value gets set because the function to
update the delta is called in more then just in that function that
checks for an overflow and the wrap case happens underneath it.

I would check how these structs are defined and the vars in them to
see if somewhere they are declared as signed values to the compiler,
because that's what it thinks it was given to compile.

I am still debugging the thr->read issue.  I have determined the cause
of the wrap in the assembler.  As to why the gcc compiler is outputing
this instruction here is something to be determined.

Jeff


Re: [PATCH 3.10 00/35] 3.10.95-stable review

2016-01-20 Thread Willy Tarreau
On Wed, Jan 20, 2016 at 04:14:51PM -0700, Shuah Khan wrote:
> On 01/20/2016 03:00 PM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 3.10.95 release.
> > There are 35 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Fri Jan 22 21:19:15 UTC 2016.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.10.95-rc1.gz
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Compiled and booted on my test system. No dmesg regressions.

And running fine on my laptop FWIW.

Willy



Re: [PATCH 1/2] regulator: ltc3589: make IRQ optional

2016-01-20 Thread Lothar Waßmann
Hi,

> On Wed, Jan 20, 2016 at 01:29:51PM +0100, Lothar Waßmann wrote:
> 
> > This pin is used as IRQ pin for the LTC3589 PMIC on the Ka-Ro
> > electronics TX48 module. Make the IRQ optional in the driver and use a
> > polling routine instead if no IRQ is specified in DT.
> > Otherwise the driver will continuously generate interrupts and make
> > the system unusable.
> 
> How will the driver generate interrupts if there is no interrupt
> physically present in the system?
>
It's using timer interrupts to poll the LTC3589 state.


Lothar Waßmann


Re: [RFC PATCH 1/2] lightnvm: specify target's logical address area

2016-01-20 Thread Wenwei Tao
2016-01-20 21:03 GMT+08:00 Matias Bjørling :
> On 01/15/2016 12:44 PM, Wenwei Tao wrote:
>> We can create more than one target on a lightnvm
>> device by specifying its begin lun and end lun.
>>
>> But only specify the physical address area is not
>> enough, we need to get the corresponding non-
>> intersection logical address area division from
>> the backend device's logcial address space.
>> Otherwise the targets on the device might use
>> the same logical addresses and this will cause
>> incorrect information in the device's l2p table.
>>
>> Signed-off-by: Wenwei Tao 
>> ---
>>  drivers/lightnvm/core.c   |  1 +
>>  drivers/lightnvm/gennvm.c | 57 
>> +++
>>  drivers/lightnvm/gennvm.h |  7 ++
>>  drivers/lightnvm/rrpc.c   | 44 
>>  drivers/lightnvm/rrpc.h   |  1 +
>>  include/linux/lightnvm.h  |  8 +++
>>  6 files changed, 114 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>> index 8f41b24..d938636 100644
>> --- a/drivers/lightnvm/core.c
>> +++ b/drivers/lightnvm/core.c
>> @@ -238,6 +238,7 @@ static int nvm_core_init(struct nvm_dev *dev)
>>   dev->nr_chnls;
>>   dev->total_pages = dev->total_blocks * dev->pgs_per_blk;
>>   INIT_LIST_HEAD(>online_targets);
>> + spin_lock_init(>lock);
>>
>>   return 0;
>>  }
>> diff --git a/drivers/lightnvm/gennvm.c b/drivers/lightnvm/gennvm.c
>> index 62c6f4d..f7c4495 100644
>> --- a/drivers/lightnvm/gennvm.c
>> +++ b/drivers/lightnvm/gennvm.c
>> @@ -20,6 +20,59 @@
>>
>>  #include "gennvm.h"
>>
>> +static sector_t gennvm_get_area(struct nvm_dev *dev, sector_t size)
>> +{
>> + struct gen_nvm *gn = dev->mp;
>> + struct gennvm_area *area, *prev;
>> + sector_t start = 0;
>
> Rename to begin?

okay with me.

>
>> + int page_size = dev->sec_size * dev->sec_per_pg;
>> + sector_t max = page_size * dev->total_pages >> 9;
>
> Can we put parentheses around this, just for clarity. Maybe also rename
> the variable to max_sect/max_sectors?
>

okay with me, I will add these changes in v2.

>> +
>> + if (size > max)
>> + return -EINVAL;
>> + area = kmalloc(sizeof(*area), GFP_KERNEL);
>
> I prefer sizeof(struct gennvm_area)
>

I use sizeof(*area) because of its short, sizeof(struct gennvm area)
is also okay with me.

>> + if (!area)
>> + return -ENOMEM;
>> +
>> + spin_lock(>lock);
>> + list_for_each_entry(prev, >area_list, list) {
>> + if (start + size > prev->start) {
>> + start = prev->end;
>> + continue;
>> + }
>> + break;
>> + }
>> +
>> + if (start + size > max) {
>
> Same with parentheses here. Just for clarity.
>

okay

>> + spin_unlock(>lock);
>> + kfree(area);
>> + return -EINVAL;
>> + }
>> +
>> + area->start = start;
>> + area->end = start + size;
>> + list_add(>list, >list);
>> + spin_unlock(>lock);
>> + return start;
>> +}
>> +
>> +static void gennvm_put_area(struct nvm_dev *dev, sector_t start)
>> +{
>> + struct gen_nvm *gn = dev->mp;
>> + struct gennvm_area *area;
>> +
>> + spin_lock(>lock);
>> + list_for_each_entry(area, >area_list, list) {
>> + if (area->start == start) {
>> + list_del(>list);
>> + spin_unlock(>lock);
>> + kfree(area);
>> + return;
>> + }
>> + }
>> + spin_unlock(>lock);
>> +}
>> +
>>  static void gennvm_blocks_free(struct nvm_dev *dev)
>>  {
>>   struct gen_nvm *gn = dev->mp;
>> @@ -228,6 +281,7 @@ static int gennvm_register(struct nvm_dev *dev)
>>
>>   gn->dev = dev;
>>   gn->nr_luns = dev->nr_luns;
>> + INIT_LIST_HEAD(>area_list);
>>   dev->mp = gn;
>>
>>   ret = gennvm_luns_init(dev, gn);
>> @@ -506,6 +560,9 @@ static struct nvmm_type gennvm = {
>>
>>   .get_lun= gennvm_get_lun,
>>   .lun_info_print = gennvm_lun_info_print,
>> +
>> + .get_area   = gennvm_get_area,
>> + .put_area   = gennvm_put_area,
>>  };
>>
>>  static int __init gennvm_module_init(void)
>> diff --git a/drivers/lightnvm/gennvm.h b/drivers/lightnvm/gennvm.h
>> index 9c24b5b..b51813a 100644
>> --- a/drivers/lightnvm/gennvm.h
>> +++ b/drivers/lightnvm/gennvm.h
>> @@ -39,6 +39,13 @@ struct gen_nvm {
>>
>>   int nr_luns;
>>   struct gen_lun *luns;
>> + struct list_head area_list;
>> +};
>> +
>> +struct gennvm_area {
>> + struct list_head list;
>> + sector_t start;
>
> Begin/end fits better.
>

I will change it in v2.

>> + sector_t end;   /* end is excluded */
>>  };
>>
>>  #define gennvm_for_each_lun(bm, lun, i) \
>> diff --git a/drivers/lightnvm/rrpc.c b/drivers/lightnvm/rrpc.c
>> index 8628a5d..ab1d17a 100644
>> --- a/drivers/lightnvm/rrpc.c
>> +++ b/drivers/lightnvm/rrpc.c
>> @@ -1017,7 +1017,17 @@ 

Re: [PATCH v2 5/5] perf/x86/amd/power: Add AMD accumulated power reporting mechanism

2016-01-20 Thread Huang Rui
On Wed, Jan 20, 2016 at 10:22:44AM +0100, Peter Zijlstra wrote:
> On Wed, Jan 20, 2016 at 12:48:24PM +0800, Huang Rui wrote:
> > Hi Peter,
> > 
> > Thanks so much to your comments.
> > 
> > On Tue, Jan 19, 2016 at 01:12:50PM +0100, Peter Zijlstra wrote:
> > > On Thu, Jan 14, 2016 at 10:50:08AM +0800, Huang Rui wrote:
> > > > +struct power_pmu {
> > > > +   spinlock_t  lock;
> > > 
> > > This should be a raw_spinlock_t, as it'll be nested under other
> > > raw_spinlock_t's.
> > > 
> > 
> > Do you mean the following spinlock operations are in hardware
> > interrupts disabled case, so I need use raw_spinlock_t instead, right?
> 
> 
>   mainline-rt
> 
> raw_spinlock_tspin-waits  spin-waits
> spinlock_tspin-waits  blocks (rt-mutex)
> struct mutex  blocks  blocks (rt-mutex)
> 
> 
> since these functions are themselves called with raw_spinlock_t held
> (perf_event_context::lock for example, but also rq::lock), any lock
> nested inside them must also be raw_spinlock_t.
> 

I see, thank you. :-)

I just quickly looked at about the spinlock on -rt mode. Because
realtime linux kernel provides two kinds of spinlock, the original
spinlock_t will be replaced the one which is able to sleep, actually,
like mutex. And another one (you mentioned here, raw_spinlock_t) can
keep on non-sleep behavior, that is the real spinlock.

And my lock here also will be nested under perf_event_context::lock,
right?

> I have a lockdep patch somewhere that checks these ordering things; I
> should rebase and post that again.
> 

Can you CC me when you post that patch next time?

> > Use raw_spin_lock_irqsave/raw_spin_unlock_irqrestore?
> 
> pmu::{start,stop,add,del} will be called with IRQs already disabled.
> 
> > > > +static int power_cpu_init(int cpu)
> > > > +{
> > > > +   int i, cu, ret = 0;
> > > > +   cpumask_var_t mask, dummy_mask;
> > > > +
> > > > +   cu = cpu / cores_per_cu;
> > > > +
> > > > +   if (!zalloc_cpumask_var(, GFP_KERNEL))
> > > > +   return -ENOMEM;
> > > > +
> > > > +   if (!zalloc_cpumask_var(_mask, GFP_KERNEL)) {
> > > > +   ret = -ENOMEM;
> > > > +   goto out;
> > > > +   }
> > > > +
> > > > +   for (i = 0; i < cores_per_cu; i++)
> > > > +   cpumask_set_cpu(i, mask);
> > > > +
> > > > +   cpumask_shift_left(mask, mask, cu * cores_per_cu);
> > > > +
> > > > +   if (!cpumask_and(dummy_mask, mask, _mask))
> > > > +   cpumask_set_cpu(cpu, _mask);
> > > > +
> > > > +   free_cpumask_var(dummy_mask);
> > > > +out:
> > > > +   free_cpumask_var(mask);
> > > > +
> > > > +   return ret;
> > > > +}
> > > 
> > > > +static int power_cpu_notifier(struct notifier_block *self,
> > > > + unsigned long action, void *hcpu)
> > > > +{
> > > > +   unsigned int cpu = (long)hcpu;
> > > > +
> > > > +   switch (action & ~CPU_TASKS_FROZEN) {
> > > > +   case CPU_UP_PREPARE:
> > > > +   if (power_cpu_prepare(cpu))
> > > > +   return NOTIFY_BAD;
> > > > +   break;
> > > > +   case CPU_STARTING:
> > > > +   if (power_cpu_init(cpu))
> > > > +   return NOTIFY_BAD;
> > > 
> > > this is called with IRQs disabled, which makes those GFP_KERNEL allocs
> > > above a pretty bad idea.
> > > 
> > 
> > Right, so should I use GFP_ATOMIC to allocate cpumask here?
> 
> One should not use GFP_ATOMIC if at all possible, also no, -rt cannot do
> _any_ allocations from this site.
> 

OK, that's because allocation might sleep when IRQ disabled. That's
incorrect.

> > > Also, note that -rt cannot actually do _any_ allocations/frees from
> > > STARTING.
> > > 
> > > Please move the allocs/frees to PREPARE/ONLINE.
> > > 
> > 
> > How about add two cpumask_var_t at power_pmu structure? Then allocate
> > the two cpumask_var_t (pmu->mask, pmu->dummy_mask), and they can be
> > also used on power_cpu_init.
> 
> That would work.

I draft an update diff that based on original patch, please take a
look.

8<--

diff --git a/arch/x86/kernel/cpu/perf_event_amd_power.c 
b/arch/x86/kernel/cpu/perf_event_amd_power.c
index 69ef234..e71d993 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_power.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_power.c
@@ -46,10 +46,17 @@ static unsigned int cu_num;
 static u64 max_cu_acc_power;
 
 struct power_pmu {
-   spinlock_t  lock;
+   raw_spinlock_t  lock;
struct list_headactive_list;
struct pmu  *pmu; /* pointer to power_pmu_class */
local64_t   cpu_sw_pwr_ptsc;
+   /*
+* These two cpumasks is used for avoiding the allocations on
+* CPU_STARTING phase. Because power_cpu_prepare will be
+* called on IRQs disabled 

RE: [Resend PATCH V5 1/1] NTB: Add support for AMD PCI-Express Non-Transparent Bridge

2016-01-20 Thread Hubbe, Allen
From: Xiangliang Yu 

> Signed-off-by: Xiangliang Yu 

Yes.

> Reviewed-by: Jon Mason 

Maybe, but that's for Jon to decide.  If he accepts it, he will add 
signed-off-by, but again, that's for Jon to decide.

> Reviewed-by: Allen Hubbe 

Adding my reviewed-by is hardly a reason to resend the whole patch.


[PATCH v2] mmc: dw_mmc: remove DW_MCI_QUIRK_BROKEN_CARD_DETECTION quirk

2016-01-20 Thread Shawn Lin
dw_mmc already use mmc_of_parse to get "broken-cd" property,
but it considered "broken-cd" to be a quirk in its driver. We
don't need this quirk here, and just take what we need from
mmc->caps.

Signed-off-by: Shawn Lin 

---

Changes in v2:
- fix wrong using of cur_slot

 drivers/mmc/host/dw_mmc.c  | 35 ++-
 include/linux/mmc/dw_mmc.h |  4 +---
 2 files changed, 11 insertions(+), 28 deletions(-)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 7128351..96f173b 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -1450,12 +1450,11 @@ static int dw_mci_get_cd(struct mmc_host *mmc)
 {
int present;
struct dw_mci_slot *slot = mmc_priv(mmc);
-   struct dw_mci_board *brd = slot->host->pdata;
struct dw_mci *host = slot->host;
int gpio_cd = mmc_gpio_get_cd(mmc);
 
/* Use platform get_cd function, else try onboard card detect */
-   if ((brd->quirks & DW_MCI_QUIRK_BROKEN_CARD_DETECTION) ||
+   if ((mmc->caps & MMC_CAP_NEEDS_POLL) ||
(mmc->caps & MMC_CAP_NONREMOVABLE))
present = 1;
else if (!IS_ERR_VALUE(gpio_cd))
@@ -2840,23 +2839,13 @@ static void dw_mci_dto_timer(unsigned long arg)
 }
 
 #ifdef CONFIG_OF
-static struct dw_mci_of_quirks {
-   char *quirk;
-   int id;
-} of_quirks[] = {
-   {
-   .quirk  = "broken-cd",
-   .id = DW_MCI_QUIRK_BROKEN_CARD_DETECTION,
-   },
-};
-
 static struct dw_mci_board *dw_mci_parse_dt(struct dw_mci *host)
 {
struct dw_mci_board *pdata;
struct device *dev = host->dev;
struct device_node *np = dev->of_node;
const struct dw_mci_drv_data *drv_data = host->drv_data;
-   int idx, ret;
+   int ret;
u32 clock_frequency;
 
pdata = devm_kzalloc(dev, sizeof(*pdata), GFP_KERNEL);
@@ -2871,11 +2860,6 @@ static struct dw_mci_board *dw_mci_parse_dt(struct 
dw_mci *host)
pdata->num_slots = 1;
}
 
-   /* get quirks */
-   for (idx = 0; idx < ARRAY_SIZE(of_quirks); idx++)
-   if (of_get_property(np, of_quirks[idx].quirk, NULL))
-   pdata->quirks |= of_quirks[idx].id;
-
if (of_property_read_u32(np, "fifo-depth", >fifo_depth))
dev_info(dev,
 "fifo-depth property not found, using value of FIFOTH 
register as default\n");
@@ -2908,18 +2892,19 @@ static struct dw_mci_board *dw_mci_parse_dt(struct 
dw_mci *host)
 
 static void dw_mci_enable_cd(struct dw_mci *host)
 {
-   struct dw_mci_board *brd = host->pdata;
unsigned long irqflags;
u32 temp;
int i;
+   struct dw_mci_slot *slot;
 
-   /* No need for CD if broken card detection */
-   if (brd->quirks & DW_MCI_QUIRK_BROKEN_CARD_DETECTION)
-   return;
-
-   /* No need for CD if all slots have a non-error GPIO */
+   /*
+* No need for CD if all slots have a non-error GPIO
+* as well as broken card detection is found.
+*/
for (i = 0; i < host->num_slots; i++) {
-   struct dw_mci_slot *slot = host->slot[i];
+   slot = host->slot[i];
+   if (slot->mmc->caps & MMC_CAP_NEEDS_POLL)
+   return;
 
if (IS_ERR_VALUE(mmc_gpio_get_cd(slot->mmc)))
break;
diff --git a/include/linux/mmc/dw_mmc.h b/include/linux/mmc/dw_mmc.h
index 89df7ab..250d822 100644
--- a/include/linux/mmc/dw_mmc.h
+++ b/include/linux/mmc/dw_mmc.h
@@ -235,10 +235,8 @@ struct dw_mci_dma_ops {
 };
 
 /* IP Quirks/flags. */
-/* Unreliable card detection */
-#define DW_MCI_QUIRK_BROKEN_CARD_DETECTION BIT(0)
 /* Timer for broken data transfer over scheme */
-#define DW_MCI_QUIRK_BROKEN_DTOBIT(1)
+#define DW_MCI_QUIRK_BROKEN_DTOBIT(0)
 
 struct dma_pdata;
 
-- 
2.3.7




[PATCH RFC ] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-20 Thread Ding Tianhong
I build a script to create several process for ioctl loop calling,
the ioctl will calling the kernel function just like:
xx_ioctl {
...
rtnl_lock();
function();
rtnl_unlock();
...
}
The function may sleep several ms, but will not halt, at the same time
another user service may calling ifconfig to change the state of the
ethernet, and after several hours, the hung task thread report this problem:


149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
[149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[149738.042280] ifconfig D 88061ec13680 0 11890 11573 0x0080
[149738.042284] 88052449bd40 0082 88053a33f300 
88052449bfd8
[149738.042286] 88052449bfd8 88052449bfd8 88053a33f300 
819e6240
[149738.042288] 819e6244 88053a33f300  
819e6248
[149738.042290] Call Trace:
[149738.042300] [] schedule_preempt_disabled+0x29/0x70
[149738.042303] [] __mutex_lock_slowpath+0xc5/0x1c0
[149738.042305] [] mutex_lock+0x1f/0x2f
[149738.042309] [] rtnl_lock+0x15/0x20
[149738.042311] [] dev_ioctl+0xda/0x590
[149738.042314] [] ? __do_page_fault+0x21c/0x560
[149738.042318] [] sock_do_ioctl+0x45/0x50
[149738.042320] [] sock_ioctl+0x1f0/0x2c0
[149738.042324] [] do_vfs_ioctl+0x2e5/0x4c0
[149738.042327] [] ? fget_light+0xa0/0xd0

 cut here 

I got the vmcore and found that the ifconfig is already in the wait_list of the
rtnl_lock for 120 second, but my process could get and release the rtnl_lock
normally several times in one second, so it means that my process jump the
queue and the ifconfig couldn't get the rtnl all the time, I check the mutex 
lock
slow path and found that the mutex may spin on owner ignore whether the  wait 
list
is empty, it will cause the task in the wait list always be cut in line, so add
test for wait list in the mutex_can_spin_on_owner and avoid this problem.

Signed-off-by: Ding Tianhong 
---
 kernel/locking/mutex.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..596b341 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex 
*lock)
struct task_struct *owner;
int retval = 1;
 
-   if (need_resched())
+   if (need_resched() || atomic_read(>count) == -1)
return 0;
 
rcu_read_lock();
@@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex 
*lock)
 /*
  * Optimistic spinning.
  *
- * We try to spin for acquisition when we find that the lock owner
- * is currently running on a (different) CPU and while we don't
- * need to reschedule. The rationale is that if the lock owner is
- * running, it is likely to release the lock soon.
+ * We try to spin for acquisition when we find that there are no
+ * pending waiters and the lock owner is currently running on a
+ * (different) CPU and while we don't need to reschedule. The
+ * rationale is that if the lock owner is running, it is likely
+ * to release the lock soon.
  *
  * Since this needs the lock owner, and this mutex implementation
  * doesn't track the owner atomically in the lock field, we need to
-- 
2.5.0




[Resend PATCH V5 0/1] AMD NTB V5 changes

2016-01-20 Thread Xiangliang Yu
Resend V5 for more convenient pick up.
Main changes in V5
Only change Signed-off-by to Reviewed-by.

Xiangliang Yu (1):
  [Resend patch V5] NTB: Add support for AMD PCI-Express Non-Transparent Bridge

 MAINTAINERS |6 +
 drivers/ntb/hw/Kconfig  |1 +
 drivers/ntb/hw/Makefile |1 +
 drivers/ntb/hw/amd/Kconfig  |7 +
 drivers/ntb/hw/amd/Makefile |1 +
 drivers/ntb/hw/amd/ntb_hw_amd.c | 1143 +++
 drivers/ntb/hw/amd/ntb_hw_amd.h |  217 
 7 files changed, 1376 insertions(+)
 create mode 100644 drivers/ntb/hw/amd/Kconfig
 create mode 100644 drivers/ntb/hw/amd/Makefile
 create mode 100644 drivers/ntb/hw/amd/ntb_hw_amd.c
 create mode 100644 drivers/ntb/hw/amd/ntb_hw_amd.h

-- 
1.9.1



Re: [PATCH 0/6] perf core: Read from overwrite ring buffer

2016-01-20 Thread Wangnan (F)



On 2016/1/20 10:20, Alexei Starovoitov wrote:

On Wed, Jan 20, 2016 at 09:37:42AM +0800, Wangnan (F) wrote:


On 2016/1/20 1:42, Alexei Starovoitov wrote:

On Tue, Jan 19, 2016 at 11:16:44AM +, Wang Nan wrote:

This patchset introduces two methods to support reading from overwrite.

  1) Tailsize: write the size of an event at the end of it
  2) Backward writing: write the ring buffer from the end of it to the
 beginning.

what happend with your other idea of moving the whole header to the end?
That felt better than either of these options.

I'll try it today. However, putting all of the three together is
not as easy as this patchset.

I'm missing something. Why all three in one set?


Can't implement all three in one, but implement two of them make
benchmarking simpler :)

Here comes some numbers.

I attach a target program at the end of this mail. It calls
close(-1) for 300 times, and use gettimeofday to check
how many us it takes.

Following cases are tested:


 BASE: ./a.out
 RAWPERF : ./perf record -o /dev/null -e raw_syscalls:* ./a.out
 WRTBKWRD: ./perf record -o /dev/null -e raw_syscalls:* ./a.out
 TAILSIZE: ./perf record --no-has-write-backward -o /dev/null -e 
raw_syscalls:*/overwrite/ ./a.out
 RAWOVWRT: ./perf record --no-has-write-backward --no-has-tailsize -o 
/dev/null -e raw_syscalls:*/overwrite/ ./a.out


With this script:

func() {
for x in `seq 1 100` ; do $1; done | tee data_$2
}

func ./a.out base
func "./perf record -o /dev/null -e raw_syscalls:* ./a.out" rawperf
func "./perf record -o /dev/null -e raw_syscalls:*/overwrite/ ./a.out" 
wrtbkwrd
func "./perf record -o /dev/null --no-has-write-backward -e 
raw_syscalls:*/overwrite/ ./a.out" tailsize
func "./perf record -o /dev/null --no-has-write-backward 
--no-has-tailsize -o /dev/null -e raw_syscalls:*/overwrite/ ./a.out" 
rawovwrt


Result:

MEAN   STDVAR
BASE:  879870.81  11913.13
RAWPERF : 2603854.7  706658.4
WRTBKWRD: 2313301.220  6727.957
TAILSIZE: 2383051.860  5248.061
RAWOVWRT: 2315273.180  5221.025

So it seems backward writing methods is good enough. We don't need to 
consider

tailsize method.

Code for this benchmark can be found from:

https://git.kernel.org/cgit/linux/kernel/git/pi3orama/linux.git/ 
perf/overwrite-benchmark


Thank you.

 Test program --
#include 
#include 
#include 
#include 

int main()
{
int i;
struct timeval tv1, tv2;
long long us1, us2;

gettimeofday(, NULL);
for (i = 0; i < 1000 * 1000 * 3; i++) {
close(-1);
}
gettimeofday(, NULL);

us1 = tv1.tv_sec * 100 + tv1.tv_usec;
us2 = tv2.tv_sec * 100 + tv2.tv_usec;
printf("%ld\n", us2 - us1);

return 0;
}



Re: [PATCH] mmc: dw_mmc: remove DW_MCI_QUIRK_BROKEN_CARD_DETECTION quirk

2016-01-20 Thread Shawn Lin

On 2016/1/21 9:43, Jaehoon Chung wrote:

Hi, Shawn.

After applied this patch at my dw-mmc git, i found some problem.
So I will revert this until fixing problem.



Oops this patch was based on some un-submmited ones which simplify
the probe flow. Although I did remember to only pick this one out from
my girrit and rebase on ulf's next, somehow I sent the wrong version.

Thanks for pointing out, I will respin v2.


On 01/18/2016 05:50 PM, Shawn Lin wrote:

dw_mmc already use mmc_of_parse to get "broken-cd" property,
but it considered "broken-cd" to be a quirk in its driver. We
don't need this quirk here, and just take what we need from
mmc->caps.

Signed-off-by: Shawn Lin 
---

  drivers/mmc/host/dw_mmc.c  | 23 +++
  include/linux/mmc/dw_mmc.h |  4 +---
  2 files changed, 4 insertions(+), 23 deletions(-)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 7128351..bbf9ca6 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -1450,12 +1450,11 @@ static int dw_mci_get_cd(struct mmc_host *mmc)
  {
int present;
struct dw_mci_slot *slot = mmc_priv(mmc);
-   struct dw_mci_board *brd = slot->host->pdata;
struct dw_mci *host = slot->host;
int gpio_cd = mmc_gpio_get_cd(mmc);

/* Use platform get_cd function, else try onboard card detect */
-   if ((brd->quirks & DW_MCI_QUIRK_BROKEN_CARD_DETECTION) ||
+   if ((mmc->caps & MMC_CAP_NEEDS_POLL) ||
(mmc->caps & MMC_CAP_NONREMOVABLE))
present = 1;
else if (!IS_ERR_VALUE(gpio_cd))
@@ -2840,23 +2839,13 @@ static void dw_mci_dto_timer(unsigned long arg)
  }

  #ifdef CONFIG_OF
-static struct dw_mci_of_quirks {
-   char *quirk;
-   int id;
-} of_quirks[] = {
-   {
-   .quirk  = "broken-cd",
-   .id = DW_MCI_QUIRK_BROKEN_CARD_DETECTION,
-   },
-};
-
  static struct dw_mci_board *dw_mci_parse_dt(struct dw_mci *host)
  {
struct dw_mci_board *pdata;
struct device *dev = host->dev;
struct device_node *np = dev->of_node;
const struct dw_mci_drv_data *drv_data = host->drv_data;
-   int idx, ret;
+   int ret;
u32 clock_frequency;

pdata = devm_kzalloc(dev, sizeof(*pdata), GFP_KERNEL);
@@ -2871,11 +2860,6 @@ static struct dw_mci_board *dw_mci_parse_dt(struct 
dw_mci *host)
pdata->num_slots = 1;
}

-   /* get quirks */
-   for (idx = 0; idx < ARRAY_SIZE(of_quirks); idx++)
-   if (of_get_property(np, of_quirks[idx].quirk, NULL))
-   pdata->quirks |= of_quirks[idx].id;
-
if (of_property_read_u32(np, "fifo-depth", >fifo_depth))
dev_info(dev,
 "fifo-depth property not found, using value of FIFOTH 
register as default\n");
@@ -2908,13 +2892,12 @@ static struct dw_mci_board *dw_mci_parse_dt(struct 
dw_mci *host)

  static void dw_mci_enable_cd(struct dw_mci *host)
  {
-   struct dw_mci_board *brd = host->pdata;
unsigned long irqflags;
u32 temp;
int i;

/* No need for CD if broken card detection */
-   if (brd->quirks & DW_MCI_QUIRK_BROKEN_CARD_DETECTION)
+   if (host->cur_slot->mmc->caps & MMC_CAP_NEEDS_POLL)


dw_mci_enable_cd is called in dw_mci_probe.
So host->cur_slot is not assigned to anything..

Best Regards,
Jaehoon Chung


return;

/* No need for CD if all slots have a non-error GPIO */
diff --git a/include/linux/mmc/dw_mmc.h b/include/linux/mmc/dw_mmc.h
index 89df7ab..250d822 100644
--- a/include/linux/mmc/dw_mmc.h
+++ b/include/linux/mmc/dw_mmc.h
@@ -235,10 +235,8 @@ struct dw_mci_dma_ops {
  };

  /* IP Quirks/flags. */
-/* Unreliable card detection */
-#define DW_MCI_QUIRK_BROKEN_CARD_DETECTION BIT(0)
  /* Timer for broken data transfer over scheme */
-#define DW_MCI_QUIRK_BROKEN_DTOBIT(1)
+#define DW_MCI_QUIRK_BROKEN_DTOBIT(0)

  struct dma_pdata;










--
Best Regards
Shawn Lin



Re: [PATCH v2] clk: rockchip: Add support for multiple clock providers

2016-01-20 Thread Xing Zheng

Hi, Heiko,
Thank you for your reply.

On 2016年01月21日 07:38, Heiko Stuebner wrote:

Hi,

Am Mittwoch, 20. Januar 2016, 17:06:49 schrieb Xing Zheng:

There are need to support Multi-CRUs probability in future, but
it is not supported on the current Rockchip Clock Framework.

Therefore, this patch add support a provider as the parameter
handler when we call the clock register functions for per CRU.

Signed-off-by: Xing Zheng 

overall this looks really nice. Thanks for following up on our talk so
quickly :-) .


---

Changes in v2:
- Fix missed to call rockchip_clk_common_cru_init when SoCs clock init.

  drivers/clk/rockchip/clk-rk3036.c |   15 +++--
  drivers/clk/rockchip/clk-rk3188.c |   40 -
  drivers/clk/rockchip/clk-rk3228.c |   15 +++--
  drivers/clk/rockchip/clk-rk3288.c |   17 --
  drivers/clk/rockchip/clk-rk3368.c |   19 +++---
  drivers/clk/rockchip/clk.c|  120
+++-- drivers/clk/rockchip/clk.h|
   35 ---
  7 files changed, 170 insertions(+), 91 deletions(-)

diff --git a/drivers/clk/rockchip/clk-rk3036.c
b/drivers/clk/rockchip/clk-rk3036.c index 483913b..050ad13 100644
--- a/drivers/clk/rockchip/clk-rk3036.c
+++ b/drivers/clk/rockchip/clk-rk3036.c
@@ -434,6 +434,7 @@ static const char *const rk3036_critical_clocks[]
__initconst = {

  static void __init rk3036_clk_init(struct device_node *np)
  {
+   struct rockchip_clk_provider *ctx;
void __iomem *reg_base;
struct clk *clk;

@@ -443,7 +444,9 @@ static void __init rk3036_clk_init(struct device_node
*np) return;
}

-   rockchip_clk_init(np, reg_base, CLK_NR_CLKS);
+   ctx = rockchip_clk_init(reg_base, CLK_NR_CLKS);
+
+   rockchip_clk_common_cru_init(np);

/* xin12m is created by an cru-internal divider */
clk = clk_register_fixed_factor(NULL, "xin12m", "xin24m", 0, 1, 2);
@@ -473,15 +476,15 @@ static void __init rk3036_clk_init(struct
device_node *np) pr_warn("%s: could not register clock sclk_macref_out:
%ld\n", __func__, PTR_ERR(clk));

-   rockchip_clk_register_plls(rk3036_pll_clks,
+   rockchip_clk_register_plls(ctx, rk3036_pll_clks,
   ARRAY_SIZE(rk3036_pll_clks),
   RK3036_GRF_SOC_STATUS0);
-   rockchip_clk_register_branches(rk3036_clk_branches,
+   rockchip_clk_register_branches(ctx, rk3036_clk_branches,
  ARRAY_SIZE(rk3036_clk_branches));
rockchip_clk_protect_critical(rk3036_critical_clocks,
  ARRAY_SIZE(rk3036_critical_clocks));

-   rockchip_clk_register_armclk(ARMCLK, "armclk",
+   rockchip_clk_register_armclk(ctx, ARMCLK, "armclk",
mux_armclk_p, ARRAY_SIZE(mux_armclk_p),
_cpuclk_data, rk3036_cpuclk_rates,
ARRAY_SIZE(rk3036_cpuclk_rates));
@@ -489,6 +492,8 @@ static void __init rk3036_clk_init(struct device_node
*np) rockchip_register_softrst(np, 9, reg_base + RK2928_SOFTRST_CON(0),
ROCKCHIP_SOFTRST_HIWORD_MASK);

-   rockchip_register_restart_notifier(RK2928_GLB_SRST_FST, NULL);
+   rockchip_register_restart_notifier(ctx, RK2928_GLB_SRST_FST, NULL);
+
+   rockchip_clk_of_add_provider(np, ctx);
  }
  CLK_OF_DECLARE(rk3036_cru, "rockchip,rk3036-cru", rk3036_clk_init);
diff --git a/drivers/clk/rockchip/clk-rk3188.c
b/drivers/clk/rockchip/clk-rk3188.c index 7f7444c..338a22e 100644
--- a/drivers/clk/rockchip/clk-rk3188.c
+++ b/drivers/clk/rockchip/clk-rk3188.c
@@ -750,18 +750,19 @@ static const char *const rk3188_critical_clocks[]
__initconst = { "pclk_peri",
  };

-static void __init rk3188_common_clk_init(struct device_node *np)
+static struct rockchip_clk_provider *__init rk3188_common_clk_init(struct
device_node *np) {
+   struct rockchip_clk_provider *ctx;
void __iomem *reg_base;
struct clk *clk;

reg_base = of_iomap(np, 0);
-   if (!reg_base) {
-   pr_err("%s: could not map cru region\n", __func__);
-   return;
-   }
+   if (!reg_base)
+   panic("%s: could not map cru region\n", __func__);

I don't believe in doing panics everywhere :-), please do
return ERR_PTR(-ENOMEM);
here and in similar locations and let the top-most caller handle errors.

Done.

@@ -260,27 +261,49 @@ static struct clk
*rockchip_clk_register_frac_branch(const char *name, return clk;
  }

-static DEFINE_SPINLOCK(clk_lock);
-static struct clk **clk_table;
-static void __iomem *reg_base;
-static struct clk_onecell_data clk_data;
  static struct device_node *cru_node;

please also include the devicetree node into the ctx struct. That way we can
keep track of all of them (when there are multiple).
In my opinion, I am a bit worried about the function 
rockchip_clk_get_grf, it just refer to
common CRU node to acquire global GRF regmap. And the function is 
non-parameter,

so, I think I will add ctx as parameter in 

[Resend PATCH V5 1/1] NTB: Add support for AMD PCI-Express Non-Transparent Bridge

2016-01-20 Thread Xiangliang Yu
This adds support for AMD's PCI-Express Non-Transparent Bridge
(NTB) device on the Zeppelin platform. The driver connnects to the
standard NTB sub-system interface, with modification to add hooks
for power management in a separate patch. The AMD NTB device has 3
memory windows, 16 doorbell, 16 scratch-pad registers, and supports
up to 16 PCIe lanes running a Gen3 speeds.

Signed-off-by: Xiangliang Yu 
Reviewed-by: Jon Mason 
Reviewed-by: Allen Hubbe 
---
 MAINTAINERS |6 +
 drivers/ntb/hw/Kconfig  |1 +
 drivers/ntb/hw/Makefile |1 +
 drivers/ntb/hw/amd/Kconfig  |7 +
 drivers/ntb/hw/amd/Makefile |1 +
 drivers/ntb/hw/amd/ntb_hw_amd.c | 1143 +++
 drivers/ntb/hw/amd/ntb_hw_amd.h |  217 
 7 files changed, 1376 insertions(+)
 create mode 100644 drivers/ntb/hw/amd/Kconfig
 create mode 100644 drivers/ntb/hw/amd/Makefile
 create mode 100644 drivers/ntb/hw/amd/ntb_hw_amd.c
 create mode 100644 drivers/ntb/hw/amd/ntb_hw_amd.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 5192700..908941a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7585,6 +7585,12 @@ W:   https://github.com/jonmason/ntb/wiki
 T: git git://github.com/jonmason/ntb.git
 F: drivers/ntb/hw/intel/
 
+NTB AMD DRIVER
+M: Xiangliang Yu 
+L: linux-...@googlegroups.com
+S: Supported
+F: drivers/ntb/hw/amd/
+
 NTFS FILESYSTEM
 M: Anton Altaparmakov 
 L: linux-ntfs-...@lists.sourceforge.net
diff --git a/drivers/ntb/hw/Kconfig b/drivers/ntb/hw/Kconfig
index 4d5535c..7116472 100644
--- a/drivers/ntb/hw/Kconfig
+++ b/drivers/ntb/hw/Kconfig
@@ -1 +1,2 @@
+source "drivers/ntb/hw/amd/Kconfig"
 source "drivers/ntb/hw/intel/Kconfig"
diff --git a/drivers/ntb/hw/Makefile b/drivers/ntb/hw/Makefile
index 175d7c9..532e085 100644
--- a/drivers/ntb/hw/Makefile
+++ b/drivers/ntb/hw/Makefile
@@ -1 +1,2 @@
+obj-$(CONFIG_NTB_AMD)  += amd/
 obj-$(CONFIG_NTB_INTEL)+= intel/
diff --git a/drivers/ntb/hw/amd/Kconfig b/drivers/ntb/hw/amd/Kconfig
new file mode 100644
index 000..cfe903c
--- /dev/null
+++ b/drivers/ntb/hw/amd/Kconfig
@@ -0,0 +1,7 @@
+config NTB_AMD
+   tristate "AMD Non-Transparent Bridge support"
+   depends on X86_64
+   help
+This driver supports AMD NTB on capable Zeppelin hardware.
+
+If unsure, say N.
diff --git a/drivers/ntb/hw/amd/Makefile b/drivers/ntb/hw/amd/Makefile
new file mode 100644
index 000..ad54da9
--- /dev/null
+++ b/drivers/ntb/hw/amd/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_NTB_AMD) += ntb_hw_amd.o
diff --git a/drivers/ntb/hw/amd/ntb_hw_amd.c b/drivers/ntb/hw/amd/ntb_hw_amd.c
new file mode 100644
index 000..d484829
--- /dev/null
+++ b/drivers/ntb/hw/amd/ntb_hw_amd.c
@@ -0,0 +1,1143 @@
+/*
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ *   redistributing this file, you may do so under either license.
+ *
+ *   GPL LICENSE SUMMARY
+ *
+ *   Copyright (C) 2016 Advanced Micro Devices, Inc. All Rights Reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   BSD LICENSE
+ *
+ *   Copyright (C) 2016 Advanced Micro Devices, Inc. All Rights Reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copy
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of AMD Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * AMD PCIe NTB Linux driver
+ *
+ * Contact Information:
+ * Xiangliang 

Re: [lkp] [mm/vmstat] 6cdb18ad98: -8.5% will-it-scale.per_thread_ops

2016-01-20 Thread Huang, Ying
Heiko Carstens  writes:

> On Wed, Jan 06, 2016 at 11:20:55AM +0800, kernel test robot wrote:
>> FYI, we noticed the below changes on
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>> commit 6cdb18ad98a49f7e9b95d538a0614cde827404b8 ("mm/vmstat: fix overflow in 
>> mod_zone_page_state()")
>> 
>> 
>> =
>> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
>>   
>> gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/ivb42/pread1/will-it-scale
>> 
>> commit: 
>>   cc28d6d80f6ab494b10f0e2ec949eacd610f66e3
>>   6cdb18ad98a49f7e9b95d538a0614cde827404b8
>> 
>> cc28d6d80f6ab494 6cdb18ad98a49f7e9b95d538a0 
>>  -- 
>>  %stddev %change %stddev
>>  \  |\  
>>2733943 .  0%  -8.5%2502129 .  0%  will-it-scale.per_thread_ops
>>   3410 .  0%  -2.0%   3343 .  0%  will-it-scale.time.system_time
>> 340.08 .  0% +19.7% 406.99 .  0%  will-it-scale.time.user_time
>>   69882822 .  2% -24.3%   52926191 .  5%  cpuidle.C1-IVT.time
>> 340.08 .  0% +19.7% 406.99 .  0%  time.user_time
>> 491.25 .  6% -17.7% 404.25 .  7%  
>> numa-vmstat.node0.nr_alloc_batch
>>   2799 . 20% -36.6%   1776 .  0%  numa-vmstat.node0.nr_mapped
>> 630.00 .140%+244.4%   2169 .  1%  
>> numa-vmstat.node1.nr_inactive_anon
>
> Hmm... this is odd. I did review all callers of mod_zone_page_state() and
> couldn't find anything obvious that would go wrong after the int -> long
> change.
>
> I also tried the "pread1_threads" test case from
> https://github.com/antonblanchard/will-it-scale.git
>
> However the results seem to vary a lot after a reboot(!), at least on s390.
>
> So I'm not sure if this is really a regression.

Most part of the regression is restored for v4.4.  But because the changes are
like "V", it is hard to bisect.

=
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  
gcc-4.9/performance/x86_64-rhel/thread/24/debian-x86_64-2015-02-07.cgz/ivb42/pread1/will-it-scale

commit: 
  cc28d6d80f6ab494b10f0e2ec949eacd610f66e3
  6cdb18ad98a49f7e9b95d538a0614cde827404b8
  v4.4

cc28d6d80f6ab494 6cdb18ad98a49f7e9b95d538a0   v4.4 
 -- -- 
 %stddev %change %stddev %change %stddev
 \  |\  |\  
   3083436 ±  0%  -9.6%2788374 ±  0%  -3.7%2970130 ±  0%  
will-it-scale.per_thread_ops
  6447 ±  0%  -2.2%   6308 ±  0%  -0.3%   6425 ±  0%  
will-it-scale.time.system_time
776.90 ±  0% +17.9% 915.71 ±  0%  +2.9% 799.12 ±  0%  
will-it-scale.time.user_time
316177 ±  4%  -4.6% 301616 ±  3% -10.3% 283563 ±  3%  
softirqs.RCU
776.90 ±  0% +17.9% 915.71 ±  0%  +2.9% 799.12 ±  0%  
time.user_time
777.33 ±  7% +20.8% 938.67 ±  7%  +7.5% 836.00 ±  8%  
slabinfo.blkdev_requests.active_objs
777.33 ±  7% +20.8% 938.67 ±  7%  +7.5% 836.00 ±  8%  
slabinfo.blkdev_requests.num_objs
  74313962 ± 44% -16.5%   62053062 ± 41% -49.9%   37246967 ±  8%  
cpuidle.C1-IVT.time
  43381614 ± 79% +24.4%   53966568 ±111%+123.9%   97135791 ± 33%  
cpuidle.C1E-IVT.time
 97.67 ± 36% +95.2% 190.67 ± 63%+122.5% 217.33 ± 41%  
cpuidle.C3-IVT.usage
   3679437 ± 69%-100.0%   0.00 ± -1%-100.0%   0.00 ± -1%  
latency_stats.avg.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
   5177475 ± 82%-100.0%   0.00 ± -1%-100.0%   0.00 ± -1%  
latency_stats.max.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
  11726393 ±112%-100.0%   0.00 ± -1%-100.0%   0.00 ± -1%  
latency_stats.sum.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
178.07 ±  0%  -1.3% 175.79 ±  0%  -0.8% 176.62 ±  0%  
turbostat.CorWatt
  0.20 ±  2% -16.9%   0.16 ± 18% -11.9%   0.17 ± 17%  
turbostat.Pkg%pc6
207.38 ±  0%  -1.1% 205.13 ±  0%  -0.7% 205.99 ±  0%  
turbostat.PkgWatt
  6889 ± 33% -49.2%   3497 ± 86% -19.4%   5552 ± 27%  
numa-vmstat.node0.nr_active_anon
483.33 ± 29% -32.3% 327.00 ± 48%  +0.1% 

Re: [PATCH 0/4] support for text-relative kallsyms table

2016-01-20 Thread Ard Biesheuvel
On 21 January 2016 at 06:10, Rusty Russell  wrote:
> Ard Biesheuvel  writes:
>> This implements text-relative kallsyms address tables. This was developed
>> as part of my series to implement KASLR/CONFIG_RELOCATABLE for arm64, but
>> I think it may be beneficial to other architectures as well, so I am
>> presenting it as a separate series.
>
> Nice work!
>

Thanks

> AFAICT this should work for every arch, as long as they start with _text
> (esp: data and init must be > _text).  In addition, it's not harmful on
> 32 bit archs.
>
> IOW, I'd like to turn it on for everyone and discard some code.  But
> it's easier to roll in like you've done first.
>
> Should we enable it by default for every arch for now, and see what
> happens?
>

As you say, this only works if every symbol >= _text, which is
obviously not the case per the conditional in scripts/kallsyms.c,
which emits _text + n or _text - n depending on whether the symbol
precedes or follows _text. The git log tells me for which arch this
was originally implemented, but it does not tell me which other archs
have come to rely on it in the meantime.

On top of that, ia64 fails to build with this option, since it has
some whitelisted absolute symbols that look suspiciously like they
could be emitted as _text relative (and it does not even matter in the
absence of CONFIG_RELOCATABLE on ia64, afaict) but I don't know
whether we can just override their types as T, since it would also
change the type in the contents of /proc/kallsyms. So some guidance
would be appreciated here.

So I agree that it would be preferred to have a single code path, but
I would need some help validating it on architectures I don't have
access to.

Thanks,
Ard.


>> The idea is that on 64-bit builds, it is rather wasteful to use absolute
>> addressing for kernel symbols since they are all within a couple of MBs
>> of each other. On top of that, the absolute addressing implies that, when
>> the kernel is relocated at runtime, each address in the table needs to be
>> fixed up individually.
>>
>> Since all section-relative addresses are already emitted relative to _text,
>> it is quite straight-forward to record only the offset, and add the absolute
>> address of _text at runtime when referring to the address table.
>>
>> The reduction ranges from around 250 KB uncompressed vmlinux size and 10 KB
>> compressed size (s390) to 3 MB/500 KB for ppc64 (although, in the latter 
>> case,
>> the reduction in uncompressed size is primarily __init data)
>>
>> Kees Cook was so kind to test these against x86_64, and confirmed that KASLR
>> still operates as expected.
>>
>> Ard Biesheuvel (4):
>>   kallsyms: add support for relative offsets in kallsyms address table
>>   powerpc: enable text relative kallsyms for ppc64
>>   s390: enable text relative kallsyms for 64-bit targets
>>   x86_64: enable text relative kallsyms for 64-bit targets
>>
>>  arch/powerpc/Kconfig|  1 +
>>  arch/s390/Kconfig   |  1 +
>>  arch/x86/Kconfig|  1 +
>>  init/Kconfig| 14 
>>  kernel/kallsyms.c   | 35 +-
>>  scripts/kallsyms.c  | 38 +---
>>  scripts/link-vmlinux.sh |  4 +++
>>  scripts/namespace.pl|  1 +
>>  8 files changed, 82 insertions(+), 13 deletions(-)
>>
>> --
>> 2.5.0


Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-20 Thread Michel Dänzer
On 21.01.2016 15:38, Michel Dänzer wrote:
> On 21.01.2016 14:31, Mario Kleiner wrote:
>> On 01/21/2016 04:43 AM, Michel Dänzer wrote:
>>> On 21.01.2016 05:32, Mario Kleiner wrote:

 So the problem is that AMDs hardware frame counters reset to
 zero during a modeset. The old DRM code dealt with drivers doing that by
 keeping vblank irqs enabled during modesets and incrementing vblank
 count by one during each vblank irq, i think that's what
 drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
>>>
>>> Right, looks like there's been a regression breaking this. I suspect the
>>> problem is that vblank->last isn't getting updated from
>>> drm_vblank_post_modeset. Not sure which change broke that though, or how
>>> to fix it. Ville?
>>>
>>
>> The whole logic has changed and the software counter updates are now
>> driven all the time by the hw counter.
>>
>>>
>>> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
>>> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
>>> vblank counters"). I've been meaning to track that down since then; one
>>> of these days hopefully, but if anybody has any ideas offhand...
>>
>> I spent the last few hours reading through the drm and radeon code and i
>> think what should probably work is to replace the
>> drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
>> calls. These are apparently meant for drivers whose hw counters reset
>> during modeset, [...]
> 
> ... just like drm_vblank_pre/post_modeset. That those were broken is a
> regression which needs to be fixed anyway. I don't think switching to
> drm_vblank_on/off is suitable for stable trees.

Even more so since as I mentioned, there is (has been since at least
about half a year ago) a counter jumping bug with drm_vblank_on/off as well.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer


Re: [PATCH V2 3/3] vhost_net: basic polling support

2016-01-20 Thread Yang Zhang

On 2016/1/21 13:13, Michael S. Tsirkin wrote:

On Thu, Jan 21, 2016 at 10:11:35AM +0800, Yang Zhang wrote:

On 2016/1/20 22:35, Michael S. Tsirkin wrote:

On Tue, Dec 01, 2015 at 02:39:45PM +0800, Jason Wang wrote:

This patch tries to poll for new added tx buffer or socket receive
queue for a while at the end of tx/rx processing. The maximum time
spent on polling were specified through a new kind of vring ioctl.

Signed-off-by: Jason Wang 
---
  drivers/vhost/net.c| 72 ++
  drivers/vhost/vhost.c  | 15 ++
  drivers/vhost/vhost.h  |  1 +
  include/uapi/linux/vhost.h | 11 +++
  4 files changed, 94 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 9eda69e..ce6da77 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -287,6 +287,41 @@ static void vhost_zerocopy_callback(struct ubuf_info 
*ubuf, bool success)
rcu_read_unlock_bh();
  }

+static inline unsigned long busy_clock(void)
+{
+   return local_clock() >> 10;
+}
+
+static bool vhost_can_busy_poll(struct vhost_dev *dev,
+   unsigned long endtime)
+{
+   return likely(!need_resched()) &&
+  likely(!time_after(busy_clock(), endtime)) &&
+  likely(!signal_pending(current)) &&
+  !vhost_has_work(dev) &&
+  single_task_running();
+}
+
+static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
+   struct vhost_virtqueue *vq,
+   struct iovec iov[], unsigned int iov_size,
+   unsigned int *out_num, unsigned int *in_num)
+{
+   unsigned long uninitialized_var(endtime);
+
+   if (vq->busyloop_timeout) {
+   preempt_disable();
+   endtime = busy_clock() + vq->busyloop_timeout;
+   while (vhost_can_busy_poll(vq->dev, endtime) &&
+  !vhost_vq_more_avail(vq->dev, vq))
+   cpu_relax();
+   preempt_enable();
+   }


Isn't there a way to call all this after vhost_get_vq_desc?
First, this will reduce the good path overhead as you
won't have to play with timers and preemption.

Second, this will reduce the chance of a pagefault on avail ring read.


+
+   return vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+out_num, in_num, NULL, NULL);
+}
+
  /* Expects to be always run from workqueue - which acts as
   * read-size critical section for our kind of RCU. */
  static void handle_tx(struct vhost_net *net)
@@ -331,10 +366,9 @@ static void handle_tx(struct vhost_net *net)
  % UIO_MAXIOV == nvq->done_idx))
break;

-   head = vhost_get_vq_desc(vq, vq->iov,
-ARRAY_SIZE(vq->iov),
-, ,
-NULL, NULL);
+   head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
+   ARRAY_SIZE(vq->iov),
+   , );
/* On error, stop handling until the next kick. */
if (unlikely(head < 0))
break;
@@ -435,6 +469,34 @@ static int peek_head_len(struct sock *sk)
return len;
  }

+static int vhost_net_peek_head_len(struct vhost_net *net, struct sock *sk)


Need a hint that it's rx related in the name.


+{
+   struct vhost_net_virtqueue *nvq = >vqs[VHOST_NET_VQ_TX];
+   struct vhost_virtqueue *vq = >vq;
+   unsigned long uninitialized_var(endtime);
+
+   if (vq->busyloop_timeout) {
+   mutex_lock(>mutex);


This appears to be called under vq mutex in handle_rx.
So how does this work then?



+   vhost_disable_notify(>dev, vq);


This appears to be called after disable notify
in handle_rx - so why disable here again?


+
+   preempt_disable();
+   endtime = busy_clock() + vq->busyloop_timeout;
+
+   while (vhost_can_busy_poll(>dev, endtime) &&
+  skb_queue_empty(>sk_receive_queue) &&
+  !vhost_vq_more_avail(>dev, vq))
+   cpu_relax();


This seems to mix in several items.
RX queue is normally not empty. I don't think
we need to poll for that.


I have seen the RX queue is easy to be empty under some extreme conditions
like lots of small packet. So maybe the check is useful here.


It's not useful *here*.
If you have an rx packet but no space in the ring,
this will exit immediately.


Indeed!



It might be useful elsewhere but I doubt it -
if rx ring is out of buffers, you are better off
backing out and giving guest some breathing space.


--
best regards
yang



--
best regards
yang


Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-20 Thread Michel Dänzer
On 21.01.2016 14:31, Mario Kleiner wrote:
> On 01/21/2016 04:43 AM, Michel Dänzer wrote:
>> On 21.01.2016 05:32, Mario Kleiner wrote:
>>>
>>> So the problem is that AMDs hardware frame counters reset to
>>> zero during a modeset. The old DRM code dealt with drivers doing that by
>>> keeping vblank irqs enabled during modesets and incrementing vblank
>>> count by one during each vblank irq, i think that's what
>>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
>>
>> Right, looks like there's been a regression breaking this. I suspect the
>> problem is that vblank->last isn't getting updated from
>> drm_vblank_post_modeset. Not sure which change broke that though, or how
>> to fix it. Ville?
>>
> 
> The whole logic has changed and the software counter updates are now
> driven all the time by the hw counter.
> 
>>
>> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
>> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
>> vblank counters"). I've been meaning to track that down since then; one
>> of these days hopefully, but if anybody has any ideas offhand...
> 
> I spent the last few hours reading through the drm and radeon code and i
> think what should probably work is to replace the
> drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
> calls. These are apparently meant for drivers whose hw counters reset
> during modeset, [...]

... just like drm_vblank_pre/post_modeset. That those were broken is a
regression which needs to be fixed anyway. I don't think switching to
drm_vblank_on/off is suitable for stable trees.

Looking at Vlastimil's original post again, I'd say the most likely
culprit is 4dfd6486 ("drm: Use vblank timestamps to guesstimate how many
vblanks were missed").


> Once drm_vblank_off is called, drm_vblank_get will no-op and return an
> error, so clients can't enable vblank irqs during the modeset - pageflip
> ioctl and waitvblank ioctl would fail while a modeset happens -
> hopefully userspace handles this correctly everywhere.

We've fixed xf86-video-ati for this.


> I'll hack up a patch for demonstration now.

You're a bit late to that party. :)

http://lists.freedesktop.org/archives/dri-devel/2015-May/083614.html
http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer


[PATCH 1/2] scsi: Do not attach VPD to devices that don't support it

2016-01-20 Thread Alexander Duyck
The patch "scsi: rescan VPD attributes" introduced a regression in which
devices that don't support VPD were being scanned for VPD attributes
anyway.  This could cause issues for this parts and should be avoided so
the check for scsi_level has been moved out of scsi_add_lun and into
scsi_attach_vpd so that all callers will not scan VPD for devices that
don't support it.

Fixes: 09e2b0b14690 ("scsi: rescan VPD attributes")
Signed-off-by: Alexander Duyck 
---
 drivers/scsi/scsi.c  |3 +++
 drivers/scsi/scsi_scan.c |3 +--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index b1bf42b93fcc..ed085e78c893 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -784,6 +784,9 @@ void scsi_attach_vpd(struct scsi_device *sdev)
int pg83_supported = 0;
unsigned char __rcu *vpd_buf, *orig_vpd_buf = NULL;
 
+   if (sdev->scsi_level < SCSI_3)
+   return;
+
if (sdev->skip_vpd_pages)
return;
 retry_pg0:
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 6a820668d442..1b16c89e0cf9 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -986,8 +986,7 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned 
char *inq_result,
}
}
 
-   if (sdev->scsi_level >= SCSI_3)
-   scsi_attach_vpd(sdev);
+   scsi_attach_vpd(sdev);
 
sdev->max_queue_depth = sdev->queue_depth;
 



[PATCH 2/2] scsi: Fix RCU handling for VPD pages

2016-01-20 Thread Alexander Duyck
This patch is meant to fix the RCU handling for VPD pages.  The original
code had a number of issues including the fact that the local variables
were being declared as __rcu, the RCU variable being directly accessed
outside of the RCU locked region, and the fact that length was not
associated with the data so it would be possible to get a mix and match of
the length for one VPD page with the data from another.

Fixes: 09e2b0b14690 ("scsi: rescan VPD attributes")
Signed-off-by: Alexander Duyck 
---
 drivers/scsi/scsi.c|   52 +++-
 drivers/scsi/scsi_lib.c|   12 +-
 drivers/scsi/scsi_sysfs.c  |   14 +++-
 include/scsi/scsi_device.h |   14 
 4 files changed, 50 insertions(+), 42 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index ed085e78c893..143b384fd145 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -782,7 +782,7 @@ void scsi_attach_vpd(struct scsi_device *sdev)
int vpd_len = SCSI_VPD_PG_LEN;
int pg80_supported = 0;
int pg83_supported = 0;
-   unsigned char __rcu *vpd_buf, *orig_vpd_buf = NULL;
+   unsigned char *vpd_buf;
 
if (sdev->scsi_level < SCSI_3)
return;
@@ -816,58 +816,60 @@ retry_pg0:
vpd_len = SCSI_VPD_PG_LEN;
 
if (pg80_supported) {
+   struct scsi_vpd_pg *vpd, *orig_vpd;
 retry_pg80:
-   vpd_buf = kmalloc(vpd_len, GFP_KERNEL);
-   if (!vpd_buf)
+   vpd = kmalloc(sizeof(*vpd) + vpd_len, GFP_KERNEL);
+   if (!vpd)
return;
 
-   result = scsi_vpd_inquiry(sdev, vpd_buf, 0x80, vpd_len);
+   result = scsi_vpd_inquiry(sdev, vpd->buf, 0x80, vpd_len);
if (result < 0) {
-   kfree(vpd_buf);
+   kfree(vpd);
return;
}
if (result > vpd_len) {
vpd_len = result;
-   kfree(vpd_buf);
+   kfree(vpd);
goto retry_pg80;
}
+   vpd->len = result;
+
mutex_lock(>inquiry_mutex);
-   orig_vpd_buf = sdev->vpd_pg80;
-   sdev->vpd_pg80_len = result;
-   rcu_assign_pointer(sdev->vpd_pg80, vpd_buf);
+   orig_vpd = rcu_dereference_protected(sdev->vpd_pg80, 1);
+   rcu_assign_pointer(sdev->vpd_pg80, vpd);
mutex_unlock(>inquiry_mutex);
-   synchronize_rcu();
-   if (orig_vpd_buf) {
-   kfree(orig_vpd_buf);
-   orig_vpd_buf = NULL;
-   }
+
+   if (orig_vpd)
+   kfree_rcu(orig_vpd, rcu);
vpd_len = SCSI_VPD_PG_LEN;
}
 
if (pg83_supported) {
+   struct scsi_vpd_pg *vpd, *orig_vpd;
 retry_pg83:
-   vpd_buf = kmalloc(vpd_len, GFP_KERNEL);
-   if (!vpd_buf)
+   vpd = kmalloc(sizeof(*vpd) + vpd_len, GFP_KERNEL);
+   if (!vpd)
return;
 
-   result = scsi_vpd_inquiry(sdev, vpd_buf, 0x83, vpd_len);
+   result = scsi_vpd_inquiry(sdev, vpd->buf, 0x83, vpd_len);
if (result < 0) {
-   kfree(vpd_buf);
+   kfree(vpd);
return;
}
if (result > vpd_len) {
vpd_len = result;
-   kfree(vpd_buf);
+   kfree(vpd);
goto retry_pg83;
}
+   vpd->len = result;
+
mutex_lock(>inquiry_mutex);
-   orig_vpd_buf = sdev->vpd_pg83;
-   sdev->vpd_pg83_len = result;
-   rcu_assign_pointer(sdev->vpd_pg83, vpd_buf);
+   orig_vpd = rcu_dereference_protected(sdev->vpd_pg83, 1);
+   rcu_assign_pointer(sdev->vpd_pg83, vpd);
mutex_unlock(>inquiry_mutex);
-   synchronize_rcu();
-   if (orig_vpd_buf)
-   kfree(orig_vpd_buf);
+
+   if (orig_vpd)
+   kfree_rcu(orig_vpd, rcu);
}
 }
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index fa6b2c4eb7a2..e44f66bc4c90 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3175,7 +3175,7 @@ int scsi_vpd_lun_id(struct scsi_device *sdev, char *id, 
size_t id_len)
u8 cur_id_type = 0xff;
u8 cur_id_size = 0;
unsigned char *d, *cur_id_str;
-   unsigned char __rcu *vpd_pg83;
+   struct scsi_vpd_pg *vpd_pg83;
int id_size = -EINVAL;
 
rcu_read_lock();
@@ -3205,8 +3205,8 @@ int scsi_vpd_lun_id(struct scsi_device *sdev, char *id, 
size_t id_len)
}
 
memset(id, 0, id_len);
-   d = vpd_pg83 + 4;
-   

[PATCH 0/2] scsi: Fix endless loop of ATA hard resets due to VPD reads

2016-01-20 Thread Alexander Duyck
Recent changes to the kernel pulled in during the merge window have
resulted in my system generating an endless loop of the following type of
errors:

[  318.965756] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[  318.968457] ata14.00: configured for UDMA/66
[  318.970656] ata14: EH complete
[  318.984366] ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[  318.986854] ata14.00: irq_stat 0x4001
[  318.989138] ata14.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 22 dma 
16640 in
 Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 
0x3 (HSM violation)
[  318.995986] ata14: hard resetting link

I bisected the issue and found the patch responsible for the issue was
commit 09e2b0b14690 "scsi: rescan VPD attributes".  This commit contained
several issues.

First, the commit had changed the behavior in terms of what devices we
called scsi_attach_vpd() for.  As a result we were calling it for devices
that didn't support a scsi_level of 6, SCSI 3, so VPD accesses could
result in errors.

Second, the commit as well as a follow-on patch for it contained a number
of RCU errors.  Specifically the code was structured such that we had
accesses outside of RCU locked regions, and repeated use of the RCU
protected pointer without using the proper accessors.  As such it was
possible to get into a serious corruption situation should a pointer be
updated.

Ultimately neither of these bugs were my root cause.  It turns out the
Marvel Console SCSI device in my system needed to have a flag set to
disable VPD access in order to keep things from looping through the error
repeatedly.  In order to resolve it I had to add the kernel parameter
"scsi_mod.dev_flags=Marvell:Console:0x400".  This allowed my system to
boot without any errors, however the first two issues described above are
still relevent so I thought I would provide the patches since I had already
written them up.

---

Alexander Duyck (2):
  scsi: Do not attach VPD to devices that don't support it
  scsi: Fix RCU handling for VPD pages


 drivers/scsi/scsi.c|   55 
 drivers/scsi/scsi_lib.c|   12 +-
 drivers/scsi/scsi_scan.c   |3 +-
 drivers/scsi/scsi_sysfs.c  |   14 ++-
 include/scsi/scsi_device.h |   14 +++
 5 files changed, 54 insertions(+), 44 deletions(-)

--


[PATCH v5 2/5] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time.

2016-01-20 Thread Tang Chen
From: Gu Zheng 

[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue 
caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is 
established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs 
*attrs){
...
/* if cpumask is contained inside a NUMA node, we belong to that node */
if (wq_numa_enabled) {
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,
   wq_numa_possible_cpumask[node])) {
pool->node = node;
break;
}
}
}

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline 
node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min 
order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
struct worker *worker;

worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, 
useing the wrong node.

..

return worker;
}

[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id) <->   apicid
4. cpuid (logical cpu id) <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> 
pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at 
boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is 
also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for 
other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not 
persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> 
apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following 
steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to 
let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And 
also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping 
when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
   This is also done by introducing an extra parameter to these apis to let the 
caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.

This patch finished step 1.

Signed-off-by: Gu Zheng 
Signed-off-by: Tang Chen 
---
 arch/x86/kernel/apic/apic.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 8a5cdda..1625778 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1998,7 

RE: [PATCH V5 1/1] NTB: Add support for AMD PCI-Express Non-Transparent Bridge

2016-01-20 Thread Yu, Xiangliang
> From: Yu, Xiangliang [mailto:xiangliang...@amd.com]
> > > > Signed-off-by: Jon Mason 
> > > > Signed-off-by: Allen Hubbe 
> > >
> > > NO.
> >
> > Ok, I'll change it if you doesn't want to change it.
> 
> Nah, just remember it for next time...
> 
> I'm satisfied with this v5.
> 
> Reviewed-by: Allen Hubbe 

Ok, I'll change it and resend V5

> > I don’t think so. In here, the i/o memory is only happened when
> > pci_iomap return Success, so the register can't be accessed through IO
> > port way. And
> > ioread* will
> > Check if the memory type is mmio type or IO port type (please see the
> > definition).
> >  I don’t think we need to check It, so I use read* because It can make
> > more efficient.
> > I think we need to think about actual usage, not only follow book.
> > And, I have said it in previous version, I don’t like explain it
> > again, and again.
> > If you have any concern, please tell me after my comment.
> 
> It's not more efficient, on this platform it's the same.
> 
> If it were my driver I would change it... but you can keep it this way.

Because my previous SATA experience, I'd like to use this way.

> > > This is different from v4.  It used to be:
> >
> > Because peer_sta is change to 0, so amd_link_is_up will return 0
> > (offline)
> > And will not check hardware link status. So It maybe make it offline
> > forever
> 
> It fixed a bug?  Great!
> 
> > > I'm nervous about ndev->peer_sta, the behavior of link_is_up,
> > > timers...
> >
> > Actually, the code is designed according to Atom NTB, except for the
> > peer_sta.
> 
> Except for peer_sta, and that's a pretty critical design change.  I'm still
> nervous, but I'll trust that you have been able to test this behavior
> thourougly.

Yes, the part of code will be changed in future because hardware design is
Being changed  too.

> 
> > I'll add the explaination when having changes.



[PATCH v5 3/5] x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store persistent cpuid <-> apicid mapping.

2016-01-20 Thread Tang Chen
From: Gu Zheng 

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 2.

In this patch, we introduce a new static array named cpuid_to_apicid[],
which is large enough to store info for all possible cpus.

And then, we modify the cpuid calculation. In generic_processor_info(),
it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
mapping changes with node hotplug.

After this patch, we find the next unused cpuid, map it to an apicid,
and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid
mapping will be persistent.

And finally we will use this array to make cpuid <-> nodeid persistent.

cpuid <-> apicid mapping is established at local apic registeration time.
But non-present or disabled cpus are ignored.

In this patch, we establish all possible cpuid <-> apicid mapping when
registering local apic.

Signed-off-by: Gu Zheng 
Signed-off-by: Tang Chen 
---
 arch/x86/include/asm/mpspec.h |  1 +
 arch/x86/kernel/acpi/boot.c   |  6 ++---
 arch/x86/kernel/apic/apic.c   | 61 ---
 3 files changed, 61 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index b07233b..db902d8 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -86,6 +86,7 @@ static inline void early_reserve_e820_mpc_new(void) { }
 #endif
 
 int generic_processor_info(int apicid, int version);
+int __generic_processor_info(int apicid, int version, bool enabled);
 
 #define PHYSID_ARRAY_SIZE  BITS_TO_LONGS(MAX_LOCAL_APIC)
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index e759076..0ce06ee 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -174,15 +174,13 @@ static int acpi_register_lapic(int id, u8 enabled)
return -EINVAL;
}
 
-   if (!enabled) {
+   if (!enabled)
++disabled_cpus;
-   return -EINVAL;
-   }
 
if (boot_cpu_physical_apicid != -1U)
ver = apic_version[boot_cpu_physical_apicid];
 
-   return generic_processor_info(id, ver);
+   return __generic_processor_info(id, ver, enabled);
 }
 
 static int __init
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 1625778..4822cda 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1998,7 +1998,53 @@ void disconnect_bsp_APIC(int virt_wire_setup)
apic_write(APIC_LVT1, value);
 }
 
-static int __generic_processor_info(int apicid, int version, bool enabled)
+/*
+ * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
+ * contiguously, it equals to current allocated max logical CPU ID plus 1.
+ * All allocated CPU ID should be in [0, nr_logical_cpuidi), so the maximum of
+ * nr_logical_cpuids is nr_cpu_ids.
+ *
+ * NOTE: Reserve 0 for BSP.
+ */
+static int nr_logical_cpuids = 1;
+
+/*
+ * Used to store mapping between logical CPU IDs and APIC IDs.
+ */
+static int cpuid_to_apicid[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+/*
+ * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
+ * and cpuid_to_apicid[] synchronized.
+ */
+static int allocate_logical_cpuid(int apicid)
+{
+   int i;
+
+   /*
+* cpuid <-> apicid mapping is persistent, so when a cpu is up,
+* check if the kernel has allocated a cpuid for it.
+*/
+   for (i = 0; i < nr_logical_cpuids; i++) {
+   if (cpuid_to_apicid[i] == apicid)
+   return i;
+   }
+
+   /* Allocate a new cpuid. */
+   if (nr_logical_cpuids >= nr_cpu_ids) {
+   WARN_ONCE(1, "Only %d processors supported."
+"Processor %d/0x%x and the rest are ignored.\n",
+nr_cpu_ids - 1, nr_logical_cpuids, apicid);
+   return -1;
+   }
+
+   cpuid_to_apicid[nr_logical_cpuids] = apicid;
+   return nr_logical_cpuids++;
+}
+
+int __generic_processor_info(int apicid, int version, bool enabled)
 {
int cpu, max = nr_cpu_ids;
bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
@@ -2079,8 +2125,17 @@ static int __generic_processor_info(int apicid, int 
version, bool enabled)
 * for BSP.
 */
cpu = 0;
-   } else
-   cpu = cpumask_next_zero(-1, cpu_present_mask);
+
+   /* Logical cpuid 0 is reserved for BSP. */
+   

[PATCH v5 4/5] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.

2016-01-20 Thread Tang Chen
From: Gu Zheng 

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 3.

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm(persistent)
2. apicid (physical cpu id)   <->   nodeid (persistent)
3. cpuid (logical cpu id) <->   apicid (not persistent, now persistent 
by step 2)
4. cpuid (logical cpu id) <->   nodeid (not persistent)

So, in order to setup persistent cpuid <-> nodeid mapping for all possible CPUs,
we should:
1. Setup cpuid <-> apicid mapping for all possible CPUs, which has been done in 
step 1, 2.
2. Setup cpuid <-> nodeid mapping for all possible CPUs. But before that, we 
should
   obtain all apicids from MADT.

All processors' apicids can be obtained by _MAT method or from MADT in ACPI.
The current code ignores disabled processors and returns -ENODEV.

After this patch, a new parameter will be added to MADT APIs so that caller
is able to control if disabled processors are ignored.

Signed-off-by: Gu Zheng 
Signed-off-by: Tang Chen 
---
 drivers/acpi/acpi_processor.c |  5 +++-
 drivers/acpi/processor_core.c | 57 +++
 2 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 6979186..d30111a 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -300,8 +300,11 @@ static int acpi_processor_get_info(struct acpi_device 
*device)
 *  Extra Processor objects may be enumerated on MP systems with
 *  less than the max # of CPUs. They should be ignored _iff
 *  they are physically not present.
+*
+*  NOTE: Even if the processor has a cpuid, it may not present because
+*  cpuid <-> apicid mapping is persistent now.
 */
-   if (invalid_logical_cpuid(pr->id)) {
+   if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
int ret = acpi_processor_hotadd_init(pr);
if (ret)
return ret;
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 33a38d6..824b98b 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -32,12 +32,12 @@ static struct acpi_table_madt *get_madt_table(void)
 }
 
 static int map_lapic_id(struct acpi_subtable_header *entry,
-u32 acpi_id, phys_cpuid_t *apic_id)
+u32 acpi_id, phys_cpuid_t *apic_id, bool ignore_disabled)
 {
struct acpi_madt_local_apic *lapic =
container_of(entry, struct acpi_madt_local_apic, header);
 
-   if (!(lapic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(lapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (lapic->processor_id != acpi_id)
@@ -48,12 +48,13 @@ static int map_lapic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_x2apic_id(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+   bool ignore_disabled)
 {
struct acpi_madt_local_x2apic *apic =
container_of(entry, struct acpi_madt_local_x2apic, header);
 
-   if (!(apic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(apic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (device_declaration && (apic->uid == acpi_id)) {
@@ -65,12 +66,13 @@ static int map_x2apic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_lsapic_id(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+   bool ignore_disabled)
 {
struct acpi_madt_local_sapic *lsapic =
container_of(entry, struct acpi_madt_local_sapic, header);
 
-   if (!(lsapic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(lsapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (device_declaration) {
@@ -87,12 +89,13 @@ static int map_lsapic_id(struct acpi_subtable_header *entry,
  * Retrieve the ARM CPU physical identifier (MPIDR)
  */
 static int map_gicc_mpidr(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr)
+   int device_declaration, u32 acpi_id, 

[PATCH v5 5/5] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting.

2016-01-20 Thread Tang Chen
From: Gu Zheng 

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 4.

This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled
processors at boot time via an additional acpi namespace walk for processors.

Signed-off-by: Gu Zheng 
Signed-off-by: Tang Chen 
---
 arch/ia64/kernel/acpi.c   |  2 +-
 arch/x86/kernel/acpi/boot.c   |  2 +-
 drivers/acpi/bus.c|  3 ++
 drivers/acpi/processor_core.c | 65 +++
 include/linux/acpi.h  |  2 ++
 5 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index b1698bc..7db5563 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
  *  ACPI based hotplug CPU support
  */
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
-static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
/*
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 0ce06ee..7d45261 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -696,7 +696,7 @@ static void __init acpi_set_irq_model_ioapic(void)
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 #include 
 
-static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
int nid;
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 891c42d..d92f45f 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1096,6 +1096,9 @@ static int __init acpi_init(void)
acpi_sleep_proc_init();
acpi_wakeup_device_init();
acpi_debugger_init();
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+   acpi_set_processor_mapping();
+#endif
return 0;
 }
 
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 824b98b..45580ff 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -261,6 +261,71 @@ int acpi_get_cpuid(acpi_handle handle, int type, u32 
acpi_id)
 }
 EXPORT_SYMBOL_GPL(acpi_get_cpuid);
 
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+static bool map_processor(acpi_handle handle, int *phys_id, int *cpuid)
+{
+   int type;
+   u32 acpi_id;
+   acpi_status status;
+   acpi_object_type acpi_type;
+   unsigned long long tmp;
+   union acpi_object object = { 0 };
+   struct acpi_buffer buffer = { sizeof(union acpi_object),  };
+
+   status = acpi_get_type(handle, _type);
+   if (ACPI_FAILURE(status))
+   return false;
+
+   switch (acpi_type) {
+   case ACPI_TYPE_PROCESSOR:
+   status = acpi_evaluate_object(handle, NULL, NULL, );
+   if (ACPI_FAILURE(status))
+   return false;
+   acpi_id = object.processor.proc_id;
+   break;
+   case ACPI_TYPE_DEVICE:
+   status = acpi_evaluate_integer(handle, "_UID", NULL, );
+   if (ACPI_FAILURE(status))
+   return false;
+   acpi_id = tmp;
+   break;
+   default:
+   return false;
+   }
+
+   type = (acpi_type == ACPI_TYPE_DEVICE) ? 1 : 0;
+
+   *phys_id = __acpi_get_phys_id(handle, type, acpi_id, false);
+   *cpuid = acpi_map_cpuid(*phys_id, acpi_id);
+   if (*cpuid == -1)
+   return false;
+
+   return true;
+}
+
+static acpi_status __init
+set_processor_node_mapping(acpi_handle handle, u32 lvl, void *context,
+  void **rv)
+{
+   u32 apic_id;
+   int cpu_id;
+
+   if (!map_processor(handle, _id, _id))
+   return AE_ERROR;
+
+   acpi_map_cpu2node(handle, cpu_id, apic_id);
+   return AE_OK;
+}
+
+void __init acpi_set_processor_mapping(void)
+{
+   /* Set persistent cpu <-> node mapping for all processors. */
+   acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+   ACPI_UINT32_MAX, set_processor_node_mapping,
+   NULL, NULL, NULL);
+}
+#endif
+
 #ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
 static int get_ioapic_id(struct acpi_subtable_header *entry, u32 gsi_base,
 u64 *phys_addr, int *ioapic_id)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 06ed7e5..080755a 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -265,6 

[PATCH v5 0/5] Make cpuid <-> nodeid mapping persistent.

2016-01-20 Thread Tang Chen
[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue 
caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is 
established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs 
*attrs){
...
/* if cpumask is contained inside a NUMA node, we belong to that node */
if (wq_numa_enabled) {
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,
   wq_numa_possible_cpumask[node])) {
pool->node = node;
break;
}
}
}

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline 
node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min 
order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
struct worker *worker;

worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, 
useing the wrong node.

..

return worker;
}


[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id) <->   apicid
4. cpuid (logical cpu id) <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> 
pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at 
boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is 
also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for 
other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not 
persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> 
apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following 
steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to 
let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And 
also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping 
when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
   This is also done by introducing an extra parameter to these apis to let the 
caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.


For previous discussion, please refer to:
https://lkml.org/lkml/2015/2/27/145
https://lkml.org/lkml/2015/3/25/989
https://lkml.org/lkml/2015/5/14/244
https://lkml.org/lkml/2015/7/7/200
https://lkml.org/lkml/2015/9/27/209

Change log v4 -> v5:
1. Remove useless code in patch 1.
2. Small improvement of commit message.

Change log v3 -> v4:
1. Fix the kernel panic at boot time. The cause is that 

[PATCH v5 1/5] x86, memhp, numa: Online memory-less nodes at boot time.

2016-01-20 Thread Tang Chen
For now, x86 does not support memory-less node. A node without memory
will not be onlined, and the cpus on it will be mapped to the other
online nodes with memory in init_cpu_to_node(). The reason of doing this
is to ensure each cpu has mapped to a node with memory, so that it will
be able to allocate local memory for that cpu.

But we don't have to do it in this way.

In this series of patches, we are going to construct cpu <-> node mapping
for all possible cpus at boot time, which is a 1-1 mapping. It means the
cpu will be mapped to the node it belongs to, and will never be changed.
If a node has only cpus but no memory, the cpus on it will be mapped to
a memory-less node. And the memory-less node should be onlined.

This patch allocate pgdats for all memory-less nodes and online them at
boot time. Then build zonelists for these nodes. As a result, when cpus
on these memory-less nodes try to allocate memory from local node, it
will automatically fall back to the proper zones in the zonelists.
---
 arch/x86/mm/numa.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index c3b3f65..010edb4 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -704,22 +704,19 @@ void __init x86_numa_init(void)
numa_init(dummy_numa_init);
 }
 
-static __init int find_near_online_node(int node)
+static void __init init_memory_less_node(int nid)
 {
-   int n, val;
-   int min_val = INT_MAX;
-   int best_node = -1;
+   unsigned long zones_size[MAX_NR_ZONES] = {0};
+   unsigned long zholes_size[MAX_NR_ZONES] = {0};
 
-   for_each_online_node(n) {
-   val = node_distance(node, n);
+   /* Allocate and initialize node data. Memory-less node is now online.*/
+   alloc_node_data(nid);
+   free_area_init_node(nid, zones_size, 0, zholes_size);
 
-   if (val < min_val) {
-   min_val = val;
-   best_node = n;
-   }
-   }
-
-   return best_node;
+   /*
+* All zonelists will be built later in start_kernel() after per cpu
+* areas are initialized.
+*/
 }
 
 /*
@@ -748,8 +745,10 @@ void __init init_cpu_to_node(void)
 
if (node == NUMA_NO_NODE)
continue;
+
if (!node_online(node))
-   node = find_near_online_node(node);
+   init_memory_less_node(node);
+
numa_set_node(cpu, node);
}
 }
-- 
1.9.3





[PATCH] x86: use enum cpuid_leafs instead of magic numbers

2016-01-20 Thread Huaitong Han
Signed-off-by: Huaitong Han 
---
 arch/x86/include/asm/elf.h | 2 +-
 arch/x86/kernel/mpparse.c  | 2 +-
 arch/x86/lguest/boot.c | 2 +-
 arch/x86/xen/enlighten.c   | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 1514753..15340e3 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -256,7 +256,7 @@ extern int force_personality32;
instruction set this CPU supports.  This could be done in user space,
but it's not easy, and we've already done it here.  */
 
-#define ELF_HWCAP  (boot_cpu_data.x86_capability[0])
+#define ELF_HWCAP  (boot_cpu_data.x86_capability[CPUID_1_EDX])
 
 /* This yields a string that ld.so will use to load implementation
specific libraries for optimization.  This is more specific in
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index 30ca760..97340f2 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -408,7 +408,7 @@ static inline void __init construct_default_ISA_mptable(int 
mpc_default_type)
processor.cpuflag = CPU_ENABLED;
processor.cpufeature = (boot_cpu_data.x86 << 8) |
(boot_cpu_data.x86_model << 4) | boot_cpu_data.x86_mask;
-   processor.featureflag = boot_cpu_data.x86_capability[0];
+   processor.featureflag = boot_cpu_data.x86_capability[CPUID_1_EDX];
processor.reserved[0] = 0;
processor.reserved[1] = 0;
for (i = 0; i < 2; i++) {
diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index 4ba229a..a9033ae 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -1535,7 +1535,7 @@ __init void lguest_init(void)
 */
cpu_detect(_cpu_data);
/* head.S usually sets up the first capability word, so do it here. */
-   new_cpu_data.x86_capability[0] = cpuid_edx(1);
+   new_cpu_data.x86_capability[CPUID_1_EDX] = cpuid_edx(1);
 
/* Math is always hard! */
set_cpu_cap(_cpu_data, X86_FEATURE_FPU);
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index d09e4c9..2c26108 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1654,7 +1654,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
cpu_detect(_cpu_data);
set_cpu_cap(_cpu_data, X86_FEATURE_FPU);
new_cpu_data.wp_works_ok = 1;
-   new_cpu_data.x86_capability[0] = cpuid_edx(1);
+   new_cpu_data.x86_capability[CPUID_1_EDX] = cpuid_edx(1);
 #endif
 
if (xen_start_info->mod_start) {
-- 
2.4.3



Re: [PATCH v2 1/4] dt-bindings: power: reset: add document for reboot-mode driver

2016-01-20 Thread Andy Yan

Hi Rob:
   thanks for your review.
On 2016年01月21日 02:28, Rob Herring wrote:

On Tue, Jan 12, 2016 at 07:29:49PM +0800, Andy Yan wrote:

add device tree binding document for reboot-mode driver

Signed-off-by: Andy Yan 

---

Changes in v2: None
Changes in v1: None

  .../bindings/power/reset/reboot-mode.txt   | 41 +
  .../bindings/power/reset/syscon-reboot-mode.txt| 52 ++
  2 files changed, 93 insertions(+)
  create mode 100644 
Documentation/devicetree/bindings/power/reset/reboot-mode.txt
  create mode 100644 
Documentation/devicetree/bindings/power/reset/syscon-reboot-mode.txt

diff --git a/Documentation/devicetree/bindings/power/reset/reboot-mode.txt 
b/Documentation/devicetree/bindings/power/reset/reboot-mode.txt
new file mode 100644
index 000..81d9f66
--- /dev/null
+++ b/Documentation/devicetree/bindings/power/reset/reboot-mode.txt
@@ -0,0 +1,41 @@
+Generic reboot mode core map driver
+
+This driver get reboot mode arguments and call the write
+interface to stores the magic value in special register
+or ram . Then the bootloader can read it and take different
+action according to the argument stored.
+
+Required properties:
+- compatible: only support "syscon-reboot-mode" now.
+
+Each mode is represented as a sub-node of reboot_mode:
+
+Subnode required properties:
+- linux,mode: reboot mode command,such as "loader", "recovery", "fastboot".
+- loader,magic: magic number for the mode, this is vendor specific.
+
+Example:
+   reboot_mode {

reboot-mode instead please.



Sorry, I have already correct it in DT file, forget it here. It 
will be changed in next version.



+   compatible = "syscon-reboot-mode";
+   offset = <0x40>;

This doc by itself is a little confusing. For example, is a child of the
syscon node? I would remove offset (and perhaps compatible) from this
example.


   Yes, is a child of a syscon mapped node. For example, Rockchip 
platform use a register of PMU(rk3066/rk3288) or GRF(rk3036), PMU and 
GRF are aleady mapped by syscon.
   offset and compatible are used by write interface driver like 
syscon-reboot-mode.c. If you don't like it appear in the core map doc, I 
will move it to the syscon-reboot-mode.txt?

+
+   loader {
+   linux,mode = "loader";
+   loader,magic = ;
+   };

Sorry, my previous suggestion was not clear. I'm suggesting get rid of
the subnodes and just do properties like this:

loader = ;
maskrom = ;

That's the same amount of information unless node names and linux,mode
values are going to diverge. Do they need to? I can't see a reason.


Because the command"linux,mode" and value"loader,magic" is vendor 
specific. I don't know what commands and how many mode other platform 
will use. So as John says in his reply, this sort of flexibility help us 
adapt the driver to different hardware/system environments.


We need to be clear what loader means. More specifically, it is boot
into bootloader shell.
Actually, Rockchip platform will reboot into a bootloader download 
mode with this command. This mode can download faster than maskrom 
download mode.

+
+   maskrom {

In theory, the bootrom could have multiple modes. This typically means a
USB download mode. So perhaps a more precise name would be
"rom-download".

In chips I'm familiar with the bootrom mode is selected via a different
mechanism than the secondary bootloader modes, but I suppose the same
mechanism could be used.

   Yes , they use the same mechanism.

+   linux,mode = "maskrom";
+   loader,magic = ;
+   };
+
+   recovery {
+   linux,mode = "recovery";
+   loader,magic = ;
+   };
+
+   fastboot {
+   linux,mode = "fastboot";
+   loader,magic = ;
+   };









Re: vmstat: make vmstat_updater deferrable again and shut down on idle

2016-01-20 Thread Shiraz Hashim
On Wed, Jan 20, 2016 at 8:42 PM, Christoph Lameter  wrote:
> On Wed, 20 Jan 2016, Shiraz Hashim wrote:
>
>> The patch makes vmstat_shepherd deferable which if is quiesed
>> would not schedule vmstat update on other cpus. Wouldn't this
>> aggravate the problem of vmstat for rest cpus not gettng updated.
>
> Its only "deferred" in order to make it at the next tick and not cause an
> extra event. This means that vmstat will run periodically from tick
> processing. It merely causes a synching so that we have one interruption
> that does both.
>
> On idle we fold counters immediately. So there is no loss of accuracy.
>

vmstat is scheduled by shepherd or by itself (conditionally). In case shepherd
is deferred and vmstat doesn't schedule itself, then vmstat needs to wait
for shepherd to be up and then schedule it. This may end up in delayed status
update for all live cpus. Isn't it ?

-- 
regards
Shiraz Hashim


[PATCH] MAINTAINERS: Update mailing list for Renesas ARM64 SoC Development

2016-01-20 Thread Simon Horman
Update the mailing list used for development of support for
ARM64 Renesas SoCs.

This is a follow-up for a similar change for other Renesas SoCs and
drivers uses by Renesas SoCs. The ARM64 SoC entry was not updated in
that patch as it was not yet present in mainline.

The motivation for the mailing list update is that Renesas SoCs are now
much wider than the SH architecture and there is some desire from some for
the linux-sh list to refocus on discussion of the work on the SH
architecture.

Cc: Magnus Damm 
Signed-off-by: Simon Horman 
---
 MAINTAINERS | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 293874ca4d4e..0331ce2a6dc6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1439,8 +1439,8 @@ S:Maintained
 ARM/RENESAS ARM64 ARCHITECTURE
 M: Simon Horman 
 M: Magnus Damm 
-L: linux...@vger.kernel.org
-Q: http://patchwork.kernel.org/project/linux-sh/list/
+L: linux-renesas-...@vger.kernel.org
+Q: http://patchwork.kernel.org/project/linux-renesas-soc/list/
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas.git next
 S: Supported
 F: arch/arm64/boot/dts/renesas/
-- 
2.1.4



RE: [PATCH V5 1/1] NTB: Add support for AMD PCI-Express Non-Transparent Bridge

2016-01-20 Thread Allen Hubbe
From: Yu, Xiangliang [mailto:xiangliang...@amd.com]
> > > Signed-off-by: Jon Mason 
> > > Signed-off-by: Allen Hubbe 
> >
> > NO.
> 
> Ok, I'll change it if you doesn't want to change it.

Nah, just remember it for next time...

I'm satisfied with this v5.

Reviewed-by: Allen Hubbe 

> I don’t think so. In here, the i/o memory is only happened when
> pci_iomap return
> Success, so the register can't be accessed through IO port way. And
> ioread* will
> Check if the memory type is mmio type or IO port type (please see the
> definition).
>  I don’t think we need to check It, so I use read* because It can make
> more efficient.
> I think we need to think about actual usage, not only follow book.
> And, I have said it in previous version, I don’t like explain it again,
> and again.
> If you have any concern, please tell me after my comment.

It's not more efficient, on this platform it's the same.

If it were my driver I would change it... but you can keep it this way.

> > This is different from v4.  It used to be:
> 
> Because peer_sta is change to 0, so amd_link_is_up will return 0
> (offline)
> And will not check hardware link status. So It maybe make it offline
> forever

It fixed a bug?  Great!

> > I'm nervous about ndev->peer_sta, the behavior of link_is_up,
> > timers...
>
> Actually, the code is designed according to Atom NTB, except for the
> peer_sta.

Except for peer_sta, and that's a pretty critical design change.  I'm still 
nervous, but I'll trust that you have been able to test this behavior 
thourougly.

> I'll add the explaination when having changes.

Thanks.

Allen



Re: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2016-01-20 Thread Yang Zhang

On 2016/1/21 14:02, Wu, Feng wrote:




-Original Message-
From: Yang Zhang [mailto:yang.zhang...@gmail.com]
Sent: Thursday, January 21, 2016 1:58 PM
To: Wu, Feng ; pbonz...@redhat.com;
rkrc...@redhat.com
Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
Subject: Re: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver lowest-
priority interrupts



I remember we have discussed that even the LAPIC is software disabled,
it still can respond to some interrupts like INIT, NMI, SMI, and SIPI
messages. Isn't current logic still problematically?


I don't think there are problems, here we only cover lowest-priority mode.


Does Intel SDM said those interrupts cannot be delivered on
lowest-priority mode?


Fixed, Lowest-priority, SMI, NMI, INIT are all "Delivery Mode", once it is
Lowest-priority, it cannot be other type, afaik.


You are correct, I missed it with physical and logical mode. Also, i 
noticed you have the check at the beginning:


+   if (!kvm_lowest_prio_delivery(irq))
+   goto set_irq;

--
best regards
yang


Re: [LKP] [lkp] [spi] 2baed30cb3: BUG: scheduling while atomic: systemd-udevd/134/0x00000002

2016-01-20 Thread Sudip Mukherjee
On Thu, Jan 21, 2016 at 01:47:10PM +0800, Huang, Ying wrote:
> Sudip Mukherjee  writes:
> 
> > On Wed, Jan 20, 2016 at 01:00:40PM +0800, Huang, Ying wrote:
> >> Sudip Mukherjee  writes:
> >> 
> >> > On Wed, Jan 20, 2016 at 08:44:37AM +0800, kernel test robot wrote:

> >
> > I am not able to reproduce this. Tested just with the kernel and
> > yocto-minimal-i386.cgz filesystem and it booted properly.
> >
> > I guess I need atleast your job file to reproduce this.
> 
> This is a boot test so I did not attached the job file.  But the test
> result may depends on specific root file system.  For example, the
> process when BUG report is always systemd-udevd.  Maybe you need a
> systemd based root file system.

So silly of me. Since you said 2baed30cb3, so i kept looking at that
patch.
Can you please test again after reverting:
ebd43516d387 ("Staging: panel: usleep_range is preferred over udelay")

If it solves the problem then I will submit a formal patch.

regards
sudip



RE: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2016-01-20 Thread Wu, Feng


> -Original Message-
> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
> Sent: Thursday, January 21, 2016 1:58 PM
> To: Wu, Feng ; pbonz...@redhat.com;
> rkrc...@redhat.com
> Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
> Subject: Re: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver lowest-
> priority interrupts
> 
> >>
> >> I remember we have discussed that even the LAPIC is software disabled,
> >> it still can respond to some interrupts like INIT, NMI, SMI, and SIPI
> >> messages. Isn't current logic still problematically?
> >
> > I don't think there are problems, here we only cover lowest-priority mode.
> 
> Does Intel SDM said those interrupts cannot be delivered on
> lowest-priority mode?

Fixed, Lowest-priority, SMI, NMI, INIT are all "Delivery Mode", once it is
Lowest-priority, it cannot be other type, afaik.

Thanks,
Feng

> 
> CC Jun.
> 
> Hi Jun,
> 
> Do you know whether INIT, NMI, SMI, and SIPI can be delivered through
> lowest-priority mode? I didn't find SDM says no.
> 
> --
> best regards
> yang


RE: [PATCH V5 1/1] NTB: Add support for AMD PCI-Express Non-Transparent Bridge

2016-01-20 Thread Yu, Xiangliang


> From: Xiangliang Yu 
> > This adds support for AMD's PCI-Express Non-Transparent Bridge
> > (NTB) device on the Zeppelin platform. The driver connnects to the
> > standard NTB sub-system interface, with modification to add hooks for
> > power management in a separate patch. The AMD NTB device has 3
> memory
> > windows, 16 doorbell, 16 scratch-pad registers, and supports up to 16
> > PCIe lanes running a Gen3 speeds.
> >
> > Signed-off-by: Xiangliang Yu 
> 
> > Signed-off-by: Jon Mason 
> > Signed-off-by: Allen Hubbe 
> 
> NO.

Ok, I'll change it if you doesn't want to change it.

> 
> > +   /* set and verify setting the translation address */
> > +   write64(addr, peer_mmio + xlat_reg);
> > +   reg_val = read64(peer_mmio + xlat_reg);
> > +   if (reg_val != addr) {
> > +   write64(0, peer_mmio + xlat_reg);
> > +   return -EIO;
> > +   }
> > +
> > +   /* set and verify setting the limit */
> > +   writel(limit, mmio + limit_reg);
> > +   reg_val = readl(mmio + limit_reg);
> > +   if (reg_val != limit) {
> > +   writel(base_addr, mmio + limit_reg);
> > +   writel(0, peer_mmio + xlat_reg);
> > +   return -EIO;
> > +   }
> 
> I see what you did there, change iowrite64 to write64.
> 
> What I meant was:
>  - change readl to ioread32.
>  - change writel to iowrite32.
>  - change readb, readw, writeb, writew (if there are any)
>  - leave ioread64 and iowrite64 as they were.
> 
> Why: http://www.makelinux.net/ldd3/chp-9-sect-4
> 
> Quote: "If you read through the kernel source, you see many calls to an older
> set of functions when I/O memory is being used. These functions still work,
> but their use in new code is discouraged. Among other things, they are less
> safe because they do not perform the same sort of type checking."
> 
> The "older set of functions" are read[bwl], write[bwl].  This is a new driver,
> with all new code.  Please use the ioread/iowrite variants.

I don’t think so. In here, the i/o memory is only happened when pci_iomap return
Success, so the register can't be accessed through IO port way. And ioread* will
Check if the memory type is mmio type or IO port type (please see the 
definition).
 I don’t think we need to check It, so I use read* because It can make more 
efficient. 
I think we need to think about actual usage, not only follow book.
And, I have said it in previous version, I don’t like explain it again, and 
again.
If you have any concern, please tell me after my comment.

> > +static int amd_link_is_up(struct amd_ntb_dev *ndev) {
> > +   if (!ndev->peer_sta)
> > +   return NTB_LNK_STA_ACTIVE(ndev->cntl_sta);
> > +
> > +   /* If peer_sta is reset or D0 event, the ISR has
> > +* started a timer to check link status of hardware.
> > +* So here just clear status bit. And if peer_sta is
> > +* D3 or PME_TO, D0/reset event will be happened when
> > +* system wakeup/poweron, so do nothing here.
> > +*/
> > +   if (ndev->peer_sta & AMD_PEER_RESET_EVENT)
> > +   ndev->peer_sta &= ~AMD_PEER_RESET_EVENT;
> > +   else if (ndev->peer_sta & AMD_PEER_D0_EVENT)
> > +   ndev->peer_sta = 0;
> > +
> > +   return 0;
> > +}
> 
> Thanks.  This is much better.
> 
> > +static void amd_handle_event(struct amd_ntb_dev *ndev, int vec)
> ...
> > +   case AMD_PEER_D0_EVENT:
> ...
> > +   /* start a timer to poll link status */
> > +   schedule_delayed_work(>hb_timer,
> > + AMD_LINK_HB_TIMEOUT);
> 
> This is different from v4.  It used to be:
> 
> if (amd_link_is_up())
>   ntb_link_event();
> else
>   schedule_delayed_work();
> 
> Why is v5 correct?
> Why was v4 incorrect?

Because peer_sta is change to 0, so amd_link_is_up will return 0 (offline)
And will not check hardware link status. So It maybe make it offline forever

> I'm nervous about ndev->peer_sta, the behavior of link_is_up, timers...
> unexplained changes to a fragile bit of code - not just this code, but any 
> code
> that deals with parallel or asynchronous behaviors.  With the comment in
> link_is_up, this code is much better, but any changes to this whole link state
> mechanism need to be explained.
Actually, the code is designed according to Atom NTB, except for the peer_sta. 
I'll add the explaination when having changes.



Re: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2016-01-20 Thread Yang Zhang

On 2016/1/21 13:46, Wu, Feng wrote:




-Original Message-
From: Yang Zhang [mailto:yang.zhang...@gmail.com]
Sent: Thursday, January 21, 2016 1:43 PM
To: Wu, Feng ; pbonz...@redhat.com;
rkrc...@redhat.com
Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
Subject: Re: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver lowest-
priority interrupts

On 2016/1/21 13:33, Wu, Feng wrote:




-Original Message-
From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
ow...@vger.kernel.org] On Behalf Of Yang Zhang
Sent: Thursday, January 21, 2016 1:24 PM
To: Wu, Feng ; pbonz...@redhat.com;
rkrc...@redhat.com
Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
Subject: Re: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver

lowest-

priority interrupts

On 2016/1/20 9:42, Feng Wu wrote:

Use vector-hashing to deliver lowest-priority interrupts, As an
example, modern Intel CPUs in server platform use this method to
handle lowest-priority interrupts.

Signed-off-by: Feng Wu 
---
bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic

*src,

struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map)
{
@@ -727,21 +743,51 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm

*kvm, struct kvm_lapic *src,


dst = map->logical_map[cid];

-   if (kvm_lowest_prio_delivery(irq)) {
+   if (!kvm_lowest_prio_delivery(irq))
+   goto set_irq;
+
+   if (!kvm_vector_hashing_enabled()) {
int l = -1;
for_each_set_bit(i, , 16) {
if (!dst[i])
continue;
if (l < 0)
l = i;
-   else if (kvm_apic_compare_prio(dst[i]->vcpu,

dst[l]->vcpu) < 0)

+   else if (kvm_apic_compare_prio(dst[i]->vcpu,
+   dst[l]->vcpu) < 0)
l = i;
}
-
bitmap = (l >= 0) ? 1 << l : 0;
+   } else {
+   int idx = 0;
+   unsigned int dest_vcpus = 0;
+
+   dest_vcpus = hweight16(bitmap);
+   if (dest_vcpus == 0)
+   goto out;
+
+   idx = kvm_vector_2_index(irq->vector,
+   dest_vcpus, , 16);
+
+   /*
+* We may find a hardware disabled LAPIC here, if

that

+* is the case, print out a error message once for each
+* guest and return.
+*/
+   if (!dst[idx-1] &&
+   (kvm->arch.disabled_lapic_found == 0)) {
+   kvm->arch.disabled_lapic_found = 1;
+   printk(KERN_ERR
+   "Disabled LAPIC found during irq

injection\n");

+   goto out;


What does "goto out" mean? Inject successfully or fail? According the
value of ret which is set to ture here, it means inject successfully but
i = -1.



Oh, I didn't notice 'ret' is initialized to true, I thought it was initialized
to false like another function, I should add a "ret = false' here. We should
failed to inject the interrupt since hardware disabled LAPIC is found.


I remember we have discussed that even the LAPIC is software disabled,
it still can respond to some interrupts like INIT, NMI, SMI, and SIPI
messages. Isn't current logic still problematically?


I don't think there are problems, here we only cover lowest-priority mode.


Does Intel SDM said those interrupts cannot be delivered on 
lowest-priority mode?


CC Jun.

Hi Jun,

Do you know whether INIT, NMI, SMI, and SIPI can be delivered through 
lowest-priority mode? I didn't find SDM says no.


--
best regards
yang


[PATCH, REGRESSION v3] mm: make apply_to_page_range more robust

2016-01-20 Thread Mika Penttilä
Recent changes (4.4.0+) in module loader triggered oops on ARM : 

The module in question is in-tree module :
drivers/misc/ti-st/st_drv.ko

The BUG is here :

[ 53.638335] [ cut here ]
[ 53.642967] kernel BUG at mm/memory.c:1878!
[ 53.647153] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[ 53.652987] Modules linked in:
[ 53.656061] CPU: 0 PID: 483 Comm: insmod Not tainted 4.4.0 #3
[ 53.661808] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 53.668338] task: a989d400 ti: 9e6a2000 task.ti: 9e6a2000
[ 53.673751] PC is at apply_to_page_range+0x204/0x224
[ 53.678723] LR is at change_memory_common+0x90/0xdc
[ 53.683604] pc : [<800ca0ec>] lr : [<8001d668>] psr: 600b0013
[ 53.683604] sp : 9e6a3e38 ip : 8001d6b4 fp : 7f0042fc
[ 53.695082] r10:  r9 : 9e6a3e90 r8 : 0080
[ 53.700309] r7 :  r6 : 7f008000 r5 : 7f008000 r4 : 7f008000
[ 53.706837] r3 : 8001d5a4 r2 : 7f008000 r1 : 7f008000 r0 : 80b8d3c0
[ 53.713368] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 53.720504] Control: 10c5387d Table: 2e6b804a DAC: 0055
[ 53.726252] Process insmod (pid: 483, stack limit = 0x9e6a2210)
[ 53.732173] Stack: (0x9e6a3e38 to 0x9e6a4000)
[ 53.736532] 3e20: 7f007fff 7f008000
[ 53.744714] 3e40: 80b8d3c0 80b8d3c0  7f007000 7f00426c 7f008000 
 7f008000
[ 53.752895] 3e60: 7f004140 7f008000  0080   
7f0042fc 8001d668
[ 53.761076] 3e80: 9e6a3e90  8001d6b4 7f00426c 0080  
9e6a3f58 7f004140
[ 53.769257] 3ea0: 7f004240 7f00414c  8008bbe0  7f00 
 
[ 53.777438] 3ec0: a8b12f00 0001cfd4 7f004250 7f004240 80b8159c  
00e0 7f0042fc
[ 53.785619] 3ee0: c183d000 74f8 18fd  0b3c  
 7f002024
[ 53.793800] 3f00: 0002      
 
[ 53.801980] 3f20:     0040  
0003 0001cfd4
[ 53.810161] 3f40: 017b 8000f7e4 9e6a2000  0002 8008c498 
c183d000 74f8
[ 53.818342] 3f60: c1841588 c1841409 c1842950 5000 52a0  
 
[ 53.826523] 3f80: 0023 0024 001a 001e 0016  
 
[ 53.834703] 3fa0: 003e3d60 8000f640   0003 0001cfd4 
 003e3d60
[ 53.842884] 3fc0:   003e3d60 017b 003e3d20 7eabc9d4 
76f2c000 0002
[ 53.851065] 3fe0: 7eabc990 7eabc980 00016320 76e81d00 600b0010 0003 
 
[ 53.859256] [<800ca0ec>] (apply_to_page_range) from [<8001d668>] 
(change_memory_common+0x90/0xdc)
[ 53.868139] [<8001d668>] (change_memory_common) from [<8008bbe0>] 
(load_module+0x194c/0x2068)
[ 53.876671] [<8008bbe0>] (load_module) from [<8008c498>] 
(SyS_finit_module+0x64/0x74)
[ 53.884512] [<8008c498>] (SyS_finit_module) from [<8000f640>] 
(ret_fast_syscall+0x0/0x34)
[ 53.892694] Code: e0834104 eabc e51a1008 eaac (e7f001f2)
[ 53.898792] ---[ end trace fe43fc78ebde29a3 ]---


The call path is SyS_init_module()->set_memory_xx()->apply_to_page_range(),
and apply_to_page_range gets zero length resulting in triggering :
   
  BUG_ON(addr >= end)

This is regression and a consequence of changes in module section handling 
(Rusty CC:ed).
This may be triggable only with certain modules and/or gcc versions. 

Plus, I think the spirit of the BUG_ON is to catch overflows,
not to bug on zero length legitimate callers. So whatever the
reason for this triggering, some day we have another caller with
zero length. And, as Rusty mentioned, he expected a zero-length range 
to do nothing, which is what intuition says. 

Fix by letting call with zero size succeed. 

v2: add more explanation
v3: added even more explanation and stack trace, tagged as regression

Signed-off-by: Mika Penttilä mika.pentt...@nextfour.com
Reviewed-by: Pekka Enberg 
---

diff --git a/mm/memory.c b/mm/memory.c
index c387430..c3d1a2e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1884,6 +1884,9 @@ int apply_to_page_range(struct mm_struct *mm, unsigned 
long addr,
unsigned long end = addr + size;
int err;
 
+   if (!size)
+   return 0;
+
BUG_ON(addr >= end);
pgd = pgd_offset(mm, addr);
do {



Re: [PATCH v3 1/6] clk: mediatek: Refine the makefile to support multiple clock drivers

2016-01-20 Thread James Liao
Hi Yingjoe,

On Thu, 2016-01-21 at 10:45 +0800, Yingjoe Chen wrote:
> On Thu, 2016-01-21 at 10:28 +0800, Yingjoe Chen wrote:
> > On Tue, 2016-01-12 at 16:31 +0800, James Liao wrote:
> > > Add a Kconfig to define clock configuration for each SoC, and
> > > modify the Makefile to build drivers that only selected in config.
> > > 
> > > Signed-off-by: Shunli Wang 
> > > Signed-off-by: James Liao 
> > > ---
> > >  drivers/clk/Kconfig   |  1 +
> > >  drivers/clk/mediatek/Kconfig  | 23 +++
> > >  drivers/clk/mediatek/Makefile |  6 +++---
> > >  3 files changed, 27 insertions(+), 3 deletions(-)
> > >  create mode 100644 drivers/clk/mediatek/Kconfig
> > > 
> > > diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
> > > index c3e3a02..b7a37dc 100644
> > > --- a/drivers/clk/Kconfig
> > > +++ b/drivers/clk/Kconfig
> > > @@ -198,3 +198,4 @@ source "drivers/clk/mvebu/Kconfig"
> > >  
> > >  source "drivers/clk/samsung/Kconfig"
> > >  source "drivers/clk/tegra/Kconfig"
> > > +source "drivers/clk/mediatek/Kconfig"
> > 
> > 
> > Hi James,
> > 
> > drivers/clk/mediatek/Kconfig add user selectable options, menuconfig
> > will list them outside of "Common Clock Framework" sub-menu if you
> > source the file here. Kconfig for samsung & tegra doesn't have any, so
> > it is OK for them to stay here.
> > 
> > Please move it inside the menu, also it seems the source lines are
> > sorted now, so let's keep them sorted.
> 
> 
> After looking at drivers/clk/Kconfig history, it seems we have similar
> issue before. I think we should move all sources under clk menu to
> prevent this from happening.

I may provide a separated patch to move other Kconfig into menu section.


Best regards,

James




Re: [PATCH v4 20/21] usb: dwc2: host: Totally redo the microframe scheduler

2016-01-20 Thread kbuild test robot
Hi Douglas,

[auto build test ERROR on next-20160120]
[cannot apply to v4.4-rc8 v4.4-rc7 v4.4-rc6 v4.4]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Douglas-Anderson/usb-dwc2-host-Fix-and-speed-up-all-the-stuff-especially-with-splits/20160121-131414
config: x86_64-randconfig-x019-01201142 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from drivers/usb/dwc2/core.c:57:0:
>> drivers/usb/dwc2/hcd.h:345:44: error: 'DWC2_HS_SCHEDULE_UFRAMES' undeclared 
>> here (not in a function)
 struct dwc2_hs_transfer_time hs_transfers[DWC2_HS_SCHEDULE_UFRAMES];
   ^
   drivers/usb/dwc2/core.c: In function 'dwc2_hc_start_transfer':
   drivers/usb/dwc2/core.c:1964:17: error: 'struct dwc2_hsotg' has no member 
named 'split_order'
  >split_order);
^
--
   In file included from drivers/usb/dwc2/core_intr.c:54:0:
>> drivers/usb/dwc2/hcd.h:345:44: error: 'DWC2_HS_SCHEDULE_UFRAMES' undeclared 
>> here (not in a function)
 struct dwc2_hs_transfer_time hs_transfers[DWC2_HS_SCHEDULE_UFRAMES];
   ^

vim +/DWC2_HS_SCHEDULE_UFRAMES +345 drivers/usb/dwc2/hcd.h

   339  u16 device_us;
   340  u16 host_interval;
   341  u16 device_interval;
   342  u16 next_active_frame;
   343  u16 start_active_frame;
   344  s16 num_hs_transfers;
 > 345  struct dwc2_hs_transfer_time 
 > hs_transfers[DWC2_HS_SCHEDULE_UFRAMES];
   346  u32 ls_start_schedule_slice;
   347  u16 ntd;
   348  struct list_head qtd_list;

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [RFC][PATCH -next 2/2] printk: set may_schedule for some of console_trylock callers

2016-01-20 Thread Sergey Senozhatsky
On (01/21/16 10:25), Sergey Senozhatsky wrote:
[..]
> > First, the message "This stops the holder of console_sem just where we
> > want him" is suspitious.
> 
> this comment is irrelevant, as of today. it was, a long time ago, because
> the entire thing was a bit different (linux-2.4.21 kernel/printk.c)
> 
> /* This stops the holder of console_sem just where we want him */
> spin_lock_irqsave(_lock, flags);
> 
> logbuf_lock does stop the holder, local_irq_save() does not, you are right.

I meant 'irrelevant on its current place'.

[..]
> > As a result, I think that we do not need the extra checks
> > for the save context in printk(). IMHO, it is safe to remove
> > all the console_may_schedule stuff and also remove the extra
> > preempt_disable/preempt_enable() in vprintk_emit().
> > 
> > Or did I miss anything?
> 
> hm... I suspect the reason we have console_may_schedule is
> console_conditional_schedule() - console_sem owner may want
> to have an internal logic to re-schedule [fwiw], while still
> holding the console_sem. tty/vt/vt.c or video/console/fbcon.c
> for example. (in 2.4 kernel: video/fbcon.c and char/console.c).
> 
> cond_resched() helps in console_unlock(); console_conditional_schedule()
> is called after console_lock() and _before_ console_unlock()

for CONFIG_PREEMPT_COUNT kernel we can do something like

+void __sched console_conditional_schedule(void)
+{
+   if (!oops_in_progress && preemptible() && !rcu_preempt_depth())
+   cond_resched();
+}

and in console_unlock()

-   if (do_cond_resched)
-   cond_resched();
+   console_conditional_schedule();



but for !CONFIG_PREEMPT_COUNT we can't. because of currently held spin_locks/etc
that we don't know about.

`console_may_schedule' carries a bit of important information for
console_conditional_schedule() caller. if it has acquired console_sem
via console_lock() - then it can schedule, if via console_trylock() - it cannot.

the last `if via console_trylock() - it cannot' rule is not always true,
we clearly can have printk()->console_unlock() from non-atomic contexts
(if we know that its non-atomic, which is not the case with !PREEMPT_COUNT).

-ss


Re: [PATCH v3 1/6] clk: mediatek: Refine the makefile to support multiple clock drivers

2016-01-20 Thread James Liao
Hi Yingjoe,

On Thu, 2016-01-21 at 10:28 +0800, Yingjoe Chen wrote:
> On Tue, 2016-01-12 at 16:31 +0800, James Liao wrote:
> > Add a Kconfig to define clock configuration for each SoC, and
> > modify the Makefile to build drivers that only selected in config.
> > 
> > Signed-off-by: Shunli Wang 
> > Signed-off-by: James Liao 
> > ---
> >  drivers/clk/Kconfig   |  1 +
> >  drivers/clk/mediatek/Kconfig  | 23 +++
> >  drivers/clk/mediatek/Makefile |  6 +++---
> >  3 files changed, 27 insertions(+), 3 deletions(-)
> >  create mode 100644 drivers/clk/mediatek/Kconfig
> > 
> > diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
> > index c3e3a02..b7a37dc 100644
> > --- a/drivers/clk/Kconfig
> > +++ b/drivers/clk/Kconfig
> > @@ -198,3 +198,4 @@ source "drivers/clk/mvebu/Kconfig"
> >  
> >  source "drivers/clk/samsung/Kconfig"
> >  source "drivers/clk/tegra/Kconfig"
> > +source "drivers/clk/mediatek/Kconfig"
> 
> 
> Hi James,
> 
> drivers/clk/mediatek/Kconfig add user selectable options, menuconfig
> will list them outside of "Common Clock Framework" sub-menu if you
> source the file here. Kconfig for samsung & tegra doesn't have any, so
> it is OK for them to stay here.
> 
> Please move it inside the menu, also it seems the source lines are
> sorted now, so let's keep them sorted.

OK. I'll move mediatek/Kconfig into menu section in next patch.


Best regards,

James



Re: linux-next: build failure after merge of the akpm-current tree

2016-01-20 Thread Sudip Mukherjee
Hi Stephen,

On Thu, Jan 21, 2016 at 04:25:45PM +1100, Stephen Rothwell wrote:
> Hi Sudip,
> 
> On Thu, 21 Jan 2016 10:47:09 +0530 Sudip Mukherjee 
>  wrote:
> >
> > On Thu, Jan 21, 2016 at 04:11:56PM +1100, Stephen Rothwell wrote:
> > > Hi Andrew,
> > > 
> > > After merging the akpm-current tree, today's linux-next build (arm
> > > efm32_defconfig) failed like this:
> > > 
> > > fs/proc/task_nommu.c:132:28: error: 'mm' undeclared (first use in this 
> > > function)
> > > 
> > > Caused by commit
> > > 
> > >   e87d4fd02f40 ("proc: revert /proc//maps [stack:TID] annotation")  
> > 
> > posted a patch for it few minutes ago.
> > https://patchwork.kernel.org/patch/8077421/
> 
> Thanks, I have added that to the akpm-current tree for tomorrow in case
> Andrew does not get around to it.

Adding that will uncover one more build failure, and the patch for that
is at:
https://patchwork.kernel.org/patch/8077891/

regards
sudip


Re: [LKP] [lkp] [spi] 2baed30cb3: BUG: scheduling while atomic: systemd-udevd/134/0x00000002

2016-01-20 Thread Huang, Ying
Sudip Mukherjee  writes:

> On Wed, Jan 20, 2016 at 01:00:40PM +0800, Huang, Ying wrote:
>> Sudip Mukherjee  writes:
>> 
>> > On Wed, Jan 20, 2016 at 08:44:37AM +0800, kernel test robot wrote:
>> >> FYI, we noticed the below changes on
>> >> 
>> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>> >> commit 2baed30cb30727b2637d26eac5a8887875a13420 ("spi: lm70llp: use new 
>> >> parport device model")
>> >> 
>> >> 
>> >> ++++
>> >> || 74bdced4b4 | 2baed30cb3 |
>> >> ++++
>> >> | boot_successes | 0  | 0  |
>> >> ++++
>> >> 
>> >> 
>> >> 
>> >> [6.358390] i6300esb: Intel 6300ESB WatchDog Timer Driver v0.05
>> >> [6.358540] i6300esb: cannot register miscdev on minor=130 (err=-16)
>> >> [6.358555] i6300ESB timer: probe of :00:06.0 failed with error -16
>> >> [6.363357] BUG: scheduling while atomic: systemd-udevd/134/0x0002
>> >> [ 6.363366] Modules linked in: crc32c_intel pcspkr evdev i6300esb
>> >> ide_cd_mod cdrom intel_agp intel_gtt i2c_piix4 i2c_core virtio_pci
>> >> virtio virtio_ring agpgart rtc_cmos(+) parport_pc(+) autofs4
>> >> [6.363369] CPU: 1 PID: 134 Comm: systemd-udevd Not tainted 
>> >> 4.4.0-rc1-6-g2baed30 #1
>> >> [6.363370] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
>> >> BIOS Debian-1.8.2-1 04/01/2014
>> >
>> > Can you please let me know how do i reproduce this on qemu? what command
>> > line you used?
>> 
>> The command line can be found in the last line of dmesg file, as below.
>> 
>> qemu-system-x86_64 -enable-kvm -cpu host -kernel
>> /pkg/linux/x86_64-randconfig-a0-01191454/gcc-5/2baed30cb30727b2637d26eac5a8887875a13420/vmlinuz-4.4.0-rc1-6-g2baed30
>> -append 'root=/dev/ram0 user=lkp
>> job=/lkp/scheduled/vm-lkp-wsx03-2G-2/bisect_boot-1-debian-x86_64-2015-02-07.cgz-x86_64-randconfig-a0-01191454-2baed30cb30727b2637d26eac5a8887875a13420-20160119-71002-198dtgm-0.yaml
>> ARCH=x86_64 kconfig=x86_64-randconfig-a0-01191454
>> branch=linux-devel/devel-spot-201601191442
>> commit=2baed30cb30727b2637d26eac5a8887875a13420
>> BOOT_IMAGE=/pkg/linux/x86_64-randconfig-a0-01191454/gcc-5/2baed30cb30727b2637d26eac5a8887875a13420/vmlinuz-4.4.0-rc1-6-g2baed30
>> max_uptime=600
>> RESULT_ROOT=/result/boot/1/vm-lkp-wsx03-2G/debian-x86_64-2015-02-07.cgz/x86_64-randconfig-a0-01191454/gcc-5/2baed30cb30727b2637d26eac5a8887875a13420/0
>> LKP_SERVER=inn earlyprintk=ttyS0,115200 systemd.log_level=err debug
>> apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100
>> panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic
>> load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0
>> vga=normal rw ip=vm-lkp-wsx03-2G-2::dhcp' -initrd
>> /fs/sda1/initrd-vm-lkp-wsx03-2G-2 -m 2048 -smp 2 -device
>> e1000,netdev=net0 -netdev user,id=net0,hostfwd=tcp::23621-:22 -boot
>> order=nc -no-reboot -watchdog i6300esb -rtc base=localtime -drive
>> file=/fs/sda1/disk0-vm-lkp-wsx03-2G-2,media=disk,if=virtio -drive
>> file=/fs/sda1/disk1-vm-lkp-wsx03-2G-2,media=disk,if=virtio -pidfile
>> /dev/shm/kboot/pid-vm-lkp-wsx03-2G-2 -serial
>> file:/dev/shm/kboot/serial-vm-lkp-wsx03-2G-2 -daemonize -display
>> none -monitor null
>
> I am not able to reproduce this. Tested just with the kernel and
> yocto-minimal-i386.cgz filesystem and it booted properly.
>
> I guess I need atleast your job file to reproduce this.

This is a boot test so I did not attached the job file.  But the test
result may depends on specific root file system.  For example, the
process when BUG report is always systemd-udevd.  Maybe you need a
systemd based root file system.

Best Regards,
Huang, Ying



RE: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2016-01-20 Thread Wu, Feng


> -Original Message-
> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
> Sent: Thursday, January 21, 2016 1:43 PM
> To: Wu, Feng ; pbonz...@redhat.com;
> rkrc...@redhat.com
> Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
> Subject: Re: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver lowest-
> priority interrupts
> 
> On 2016/1/21 13:33, Wu, Feng wrote:
> >
> >
> >> -Original Message-
> >> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> >> ow...@vger.kernel.org] On Behalf Of Yang Zhang
> >> Sent: Thursday, January 21, 2016 1:24 PM
> >> To: Wu, Feng ; pbonz...@redhat.com;
> >> rkrc...@redhat.com
> >> Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
> >> Subject: Re: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver
> lowest-
> >> priority interrupts
> >>
> >> On 2016/1/20 9:42, Feng Wu wrote:
> >>> Use vector-hashing to deliver lowest-priority interrupts, As an
> >>> example, modern Intel CPUs in server platform use this method to
> >>> handle lowest-priority interrupts.
> >>>
> >>> Signed-off-by: Feng Wu 
> >>> ---
> >>>bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic
> >> *src,
> >>>   struct kvm_lapic_irq *irq, int *r, unsigned long 
> >>> *dest_map)
> >>>{
> >>> @@ -727,21 +743,51 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm
> >> *kvm, struct kvm_lapic *src,
> >>>
> >>>   dst = map->logical_map[cid];
> >>>
> >>> - if (kvm_lowest_prio_delivery(irq)) {
> >>> + if (!kvm_lowest_prio_delivery(irq))
> >>> + goto set_irq;
> >>> +
> >>> + if (!kvm_vector_hashing_enabled()) {
> >>>   int l = -1;
> >>>   for_each_set_bit(i, , 16) {
> >>>   if (!dst[i])
> >>>   continue;
> >>>   if (l < 0)
> >>>   l = i;
> >>> - else if (kvm_apic_compare_prio(dst[i]->vcpu,
> >> dst[l]->vcpu) < 0)
> >>> + else if (kvm_apic_compare_prio(dst[i]->vcpu,
> >>> + dst[l]->vcpu) < 0)
> >>>   l = i;
> >>>   }
> >>> -
> >>>   bitmap = (l >= 0) ? 1 << l : 0;
> >>> + } else {
> >>> + int idx = 0;
> >>> + unsigned int dest_vcpus = 0;
> >>> +
> >>> + dest_vcpus = hweight16(bitmap);
> >>> + if (dest_vcpus == 0)
> >>> + goto out;
> >>> +
> >>> + idx = kvm_vector_2_index(irq->vector,
> >>> + dest_vcpus, , 16);
> >>> +
> >>> + /*
> >>> +  * We may find a hardware disabled LAPIC here, if
> >> that
> >>> +  * is the case, print out a error message once for each
> >>> +  * guest and return.
> >>> +  */
> >>> + if (!dst[idx-1] &&
> >>> + (kvm->arch.disabled_lapic_found == 0)) {
> >>> + kvm->arch.disabled_lapic_found = 1;
> >>> + printk(KERN_ERR
> >>> + "Disabled LAPIC found during irq
> >> injection\n");
> >>> + goto out;
> >>
> >> What does "goto out" mean? Inject successfully or fail? According the
> >> value of ret which is set to ture here, it means inject successfully but
> >> i = -1.
> >>
> >
> > Oh, I didn't notice 'ret' is initialized to true, I thought it was 
> > initialized
> > to false like another function, I should add a "ret = false' here. We should
> > failed to inject the interrupt since hardware disabled LAPIC is found.
> 
> I remember we have discussed that even the LAPIC is software disabled,
> it still can respond to some interrupts like INIT, NMI, SMI, and SIPI
> messages. Isn't current logic still problematically?

I don't think there are problems, here we only cover lowest-priority mode.

Thanks,
Feng

> 
> --
> best regards
> yang


Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the interrupt is not single-destination

2016-01-20 Thread Yang Zhang

On 2016/1/21 13:41, Wu, Feng wrote:




-Original Message-
From: Yang Zhang [mailto:yang.zhang...@gmail.com]
Sent: Thursday, January 21, 2016 1:36 PM
To: Wu, Feng ; pbonz...@redhat.com;
rkrc...@redhat.com
Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
interrupt is not single-destination

On 2016/1/21 13:07, Wu, Feng wrote:




-Original Message-
From: Yang Zhang [mailto:yang.zhang...@gmail.com]
Sent: Thursday, January 21, 2016 1:00 PM
To: Wu, Feng ; pbonz...@redhat.com;
rkrc...@redhat.com
Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
interrupt is not single-destination

On 2016/1/21 12:42, Wu, Feng wrote:




-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org]

On

Behalf Of Yang Zhang
Sent: Thursday, January 21, 2016 11:35 AM
To: Wu, Feng ; pbonz...@redhat.com;
rkrc...@redhat.com
Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if

the

interrupt is not single-destination

On 2016/1/21 11:14, Wu, Feng wrote:




-Original Message-
From: Yang Zhang [mailto:yang.zhang...@gmail.com]
Sent: Thursday, January 21, 2016 11:06 AM
To: Wu, Feng ; pbonz...@redhat.com;
rkrc...@redhat.com
Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if

the

interrupt is not single-destination

On 2016/1/20 9:42, Feng Wu wrote:

When the interrupt is not single destination any more, we need
to change back IRTE to remapped mode explicitly.

Signed-off-by: Feng Wu 
---
  arch/x86/kvm/vmx.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e2951b6..13d14d4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10764,8 +10764,17 @@ static int vmx_update_pi_irte(struct

kvm

*kvm, unsigned int host_irq,

 */

kvm_set_msi_irq(e, );
-   if (!kvm_intr_is_single_vcpu(kvm, , ))
+   if (!kvm_intr_is_single_vcpu(kvm, , )) {
+   /*
+* Make sure the IRTE is in remapped mode if
+* we don't handle it in posted mode.
+*/
+   pi_set_sn(vcpu_to_pi_desc(vcpu));
+   ret = irq_set_vcpu_affinity(host_irq, NULL);
+   pi_clear_sn(vcpu_to_pi_desc(vcpu));
+
continue;
+   }

vcpu_info.pi_desc_addr =

__pa(vcpu_to_pi_desc(vcpu));

vcpu_info.vector = irq.vector;



I am still feel weird with this change: according the semantic of VT-d
posted interrupt, the interrupt will injected to guest through posted
notification and /proc/interrupts shows the same meaning. But now,
without being aware of user, the interrupt changes to legacy way and

it

appears on different entry on /proc/interrupts. It looks weird.


I don't think it has problem here, IMO, this is exactly how it works.
There should be different entry for the interrupts in VT-d PI mode
and leagcy mode.


I am not saying any problem here. Just feel weird. From a normal user's
point, he has turned on the VT-d pi and according the semantic of VT-d
pi, he should not observe the interrupt through legacy mode, but now

he

do see it. Maybe print out a message here will be helpful, like what you
did for disabled lapic found during irq injection.


Even VT-d PI is on, not all interrupts can be handled by it, the reason the


No, we can handle it but we don't do it due to the complexity.For
example, we can use wake up vector to delivery the interrupt which still
is in PI mode but doesn't require any mode change.


I mean, multi-cast and broadcast interrupts cannot be handled in PI mode.


We may have different understanding on PI mode. My understanding is if
we set the IRTE to PI format, than the subsequent interrupt will be
handled in PI mode. multi-cast and broadcast interrupts cannot be
injected to guest directly but it doesn't mean cannot be handled in PI
mode. As i said, we can handle it in wake up vector or via other
approach.But it is much complexity.


For the multicast/broastcast, we cannot set the related IRTE in PI
mode, since we cannot set only one destination in IRTE. If an interrupt
is for multiple destination, how can you use VT-d PI to injection it
to all the destinations?


You may still not get my point. Anyway, it doesn't matter. Rollback to 
legacy mode still is the best choice so far.


--
best regards
yang


Re: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2016-01-20 Thread Yang Zhang

On 2016/1/21 13:33, Wu, Feng wrote:




-Original Message-
From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
ow...@vger.kernel.org] On Behalf Of Yang Zhang
Sent: Thursday, January 21, 2016 1:24 PM
To: Wu, Feng ; pbonz...@redhat.com;
rkrc...@redhat.com
Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
Subject: Re: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver lowest-
priority interrupts

On 2016/1/20 9:42, Feng Wu wrote:

Use vector-hashing to deliver lowest-priority interrupts, As an
example, modern Intel CPUs in server platform use this method to
handle lowest-priority interrupts.

Signed-off-by: Feng Wu 
---
   bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic

*src,

struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map)
   {
@@ -727,21 +743,51 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm

*kvm, struct kvm_lapic *src,


dst = map->logical_map[cid];

-   if (kvm_lowest_prio_delivery(irq)) {
+   if (!kvm_lowest_prio_delivery(irq))
+   goto set_irq;
+
+   if (!kvm_vector_hashing_enabled()) {
int l = -1;
for_each_set_bit(i, , 16) {
if (!dst[i])
continue;
if (l < 0)
l = i;
-   else if (kvm_apic_compare_prio(dst[i]->vcpu,

dst[l]->vcpu) < 0)

+   else if (kvm_apic_compare_prio(dst[i]->vcpu,
+   dst[l]->vcpu) < 0)
l = i;
}
-
bitmap = (l >= 0) ? 1 << l : 0;
+   } else {
+   int idx = 0;
+   unsigned int dest_vcpus = 0;
+
+   dest_vcpus = hweight16(bitmap);
+   if (dest_vcpus == 0)
+   goto out;
+
+   idx = kvm_vector_2_index(irq->vector,
+   dest_vcpus, , 16);
+
+   /*
+* We may find a hardware disabled LAPIC here, if

that

+* is the case, print out a error message once for each
+* guest and return.
+*/
+   if (!dst[idx-1] &&
+   (kvm->arch.disabled_lapic_found == 0)) {
+   kvm->arch.disabled_lapic_found = 1;
+   printk(KERN_ERR
+   "Disabled LAPIC found during irq

injection\n");

+   goto out;


What does "goto out" mean? Inject successfully or fail? According the
value of ret which is set to ture here, it means inject successfully but
i = -1.



Oh, I didn't notice 'ret' is initialized to true, I thought it was initialized
to false like another function, I should add a "ret = false' here. We should
failed to inject the interrupt since hardware disabled LAPIC is found.


I remember we have discussed that even the LAPIC is software disabled, 
it still can respond to some interrupts like INIT, NMI, SMI, and SIPI 
messages. Isn't current logic still problematically?


--
best regards
yang


RE: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the interrupt is not single-destination

2016-01-20 Thread Wu, Feng


> -Original Message-
> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
> Sent: Thursday, January 21, 2016 1:36 PM
> To: Wu, Feng ; pbonz...@redhat.com;
> rkrc...@redhat.com
> Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
> interrupt is not single-destination
> 
> On 2016/1/21 13:07, Wu, Feng wrote:
> >
> >
> >> -Original Message-
> >> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
> >> Sent: Thursday, January 21, 2016 1:00 PM
> >> To: Wu, Feng ; pbonz...@redhat.com;
> >> rkrc...@redhat.com
> >> Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
> >> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
> >> interrupt is not single-destination
> >>
> >> On 2016/1/21 12:42, Wu, Feng wrote:
> >>>
> >>>
>  -Original Message-
>  From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org]
> >> On
>  Behalf Of Yang Zhang
>  Sent: Thursday, January 21, 2016 11:35 AM
>  To: Wu, Feng ; pbonz...@redhat.com;
>  rkrc...@redhat.com
>  Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
>  Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if
> the
>  interrupt is not single-destination
> 
>  On 2016/1/21 11:14, Wu, Feng wrote:
> >
> >
> >> -Original Message-
> >> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
> >> Sent: Thursday, January 21, 2016 11:06 AM
> >> To: Wu, Feng ; pbonz...@redhat.com;
> >> rkrc...@redhat.com
> >> Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
> >> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if
> >> the
> >> interrupt is not single-destination
> >>
> >> On 2016/1/20 9:42, Feng Wu wrote:
> >>> When the interrupt is not single destination any more, we need
> >>> to change back IRTE to remapped mode explicitly.
> >>>
> >>> Signed-off-by: Feng Wu 
> >>> ---
> >>>  arch/x86/kvm/vmx.c | 11 ++-
> >>>  1 file changed, 10 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>> index e2951b6..13d14d4 100644
> >>> --- a/arch/x86/kvm/vmx.c
> >>> +++ b/arch/x86/kvm/vmx.c
> >>> @@ -10764,8 +10764,17 @@ static int vmx_update_pi_irte(struct
> kvm
> >> *kvm, unsigned int host_irq,
> >>>*/
> >>>
> >>>   kvm_set_msi_irq(e, );
> >>> - if (!kvm_intr_is_single_vcpu(kvm, , ))
> >>> + if (!kvm_intr_is_single_vcpu(kvm, , )) {
> >>> + /*
> >>> +  * Make sure the IRTE is in remapped mode if
> >>> +  * we don't handle it in posted mode.
> >>> +  */
> >>> + pi_set_sn(vcpu_to_pi_desc(vcpu));
> >>> + ret = irq_set_vcpu_affinity(host_irq, NULL);
> >>> + pi_clear_sn(vcpu_to_pi_desc(vcpu));
> >>> +
> >>>   continue;
> >>> + }
> >>>
> >>>   vcpu_info.pi_desc_addr =
> >> __pa(vcpu_to_pi_desc(vcpu));
> >>>   vcpu_info.vector = irq.vector;
> >>>
> >>
> >> I am still feel weird with this change: according the semantic of VT-d
> >> posted interrupt, the interrupt will injected to guest through posted
> >> notification and /proc/interrupts shows the same meaning. But now,
> >> without being aware of user, the interrupt changes to legacy way and
> it
> >> appears on different entry on /proc/interrupts. It looks weird.
> >
> > I don't think it has problem here, IMO, this is exactly how it works.
> > There should be different entry for the interrupts in VT-d PI mode
> > and leagcy mode.
> 
>  I am not saying any problem here. Just feel weird. From a normal user's
>  point, he has turned on the VT-d pi and according the semantic of VT-d
>  pi, he should not observe the interrupt through legacy mode, but now
> he
>  do see it. Maybe print out a message here will be helpful, like what you
>  did for disabled lapic found during irq injection.
> >>>
> >>> Even VT-d PI is on, not all interrupts can be handled by it, the reason 
> >>> the
> >>
> >> No, we can handle it but we don't do it due to the complexity.For
> >> example, we can use wake up vector to delivery the interrupt which still
> >> is in PI mode but doesn't require any mode change.
> >
> > I mean, multi-cast and broadcast interrupts cannot be handled in PI mode.
> 
> We may have different understanding on PI mode. My understanding is if
> we set the IRTE to PI format, than the subsequent interrupt will be
> handled in PI mode. multi-cast and broadcast interrupts cannot be
> injected to guest directly but it doesn't mean cannot be handled in PI
> mode. As i said, we can handle it in wake up vector or via 

[PATCH RFC 00/15] mmc: sunxi: Support vqmmc regulator and eMMC DDR modes

2016-01-20 Thread Chen-Yu Tsai
Hi everyone,

This series adds support for vqmmc regulator and eMMC DDR modes for
sunxi-mmc. Allwinner's MMC controller supports eMMC 4.41 on earlier
SoCs, and up to 5.0 on latest ones. UHS-1 modes are also supported
by the hardware, but these are not covered in this series, as no
boards have dedicated regulators for vqmmc.

To support these faster modes, these patches adds vqmmc regulator
support, which is used by the mmc core to switch to faster modes,
even if the signaling voltage is fixed. Signal voltage switching
support is also added, but not tested, as no available hardware has
a dedicated vqmmc regulator.

Support for eMMC reset in the controller, vs a GPIO and pwrseq, is
also added where applicable.

Patch 1 documents the mmc host init sequence. When the driver was
ported, this part was copied verbatim and not documented. With inline
comments from later SDKs and datasheet register definitions, this part
is now clearer.

Patch 2 makes the .set_ios callback return on errors from
mmc_regulator_set_ocr.

Patch 3 adds support (enabling/disable, and voltage control) for vqmmc
regulator to sunxi-mmc.

Patch 4 adds support signal voltage switch for the mmc controller. The
Allwinner MMC controller uses a special bit for sending signal voltage
switching command.

Patch 5 adds timing delays for MMC_DDR52 mode.

Patch 6 adds support for 8 bit eMMC DDR52 mode. Under this mode, the
controller must run at twice the card clock, and different timing delays
are needed.

Patch 7 enables eMMC HS-DDR for sunxi-mmc.

Patch 8 adds mmc3 pins for 8 bit emmc for A31/A31s.

Patch 9 switches from mmc2 to mmc3 for the onboard eMMC on Sinlinx
SinA31s. According to Allwinner, only mmc3 supports eMMC DDR52 on
A31/A31s.

Patch 10 adds the eMMC reset pin to the emmc pingroup for A23/A33.

Patch 11 enables eMMC hardware reset and eMMC DDR52 mode for SinA33.

Patch 12 switches A80 to sun9i specific mmc compatible. A80 has different
timing delays, and a larger FIFO (TODO).

Patch 13 adds the eMMC reset pin to the emmc pingroup for A80.

Patch 14 enables eMMC hardware reset and eMMC DDR52 mode for A80 Optimus.

Patch 15 enables eMMC hardware reset and eMMC DDR52 mode for Cubieboard4.

Chen-Yu Tsai (15):
  mmc: sunxi: Document host init sequence
  mmc: sunxi: Return error on mmc_regulator_set_ocr() fail in .set_ios
op
  mmc: sunxi: Block signal voltage switching (CMD11)
  mmc: sunxi: Support vqmmc regulator
  mmc: sunxi: Support MMC_DDR52 timing modes
  mmc: sunxi: Support 8 bit eMMC DDR transfer modes
  mmc: sunxi: Enable eMMC HS-DDR (MMC_CAP_1_8V_DDR) support
  ARM: dts: sun6i: Add mmc3 pins for 8 bit emmc
  ARM: dts: sun6i: sina31s: Switch to mmc3 for onboard eMMC
  ARM: dts: sun8i: Include SDC2_RST pin in mmc2_8bit_pins
  ARM: dts: sun8i: sina33: Enable hardware reset and HS-DDR for eMMC
  ARM: dts: sun9i: Use sun9i specific mmc compatible
  ARM: dts: sun9i: Include SDC2_RST pin in mmc2_8bit_pins
  ARM: dts: sun9i: a80-optimus: Enable hardware reset and HS-DDR for
eMMC
  ARM: dts: sun9i: cubieboard4: Enable hardware reset and HS-DDR for
eMMC

 arch/arm/boot/dts/sun6i-a31.dtsi   | 10 +++
 arch/arm/boot/dts/sun6i-a31s-sina31s-core.dtsi |  6 +-
 arch/arm/boot/dts/sun8i-a23-a33.dtsi   |  2 +-
 arch/arm/boot/dts/sun8i-a33-sinlinx-sina33.dts |  3 +
 arch/arm/boot/dts/sun9i-a80-cubieboard4.dts|  6 ++
 arch/arm/boot/dts/sun9i-a80-optimus.dts|  6 ++
 arch/arm/boot/dts/sun9i-a80.dtsi   | 11 +--
 drivers/mmc/host/sunxi-mmc.c   | 98 +++---
 8 files changed, 126 insertions(+), 16 deletions(-)

-- 
2.7.0.rc3



[PATCH RFC 13/15] ARM: dts: sun9i: Include SDC2_RST pin in mmc2_8bit_pins

2016-01-20 Thread Chen-Yu Tsai
mmc2_8bit_pins is used with eMMC chips, which also have a reset pin.
The MMC controller also has a reset output that is supported.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/sun9i-a80.dtsi | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/sun9i-a80.dtsi b/arch/arm/boot/dts/sun9i-a80.dtsi
index f4f61b02be1a..f68b3242b33a 100644
--- a/arch/arm/boot/dts/sun9i-a80.dtsi
+++ b/arch/arm/boot/dts/sun9i-a80.dtsi
@@ -704,7 +704,8 @@
mmc2_8bit_pins: mmc2_8bit {
allwinner,pins = "PC6", "PC7", "PC8", "PC9",
 "PC10", "PC11", "PC12",
-"PC13", "PC14", "PC15";
+"PC13", "PC14", "PC15",
+"PC16";
allwinner,function = "mmc2";
allwinner,drive = ;
allwinner,pull = ;
-- 
2.7.0.rc3



[PATCH RFC 15/15] ARM: dts: sun9i: cubieboard4: Enable hardware reset and HS-DDR for eMMC

2016-01-20 Thread Chen-Yu Tsai
mmc2 has a special pin for eMMC hardware reset, which is controllable
from the controller. Add the "mmc-cap-hw-reset" property to denote that
this controller supports this function, and the pins are actually used.

Also increase the signal drive strength for mmc2 pins, for HS-DDR mode
support.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/sun9i-a80-cubieboard4.dts | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm/boot/dts/sun9i-a80-cubieboard4.dts 
b/arch/arm/boot/dts/sun9i-a80-cubieboard4.dts
index 382bd9fc5647..eb2ccd0a3bd5 100644
--- a/arch/arm/boot/dts/sun9i-a80-cubieboard4.dts
+++ b/arch/arm/boot/dts/sun9i-a80-cubieboard4.dts
@@ -111,9 +111,15 @@
vmmc-supply = <_vcc3v0>;
bus-width = <8>;
non-removable;
+   cap-mmc-hw-reset;
status = "okay";
 };
 
+_8bit_pins {
+   /* Increase drive strength for DDR modes */
+   allwinner,drive = ;
+};
+
 _ir {
status = "okay";
 };
-- 
2.7.0.rc3



[PATCH RFC 14/15] ARM: dts: sun9i: a80-optimus: Enable hardware reset and HS-DDR for eMMC

2016-01-20 Thread Chen-Yu Tsai
mmc2 has a special pin for eMMC hardware reset, which is controllable
from the controller. Add the "mmc-cap-hw-reset" property to denote that
this controller supports this function, and the pins are actually used.

Also increase the signal drive strength for mmc2 pins, for HS-DDR mode
support.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/sun9i-a80-optimus.dts | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm/boot/dts/sun9i-a80-optimus.dts 
b/arch/arm/boot/dts/sun9i-a80-optimus.dts
index c0060e4f7379..958160e40fd0 100644
--- a/arch/arm/boot/dts/sun9i-a80-optimus.dts
+++ b/arch/arm/boot/dts/sun9i-a80-optimus.dts
@@ -174,9 +174,15 @@
vmmc-supply = <_vcc3v0>;
bus-width = <8>;
non-removable;
+   cap-mmc-hw-reset;
status = "okay";
 };
 
+_8bit_pins {
+   /* Increase drive strength for DDR modes */
+   allwinner,drive = ;
+};
+
 _usb1_vbus {
pinctrl-0 = <_vbus_pin_optimus>;
gpio = < 7 4 GPIO_ACTIVE_HIGH>; /* PH4 */
-- 
2.7.0.rc3



[PATCH RFC 04/15] mmc: sunxi: Support vqmmc regulator

2016-01-20 Thread Chen-Yu Tsai
eMMC chips require 2 power supplies, vmmc for internal logic, and vqmmc
for driving output buffers. vqmmc also controls signaling voltage. Most
boards we've seen use the same regulator for both, nevertheless the 2
have different usages, and should be set separately.

This patch adds support for vqmmc regulator supply, including voltage
switching. The MMC core can use this to try different signaling voltages
for eMMC.

Signed-off-by: Chen-Yu Tsai 
---
 drivers/mmc/host/sunxi-mmc.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/drivers/mmc/host/sunxi-mmc.c b/drivers/mmc/host/sunxi-mmc.c
index 0495ae7da6d6..4bec87458317 100644
--- a/drivers/mmc/host/sunxi-mmc.c
+++ b/drivers/mmc/host/sunxi-mmc.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -256,6 +257,9 @@ struct sunxi_mmc_host {
struct mmc_request *mrq;
struct mmc_request *manual_stop_mrq;
int ferror;
+
+   /* vqmmc */
+   boolvqmmc_enabled;
 };
 
 static int sunxi_mmc_reset_host(struct sunxi_mmc_host *host)
@@ -716,6 +720,16 @@ static void sunxi_mmc_set_ios(struct mmc_host *mmc, struct 
mmc_ios *ios)
if (host->ferror)
return;
 
+   if (!IS_ERR(mmc->supply.vqmmc)) {
+   host->ferror = regulator_enable(mmc->supply.vqmmc);
+   if (host->ferror) {
+   dev_err(mmc_dev(mmc),
+   "failed to enable vqmmc\n");
+   return;
+   }
+   host->vqmmc_enabled = true;
+   }
+
host->ferror = sunxi_mmc_init_host(mmc);
if (host->ferror)
return;
@@ -727,6 +741,9 @@ static void sunxi_mmc_set_ios(struct mmc_host *mmc, struct 
mmc_ios *ios)
dev_dbg(mmc_dev(mmc), "power off!\n");
sunxi_mmc_reset_host(host);
mmc_regulator_set_ocr(mmc, mmc->supply.vmmc, 0);
+   if (!IS_ERR(mmc->supply.vqmmc) && host->vqmmc_enabled)
+   regulator_disable(mmc->supply.vqmmc);
+   host->vqmmc_enabled = false;
break;
}
 
@@ -758,6 +775,19 @@ static void sunxi_mmc_set_ios(struct mmc_host *mmc, struct 
mmc_ios *ios)
}
 }
 
+static int sunxi_mmc_volt_switch(struct mmc_host *mmc, struct mmc_ios *ios)
+{
+   /* vqmmc regulator is available */
+   if (!IS_ERR(mmc->supply.vqmmc))
+   return mmc_regulator_set_vqmmc(mmc, ios);
+
+   /* no vqmmc regulator, assume fixed regulator at 3/3.3V */
+   if (mmc->ios.signal_voltage == MMC_SIGNAL_VOLTAGE_330)
+   return 0;
+
+   return -EINVAL;
+}
+
 static void sunxi_mmc_enable_sdio_irq(struct mmc_host *mmc, int enable)
 {
struct sunxi_mmc_host *host = mmc_priv(mmc);
@@ -923,6 +953,7 @@ static struct mmc_host_ops sunxi_mmc_ops = {
.get_ro  = mmc_gpio_get_ro,
.get_cd  = mmc_gpio_get_cd,
.enable_sdio_irq = sunxi_mmc_enable_sdio_irq,
+   .start_signal_voltage_switch = sunxi_mmc_volt_switch,
.hw_reset= sunxi_mmc_hw_reset,
.card_busy   = sunxi_mmc_card_busy,
 };
-- 
2.7.0.rc3



[PATCH RFC 12/15] ARM: dts: sun9i: Use sun9i specific mmc compatible

2016-01-20 Thread Chen-Yu Tsai
sun9i/A80 MMC controllers have a larger FIFO, and the FIFO DMA
trigger levels can be increased. Also, the mmc module clock parent
has a higher clock rate, and the sample and output delay phases
are different.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/sun9i-a80.dtsi | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/boot/dts/sun9i-a80.dtsi b/arch/arm/boot/dts/sun9i-a80.dtsi
index e838f206f2a0..f4f61b02be1a 100644
--- a/arch/arm/boot/dts/sun9i-a80.dtsi
+++ b/arch/arm/boot/dts/sun9i-a80.dtsi
@@ -543,7 +543,7 @@
};
 
mmc0: mmc@01c0f000 {
-   compatible = "allwinner,sun5i-a13-mmc";
+   compatible = "allwinner,sun9i-a80-mmc";
reg = <0x01c0f000 0x1000>;
clocks = <_config_clk 0>, <_clk 0>,
 <_clk 1>, <_clk 2>;
@@ -557,7 +557,7 @@
};
 
mmc1: mmc@01c1 {
-   compatible = "allwinner,sun5i-a13-mmc";
+   compatible = "allwinner,sun9i-a80-mmc";
reg = <0x01c1 0x1000>;
clocks = <_config_clk 1>, <_clk 0>,
 <_clk 1>, <_clk 2>;
@@ -571,7 +571,7 @@
};
 
mmc2: mmc@01c11000 {
-   compatible = "allwinner,sun5i-a13-mmc";
+   compatible = "allwinner,sun9i-a80-mmc";
reg = <0x01c11000 0x1000>;
clocks = <_config_clk 2>, <_clk 0>,
 <_clk 1>, <_clk 2>;
@@ -585,7 +585,7 @@
};
 
mmc3: mmc@01c12000 {
-   compatible = "allwinner,sun5i-a13-mmc";
+   compatible = "allwinner,sun9i-a80-mmc";
reg = <0x01c12000 0x1000>;
clocks = <_config_clk 3>, <_clk 0>,
 <_clk 1>, <_clk 2>;
-- 
2.7.0.rc3



[PATCH RFC 06/15] mmc: sunxi: Support 8 bit eMMC DDR transfer modes

2016-01-20 Thread Chen-Yu Tsai
Allwinner's MMC controller needs to run at double the card clock rate
for 8 bit DDR transfer modes. Interestingly, this is not needed for
4 bit DDR transfers.

Different clock delays are needed for 8 bit eMMC DDR, due to the
increased module clock rate. For the A80 though, the same values for
4 bit and 8 bit are shared. The new values for the other SoCs were from
A83T user manual's "new timing mode" default values, which describes
them in clock phase, rather than delay periods. These values were used
without any modification. They may not be correct, but they work.

Signed-off-by: Chen-Yu Tsai 
---
 drivers/mmc/host/sunxi-mmc.c | 33 ++---
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/drivers/mmc/host/sunxi-mmc.c b/drivers/mmc/host/sunxi-mmc.c
index b403a2433eec..d05928091b34 100644
--- a/drivers/mmc/host/sunxi-mmc.c
+++ b/drivers/mmc/host/sunxi-mmc.c
@@ -215,6 +215,7 @@
 #define SDXC_CLK_25M   1
 #define SDXC_CLK_50M   2
 #define SDXC_CLK_50M_DDR   3
+#define SDXC_CLK_50M_DDR_8BIT  4
 
 struct sunxi_mmc_clk_delay {
u32 output;
@@ -656,11 +657,17 @@ static int sunxi_mmc_clk_set_rate(struct sunxi_mmc_host 
*host,
  struct mmc_ios *ios)
 {
u32 rate, oclk_dly, rval, sclk_dly;
+   u32 clock = ios->clock;
int ret;
 
-   rate = clk_round_rate(host->clk_mmc, ios->clock);
+   /* 8 bit DDR requires a higher module clock */
+   if (ios->timing == MMC_TIMING_MMC_DDR52 &&
+   ios->bus_width == MMC_BUS_WIDTH_8)
+   clock <<= 1;
+
+   rate = clk_round_rate(host->clk_mmc, clock);
dev_dbg(mmc_dev(host->mmc), "setting clk to %d, rounded %d\n",
-   ios->clock, rate);
+   clock, rate);
 
/* setting clock rate */
ret = clk_set_rate(host->clk_mmc, rate);
@@ -677,6 +684,12 @@ static int sunxi_mmc_clk_set_rate(struct sunxi_mmc_host 
*host,
/* clear internal divider */
rval = mmc_readl(host, REG_CLKCR);
rval &= ~0xff;
+   /* set internal divider for 8 bit eMMC DDR, so card clock is right */
+   if (ios->timing == MMC_TIMING_MMC_DDR52 &&
+   ios->bus_width == MMC_BUS_WIDTH_8) {
+   rval |= 1;
+   rate >>= 1;
+   }
mmc_writel(host, REG_CLKCR, rval);
 
/* determine delays */
@@ -687,13 +700,16 @@ static int sunxi_mmc_clk_set_rate(struct sunxi_mmc_host 
*host,
oclk_dly = host->clk_delays[SDXC_CLK_25M].output;
sclk_dly = host->clk_delays[SDXC_CLK_25M].sample;
} else if (rate <= 5000) {
-   if (ios->timing == MMC_TIMING_UHS_DDR50 ||
-   ios->timing == MMC_TIMING_MMC_DDR52) {
-   oclk_dly = host->clk_delays[SDXC_CLK_50M_DDR].output;
-   sclk_dly = host->clk_delays[SDXC_CLK_50M_DDR].sample;
-   } else {
+   if (ios->timing != MMC_TIMING_UHS_DDR50 &&
+   ios->timing != MMC_TIMING_MMC_DDR52) {
oclk_dly = host->clk_delays[SDXC_CLK_50M].output;
sclk_dly = host->clk_delays[SDXC_CLK_50M].sample;
+   } else if (ios->bus_width == MMC_BUS_WIDTH_8) {
+   oclk_dly = 
host->clk_delays[SDXC_CLK_50M_DDR_8BIT].output;
+   sclk_dly = 
host->clk_delays[SDXC_CLK_50M_DDR_8BIT].sample;
+   } else {
+   oclk_dly = host->clk_delays[SDXC_CLK_50M_DDR].output;
+   sclk_dly = host->clk_delays[SDXC_CLK_50M_DDR].sample;
}
} else {
return -EINVAL;
@@ -965,6 +981,8 @@ static const struct sunxi_mmc_clk_delay 
sunxi_mmc_clk_delays[] = {
[SDXC_CLK_25M]  = { .output = 180, .sample =  75 },
[SDXC_CLK_50M]  = { .output =  90, .sample = 120 },
[SDXC_CLK_50M_DDR]  = { .output =  60, .sample = 120 },
+   /* Value from A83T "new timing mode". Works but might not be right. */
+   [SDXC_CLK_50M_DDR_8BIT] = { .output =  90, .sample = 180 },
 };
 
 static const struct sunxi_mmc_clk_delay sun9i_mmc_clk_delays[] = {
@@ -972,6 +990,7 @@ static const struct sunxi_mmc_clk_delay 
sun9i_mmc_clk_delays[] = {
[SDXC_CLK_25M]  = { .output = 180, .sample =  75 },
[SDXC_CLK_50M]  = { .output = 150, .sample = 120 },
[SDXC_CLK_50M_DDR]  = { .output =  90, .sample = 120 },
+   [SDXC_CLK_50M_DDR_8BIT] = { .output =  90, .sample = 120 },
 };
 
 static int sunxi_mmc_resource_request(struct sunxi_mmc_host *host,
-- 
2.7.0.rc3



[PATCH RFC 11/15] ARM: dts: sun8i: sina33: Enable hardware reset and HS-DDR for eMMC

2016-01-20 Thread Chen-Yu Tsai
mmc2 has a special pin for eMMC hardware reset, which is controllable
from the controller. Add the "mmc-cap-hw-reset" property to denote that
this controller supports this function, and the pins are actually used.

Also increase the signal drive strength for mmc2 pins, for HS-DDR mode
support.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/sun8i-a33-sinlinx-sina33.dts | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm/boot/dts/sun8i-a33-sinlinx-sina33.dts 
b/arch/arm/boot/dts/sun8i-a33-sinlinx-sina33.dts
index 13ce68f06dd6..bd2a3beb4629 100644
--- a/arch/arm/boot/dts/sun8i-a33-sinlinx-sina33.dts
+++ b/arch/arm/boot/dts/sun8i-a33-sinlinx-sina33.dts
@@ -109,10 +109,13 @@
vmmc-supply = <_vcc3v0>;
bus-width = <8>;
non-removable;
+   cap-mmc-hw-reset;
status = "okay";
 };
 
 _8bit_pins {
+   /* Increase drive strength for DDR modes */
+   allwinner,drive = ;
/* eMMC is missing pull-ups */
allwinner,pull = ;
 };
-- 
2.7.0.rc3



[PATCH RFC 09/15] ARM: dts: sun6i: sina31s: Switch to mmc3 for onboard eMMC

2016-01-20 Thread Chen-Yu Tsai
According to Allwinner, only mmc3 supports 8 bit DDR transfers for eMMC.
Switch to mmc3 for the onboard eMMC, and also assign vqmmc for signal
voltage sensing/switching, and "cap-mmc-hw-reset" to denote this
instance can use eMMC hardware reset.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/sun6i-a31s-sina31s-core.dtsi | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/sun6i-a31s-sina31s-core.dtsi 
b/arch/arm/boot/dts/sun6i-a31s-sina31s-core.dtsi
index ea69fb8ad4d8..4ec0c8679b2e 100644
--- a/arch/arm/boot/dts/sun6i-a31s-sina31s-core.dtsi
+++ b/arch/arm/boot/dts/sun6i-a31s-sina31s-core.dtsi
@@ -61,12 +61,14 @@
 };
 
 /* eMMC on core board */
- {
+ {
pinctrl-names = "default";
-   pinctrl-0 = <_8bit_emmc_pins>;
+   pinctrl-0 = <_8bit_emmc_pins>;
vmmc-supply = <_dcdc1>;
+   vqmmc-supply = <_dcdc1>;
bus-width = <8>;
non-removable;
+   cap-mmc-hw-reset;
status = "okay";
 };
 
-- 
2.7.0.rc3



[PATCH RFC 10/15] ARM: dts: sun8i: Include SDC2_RST pin in mmc2_8bit_pins

2016-01-20 Thread Chen-Yu Tsai
mmc2_8bit_pins is used with eMMC chips, which also have a reset pin.
The MMC controller also has a reset output that is supported.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/sun8i-a23-a33.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/sun8i-a23-a33.dtsi 
b/arch/arm/boot/dts/sun8i-a23-a33.dtsi
index 6f88fb0ddbc7..7e05e09e61c7 100644
--- a/arch/arm/boot/dts/sun8i-a23-a33.dtsi
+++ b/arch/arm/boot/dts/sun8i-a23-a33.dtsi
@@ -381,7 +381,7 @@
allwinner,pins = "PC5", "PC6", "PC8",
 "PC9", "PC10", "PC11",
 "PC12", "PC13", "PC14",
-"PC15";
+"PC15", "PC16";
allwinner,function = "mmc2";
allwinner,drive = ;
allwinner,pull = ;
-- 
2.7.0.rc3



[PATCH] net:mac80211:mesh_plink: remove redundant sta_info check

2016-01-20 Thread Sunil Shahu
Remove unnecessory "if" statement and club it with previos "if" block.

Signed-off-by: Sunil Shahu 
---
 net/mac80211/mesh_plink.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/mac80211/mesh_plink.c b/net/mac80211/mesh_plink.c
index bd3d55e..e5851ae 100644
--- a/net/mac80211/mesh_plink.c
+++ b/net/mac80211/mesh_plink.c
@@ -976,6 +976,9 @@ mesh_plink_get_event(struct ieee80211_sub_if_data *sdata,
mpl_dbg(sdata, "Mesh plink error: no more free 
plinks\n");
goto out;
}
+   /* new matching peer */
+   event = OPN_ACPT;
+   goto out;
} else {
if (!test_sta_flag(sta, WLAN_STA_AUTH)) {
mpl_dbg(sdata, "Mesh plink: Action frame from 
non-authed peer\n");
@@ -985,12 +988,6 @@ mesh_plink_get_event(struct ieee80211_sub_if_data *sdata,
goto out;
}
 
-   /* new matching peer */
-   if (!sta) {
-   event = OPN_ACPT;
-   goto out;
-   }
-
switch (ftype) {
case WLAN_SP_MESH_PEERING_OPEN:
if (!matches_local)
-- 
1.9.1



[PATCH RFC 08/15] ARM: dts: sun6i: Add mmc3 pins for 8 bit emmc

2016-01-20 Thread Chen-Yu Tsai
mmc2 and mmc3 are available on the same pins, with different mux values.
However, only mmc3 supports 8 bit DDR transfer modes.

Since preference for mmc3 over mmc2 is due to DDR transfer modes, just
set the drive strength to 40mA, which is needed for DDR.

This pinmux setting also includes the hardware reset pin for emmc.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/sun6i-a31.dtsi | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm/boot/dts/sun6i-a31.dtsi b/arch/arm/boot/dts/sun6i-a31.dtsi
index b6ad7850fac6..1867af24ff52 100644
--- a/arch/arm/boot/dts/sun6i-a31.dtsi
+++ b/arch/arm/boot/dts/sun6i-a31.dtsi
@@ -709,6 +709,16 @@
allwinner,pull = ;
};
 
+   mmc3_8bit_emmc_pins: mmc3@1 {
+   allwinner,pins = "PC6", "PC7", "PC8", "PC9",
+"PC10", "PC11", "PC12",
+"PC13", "PC14", "PC15",
+"PC24";
+   allwinner,function = "mmc3";
+   allwinner,drive = ;
+   allwinner,pull = ;
+   };
+
gmac_pins_mii_a: gmac_mii@0 {
allwinner,pins = "PA0", "PA1", "PA2", "PA3",
"PA8", "PA9", "PA11",
-- 
2.7.0.rc3



Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the interrupt is not single-destination

2016-01-20 Thread Yang Zhang

On 2016/1/21 13:07, Wu, Feng wrote:




-Original Message-
From: Yang Zhang [mailto:yang.zhang...@gmail.com]
Sent: Thursday, January 21, 2016 1:00 PM
To: Wu, Feng ; pbonz...@redhat.com;
rkrc...@redhat.com
Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
interrupt is not single-destination

On 2016/1/21 12:42, Wu, Feng wrote:




-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org]

On

Behalf Of Yang Zhang
Sent: Thursday, January 21, 2016 11:35 AM
To: Wu, Feng ; pbonz...@redhat.com;
rkrc...@redhat.com
Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
interrupt is not single-destination

On 2016/1/21 11:14, Wu, Feng wrote:




-Original Message-
From: Yang Zhang [mailto:yang.zhang...@gmail.com]
Sent: Thursday, January 21, 2016 11:06 AM
To: Wu, Feng ; pbonz...@redhat.com;
rkrc...@redhat.com
Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if

the

interrupt is not single-destination

On 2016/1/20 9:42, Feng Wu wrote:

When the interrupt is not single destination any more, we need
to change back IRTE to remapped mode explicitly.

Signed-off-by: Feng Wu 
---
 arch/x86/kvm/vmx.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e2951b6..13d14d4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10764,8 +10764,17 @@ static int vmx_update_pi_irte(struct kvm

*kvm, unsigned int host_irq,

 */

kvm_set_msi_irq(e, );
-   if (!kvm_intr_is_single_vcpu(kvm, , ))
+   if (!kvm_intr_is_single_vcpu(kvm, , )) {
+   /*
+* Make sure the IRTE is in remapped mode if
+* we don't handle it in posted mode.
+*/
+   pi_set_sn(vcpu_to_pi_desc(vcpu));
+   ret = irq_set_vcpu_affinity(host_irq, NULL);
+   pi_clear_sn(vcpu_to_pi_desc(vcpu));
+
continue;
+   }

vcpu_info.pi_desc_addr =

__pa(vcpu_to_pi_desc(vcpu));

vcpu_info.vector = irq.vector;



I am still feel weird with this change: according the semantic of VT-d
posted interrupt, the interrupt will injected to guest through posted
notification and /proc/interrupts shows the same meaning. But now,
without being aware of user, the interrupt changes to legacy way and it
appears on different entry on /proc/interrupts. It looks weird.


I don't think it has problem here, IMO, this is exactly how it works.
There should be different entry for the interrupts in VT-d PI mode
and leagcy mode.


I am not saying any problem here. Just feel weird. From a normal user's
point, he has turned on the VT-d pi and according the semantic of VT-d
pi, he should not observe the interrupt through legacy mode, but now he
do see it. Maybe print out a message here will be helpful, like what you
did for disabled lapic found during irq injection.


Even VT-d PI is on, not all interrupts can be handled by it, the reason the


No, we can handle it but we don't do it due to the complexity.For
example, we can use wake up vector to delivery the interrupt which still
is in PI mode but doesn't require any mode change.


I mean, multi-cast and broadcast interrupts cannot be handled in PI mode.


We may have different understanding on PI mode. My understanding is if 
we set the IRTE to PI format, than the subsequent interrupt will be 
handled in PI mode. multi-cast and broadcast interrupts cannot be 
injected to guest directly but it doesn't mean cannot be handled in PI 
mode. As i said, we can handle it in wake up vector or via other 
approach.But it is much complexity.


I agree that rollback to legacy mode is the best choice, but may need 
some additional messages to tell the user(host administrator) why we 
change to legacy mode. I think not all of them are familiar with the 
detail of VT-d PI. If they find there are still some interrupts goto 
legacy mode even they have turned on PI, they may get confused.







interrupts is changed back to legacy mode is because the user changes
the affinity, and it cannot be handle in PI mode, and hence legacy mode
is used. It is the user's behavior that cause this mode change, seems it is
not so weird to me. But add some message here is good idea, just like


Why user's behavior can change the mode?


Like you mentioned before, if the interrupt is changed from single-destination
to multiple-destination by guest. And this is the reason of adding the rollback
logic here, right?


The user means the host administrator.



Thanks,
Feng


According the current design,
there is no way for user to turn on/off dynamically.Why we need to
rollback to 

[PATCH RFC 03/15] mmc: sunxi: Block signal voltage switching (CMD11)

2016-01-20 Thread Chen-Yu Tsai
Allwinner's mmc controller supports signal voltage switching. This is
supported in code in Allwinner's kernel. However, publicly available
boards all tie it to a fixed 3.0/3.3V regulator, with options to tie
it to 1.8V for eMMC on some.

Since Allwinner's kernel is an ancient 3.4, it is hard to say whether
adapting it's code to a modern mainline kernel would work. Block signal
voltage switching until someone has proper hardware to implement and
test this.

This only affects SD UHS-1 modes, as eMMC switches the voltage directly
without any signaling.

Signed-off-by: Chen-Yu Tsai 
---
 drivers/mmc/host/sunxi-mmc.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/mmc/host/sunxi-mmc.c b/drivers/mmc/host/sunxi-mmc.c
index 790f01662b4e..0495ae7da6d6 100644
--- a/drivers/mmc/host/sunxi-mmc.c
+++ b/drivers/mmc/host/sunxi-mmc.c
@@ -816,6 +816,20 @@ static void sunxi_mmc_request(struct mmc_host *mmc, struct 
mmc_request *mrq)
}
}
 
+   /*
+* TODO Support signal voltage switching
+*
+* Compared to Allwinner's kernel, recent updates in the mmc core
+* mean this should be as easy as setting the flags in cmd_val and
+* imask, and waiting for it to finish. However no boards support
+* this so this cannot be tested. Block it for now.
+*/
+   if (cmd->opcode == SD_SWITCH_VOLTAGE) {
+   mrq->cmd->error = -EPERM;
+   mmc_request_done(mmc, mrq);
+   return;
+   }
+
if (cmd->opcode == MMC_GO_IDLE_STATE) {
cmd_val |= SDXC_SEND_INIT_SEQUENCE;
imask |= SDXC_COMMAND_DONE;
-- 
2.7.0.rc3



[PATCH RFC 01/15] mmc: sunxi: Document host init sequence

2016-01-20 Thread Chen-Yu Tsai
sunxi_mmc_init_host() originated from Allwinner kernel sources. The
magic numbers written to various registers was never documented.

Add comments for values found in Allwinner user manuals.

Signed-off-by: Chen-Yu Tsai 
---
 drivers/mmc/host/sunxi-mmc.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/mmc/host/sunxi-mmc.c b/drivers/mmc/host/sunxi-mmc.c
index 83de82bceafc..cce5ca540857 100644
--- a/drivers/mmc/host/sunxi-mmc.c
+++ b/drivers/mmc/host/sunxi-mmc.c
@@ -284,16 +284,28 @@ static int sunxi_mmc_init_host(struct mmc_host *mmc)
if (sunxi_mmc_reset_host(host))
return -EIO;
 
+   /*
+* Burst 8 transfers, RX trigger level: 7, TX trigger level: 8
+*
+* TODO: sun9i has a larger FIFO and supports higher trigger values
+*/
mmc_writel(host, REG_FTRGL, 0x20070008);
+   /* Maximum timeout value */
mmc_writel(host, REG_TMOUT, 0x);
+   /* Unmask SDIO interrupt if needed */
mmc_writel(host, REG_IMASK, host->sdio_imask);
+   /* Clear all pending interrupts */
mmc_writel(host, REG_RINTR, 0x);
+   /* Debug register? undocumented */
mmc_writel(host, REG_DBGC, 0xdeb);
+   /* Enable CEATA support */
mmc_writel(host, REG_FUNS, SDXC_CEATA_ON);
+   /* Set DMA descriptor list base address */
mmc_writel(host, REG_DLBA, host->sg_dma);
 
rval = mmc_readl(host, REG_GCTRL);
rval |= SDXC_INTERRUPT_ENABLE_BIT;
+   /* Undocumented, but found in Allwinner code */
rval &= ~SDXC_ACCESS_DONE_DIRECT;
mmc_writel(host, REG_GCTRL, rval);
 
-- 
2.7.0.rc3



RE: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2016-01-20 Thread Wu, Feng


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> ow...@vger.kernel.org] On Behalf Of Yang Zhang
> Sent: Thursday, January 21, 2016 1:24 PM
> To: Wu, Feng ; pbonz...@redhat.com;
> rkrc...@redhat.com
> Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
> Subject: Re: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver lowest-
> priority interrupts
> 
> On 2016/1/20 9:42, Feng Wu wrote:
> > Use vector-hashing to deliver lowest-priority interrupts, As an
> > example, modern Intel CPUs in server platform use this method to
> > handle lowest-priority interrupts.
> >
> > Signed-off-by: Feng Wu 
> > ---
> >   bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic
> *src,
> > struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map)
> >   {
> > @@ -727,21 +743,51 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm
> *kvm, struct kvm_lapic *src,
> >
> > dst = map->logical_map[cid];
> >
> > -   if (kvm_lowest_prio_delivery(irq)) {
> > +   if (!kvm_lowest_prio_delivery(irq))
> > +   goto set_irq;
> > +
> > +   if (!kvm_vector_hashing_enabled()) {
> > int l = -1;
> > for_each_set_bit(i, , 16) {
> > if (!dst[i])
> > continue;
> > if (l < 0)
> > l = i;
> > -   else if (kvm_apic_compare_prio(dst[i]->vcpu,
> dst[l]->vcpu) < 0)
> > +   else if (kvm_apic_compare_prio(dst[i]->vcpu,
> > +   dst[l]->vcpu) < 0)
> > l = i;
> > }
> > -
> > bitmap = (l >= 0) ? 1 << l : 0;
> > +   } else {
> > +   int idx = 0;
> > +   unsigned int dest_vcpus = 0;
> > +
> > +   dest_vcpus = hweight16(bitmap);
> > +   if (dest_vcpus == 0)
> > +   goto out;
> > +
> > +   idx = kvm_vector_2_index(irq->vector,
> > +   dest_vcpus, , 16);
> > +
> > +   /*
> > +* We may find a hardware disabled LAPIC here, if
> that
> > +* is the case, print out a error message once for each
> > +* guest and return.
> > +*/
> > +   if (!dst[idx-1] &&
> > +   (kvm->arch.disabled_lapic_found == 0)) {
> > +   kvm->arch.disabled_lapic_found = 1;
> > +   printk(KERN_ERR
> > +   "Disabled LAPIC found during irq
> injection\n");
> > +   goto out;
> 
> What does "goto out" mean? Inject successfully or fail? According the
> value of ret which is set to ture here, it means inject successfully but
> i = -1.
> 

Oh, I didn't notice 'ret' is initialized to true, I thought it was initialized
to false like another function, I should add a "ret = false' here. We should
failed to inject the interrupt since hardware disabled LAPIC is found.

Thanks,
Feng


[PATCH RFC 05/15] mmc: sunxi: Support MMC_DDR52 timing modes

2016-01-20 Thread Chen-Yu Tsai
DDR transfer modes include UHS-1 DDR50 and MMC HS-DDR (or MMC_DDR52).
Consider MMC_DDR52 when setting clock delays.

Signed-off-by: Chen-Yu Tsai 
---
 drivers/mmc/host/sunxi-mmc.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/sunxi-mmc.c b/drivers/mmc/host/sunxi-mmc.c
index 4bec87458317..b403a2433eec 100644
--- a/drivers/mmc/host/sunxi-mmc.c
+++ b/drivers/mmc/host/sunxi-mmc.c
@@ -687,7 +687,8 @@ static int sunxi_mmc_clk_set_rate(struct sunxi_mmc_host 
*host,
oclk_dly = host->clk_delays[SDXC_CLK_25M].output;
sclk_dly = host->clk_delays[SDXC_CLK_25M].sample;
} else if (rate <= 5000) {
-   if (ios->timing == MMC_TIMING_UHS_DDR50) {
+   if (ios->timing == MMC_TIMING_UHS_DDR50 ||
+   ios->timing == MMC_TIMING_MMC_DDR52) {
oclk_dly = host->clk_delays[SDXC_CLK_50M_DDR].output;
sclk_dly = host->clk_delays[SDXC_CLK_50M_DDR].sample;
} else {
@@ -762,7 +763,8 @@ static void sunxi_mmc_set_ios(struct mmc_host *mmc, struct 
mmc_ios *ios)
 
/* set ddr mode */
rval = mmc_readl(host, REG_GCTRL);
-   if (ios->timing == MMC_TIMING_UHS_DDR50)
+   if (ios->timing == MMC_TIMING_UHS_DDR50 ||
+   ios->timing == MMC_TIMING_MMC_DDR52)
rval |= SDXC_DDR_MODE;
else
rval &= ~SDXC_DDR_MODE;
-- 
2.7.0.rc3



[PATCH RFC 02/15] mmc: sunxi: Return error on mmc_regulator_set_ocr() fail in .set_ios op

2016-01-20 Thread Chen-Yu Tsai
Let .set_ios() fail if mmc_regulator_set_ocr() fails to enable and set a
proper voltage for vmmc.

Signed-off-by: Chen-Yu Tsai 
---
 drivers/mmc/host/sunxi-mmc.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/host/sunxi-mmc.c b/drivers/mmc/host/sunxi-mmc.c
index cce5ca540857..790f01662b4e 100644
--- a/drivers/mmc/host/sunxi-mmc.c
+++ b/drivers/mmc/host/sunxi-mmc.c
@@ -711,7 +711,10 @@ static void sunxi_mmc_set_ios(struct mmc_host *mmc, struct 
mmc_ios *ios)
break;
 
case MMC_POWER_UP:
-   mmc_regulator_set_ocr(mmc, mmc->supply.vmmc, ios->vdd);
+   host->ferror = mmc_regulator_set_ocr(mmc, mmc->supply.vmmc,
+ios->vdd);
+   if (host->ferror)
+   return;
 
host->ferror = sunxi_mmc_init_host(mmc);
if (host->ferror)
-- 
2.7.0.rc3



Re: powerpc: Simplify module TOC handling

2016-01-20 Thread Michael Ellerman
On Mon, 2016-18-01 at 00:44:27 UTC, Michael Ellerman wrote:
> From: Alan Modra 
> 
> PowerPC64 uses the symbol .TOC. much as other targets use
> _GLOBAL_OFFSET_TABLE_. It identifies the value of the GOT pointer (or in
> powerpc parlance, the TOC pointer). Global offset tables are generally
> local to an executable or shared library, or in the kernel, module. Thus
> it does not make sense for a module to resolve a relocation against
> .TOC. to the kernel's .TOC. value. A module has its own .TOC., and
> indeed the powerpc64 module relocation processing ignores the kernel
> value of .TOC. and instead calculates a module-local value.
> 
> This patch removes code involved in exporting the kernel .TOC., tweaks
> modpost to ignore an undefined .TOC., and the module loader to twiddle
> the section symbol so that .TOC. isn't seen as undefined.
> 
> Note that if the kernel was compiled with -msingle-pic-base then ELFv2
> would not have function global entry code setting up r2. In that case
> the module call stubs would need to be modified to set up r2 using the
> kernel .TOC. value, requiring some of this code to be reinstated.
> 
> mpe: Furthermore a change in binutils master (not yet released) causes
> the current way we handle the TOC to no longer work when building with
> MODVERSIONS=y and RELOCATABLE=n. The symptom is that modules can not be
> loaded due to there being no version found for TOC.
> 
> Cc: sta...@vger.kernel.org # 3.16+
> Signed-off-by: Alan Modra 
> Signed-off-by: Michael Ellerman 

Applied to powerpc fixes.

https://git.kernel.org/powerpc/c/c153693d7eb9eeb28478aa2dea

cheers


[PATCH RFC 07/15] mmc: sunxi: Enable eMMC HS-DDR (MMC_CAP_1_8V_DDR) support

2016-01-20 Thread Chen-Yu Tsai
Now that clock delay settings for 8 bit DDR are correct, and vqmmc
support is available, we can enable MMC_CAP_1_8V_DDR support. This
enables MMC HS-DDR at up to 52 MHz, even if signal voltage switching
is not available.

Signed-off-by: Chen-Yu Tsai 
---
 drivers/mmc/host/sunxi-mmc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/mmc/host/sunxi-mmc.c b/drivers/mmc/host/sunxi-mmc.c
index d05928091b34..f3a7f36e38c2 100644
--- a/drivers/mmc/host/sunxi-mmc.c
+++ b/drivers/mmc/host/sunxi-mmc.c
@@ -1145,6 +1145,7 @@ static int sunxi_mmc_probe(struct platform_device *pdev)
mmc->f_min  =   40;
mmc->f_max  = 5000;
mmc->caps  |= MMC_CAP_MMC_HIGHSPEED | MMC_CAP_SD_HIGHSPEED |
+ MMC_CAP_1_8V_DDR |
  MMC_CAP_ERASE | MMC_CAP_SDIO_IRQ;
 
ret = mmc_of_parse(mmc);
-- 
2.7.0.rc3



Re: [LKP] [lkp] [spi] 2baed30cb3: BUG: scheduling while atomic: systemd-udevd/134/0x00000002

2016-01-20 Thread Sudip Mukherjee
On Wed, Jan 20, 2016 at 01:00:40PM +0800, Huang, Ying wrote:
> Sudip Mukherjee  writes:
> 
> > On Wed, Jan 20, 2016 at 08:44:37AM +0800, kernel test robot wrote:
> >> FYI, we noticed the below changes on
> >> 
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> >> commit 2baed30cb30727b2637d26eac5a8887875a13420 ("spi: lm70llp: use new 
> >> parport device model")
> >> 
> >> 
> >> ++++
> >> || 74bdced4b4 | 2baed30cb3 |
> >> ++++
> >> | boot_successes | 0  | 0  |
> >> ++++
> >> 
> >> 
> >> 
> >> [6.358390] i6300esb: Intel 6300ESB WatchDog Timer Driver v0.05
> >> [6.358540] i6300esb: cannot register miscdev on minor=130 (err=-16)
> >> [6.358555] i6300ESB timer: probe of :00:06.0 failed with error -16
> >> [6.363357] BUG: scheduling while atomic: systemd-udevd/134/0x0002
> >> [ 6.363366] Modules linked in: crc32c_intel pcspkr evdev i6300esb
> >> ide_cd_mod cdrom intel_agp intel_gtt i2c_piix4 i2c_core virtio_pci
> >> virtio virtio_ring agpgart rtc_cmos(+) parport_pc(+) autofs4
> >> [6.363369] CPU: 1 PID: 134 Comm: systemd-udevd Not tainted 
> >> 4.4.0-rc1-6-g2baed30 #1
> >> [6.363370] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> >> Debian-1.8.2-1 04/01/2014
> >
> > Can you please let me know how do i reproduce this on qemu? what command
> > line you used?
> 
> The command line can be found in the last line of dmesg file, as below.
> 
> qemu-system-x86_64 -enable-kvm -cpu host -kernel 
> /pkg/linux/x86_64-randconfig-a0-01191454/gcc-5/2baed30cb30727b2637d26eac5a8887875a13420/vmlinuz-4.4.0-rc1-6-g2baed30
>  -append 'root=/dev/ram0 user=lkp 
> job=/lkp/scheduled/vm-lkp-wsx03-2G-2/bisect_boot-1-debian-x86_64-2015-02-07.cgz-x86_64-randconfig-a0-01191454-2baed30cb30727b2637d26eac5a8887875a13420-20160119-71002-198dtgm-0.yaml
>  ARCH=x86_64 kconfig=x86_64-randconfig-a0-01191454 
> branch=linux-devel/devel-spot-201601191442 
> commit=2baed30cb30727b2637d26eac5a8887875a13420 
> BOOT_IMAGE=/pkg/linux/x86_64-randconfig-a0-01191454/gcc-5/2baed30cb30727b2637d26eac5a8887875a13420/vmlinuz-4.4.0-rc1-6-g2baed30
>  max_uptime=600 
> RESULT_ROOT=/result/boot/1/vm-lkp-wsx03-2G/debian-x86_64-2015-02-07.cgz/x86_64-randconfig-a0-01191454/gcc-5/2baed30cb30727b2637d26eac5a8887875a13420/0
>  LKP_SERVER=inn earlyprintk=ttyS0,115200 systemd.log_level=err debug 
> apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 
> softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 
> prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal rw 
> ip=vm-lkp-wsx03-2G-2::dhcp'  -initrd /fs/sda1/initrd-vm-lkp-wsx03-2G-2 -m 
> 2048 -smp 2 -device e1000,netdev=net0 -netdev 
> user,id=net0,hostfwd=tcp::23621-:22 -boot order=nc -no-reboot -watchdog 
> i6300esb -rtc base=localtime -drive 
> file=/fs/sda1/disk0-vm-lkp-wsx03-2G-2,media=disk,if=virtio -drive 
> file=/fs/sda1/disk1-vm-lkp-wsx03-2G-2,media=disk,if=virtio -pidfile 
> /dev/shm/kboot/pid-vm-lkp-wsx03-2G-2 -serial 
> file:/dev/shm/kboot/serial-vm-lkp-wsx03-2G-2 -daemonize -display none 
> -monitor null

I am not able to reproduce this. Tested just with the kernel and
yocto-minimal-i386.cgz filesystem and it booted properly.

I guess I need atleast your job file to reproduce this.

regards
sudip


Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-20 Thread Mario Kleiner

On 01/21/2016 04:43 AM, Michel Dänzer wrote:

On 21.01.2016 05:32, Mario Kleiner wrote:


So the problem is that AMDs hardware frame counters reset to
zero during a modeset. The old DRM code dealt with drivers doing that by
keeping vblank irqs enabled during modesets and incrementing vblank
count by one during each vblank irq, i think that's what
drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.


Right, looks like there's been a regression breaking this. I suspect the
problem is that vblank->last isn't getting updated from
drm_vblank_post_modeset. Not sure which change broke that though, or how
to fix it. Ville?



The whole logic has changed and the software counter updates are now 
driven all the time by the hw counter.




BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
vblank counters"). I've been meaning to track that down since then; one
of these days hopefully, but if anybody has any ideas offhand...




I spent the last few hours reading through the drm and radeon code and i 
think what should probably work is to replace the 
drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on 
calls. These are apparently meant for drivers whose hw counters reset 
during modeset, and seem to reinitialize stuff properly and release 
clients queued vblank events to avoid blocking - not tested so far, just 
looked at the code.


Once drm_vblank_off is called, drm_vblank_get will no-op and return an 
error, so clients can't enable vblank irqs during the modeset - pageflip 
ioctl and waitvblank ioctl would fail while a modeset happens - 
hopefully userspace handles this correctly everywhere.


It would also cause radeons power management to not sync its actions to 
vblank if it would get invoked during a modeset, but that seems to be 
handled by a 200 msec timeout and hopefully only cause visual glitches - 
or invisible glitches while the crtc is blanked during modeset?


There could be another tiny race with the new "vblank counter bumping" 
logic from commit 5b5561b ("drm/radeon: Fixup hw vblank counters/ts 
...") if drm_update_vblank_counter() would be called multiple times in 
quick succession within the "radeon_crtc->lb_vblank_lead_lines" 
scanlines before start of real vblank iff at the same time a modeset 
would happen and set radeon_crtc->lb_vblank_lead_lines to a smaller 
value due to a change in horizontal mode resolution. That needs a 
modeset to happen to a higher horizontal resolution just exactly when 
the scanout is in exactly the right 5 or so scanlines and some client is 
calling drm_vblank_get() to enable vblank irqs at the same time, but it 
would cause the same hang if it happened - not that likely to happen 
often, but still not nice, also Murphy's law... If we could switch to 
drm_vblank_off/on instead of drm_vblank_pre/post_modeset we could remove 
those race as well by forbidding any vblank irq related activity during 
a modeset.


I'll hack up a patch for demonstration now.


Re: [PATCH v4 1/4] soc: mediatek: Refine scpsys to support multiple platform

2016-01-20 Thread James Liao
On Wed, 2016-01-20 at 17:14 +0800, Yingjoe Chen wrote:
> On Wed, 2016-01-20 at 14:08 +0800, James Liao wrote:
> > Refine scpsys driver common code to support multiple SoC / platform.
> > 
> > Signed-off-by: James Liao 
> <...>
> > diff --git a/drivers/soc/mediatek/mtk-scpsys.h 
> > b/drivers/soc/mediatek/mtk-scpsys.h
> > new file mode 100644
> > index 000..e435bc3
> > --- /dev/null
> > +++ b/drivers/soc/mediatek/mtk-scpsys.h
> > @@ -0,0 +1,55 @@
> > +#ifndef __DRV_SOC_MTK_H
> > +#define __DRV_SOC_MTK_H
> > +
> > +enum clk_id {
> > +   CLK_NONE,
> > +   CLK_MM,
> > +   CLK_MFG,
> > +   CLK_VENC,
> > +   CLK_VENC_LT,
> > +   CLK_MAX,
> > +};
> > +
> > +#define MAX_CLKS   2
> > +
> > +struct scp_domain_data {
> > +   const char *name;
> > +   u32 sta_mask;
> > +   int ctl_offs;
> > +   u32 sram_pdn_bits;
> > +   u32 sram_pdn_ack_bits;
> > +   u32 bus_prot_mask;
> > +   enum clk_id clk_id[MAX_CLKS];
> > +   bool active_wakeup;
> > +};
> > +
> > +struct scp;
> > +
> > +struct scp_domain {
> > +   struct generic_pm_domain genpd;
> > +   struct scp *scp;
> > +   struct clk *clk[MAX_CLKS];
> > +   u32 sta_mask;
> > +   void __iomem *ctl_addr;
> > +   u32 sram_pdn_bits;
> > +   u32 sram_pdn_ack_bits;
> > +   u32 bus_prot_mask;
> > +   bool active_wakeup;
> > +   struct regulator *supply;
> > +};
> > +
> > +struct scp {
> > +   struct scp_domain *domains;
> > +   struct genpd_onecell_data pd_data;
> > +   struct device *dev;
> > +   void __iomem *base;
> > +   struct regmap *infracfg;
> > +};
> > +
> > +struct scp *init_scp(struct platform_device *pdev,
> > +   const struct scp_domain_data *scp_domain_data, int num);
> > +
> > +void mtk_register_power_domains(struct platform_device *pdev,
> > +   struct scp *scp, int num);
> > +
> > +#endif /* __DRV_SOC_MTK_H */
> 
> After merge, only mtk-scpsys.c will use this file. I think it make sense
> to just include them in mtk-scpsys.c. Also init_scp and
> mtk_register_power_domains can be static now.

Hi Yingjoe,

OK. I can merge this header file into mtk-scpsys.c when we confirmed the
MT8173 + MT2701 implementation is OK.


Hi Matthias,

Do you have suggestions for this implementation that merge MT8173 and
MT2701 support in the same file?


Best regards,

James




Re: [PATCH v4 07/21] usb: dwc2: hcd: fix split transfer schedule sequence

2016-01-20 Thread kbuild test robot
Hi Douglas,

[auto build test ERROR on next-20160120]
[cannot apply to v4.4-rc8 v4.4-rc7 v4.4-rc6 v4.4]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Douglas-Anderson/usb-dwc2-host-Fix-and-speed-up-all-the-stuff-especially-with-splits/20160121-131414
config: x86_64-randconfig-x019-01201142 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/usb/dwc2/core.c: In function 'dwc2_hc_start_transfer':
>> drivers/usb/dwc2/core.c:1876:17: error: 'struct dwc2_hsotg' has no member 
>> named 'split_order'
  >split_order);
^
--
>> /bin/bash: line 0: [: -ge: unary operator expected

vim +1876 drivers/usb/dwc2/core.c

  1870  ec_mc = 3;
  1871  else
  1872  ec_mc = 1;
  1873  
  1874  /* Put ourselves on the list to keep order straight */
  1875  list_move_tail(>split_order_list_entry,
> 1876 >split_order);
  1877  } else {
  1878  if (dbg_hc(chan))
  1879  dev_vdbg(hsotg->dev, "no split\n");

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: linux-next: build failure after merge of the akpm-current tree

2016-01-20 Thread Stephen Rothwell
Hi Sudip,

On Thu, 21 Jan 2016 10:47:09 +0530 Sudip Mukherjee  
wrote:
>
> On Thu, Jan 21, 2016 at 04:11:56PM +1100, Stephen Rothwell wrote:
> > Hi Andrew,
> > 
> > After merging the akpm-current tree, today's linux-next build (arm
> > efm32_defconfig) failed like this:
> > 
> > fs/proc/task_nommu.c:132:28: error: 'mm' undeclared (first use in this 
> > function)
> > 
> > Caused by commit
> > 
> >   e87d4fd02f40 ("proc: revert /proc//maps [stack:TID] annotation")  
> 
> posted a patch for it few minutes ago.
> https://patchwork.kernel.org/patch/8077421/

Thanks, I have added that to the akpm-current tree for tomorrow in case
Andrew does not get around to it.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


Re: [PATCH v3 2/4] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2016-01-20 Thread Yang Zhang

On 2016/1/20 9:42, Feng Wu wrote:

Use vector-hashing to deliver lowest-priority interrupts, As an
example, modern Intel CPUs in server platform use this method to
handle lowest-priority interrupts.

Signed-off-by: Feng Wu 
---
v3:
- Fix a bug for sparse topologies, in that case, vcpu_id is not equal
to the return value got by kvm_get_vcpu().
- Remove unnecessary check in fast irq delivery patch.
- print a error message only once for each guest when we find hardware
   disabled LAPIC during interrupt injection.

  arch/x86/include/asm/kvm_host.h |  2 ++
  arch/x86/kvm/irq_comm.c | 27 +
  arch/x86/kvm/lapic.c| 52 ++---
  arch/x86/kvm/lapic.h|  2 ++
  arch/x86/kvm/x86.c  |  9 +++
  arch/x86/kvm/x86.h  |  1 +
  6 files changed, 85 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 44adbb8..5054810 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -754,6 +754,8 @@ struct kvm_arch {

bool irqchip_split;
u8 nr_reserved_ioapic_pins;
+
+   int disabled_lapic_found;
  };

  struct kvm_vm_stat {
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 8fc89ef..062e907 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -34,6 +34,7 @@
  #include "lapic.h"

  #include "hyperv.h"
+#include "x86.h"

  static int kvm_set_pic_irq(struct kvm_kernel_irq_routing_entry *e,
   struct kvm *kvm, int irq_source_id, int level,
@@ -55,8 +56,10 @@ static int kvm_set_ioapic_irq(struct 
kvm_kernel_irq_routing_entry *e,
  int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq, unsigned long *dest_map)
  {
-   int i, r = -1;
+   int i, r = -1, idx = 0;
struct kvm_vcpu *vcpu, *lowest = NULL;
+   unsigned long dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)];
+   unsigned int dest_vcpus = 0;

if (irq->dest_mode == 0 && irq->dest_id == 0xff &&
kvm_lowest_prio_delivery(irq)) {
@@ -67,6 +70,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, , dest_map))
return r;

+   memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap));
+
kvm_for_each_vcpu(i, vcpu, kvm) {
if (!kvm_apic_present(vcpu))
continue;
@@ -80,13 +85,25 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
r = 0;
r += kvm_apic_set_irq(vcpu, irq, dest_map);
} else if (kvm_lapic_enabled(vcpu)) {
-   if (!lowest)
-   lowest = vcpu;
-   else if (kvm_apic_compare_prio(vcpu, lowest) < 0)
-   lowest = vcpu;
+   if (!kvm_vector_hashing_enabled()) {
+   if (!lowest)
+   lowest = vcpu;
+   else if (kvm_apic_compare_prio(vcpu, lowest) < 
0)
+   lowest = vcpu;
+   } else {
+   __set_bit(i, dest_vcpu_bitmap);
+   dest_vcpus++;
+   }
}
}

+   if (dest_vcpus != 0) {
+   idx = kvm_vector_2_index(irq->vector, dest_vcpus,
+dest_vcpu_bitmap, KVM_MAX_VCPUS);
+
+   lowest = kvm_get_vcpu(kvm, idx - 1);
+   }
+
if (lowest)
r = kvm_apic_set_irq(lowest, irq, dest_map);

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 36591fa..e1a449da 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -675,6 +675,22 @@ bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct 
kvm_lapic *source,
}
  }

+int kvm_vector_2_index(u32 vector, u32 dest_vcpus,
+  const unsigned long *bitmap, u32 bitmap_size)
+{
+   u32 mod;
+   int i, idx = 0;
+
+   mod = vector % dest_vcpus;
+
+   for (i = 0; i <= mod; i++) {
+   idx = find_next_bit(bitmap, bitmap_size, idx) + 1;
+   BUG_ON(idx > bitmap_size);
+   }
+
+   return idx;
+}
+
  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map)
  {
@@ -727,21 +743,51 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, 
struct kvm_lapic *src,

dst = map->logical_map[cid];

-   if (kvm_lowest_prio_delivery(irq)) {
+   if (!kvm_lowest_prio_delivery(irq))
+   goto set_irq;
+
+   if (!kvm_vector_hashing_enabled()) {
int l = -1;
  

Re: tags: Unify emacs and exuberant rules

2016-01-20 Thread Dave Jones
On Wed, Jan 20, 2016 at 06:22:04PM +, Linux Kernel wrote:
 > Web:
 > https://git.kernel.org/torvalds/c/93209d65c1d38f86ffb3f61a1214130b581a9709
 > Commit: 93209d65c1d38f86ffb3f61a1214130b581a9709
 > Parent: a1ccdb63b5535dc3446b0a9efc6d97aca82c72ef
 > Refname:refs/heads/master
 > Author: Michal Marek 
 > AuthorDate: Wed Oct 14 11:48:06 2015 +0200
 > Committer:  Michal Marek 
 > CommitDate: Tue Jan 5 22:18:48 2016 +0100
 > 
 > tags: Unify emacs and exuberant rules
 > 
 > The emacs rules were constantly lagging behind the exuberant ones. Use a
 > single set of rules for both, to make the script easier to maintain.
 > The language understood by both tools is basic regular expression with
 > some limitations, which are documented in a comment. To be able to store
 > the rules in an array and easily iterate over it, the script requires
 > bash now. In the exuberant case, the change fixes some false matches in
 >  and also some too greedy matches in the arguments
 > of the DECLARE_*/DEFINE_* macros. In the emacs case, several previously
 > not working rules are matching now. Tested with these versions of the
 > tools:
 > 
 >   Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
 >   etags (GNU Emacs 24.5)
 > 
 > Signed-off-by: Michal Marek 

Since today, make tags got a lot more noisy for me on Debian unstable
(exuberant-ctags 1:5.9~svn20110310-10)

$ make tags
GEN tags
ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name 
pattern "\1"
ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name 
pattern "\1"
ctags: Warning: kernel/locking/lockdep.c:153: null expansion of name pattern 
"\1"
ctags: Warning: kernel/workqueue.c:307: null expansion of name pattern "\1"
ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1"
ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1"
ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1"
ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1"
ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1"

Looks like it's choking on DEFINE_PER_CPU definitions ?

Dave


Re: [PATCH v4 3/4] soc: mediatek: Add MT2701 power dt-bindings

2016-01-20 Thread James Liao
Hi Yingjoe, Rob,

On Wed, 2016-01-20 at 10:35 -0600, Rob Herring wrote:
> On Wed, Jan 20, 2016 at 05:29:21PM +0800, Yingjoe Chen wrote:
> > On Wed, 2016-01-20 at 14:08 +0800, James Liao wrote:
> > > From: Shunli Wang 
> > > 
> > > Add power dt-bindings for MT2701.
> > > 
> > > Signed-off-by: Shunli Wang 
> > > Signed-off-by: James Liao 
> > > ---
> > >  .../devicetree/bindings/soc/mediatek/scpsys.txt|  6 +++--
> > >  include/dt-bindings/power/mt2701-power.h   | 27 
> > > ++
> > >  2 files changed, 31 insertions(+), 2 deletions(-)
> > >  create mode 100644 include/dt-bindings/power/mt2701-power.h
> > > 
> > > diff --git a/Documentation/devicetree/bindings/soc/mediatek/scpsys.txt 
> > > b/Documentation/devicetree/bindings/soc/mediatek/scpsys.txt
> > > index a6c8afc..807d87f 100644
> > > --- a/Documentation/devicetree/bindings/soc/mediatek/scpsys.txt
> > > +++ b/Documentation/devicetree/bindings/soc/mediatek/scpsys.txt
> > > @@ -9,10 +9,12 @@ domain control.
> > >  
> > >  The driver implements the Generic PM domain bindings described in
> > >  power/power_domain.txt. It provides the power domains defined in
> > > -include/dt-bindings/power/mt8173-power.h.
> > > +include/dt-bindings/power/mt8173-power.h and mt2701-power.h.
> > >  
> > >  Required properties:
> > > -- compatible: Must be "mediatek,mt8173-scpsys"
> > > +- compatible: Should be:
> > > + - "mediatek,mt8173-scpsys"
> > > + - "mediatek,mt2701-scpsys"
> > >  - #power-domain-cells: Must be 1
> > >  - reg: Address range of the SCPSYS unit
> > >  - infracfg: must contain a phandle to the infracfg controller
> > 
> > Please sort the list.
> 
> And s/Should be/Should be one of/

OK. I'll modify it in next patch.


Best regards,

James



[lkp] [rpm] 1b018e0756: INFO: rcu_preempt detected stalls on CPUs/tasks:

2016-01-20 Thread kernel test robot
FYI, we noticed the below changes on

https://github.com/0day-ci/linux 
Zhaoyang-Huang/rpm-refining-the-rpm_suspend-function/20160120-160501
commit 1b018e07564eb530a5f19f1acab2a926f84b ("rpm: refining the rpm_suspend 
function")


++++
|| dc7a021ccc | 1b018e0756 |
++++
| boot_successes | 40 | 0  |
| boot_failures  | 0  | 37 |
| INFO:rcu_preempt_detected_stalls_on_CPUs/tasks | 0  | 13 |
| BUG:kernel_boot_hang   | 0  | 37 |
| backtrace:schedule_timeout | 0  | 2  |
++++



[6.040069] usb usb1: SerialNumber: dummy_hcd.0
[6.041207] hub 1-0:1.0: USB hub found
[6.041745] hub 1-0:1.0: 1 port detected
[  106.049840] INFO: rcu_preempt detected stalls on CPUs/tasks:
[  106.053605] Task dump for CPU 1:
[  106.054024] kworker/1:1 R  running task1436829  2 0x
[  106.054968] Workqueue: pm pm_runtime_work
[  106.055509]  8348aec8 82d568f0  
822b037b
[  106.056527]  0001  8800121029c0 
880012bd50c0
[  106.057553]  88001211c440 88001211c440 8800121029f0 
880012153e48
[  106.058570] Call Trace:
[  106.058897]  [] ? worker_thread+0x28c/0x37e
[  106.059640]  [] ? process_scheduled_works+0x2e/0x2e
[  106.060467]  [] ? kthread+0xf6/0xfe
[  106.061117]  [] ? __kthread_parkme+0x82/0x82
[  106.061869]  [] ? ret_from_fork+0x3f/0x70
[  106.062589]  [] ? __kthread_parkme+0x82/0x82
[  106.063337] rcu_preempt kthread starved for 25002 jiffies! 
g18446744073709551426 c18446744073709551425 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
[  106.064918] rcu_preempt S 8812fd08 14696 7  2 0x
[  106.065868]  8812fd08 88068000 881216c0 
8813
[  106.066886]  880012bcd980 8812fd40 881216c0 
0001
[  106.067911]  8812fd20 81c4d5d7 fffee08c 
8812fdc0
[  106.068923] Call Trace:
[  106.069247]  [] schedule+0x83/0x98
[  106.069890]  [] schedule_timeout+0x144/0x173
[  106.070645]  [] ? cascade+0x47/0x47
[  106.071299]  [] rcu_gp_kthread+0x574/0x8d7
[  106.072033]  [] ? rcu_gp_kthread+0x574/0x8d7
[  106.072787]  [] ? __wake_up_common+0x7c/0x7c
[  106.073537]  [] ? force_qs_rnp+0x164/0x164
[  106.074261]  [] kthread+0xf6/0xfe
[  106.074895]  [] ? __kthread_parkme+0x82/0x82
[  106.075645]  [] ret_from_fork+0x3f/0x70
[  106.076337]  [] ? __kthread_parkme+0x82/0x82
[  406.069839] INFO: rcu_preempt detected stalls on CPUs/tasks:
[  406.073733] Task dump for CPU 1:
[  406.074151] kworker/1:1 R  running task1436829  2 0x
[  406.075103] Workqueue: pm pm_runtime_work
[  406.075641]  8348aec8 82d568f0  
822b037b
[  406.076652]  0001  8800121029c0 
880012bd50c0
[  406.077674]  88001211c440 88001211c440 8800121029f0 
880012153e48
[  406.078677] Call Trace:
[  406.079001]  [] ? worker_thread+0x28c/0x37e
[  406.079741]  [] ? process_scheduled_works+0x2e/0x2e
[  406.080565]  [] ? kthread+0xf6/0xfe
[  406.081710]  [] ? __kthread_parkme+0x82/0x82
[  406.082449]  [] ? ret_from_fork+0x3f/0x70
[  406.083162]  [] ? __kthread_parkme+0x82/0x82
[  406.083914] rcu_preempt kthread starved for 17 jiffies! 
g18446744073709551426 c18446744073709551425 f0x2 RCU_GP_WAIT_FQS(3) 
->state=0x100
[  406.085516] rcu_preempt W 8812fd08 14696 7  2 0x
[  406.086455]  8812fd08 88068000 881216c0 
8813
[  406.087461]  880012bcd980 8812fd40 881216c0 
0001
[  406.088471]  8812fd20 81c4d5d7 fffee08c 
8812fdc0
[  406.089480] Call Trace:
[  406.089804]  [] schedule+0x83/0x98
[  406.090435]  [] schedule_timeout+0x144/0x173
[  406.091180]  [] ? cascade+0x47/0x47
[  406.091832]  [] rcu_gp_kthread+0x574/0x8d7
[  406.092558]  [] ? rcu_gp_kthread+0x574/0x8d7
[  406.093300]  [] ? __wake_up_common+0x7c/0x7c
[  406.094047]  [] ? force_qs_rnp+0x164/0x164
[  406.094779]  [] kthread+0xf6/0xfe
[  406.095398]  [] ? __kthread_parkme+0x82/0x82
[  406.096144]  [] ret_from_fork+0x3f/0x70
[  406.096834]  [] ? __kthread_parkme+0x82/0x82

Elapsed time: 440
BUG: kernel boot hang
qemu-system-x86_64 -enable-kvm -cpu Nehalem -kernel 
/pkg/linux/x86_64-acpi-redef/gcc-5/1b018e07564eb530a5f19f1acab2a926f84b/vmlinuz-4.4.0-03451-g1b018e0
 -append 'root=/dev/ram0 user=lkp 
job=/lkp/scheduled/vm-intel12-yocto-x86_64-7/bisect_boot-1-yocto-minimal-x86_64.cgz-x

[lkp] [mm, vmscan] ab543c4e6c: BUG: unable to handle kernel

2016-01-20 Thread kernel test robot
FYI, we noticed the below changes on

https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma 
mm-vmscan-node-lru-v2r16
commit ab543c4e6ce43e2225a598e2d96f5b0ec04dbd73 ("mm, vmscan: Move LRU lists to 
node")


[7.976872] Loading compiled-in X.509 certificates
[8.057169] Unregister pv shared memory for cpu 0
[8.057169] Unregister pv shared memory for cpu 0
[8.121704] BUG: unable to handle kernel 
[8.121704] BUG: unable to handle kernel paging requestpaging request at 
001d7661
 at 001d7661
[8.123958] IP:
[8.123958] IP: [] cpu_vm_stats_fold+0xb7/0x110
 [] cpu_vm_stats_fold+0xb7/0x110
[8.125952] PGD 0 
[8.125952] PGD 0 

[8.126619] Oops:  [#1] 
[8.126619] Oops:  [#1] PREEMPT PREEMPT SMP SMP DEBUG_PAGEALLOC 
DEBUG_PAGEALLOC 

[8.128395] Modules linked in:
[8.128395] Modules linked in:

[8.129442] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.4.0-08955-gab543c4 #1
[8.129442] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.4.0-08955-gab543c4 #1
[8.131672] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[8.131672] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[8.134471] task: 8829c040 ti: 882a task.ti: 
882a
[8.134471] task: 8829c040 ti: 882a task.ti: 
882a
[8.136803] RIP: 0010:[] 
[8.136803] RIP: 0010:[]  [] 
cpu_vm_stats_fold+0xb7/0x110
 [] cpu_vm_stats_fold+0xb7/0x110
[8.139518] RSP: :882a3bf8  EFLAGS: 00010282
[8.139518] RSP: :882a3bf8  EFLAGS: 00010282
[8.141221] RAX: 8de8e180 RBX: 001d7660 RCX: 001d7661
[8.141221] RAX: 8de8e180 RBX: 001d7660 RCX: 001d7661
[8.143435] RDX: 8de8f848 RSI: 8de8e808 RDI: 8de8ef00
[8.143435] RDX: 8de8f848 RSI: 8de8e808 RDI: 8de8ef00
[8.145694] RBP: 882a3c70 R08: 8de8f888 R09: 0002
[8.145694] RBP: 882a3c70 R08: 8de8f888 R09: 0002
[8.147921] R10:  R11:  R12: 882a3bfc
[8.147921] R10:  R11:  R12: 882a3bfc
[8.150179] R13:  R14: 0001 R15: 8de42f00
[8.150179] R13:  R14: 0001 R15: 8de42f00
[8.152400] FS:  () GS:88001440() 
knlGS:
[8.152400] FS:  () GS:88001440() 
knlGS:
[8.154919] CS:  0010 DS:  ES:  CR0: 8005003b
[8.154919] CS:  0010 DS:  ES:  CR0: 8005003b
[8.156709] CR2: 001d7661 CR3: 0ddbb000 CR4: 06a0
[8.156709] CR2: 001d7661 CR3: 0ddbb000 CR4: 06a0
[8.158977] Stack:
[8.158977] Stack:
[8.159619]  fffe002a3c18
[8.159619]  fffe002a3c18 fffe fffe 
   

[8.164052]  
[8.164052]   0002 0002 
   

[8.166493]  
[8.166493]     
   

[8.168808] Call Trace:
[8.168808] Call Trace:
[8.169543]  [] page_alloc_cpu_notify+0x2d/0x40
[8.169543]  [] page_alloc_cpu_notify+0x2d/0x40
[8.171243]  [] notifier_call_chain+0x92/0xc0
[8.171243]  [] notifier_call_chain+0x92/0xc0
[8.173006]  [] __raw_notifier_call_chain+0x9/0x10
[8.173006]  [] __raw_notifier_call_chain+0x9/0x10
[8.176126]  [] cpu_notify_nofail+0x1e/0x50
[8.176126]  [] cpu_notify_nofail+0x1e/0x50
[8.177753]  [] _cpu_down+0x245/0x330
[8.177753]  [] _cpu_down+0x245/0x330
[8.179201]  [] ? __call_rcu+0x460/0x460
[8.179201]  [] ? __call_rcu+0x460/0x460
[8.182691]  [] ? call_rcu_bh+0x20/0x20
[8.182691]  [] ? call_rcu_bh+0x20/0x20
[8.184357]  [] ? __rcu_read_unlock+0x120/0x120
[8.184357]  [] ? __rcu_read_unlock+0x120/0x120
[8.186075]  [] ? wait_for_common+0x103/0x1b0
[8.186075]  [] ? wait_for_common+0x103/0x1b0
[8.187693]  [] ? __rcu_read_unlock+0x120/0x120
[8.187693]  [] ? __rcu_read_unlock+0x120/0x120
[8.189594]  [] ? wait_for_common+0x103/0x1b0
[8.189594]  [] ? wait_for_common+0x103/0x1b0
[8.191439]  [] cpu_down+0x31/0x50
[8.191439]  [] cpu_down+0x31/0x50
[8.192908]  [] _debug_hotplug_cpu+0x78/0xb0
[8.192908]  [] _debug_hotplug_cpu+0x78/0xb0
[8.194545]  [] ? topology_init+0x39/0x39
[8.194545]  [] ? topology_init+0x39/0x39
[8.196089]  [] debug_hotplug_cpu+0xd/0x11
[8.196089]  [] debug_hotplug_cpu+0xd/0x11
[8.197902]  [] 

Re: [PATCH v4 2/6] dt-bindings: ARM: Mediatek: Document bindings for MT2701

2016-01-20 Thread James Liao
Hi Rob,

On Wed, 2016-01-20 at 10:32 -0600, Rob Herring wrote:
> On Wed, Jan 20, 2016 at 02:35:43PM +0800, James Liao wrote:
> > This patch adds the binding documentation for apmixedsys, bdpsys,
> > ethsys, hifsys, imgsys, infracfg, mmsys, pericfg, topckgen and
> > vdecsys for Mediatek MT2701.
> > 
> > Signed-off-by: James Liao 
> > Tested-by: John Crispin 
> > ---
> >  .../bindings/arm/mediatek/mediatek,apmixedsys.txt  |  1 +
> >  .../bindings/arm/mediatek/mediatek,bdpsys.txt  | 22 
> > ++
> >  .../bindings/arm/mediatek/mediatek,ethsys.txt  | 22 
> > ++
> >  .../bindings/arm/mediatek/mediatek,hifsys.txt  | 22 
> > ++
> >  .../bindings/arm/mediatek/mediatek,imgsys.txt  |  1 +
> >  .../bindings/arm/mediatek/mediatek,infracfg.txt|  1 +
> >  .../bindings/arm/mediatek/mediatek,mmsys.txt   |  1 +
> >  .../bindings/arm/mediatek/mediatek,pericfg.txt |  1 +
> >  .../bindings/arm/mediatek/mediatek,topckgen.txt|  1 +
> >  .../bindings/arm/mediatek/mediatek,vdecsys.txt |  1 +
> >  10 files changed, 73 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt
> >  create mode 100644 
> > Documentation/devicetree/bindings/arm/mediatek/mediatek,ethsys.txt
> >  create mode 100644 
> > Documentation/devicetree/bindings/arm/mediatek/mediatek,hifsys.txt
> > 
> > diff --git 
> > a/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt 
> > b/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt
> > index 936166f..a701e19 100644
> > --- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt
> > +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,apmixedsys.txt
> > @@ -6,6 +6,7 @@ The Mediatek apmixedsys controller provides the PLLs to the 
> > system.
> >  Required Properties:
> >  
> >  - compatible: Should be:
> > +   - "mediatek,mt2701-apmixedsys"
> > - "mediatek,mt8135-apmixedsys"
> > - "mediatek,mt8173-apmixedsys"
> >  - #clock-cells: Must be 1
> > diff --git 
> > a/Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt 
> > b/Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt
> > new file mode 100644
> > index 000..4137196
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,bdpsys.txt
> > @@ -0,0 +1,22 @@
> > +Mediatek bdpsys controller
> > +
> > +
> > +The Mediatek bdpsys controller provides various clocks to the system.
> 
> As you clarified these blocks provide more that just clocks. Please list 
> all the functions here and on the others.

Some blocks may provide clock and reset controller at the same time. But
most of them will not provide functions directly. Instead, some DT
blocks which provide specific functions may refer to these controller
nodes due to it need to access the same register space.

For example, scpsys (the power domain provider) refers to infracfg
because it need to control infracfg registers when power on/off domains:

scpsys: scpsys@10006000 {
compatible = "mediatek,mt2701-scpsys";
#power-domain-cells = <1>;
reg = <0 0x10006000 0 0x1000>;
infracfg = <>;
};

So I think it should not need to list all functions for each blocks
here.

> > +
> > +Required Properties:
> > +
> > +- compatible: Should be:
> > +   - "mediatek,mt2701-bdpsys", "syscon"
> > +- #clock-cells: Must be 1
> > +
> > +The bdpsys controller uses the common clk binding from
> > +Documentation/devicetree/bindings/clock/clock-bindings.txt
> > +The available clocks are defined in dt-bindings/clock/mt*-clk.h.
> > +




  1   2   3   4   5   6   7   8   9   10   >