date:20160725

Re: [PATCH v2 06/14] ARM: sun8i: clk: Add clk-factor rate application method

2016-07-25 Thread Maxime Ripard

On Thu, Jul 21, 2016 at 11:52:15AM +0200, Ondřej Jirman wrote:
> >>> If so, then yes, trying to switch to the 24MHz oscillator before
> >>> applying the factors, and then switching back when the PLL is stable
> >>> would be a nice solution.
> >>>
> >>> I just checked, and all the SoCs we've had so far have that
> >>> possibility, so if it works, for now, I'd like to stick to that.
> >>
> >> It would need to be tested. U-boot does the change only once, while the
> >> kernel would be doing it all the time and between various frequencies
> >> and PLL settings. So the issues may show up with this solution too.
> > 
> > That would have the benefit of being quite easy to document, not be a
> > huge amount of code and it would work on all the CPUs PLLs we have so
> > far, so still, a pretty big win. If it doesn't, of course, we don't
> > really have the choice.
> 
> It's probably more code though. It has to access different register from
> the one that is already defined in dts, which would add a lot of code
> and require dts changes. The original patch I sent is simpler than that.

Why?

You can use container_of to retrieve the parent structure of the clock
notifier, and then you get a ccu_common structure pointer, with the
CCU base address, the clock register, its lock, etc.

Look at what is done in drivers/clk/meson/clk-cpu.c. It's like 20 LoC.

I don't really get why anything should be changed in the DT, or why it
would add a lot of code. Or maybe we're not talking about the same
thing?

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature

Re: [PATCH] staging: ks7010: declare private functions static

2016-07-25 Thread Nicholas Mc Guire

On Tue, Jul 26, 2016 at 08:51:14AM +0200, Wolfram Sang wrote:
> On Tue, Jul 26, 2016 at 06:48:00AM +, Nicholas Mc Guire wrote:
> > On Mon, Jul 25, 2016 at 11:04:18PM +0200, Wolfram Sang wrote:
> > > On Mon, Jul 25, 2016 at 09:22:27PM +0200, Nicholas Mc Guire wrote:
> > > > Private functions in ks_hostif.c can be declared static. 
> > > > 
> > > > Fixes: 13a9930d15b4 ("staging: ks7010: add driver from Nanonote 
> > > > extra-repository")
> > > > 
> > > > Signed-off-by: Nicholas Mc Guire 
> > > 
> > > Reviewed-by: Wolfram Sang 
> > > 
> > > drivers/staging/ks7010/ks7010_sdio.c and
> > > drivers/staging/ks7010/ks_wlan_net.c have similar warnings in case you'd
> > > like to fix those, too.)
> > > 
> > the cases found regarding completion were:
> > ./drivers/staging/ks7010/ks_hostif.c:80 treating signal case as success
> > ./drivers/staging/ks7010/ks_wlan_net.c:109 treating signal case as success
> > ./drivers/staging/ks7010/ks7010_sdio.c:901 treating signal case as success
> > ./drivers/staging/ks7010/ks7010_sdio.c:929 treating signal case as success
> > ./drivers/video/fbdev/exynos/exynos_mipi_dsi_common.c:383 treating signal 
> > case as success
> > ./drivers/video/fbdev/exynos/exynos_mipi_dsi_common.c:247 treating signal 
> > case as success
> > 
> > will be going through all of them in the next days. 
> 
> Awesome, thanks!
> 
> I meant the "should it be static?" sparse warnings here, though :)
> 
well I do run sparse on all the cleanups and if that triggers 
and it is sufficiently clear from context, patches will follow.

thx!
hofrat

Re: [PATCH] s390/perf: fix 'start' address of module's map

2016-07-25 Thread Jiri Olsa

On Fri, Jul 22, 2016 at 10:47:34AM +0800, Songshan Gong wrote:
> Has the patch been accepted by upstream?
> 
> 在 7/21/2016 11:10 AM, Song Shan Gong 写道:
> > At preset, when creating module's map, perf gets 'start' address by parsing
> > '/proc/modules', but it's module base address, isn't the start address of
> > '.text' section. In most archs, it's OK. But for s390, it places 'GOT' and
> > 'PLT' relocations before '.text' section. So there exists an offset between
> > module base address and '.text' section, which will incur wrong symbol
> > resolution for modules.
> > 
> > Fix this bug by getting 'start' address of module's map from parsing
> > '/sys/module/[module name]/sections/.text', not from '/proc/modules'.
> > 
> > Signed-off-by: Song Shan Gong 
> > Acked-by: Jiri Olsa 

I think it's good to go, Arnaldo, could you please take this one?

thanks,
jirka

Re: [PATCH] staging: ks7010: declare private functions static

2016-07-25 Thread Wolfram Sang

On Tue, Jul 26, 2016 at 06:48:00AM +, Nicholas Mc Guire wrote:
> On Mon, Jul 25, 2016 at 11:04:18PM +0200, Wolfram Sang wrote:
> > On Mon, Jul 25, 2016 at 09:22:27PM +0200, Nicholas Mc Guire wrote:
> > > Private functions in ks_hostif.c can be declared static. 
> > > 
> > > Fixes: 13a9930d15b4 ("staging: ks7010: add driver from Nanonote 
> > > extra-repository")
> > > 
> > > Signed-off-by: Nicholas Mc Guire 
> > 
> > Reviewed-by: Wolfram Sang 
> > 
> > drivers/staging/ks7010/ks7010_sdio.c and
> > drivers/staging/ks7010/ks_wlan_net.c have similar warnings in case you'd
> > like to fix those, too.)
> > 
> the cases found regarding completion were:
> ./drivers/staging/ks7010/ks_hostif.c:80 treating signal case as success
> ./drivers/staging/ks7010/ks_wlan_net.c:109 treating signal case as success
> ./drivers/staging/ks7010/ks7010_sdio.c:901 treating signal case as success
> ./drivers/staging/ks7010/ks7010_sdio.c:929 treating signal case as success
> ./drivers/video/fbdev/exynos/exynos_mipi_dsi_common.c:383 treating signal 
> case as success
> ./drivers/video/fbdev/exynos/exynos_mipi_dsi_common.c:247 treating signal 
> case as success
> 
> will be going through all of them in the next days. 

Awesome, thanks!

I meant the "should it be static?" sparse warnings here, though :)



signature.asc
Description: PGP signature

Re: [PATCH] i2c: i801: use IS_ENABLED() instead of checking for built-in or module

2016-07-25 Thread Wolfram Sang

On Thu, Jul 21, 2016 at 12:11:01PM -0400, Javier Martinez Canillas wrote:
> The IS_ENABLED() macro checks if a Kconfig symbol has been enabled either
> built-in or as a module, use that macro instead of open coding the same.
> 
> Using the macro makes the code more readable by helping abstract away some
> of the Kconfig built-in and module enable details.
> 
> Signed-off-by: Javier Martinez Canillas 

Applied to for-next, thanks!



signature.asc
Description: PGP signature

Re: [PATCH] iio: adc: rockchip_saradc: Explicitly disable ADC on probe

2016-07-25 Thread Caesar Wang



On 2016年07月26日 11:22, Guenter Roeck wrote:

On 07/25/2016 07:51 PM, Caesar Wang wrote:

Hi Guenter,

Thanks for fixing it.

On 2016年07月26日 03:39, Guenter Roeck wrote:

If the ADC is read for the first time, the caller gets a timeout error,
and the kernel log shows

read channel() error: -110

The ADC may be enabled on boot, and needs to be explicitly disabled
for a read sequence to work (otherwise there is no completion 
interrupt).

Disaple it explicitly in the probe function.

Fixes: 44d6f2ef94f9 ("iio: adc: add driver for Rockchip saradc")
Signed-off-by: Guenter Roeck 
---
  drivers/iio/adc/rockchip_saradc.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/iio/adc/rockchip_saradc.c 
b/drivers/iio/adc/rockchip_saradc.c

index f9ad6c2d6821..6aa3271d86b5 100644
--- a/drivers/iio/adc/rockchip_saradc.c
+++ b/drivers/iio/adc/rockchip_saradc.c
@@ -280,6 +280,9 @@ static int rockchip_saradc_probe(struct 
platform_device *pdev)

  goto err_pclk;
  }
+/* Make sure ADC is disabled */
+writel_relaxed(0, info->regs + SARADC_CTRL);


I think we should reset the saradc controller.
Since make sure the reset value is 0 and loader-->kernel may even 
cause harm, as my experience on tsadc. 
(drivers/thermal/rockchip_thermal.c)



e.g.:
/**
* Reset SARADC Controller, reset all saradc registers.
*/
static void rockchip_saradc_reset_controller(struct reset_control 
*reset)

{
reset_control_assert(reset);
usleep_range(10, 20);
reset_control_deassert(reset);
}

..probe()
{
...
rockchip_saradc_reset_controller();
...
}



Ok, I'll give it a try.



I posted it on https://patchwork.kernel.org/patch/9247661/



Guenter



-
Caesar


+
  platform_set_drvdata(pdev, indio_dev);
  indio_dev->name = dev_name(&pdev->dev);






___
Linux-rockchip mailing list
linux-rockc...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip



--
caesar wang | software engineer | w...@rock-chip.com

Re: [PATCH] staging: ks7010: declare private functions static

2016-07-25 Thread Nicholas Mc Guire

On Mon, Jul 25, 2016 at 11:04:18PM +0200, Wolfram Sang wrote:
> On Mon, Jul 25, 2016 at 09:22:27PM +0200, Nicholas Mc Guire wrote:
> > Private functions in ks_hostif.c can be declared static. 
> > 
> > Fixes: 13a9930d15b4 ("staging: ks7010: add driver from Nanonote 
> > extra-repository")
> > 
> > Signed-off-by: Nicholas Mc Guire 
> 
> Reviewed-by: Wolfram Sang 
> 
> drivers/staging/ks7010/ks7010_sdio.c and
> drivers/staging/ks7010/ks_wlan_net.c have similar warnings in case you'd
> like to fix those, too.)
> 
the cases found regarding completion were:
./drivers/staging/ks7010/ks_hostif.c:80 treating signal case as success
./drivers/staging/ks7010/ks_wlan_net.c:109 treating signal case as success
./drivers/staging/ks7010/ks7010_sdio.c:901 treating signal case as success
./drivers/staging/ks7010/ks7010_sdio.c:929 treating signal case as success
./drivers/video/fbdev/exynos/exynos_mipi_dsi_common.c:383 treating signal case 
as success
./drivers/video/fbdev/exynos/exynos_mipi_dsi_common.c:247 treating signal case 
as success

will be going through all of them in the next days. 

thx!
hofrat

Re: [PATCH] ceph: Correctly return NXIO errors from ceph_llseek.

2016-07-25 Thread Yan, Zheng


> On Jul 22, 2016, at 01:43, Phil Turnbull  wrote:
> 
> ceph_llseek does not correctly return NXIO errors because the 'out' path
> always returns 'offset'.
> 
> Fixes: 06222e491e66 ("fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's 
> that define their own llseek")
> Signed-off-by: Phil Turnbull 
> ---
> fs/ceph/file.c | 12 +---
> 1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index ce2f5795e44b..13adb5b2ef29 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -1448,16 +1448,14 @@ static loff_t ceph_llseek(struct file *file, loff_t 
> offset, int whence)
> {
>   struct inode *inode = file->f_mapping->host;
>   loff_t i_size;
> - int ret;
> + loff_t ret;
> 
>   inode_lock(inode);
> 
>   if (whence == SEEK_END || whence == SEEK_DATA || whence == SEEK_HOLE) {
>   ret = ceph_do_getattr(inode, CEPH_STAT_CAP_SIZE, false);
> - if (ret < 0) {
> - offset = ret;
> + if (ret < 0)
>   goto out;
> - }
>   }
> 
>   i_size = i_size_read(inode);
> @@ -1473,7 +1471,7 @@ static loff_t ceph_llseek(struct file *file, loff_t 
> offset, int whence)
>* write() or lseek() might have altered it
>*/
>   if (offset == 0) {
> - offset = file->f_pos;
> + ret = file->f_pos;
>   goto out;
>   }
>   offset += file->f_pos;
> @@ -1493,11 +1491,11 @@ static loff_t ceph_llseek(struct file *file, loff_t 
> offset, int whence)
>   break;
>   }
> 
> - offset = vfs_setpos(file, offset, inode->i_sb->s_maxbytes);
> + ret = vfs_setpos(file, offset, inode->i_sb->s_maxbytes);
> 
> out:
>   inode_unlock(inode);
> - return offset;
> + return ret;
> }
> 
> static inline void ceph_zero_partial_page(

applied, thanks

Yan, Zheng

> -- 
> 2.9.0.rc2
>

Re: [GIT PULL] perf changes for v4.8

2016-07-25 Thread Ingo Molnar


* Stephen Rothwell  wrote:

> > > That is why I sent this without mentioning the conflict. Is there any 
> > > other 
> > > complication that I missed?
> > 
> > Actually, the perf tree on its own was enough to trigger the build problem, 
> > the luto-next tree was just what initially triggered the build failure in 
> > linux-next (I guess there is some missing dependency). After the build 
> > failed, 
> > I started including the perf tree directly before the tip tree and the 
> > build 
> > would fail when I merged that ...
> 
> Now that this is fixed and merged into the tip tree, I have removed the perf 
> tree from linux-next.

Ok, thanks - and sorry about this - I'll get the tooling fixes to Linus ASAP.

Thanks,

Ingo

Re: [PATCH 1/1 linux-next] kbuild: add make force=1 for testing

2016-07-25 Thread Robert Jarzmik

Andrew Morton  writes:

> On Sun, 24 Jul 2016 15:28:18 +0200 Fabian Frederick  wrote:
>
>> Commit 51193b76bfff
>> ("kbuild: forbid kernel directory to contain spaces and colons")
>> 
>> makes it impossible to build kernel on default SD labels like
>> "SD Card" for instance.
>> 
>> Makefile:133: *** main directory cannot contain spaces nor colons.  Stop.
>> 
>> User could rename directories but volume name is not always writable.
>> 
>> This patch adds ability to do make force=1 for people
>> not interested in modules_install in this case but only testing.
>> 
>> (Note that other options could go under ifndef force)
>
> That's a bit of a hack on a hack.
>
> 51193b76bfff said:
>
> :When the kernel path contains a space or a colon somewhere in the path
> :name, the modules_install target doesn't work anymore, as the path names
> :are not enclosed in double quotes. It is also supposed that and O= build
> :will suffer from the same weakness as modules_install.
> :
> :Instead of checking and improving kbuild to resist to directories
> :including these characters, error out early to prevent any build if the
> :kernel's main directory contains a space.
>
> What's involved in fixing this properly?  Make the whole kbuild
> system operate correctly when there are spaces/colons in the
> pathname?

I was thinking originally fixing it by :
http://www.spinics.net/lists/linux-kbuild/msg12036.html

This fixed "properly" the make modules_install I think.
And Marek pointed out that there were other cases, such as O=/my dir/ but not
limited to, where it would also break, hence this patch.

I'm not a kbuild expert so I'd like someone else (Marek) to enumerate the
remaining cases not covered by the original patch.

Cheers.

-- 
Robert

Re: [PATCH] staging: ks7010: fix wait_for_completion_interruptible_timeout return handling

2016-07-25 Thread Nicholas Mc Guire

On Mon, Jul 25, 2016 at 10:54:03PM +0200, Wolfram Sang wrote:
> On Mon, Jul 25, 2016 at 09:21:50PM +0200, Nicholas Mc Guire wrote:
> > wait_for_completion_interruptible_timeout return 0 on timeout and 
> > -ERESTARTSYS if interrupted. The check for 
> > !wait_for_completion_interruptible_timeout() would report an interrupt
> > as timeout. Further, while HZ/50 will work most of the time it could 
> 
> Wouldn't it interpret -ERESTARTSYS as *no timeout*?
>

yup - actually the current code just treats the -ERESTARTSYS 
case as success.
 
> Anyway, the plain !0 comparison for me clearly shows that
> 'interruptible' was more copy&pasted then really planned or supported.
> If it was, it would need to cancel something. Also, 20ms is pretty hard
> to cancel for a user ;) Given all that and the troubles we had with
> 'interruptible' in the I2C subsystem, I'd much vote for dropping
> interruptible here.
> 
> > fail for HZ < 50, so this is switched to msecs_to_jiffies(20).
> 
> Rest looks good, thanks!
> 

thx!
hofrat

Re: [PATCH] clocksource: sun4i: Clear interrupts after stopping timer in probe function

2016-07-25 Thread Chen-Yu Tsai

On Tue, Jul 26, 2016 at 1:49 PM, Maxime Ripard
 wrote:
> On Tue, Jul 26, 2016 at 11:01:59AM +0800, Chen-Yu Tsai wrote:
>> The bootloader (U-boot) sometimes uses this timer for various delays.
>> It uses it as a ongoing counter, and does comparisons on the current
>> counter value. The timer counter is never stopped.
>>
>> In some cases when the user interacts with the bootloader, or lets
>> it idle for some time before loading Linux, the timer may expire,
>> and an interrupt will be pending. This results in an unexpected
>> interrupt when the timer interrupt is enabled by the kernel, at
>> which point the event_handler isn't set yet. This results in a NULL
>> pointer dereference exception, panic, and no way to reboot.
>>
>> Clear any pending interrupts after we stop the timer in the probe
>> function to avoid this.
>>
>> Signed-off-by: Chen-Yu Tsai 
>
> Awesome, thanks!
>
> You should put stable in Cc though for this kind of patches.

AFAIK some maintainers prefer to add it themselves. Not sure about
clocksource so I left it out.

ChenYu

[PATCH 1/2] ARM: dts: imx7d: move ARM platform peripherals inside soc node

2016-07-25 Thread Stefan Agner

Since we have a SoC level node we should make use of it and have
all nodes which are within the SoC, inside that node. This also
saves an extra interrupt-parent properties. While at it, also
order the Coresight nodes according to register addresses.

Signed-off-by: Stefan Agner 
---
Hi Shawn,

Not sure if there was a reasoning behind having all these nodes
not within the soc subnode, but it seems to me somewhat uncommon
in the i.MX world...

If possible this patchset should go into v4.8 since 2/2 is a fix,
however, I understand that 1/2 is not really post rc1 material...
What do you think?

--
Stefan

 arch/arm/boot/dts/imx7d.dtsi |  32 ++---
 arch/arm/boot/dts/imx7s.dtsi | 301 +--
 2 files changed, 167 insertions(+), 166 deletions(-)

diff --git a/arch/arm/boot/dts/imx7d.dtsi b/arch/arm/boot/dts/imx7d.dtsi
index 51c13cb..3d77d95 100644
--- a/arch/arm/boot/dts/imx7d.dtsi
+++ b/arch/arm/boot/dts/imx7d.dtsi
@@ -52,23 +52,25 @@
};
};
 
-   etm@3007d000 {
-   compatible = "arm,coresight-etm3x", "arm,primecell";
-   reg = <0x3007d000 0x1000>;
+   soc {
+   etm@3007d000 {
+   compatible = "arm,coresight-etm3x", "arm,primecell";
+   reg = <0x3007d000 0x1000>;
 
-   /*
-* System will hang if added nosmp in kernel command line
-* without arm,primecell-periphid because amba bus try to
-* read id and core1 power off at this time.
-*/
-   arm,primecell-periphid = <0xbb956>;
-   cpu = <&cpu1>;
-   clocks = <&clks IMX7D_MAIN_AXI_ROOT_CLK>;
-   clock-names = "apb_pclk";
+   /*
+* System will hang if added nosmp in kernel command 
line
+* without arm,primecell-periphid because amba bus try 
to
+* read id and core1 power off at this time.
+*/
+   arm,primecell-periphid = <0xbb956>;
+   cpu = <&cpu1>;
+   clocks = <&clks IMX7D_MAIN_AXI_ROOT_CLK>;
+   clock-names = "apb_pclk";
 
-   port {
-   etm1_out_port: endpoint {
-   remote-endpoint = <&ca_funnel_in_port1>;
+   port {
+   etm1_out_port: endpoint {
+   remote-endpoint = <&ca_funnel_in_port1>;
+   };
};
};
};
diff --git a/arch/arm/boot/dts/imx7s.dtsi b/arch/arm/boot/dts/imx7s.dtsi
index 1e90bdb..d89587a 100644
--- a/arch/arm/boot/dts/imx7s.dtsi
+++ b/arch/arm/boot/dts/imx7s.dtsi
@@ -95,16 +95,6 @@
};
};
 
-   intc: interrupt-controller@31001000 {
-   compatible = "arm,cortex-a7-gic";
-   #interrupt-cells = <3>;
-   interrupt-controller;
-   reg = <0x31001000 0x1000>,
- <0x31002000 0x1000>,
- <0x31004000 0x2000>,
- <0x31006000 0x2000>;
-   };
-
ckil: clock-cki {
compatible = "fixed-clock";
#clock-cells = <0>;
@@ -119,195 +109,204 @@
clock-output-names = "osc";
};
 
-   timer {
-   compatible = "arm,armv7-timer";
-   interrupts = ,
-,
-,
-;
+   soc {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   compatible = "simple-bus";
interrupt-parent = <&intc>;
-   };
+   ranges;
 
-   etr@30086000 {
-   compatible = "arm,coresight-tmc", "arm,primecell";
-   reg = <0x30086000 0x1000>;
-   clocks = <&clks IMX7D_MAIN_AXI_ROOT_CLK>;
-   clock-names = "apb_pclk";
+   funnel@30041000 {
+   compatible = "arm,coresight-funnel", "arm,primecell";
+   reg = <0x30041000 0x1000>;
+   clocks = <&clks IMX7D_MAIN_AXI_ROOT_CLK>;
+   clock-names = "apb_pclk";
+
+   ca_funnel_ports: ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   /* funnel input ports */
+   port@0 {
+   reg = <0>;
+   ca_funnel_in_port0: endpoint {
+   slave-mode;
+   remote-endpoint = 
<&etm0_out_port>;
+   };
+   };
+
+   /* funnel output

[PATCH 2/2] ARM: dts: imx7d: fix GIC nodes interrupt and register specification

2016-07-25 Thread Stefan Agner

The i.MX 7 as a GICv2, hence its CPU interface register map (the
second register region) is 8kB long. Add the VGIC maintenance
interrupt which allows to use the new VGIC driver.

Signed-off-by: Stefan Agner 
Suggested-by: Marc Zyngier 
---
 arch/arm/boot/dts/imx7s.dtsi | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/imx7s.dtsi b/arch/arm/boot/dts/imx7s.dtsi
index d89587a..c63591c 100644
--- a/arch/arm/boot/dts/imx7s.dtsi
+++ b/arch/arm/boot/dts/imx7s.dtsi
@@ -292,10 +292,11 @@
 
intc: interrupt-controller@31001000 {
compatible = "arm,cortex-a7-gic";
+   interrupts = ;
#interrupt-cells = <3>;
interrupt-controller;
reg = <0x31001000 0x1000>,
- <0x31002000 0x1000>,
+ <0x31002000 0x2000>,
  <0x31004000 0x2000>,
  <0x31006000 0x2000>;
};
-- 
2.9.0

Re: [GIT PULL] perf changes for v4.8

2016-07-25 Thread Ingo Molnar


* Stephen Rothwell  wrote:

> Hi Linus,
> 
> On Mon, 25 Jul 2016 14:45:53 -0700 Linus Torvalds 
>  wrote:
> >
> > On Mon, Jul 25, 2016 at 2:21 PM, Stephen Rothwell  
> > wrote:
> > >
> > > Actually, the perf tree on its own was enough to trigger the build
> > > problem, the luto-next tree was just what initially triggered the build
> > > failure in linux-next (I guess there is some missing dependency).
> > > After the build failed, I started including the perf tree directly
> > > before the tip tree and the build would fail when I merged that ...  
> > 
> > Ugh. It's merged in my tree now, because I thought it was ok. Can
> > somebody point me to the fix?
> 
> I only affects cross building of the objtool and vdso2c tools (which is
> how I work).  The latest version of the perf/core branch in the tip
> tree now has all the fixes, so I assume that Ingo will send another
> pull request.

Yes, I'll send this ASAP.

> Unfortunately, that means that your tree is broken for me this
> morning ... but I will cope, I guess.

That's weird, I pushed out the fix from Arnaldo yesterday (about 8 hours ago) 
which should merge fine with Linus's tree and make your tooling combination 
work.

Thanks,

Ingo

Re: staging: wilc1000: Reduce scope for a few variables in mac_ioctl()

2016-07-25 Thread SF Markus Elfring

>> -if (strncasecmp(buff, "RSSI", length) == 0) {
>> +if (strncasecmp(buff, "RSSI", 0) == 0) {
>> +s8 rssi;
>> +
> 
> Um, please think a second about if it makes any sense at all to compare 
> zero chars of two strings.

Under which circumstances should the variable "length" contain an other
value than zero?

How can this open issue be fixed better?

Regards,
Markus

RE: [PATCH 4.6 143/203] memory: omap-gpmc: Fix omap gpmc EXTRADELAY timing

2016-07-25 Thread SebastienOcquidant





-
Eaton Industries (France) S.A.S ~ Siège social: 110 Rue Blaise Pascal, Immeuble 
Le Viséo - Bâtiment A Innovallée, 38330, Montbonnot-St.-Martin, France ~ Lieu 
d'enregistrement au registre du commerce: Grenoble ~ Numéro d'enregistrement: 
509 653 176 ~ Capital social souscrit et liberé:€ 16215441 ~ Numéro de TVA: 
FR47509653176
਍
-

-Message d'origine-
De : Greg Kroah-Hartman [mailto:gre...@linuxfoundation.org] 
Envoyé : lundi 25 juillet 2016 22:56
À : linux-kernel@vger.kernel.org
Cc : Greg Kroah-Hartman; sta...@vger.kernel.org; Ocquidant, Sebastien; Roger 
Quadros
Objet : [PATCH 4.6 143/203] memory: omap-gpmc: Fix omap gpmc EXTRADELAY timing

4.6-stable review patch.  If anyone has any objections, please let me know.

--

From: Ocquidant, Sebastien 

commit 8f50b8e57442d28e41bb736c173d8a2490549a82 upstream.

In the omap gpmc driver it can be noticed that GPMC_CONFIG4_OEEXTRADELAY is 
overwritten by the WEEXTRADELAY value from the device tree and 
GPMC_CONFIG4_WEEXTRADELAY is not updated by the value from the device tree.

As a consequence, the memory accesses cannot be configured properly when the 
extra delay are needed for OE and WE.

Fix the update of GPMC_CONFIG4_WEEXTRADELAY with the value from the device tree 
file and prevents GPMC_CONFIG4_OEXTRADELAY being overwritten by the WEXTRADELAY 
value from the device tree.

Signed-off-by: Ocquidant, Sebastien 
Signed-off-by: Roger Quadros 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/memory/omap-gpmc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/memory/omap-gpmc.c
+++ b/drivers/memory/omap-gpmc.c
@@ -394,7 +394,7 @@ static void gpmc_cs_bool_timings(int cs,
gpmc_cs_modify_reg(cs, GPMC_CS_CONFIG4,
   GPMC_CONFIG4_OEEXTRADELAY, p->oe_extra_delay);
gpmc_cs_modify_reg(cs, GPMC_CS_CONFIG4,
-  GPMC_CONFIG4_OEEXTRADELAY, p->we_extra_delay);
+  GPMC_CONFIG4_WEEXTRADELAY, p->we_extra_delay);
gpmc_cs_modify_reg(cs, GPMC_CS_CONFIG6,
   GPMC_CONFIG6_CYCLE2CYCLESAMECSEN,
   p->cycle2cyclesamecsen);



Hi Greg,

OK for me

Sébastien Ocquidant

[PATCH v2] net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()

2016-07-25 Thread Chunhui He

NUD_STALE is used when the caller(e.g. arp_process()) can't guarantee
neighbour reachability. If the entry was NUD_VALID and lladdr is unchanged,
the entry state should not be changed.

Currently the code puts an extra "NUD_CONNECTED" condition. So if old state
was NUD_DELAY or NUD_PROBE (they are NUD_VALID but not NUD_CONNECTED), the
state can be changed to NUD_STALE.

This may cause problem. Because NUD_STALE lladdr doesn't guarantee
reachability, when we send traffic, the state will be changed to
NUD_DELAY. In normal case, if we get no confirmation (by dst_confirm()),
we will change the state to NUD_PROBE and send probe traffic. But now the
state may be reset to NUD_STALE again(e.g. by broadcast ARP packets),
so the probe traffic will not be sent. This situation may happen again and
again, and packets will be sent to an non-reachable lladdr forever.

The fix is to remove the "NUD_CONNECTED" condition. After that the
"NEIGH_UPDATE_F_WEAK_OVERRIDE" condition (used by IPv6) in that branch will
be redundant, so remove it.

This change may increase probe traffic, but it's essential since NUD_STALE
lladdr is unreliable. To ensure correctness, we prefer to resolve lladdr,
when we can't get confirmation, even while remote packets try to set
NUD_STALE state.

Signed-off-by: Chunhui He 
---
v2:
 - change title from "net: neigh: disallow state transition DELAY->STALE in
   neigh_update()"
 - remove "NUD_CONNECTED" condition instead of "NUD_CONNECTED | NUD_DELAY"
 - remove "NEIGH_UPDATE_F_WEAK_OVERRIDE" condition

---
 net/core/neighbour.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 510cd62..ed8c317e 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1060,8 +1060,6 @@ static void neigh_update_hhs(struct neighbour *neigh)
NEIGH_UPDATE_F_WEAK_OVERRIDE will suspect existing "connected"
lladdr instead of overriding it
if it is different.
-   It also allows to retain current state
-   if lladdr is unchanged.
NEIGH_UPDATE_F_ADMINmeans that the change is administrative.
 
NEIGH_UPDATE_F_OVERRIDE_ISROUTER allows to override existing
@@ -1150,10 +1148,7 @@ int neigh_update(struct neighbour *neigh, const u8 
*lladdr, u8 new,
} else
goto out;
} else {
-   if (lladdr == neigh->ha && new == NUD_STALE &&
-   ((flags & NEIGH_UPDATE_F_WEAK_OVERRIDE) ||
-(old & NUD_CONNECTED))
-   )
+   if (lladdr == neigh->ha && new == NUD_STALE)
new = old;
}
}
-- 
2.1.4

[PATCH 1/4] iio: adc: rockchip_saradc: reset saradc controller before programming it

2016-07-25 Thread Caesar Wang

SARADC controller needs to be reset before programming it, otherwise
it will not function properly.

Signed-off-by: Caesar Wang 
Cc: Jonathan Cameron 
Cc: Heiko Stuebner 
Cc: Rob Herring 
Cc: linux-...@vger.kernel.org
Cc: linux-rockc...@lists.infradead.org
---

 .../bindings/iio/adc/rockchip-saradc.txt   |  5 +
 drivers/iio/adc/Kconfig|  1 +
 drivers/iio/adc/rockchip_saradc.c  | 22 ++
 3 files changed, 28 insertions(+)

diff --git a/Documentation/devicetree/bindings/iio/adc/rockchip-saradc.txt 
b/Documentation/devicetree/bindings/iio/adc/rockchip-saradc.txt
index bf99e2f..d2258be 100644
--- a/Documentation/devicetree/bindings/iio/adc/rockchip-saradc.txt
+++ b/Documentation/devicetree/bindings/iio/adc/rockchip-saradc.txt
@@ -13,6 +13,9 @@ Required properties:
 - clocks: Must contain an entry for each entry in clock-names.
 - clock-names: Shall be "saradc" for the converter-clock, and "apb_pclk" for
the peripheral clock.
+- resets: Must contain an entry for each entry in reset-names.
+ See ../reset/reset.txt for details.
+- reset-names: Must include the name "saradc-apb".
 - vref-supply: The regulator supply ADC reference voltage.
 - #io-channel-cells: Should be 1, see ../iio-bindings.txt
 
@@ -23,6 +26,8 @@ Example:
interrupts = ;
clocks = <&cru SCLK_SARADC>, <&cru PCLK_SARADC>;
clock-names = "saradc", "apb_pclk";
+   resets = <&cru SRST_SARADC>;
+   reset-names = "saradc-apb";
#io-channel-cells = <1>;
vref-supply = <&vcc18>;
};
diff --git a/drivers/iio/adc/Kconfig b/drivers/iio/adc/Kconfig
index 1de31bd..7675772 100644
--- a/drivers/iio/adc/Kconfig
+++ b/drivers/iio/adc/Kconfig
@@ -389,6 +389,7 @@ config QCOM_SPMI_VADC
 config ROCKCHIP_SARADC
tristate "Rockchip SARADC driver"
depends on ARCH_ROCKCHIP || (ARM && COMPILE_TEST)
+   depends on RESET_CONTROLLER
help
  Say yes here to build support for the SARADC found in SoCs from
  Rockchip.
diff --git a/drivers/iio/adc/rockchip_saradc.c 
b/drivers/iio/adc/rockchip_saradc.c
index f9ad6c2..2f0e110 100644
--- a/drivers/iio/adc/rockchip_saradc.c
+++ b/drivers/iio/adc/rockchip_saradc.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -53,6 +55,7 @@ struct rockchip_saradc {
struct clk  *clk;
struct completion   completion;
struct regulator*vref;
+   struct reset_control*reset;
const struct rockchip_saradc_data *data;
u16 last_val;
 };
@@ -190,6 +193,16 @@ static const struct of_device_id rockchip_saradc_match[] = 
{
 };
 MODULE_DEVICE_TABLE(of, rockchip_saradc_match);
 
+/**
+ * Reset SARADC Controller.
+ */
+static void rockchip_saradc_reset_controller(struct reset_control *reset)
+{
+   reset_control_assert(reset);
+   usleep_range(10, 20);
+   reset_control_deassert(reset);
+}
+
 static int rockchip_saradc_probe(struct platform_device *pdev)
 {
struct rockchip_saradc *info = NULL;
@@ -218,6 +231,13 @@ static int rockchip_saradc_probe(struct platform_device 
*pdev)
if (IS_ERR(info->regs))
return PTR_ERR(info->regs);
 
+   info->reset = devm_reset_control_get(&pdev->dev, "saradc-apb");
+   if (IS_ERR(info->reset)) {
+   ret = PTR_ERR(info->reset);
+   dev_err(&pdev->dev, "failed to get saradc reset: %d\n", ret);
+   return ret;
+   }
+
init_completion(&info->completion);
 
irq = platform_get_irq(pdev, 0);
@@ -252,6 +272,8 @@ static int rockchip_saradc_probe(struct platform_device 
*pdev)
return PTR_ERR(info->vref);
}
 
+   rockchip_saradc_reset_controller(info->reset);
+
/*
 * Use a default value for the converter clock.
 * This may become user-configurable in the future.
-- 
1.9.1

[PATCH v10 2/7] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time.

2016-07-25 Thread Dou Liyang

From: Gu Zheng 

[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue 
caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is 
established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs 
*attrs){
...
/* if cpumask is contained inside a NUMA node, we belong to that node */
if (wq_numa_enabled) {
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,
   wq_numa_possible_cpumask[node])) {
pool->node = node;
break;
}
}
}

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline 
node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min 
order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
struct worker *worker;

worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, 
useing the wrong node.

..

return worker;
}

[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id) <->   apicid
4. cpuid (logical cpu id) <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> 
pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at 
boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is 
also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for 
other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not 
persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> 
apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following 
steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to 
let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And 
also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping 
when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
   This is also done by introducing an extra parameter to these apis to let the 
caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.

This patch finished step 1.

Signed-off-by: Gu Zheng 
Signed-off-by: Tang Chen 
Signed-off-by: Zhu Guihua 
Signed-off-by: Dou Liyang 
---
 arch/x86/kernel/apic/apic.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 60078a6..8e3c377 100644
--- a/arch/x86/kernel/apic/api

[PATCH resent] w1:omap_hdq: fix regression

2016-07-25 Thread H. Nikolaus Schaller

commit  ("w1: masters: omap_hdq: add support for 1-wire mode")
did add a statement to clear the hdq_irqstatus flags in hdq_read_byte().

If the hdq reading process is scheduled slowly or interrupts are disabled
for a while the hardware read activity might already be finished on entry
of hdq_read_byte(). And hdq_isr() already has set the hdq_irqstatus to
0x6 (can be seen in debug mode) denoting that both, the TXCOMPLETE
and RXCOMPLETE interrupts occurred in parallel.

This means there is no need to wait and the hdq_read_byte() can just read
the byte from the hdq controller.

By resetting hdq_irqstatus to 0 the read process is forced to be always
waiting again (because the if statement always succeeds) but the hardware
will not issue another RXCOMPLETE interrupt. This results in a false
timeout.

After such a situation the hdq bus hangs.

Signed-off-by: H. Nikolaus Schaller 
Cc: sta...@vger.kernel.org
---
 drivers/w1/masters/omap_hdq.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/w1/masters/omap_hdq.c b/drivers/w1/masters/omap_hdq.c
index a2eec97..bb09de6 100644
--- a/drivers/w1/masters/omap_hdq.c
+++ b/drivers/w1/masters/omap_hdq.c
@@ -390,8 +390,6 @@ static int hdq_read_byte(struct hdq_data *hdq_data, u8 *val)
goto out;
}
 
-   hdq_data->hdq_irqstatus = 0;
-
if (!(hdq_data->hdq_irqstatus & OMAP_HDQ_INT_STATUS_RXCOMPLETE)) {
hdq_reg_merge(hdq_data, OMAP_HDQ_CTRL_STATUS,
OMAP_HDQ_CTRL_STATUS_DIR | OMAP_HDQ_CTRL_STATUS_GO,
-- 
2.7.3

[PATCH v10 4/7] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.

2016-07-25 Thread Dou Liyang

From: Gu Zheng 

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 3.

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm(persistent)
2. apicid (physical cpu id)   <->   nodeid (persistent)
3. cpuid (logical cpu id) <->   apicid (not persistent, now persistent 
by step 2)
4. cpuid (logical cpu id) <->   nodeid (not persistent)

So, in order to setup persistent cpuid <-> nodeid mapping for all possible CPUs,
we should:
1. Setup cpuid <-> apicid mapping for all possible CPUs, which has been done in 
step 1, 2.
2. Setup cpuid <-> nodeid mapping for all possible CPUs. But before that, we 
should
   obtain all apicids from MADT.

All processors' apicids can be obtained by _MAT method or from MADT in ACPI.
The current code ignores disabled processors and returns -ENODEV.

After this patch, a new parameter will be added to MADT APIs so that caller
is able to control if disabled processors are ignored.

Signed-off-by: Gu Zheng 
Signed-off-by: Tang Chen 
Signed-off-by: Zhu Guihua 
Signed-off-by: Dou Liyang 
---
 drivers/acpi/acpi_processor.c |  5 +++-
 drivers/acpi/processor_core.c | 57 +++
 2 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index c7ba948..e85b19a 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -300,8 +300,11 @@ static int acpi_processor_get_info(struct acpi_device 
*device)
 *  Extra Processor objects may be enumerated on MP systems with
 *  less than the max # of CPUs. They should be ignored _iff
 *  they are physically not present.
+*
+*  NOTE: Even if the processor has a cpuid, it may not present because
+*  cpuid <-> apicid mapping is persistent now.
 */
-   if (invalid_logical_cpuid(pr->id)) {
+   if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
int ret = acpi_processor_hotadd_init(pr);
if (ret)
return ret;
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 33a38d6..824b98b 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -32,12 +32,12 @@ static struct acpi_table_madt *get_madt_table(void)
 }
 
 static int map_lapic_id(struct acpi_subtable_header *entry,
-u32 acpi_id, phys_cpuid_t *apic_id)
+u32 acpi_id, phys_cpuid_t *apic_id, bool ignore_disabled)
 {
struct acpi_madt_local_apic *lapic =
container_of(entry, struct acpi_madt_local_apic, header);
 
-   if (!(lapic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(lapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (lapic->processor_id != acpi_id)
@@ -48,12 +48,13 @@ static int map_lapic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_x2apic_id(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+   bool ignore_disabled)
 {
struct acpi_madt_local_x2apic *apic =
container_of(entry, struct acpi_madt_local_x2apic, header);
 
-   if (!(apic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(apic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (device_declaration && (apic->uid == acpi_id)) {
@@ -65,12 +66,13 @@ static int map_x2apic_id(struct acpi_subtable_header *entry,
 }
 
 static int map_lsapic_id(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+   int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+   bool ignore_disabled)
 {
struct acpi_madt_local_sapic *lsapic =
container_of(entry, struct acpi_madt_local_sapic, header);
 
-   if (!(lsapic->lapic_flags & ACPI_MADT_ENABLED))
+   if (ignore_disabled && !(lsapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
 
if (device_declaration) {
@@ -87,12 +89,13 @@ static int map_lsapic_id(struct acpi_subtable_header *entry,
  * Retrieve the ARM CPU physical identifier (MPIDR)
  */
 static int map_gicc_mpidr(struct acpi_subtable_header *entry,
-   int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr)
+

[PATCH 3/4] arm64: dts: rockchip: add reset saradc node for rk3368 SoCs

2016-07-25 Thread Caesar Wang

SARADC controller needs to be reset before programming it, otherwise
it will not function properly.

Signed-off-by: Caesar Wang 
---

 arch/arm64/boot/dts/rockchip/rk3368.dtsi | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3368.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3368.dtsi
index d02a9003..4f44d11 100644
--- a/arch/arm64/boot/dts/rockchip/rk3368.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3368.dtsi
@@ -270,6 +270,8 @@
#io-channel-cells = <1>;
clocks = <&cru SCLK_SARADC>, <&cru PCLK_SARADC>;
clock-names = "saradc", "apb_pclk";
+   resets = <&cru SRST_SARADC>;
+   reset-names = "saradc-apb";
status = "disabled";
};
 
-- 
1.9.1

[PATCH v10 6/7] acpi: Provide the mechanism to validate processors in the ACPI tables

2016-07-25 Thread Dou Liyang

[Problem]

When we set cpuid <-> nodeid mapping to be persistent, it will use the DSDT
As we know, the ACPI tables are just like user's input in that respect, and
we don't crash if user's input is unreasonable.

Such as, the mapping of the proc_id and pxm in some machine's ACPI table is
like this:

proc_id   |pxm

0   <-> 0
1   <-> 0
2   <-> 1
3   <-> 1
89  <-> 0
89  <-> 0
89  <-> 0
89  <-> 1
89  <-> 1
89  <-> 2
89  <-> 3
.

We can't be sure which one is correct to the proc_id 89. We may map a wrong
node to a cpu. When pages are allocated, this may cause a kernal panic.

So, we should provide mechanisms to validate the ACPI tables, just like we
do validation to check user's input in web project.

The mechanism is that the processor objects which have the duplicate IDs
are not valid.

[Solution]

We add a validation function, like this:

foreach Processor in DSDT
proc_id= get_ACPI_Processor_number(Processor)
if(the proc_id has alreadly existed )
mark both of them as being unreasonable;

The function will record the unique or duplicate processor IDs.

The duplicate processor IDs such as 89 are regarded as the unreasonable IDs
which mean that the processor objects in question are not valid.

Signed-off-by: Dou Liyang 
---
 drivers/acpi/acpi_processor.c | 79 +++
 1 file changed, 79 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 0c15828..346fbfc 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -581,8 +581,87 @@ static struct acpi_scan_handler 
processor_container_handler = {
.attach = acpi_processor_container_attach,
 };
 
+/* The number of the unique processor IDs */
+static int nr_unique_ids;
+
+/* The number of the duplicate processor IDs */
+static int nr_duplicate_ids;
+
+/* Used to store the unique processor IDs */
+static int unique_processor_ids[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+/* Used to store the duplicate processor IDs */
+static int duplicate_processor_ids[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+static void processor_validated_ids_update(int proc_id)
+{
+   int i;
+
+   if (nr_unique_ids == NR_CPUS||nr_duplicate_ids == NR_CPUS)
+   return;
+
+   /*
+* Firstly, compare the proc_id with duplicate IDs, if the proc_id is
+* already in the IDs, do nothing.
+*/
+   for (i = 0; i < nr_duplicate_ids; i++) {
+   if (duplicate_processor_ids[i] == proc_id)
+   return;
+   }
+
+   /*
+* Secondly, compare the proc_id with unique IDs, if the proc_id is in
+* the IDs, put it in the duplicate IDs.
+*/
+   for (i = 0; i < nr_unique_ids; i++) {
+   if (unique_processor_ids[i] == proc_id) {
+   duplicate_processor_ids[nr_duplicate_ids] = proc_id;
+   nr_duplicate_ids++;
+   return;
+   }
+   }
+
+   /*
+* Lastly, the proc_id is a unique ID, put it in the unique IDs.
+*/
+   unique_processor_ids[nr_unique_ids] = proc_id;
+   nr_unique_ids++;
+}
+
+static acpi_status acpi_processor_ids_walk(acpi_handle handle,
+   u32 lvl,
+   void *context,
+   void **rv)
+{
+   acpi_status status;
+   union acpi_object object = { 0 };
+   struct acpi_buffer buffer = { sizeof(union acpi_object), &object };
+
+   status = acpi_evaluate_object(handle, NULL, NULL, &buffer);
+   if (ACPI_FAILURE(status))
+   acpi_handle_info(handle, "Not get the processor object\n");
+   else
+   processor_validated_ids_update(object.processor.proc_id);
+
+   return AE_OK;
+}
+
+static void acpi_processor_duplication_valiate(void)
+{
+   /* Search all processor nodes in ACPI namespace */
+   acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+   ACPI_UINT32_MAX,
+   acpi_processor_ids_walk,
+   NULL, NULL, NULL);
+}
+
 void __init acpi_processor_init(void)
 {
+   acpi_processor_duplication_valiate();
acpi_scan_add_handler_with_hotplug(&processor_handler, "processor");
acpi_scan_add_handler(&processor_container_handler);
 }
-- 
2.5.5

[PATCH 2/4] arm64: dts: rockchip: add the saradc for rk3399

2016-07-25 Thread Caesar Wang

This patch adds saradc needed information on rk3399 SoCs.

Signed-off-by: Caesar Wang 
---

 arch/arm64/boot/dts/rockchip/rk3399.dtsi | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
index 4c84229..b81f84b 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
@@ -299,6 +299,18 @@
};
};
 
+   saradc: saradc@ff10 {
+   compatible = "rockchip,rk3399-saradc";
+   reg = <0x0 0xff10 0x0 0x100>;
+   interrupts = ;
+   #io-channel-cells = <1>;
+   clocks = <&cru SCLK_SARADC>, <&cru PCLK_SARADC>;
+   clock-names = "saradc", "apb_pclk";
+   resets = <&cru SRST_P_SARADC>;
+   reset-names = "saradc-apb";
+   status = "disabled";
+   };
+
i2c1: i2c@ff11 {
compatible = "rockchip,rk3399-i2c";
reg = <0x0 0xff11 0x0 0x1000>;
-- 
1.9.1

[PATCH v10 5/7] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting.

2016-07-25 Thread Dou Liyang

From: Gu Zheng 

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 4.

This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled
processors at boot time via an additional acpi namespace walk for processors.

Signed-off-by: Gu Zheng 
Signed-off-by: Tang Chen 
Signed-off-by: Zhu Guihua 
Signed-off-by: Dou Liyang 
---
 arch/ia64/kernel/acpi.c   |  3 +-
 arch/x86/kernel/acpi/boot.c   |  4 ++-
 drivers/acpi/acpi_processor.c |  5 
 drivers/acpi/bus.c|  1 +
 drivers/acpi/processor_core.c | 67 +++
 include/linux/acpi.h  |  3 ++
 6 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index b1698bc..bb36515 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
  *  ACPI based hotplug CPU support
  */
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
-static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
/*
@@ -811,6 +811,7 @@ static int acpi_map_cpu2node(acpi_handle handle, int cpu, 
int physid)
 #endif
return 0;
 }
+EXPORT_SYMBOL(acpi_map_cpu2node);
 
 int additional_cpus __initdata = -1;
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 37248c3..0900264f 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -695,7 +695,7 @@ static void __init acpi_set_irq_model_ioapic(void)
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 #include 
 
-static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
int nid;
@@ -706,7 +706,9 @@ static void acpi_map_cpu2node(acpi_handle handle, int cpu, 
int physid)
numa_set_node(cpu, nid);
}
 #endif
+   return 0;
 }
+EXPORT_SYMBOL(acpi_map_cpu2node);
 
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu)
 {
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index e85b19a..0c15828 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -182,6 +182,11 @@ int __weak arch_register_cpu(int cpu)
 
 void __weak arch_unregister_cpu(int cpu) {}
 
+int __weak acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+{
+   return -ENODEV;
+}
+
 static int acpi_processor_hotadd_init(struct acpi_processor *pr)
 {
unsigned long long sta;
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 262ca31..0fe5f54 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1124,6 +1124,7 @@ static int __init acpi_init(void)
acpi_sleep_proc_init();
acpi_wakeup_device_init();
acpi_debugger_init();
+   acpi_set_processor_mapping();
return 0;
 }
 
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 824b98b..e814cd4 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -261,6 +261,73 @@ int acpi_get_cpuid(acpi_handle handle, int type, u32 
acpi_id)
 }
 EXPORT_SYMBOL_GPL(acpi_get_cpuid);
 
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+static bool map_processor(acpi_handle handle, phys_cpuid_t *phys_id, int 
*cpuid)
+{
+   int type;
+   u32 acpi_id;
+   acpi_status status;
+   acpi_object_type acpi_type;
+   unsigned long long tmp;
+   union acpi_object object = { 0 };
+   struct acpi_buffer buffer = { sizeof(union acpi_object), &object };
+
+   status = acpi_get_type(handle, &acpi_type);
+   if (ACPI_FAILURE(status))
+   return false;
+
+   switch (acpi_type) {
+   case ACPI_TYPE_PROCESSOR:
+   status = acpi_evaluate_object(handle, NULL, NULL, &buffer);
+   if (ACPI_FAILURE(status))
+   return false;
+   acpi_id = object.processor.proc_id;
+   break;
+   case ACPI_TYPE_DEVICE:
+   status = acpi_evaluate_integer(handle, "_UID", NULL, &tmp);
+   if (ACPI_FAILURE(status))
+   return false;
+   acpi_id = tmp;
+   break;
+   default:
+   return false;
+   }
+
+   type = (acpi_type == ACPI_TYPE_DEVICE) ? 1 : 0;
+
+   *phys_id = __acpi_get_phys_id(handle, type, acpi_id, false);
+   *cpuid = acpi_map_cpuid(*phys_id, acpi_id);
+   if

[PATCH 4/4] arm: dts: rockchip: add reset node for the exist saradc SoCs

2016-07-25 Thread Caesar Wang

SARADC controller needs to be reset before programming it, otherwise
it will not function properly.

Signed-off-by: Caesar Wang 
---

 arch/arm/boot/dts/rk3066a.dtsi | 2 ++
 arch/arm/boot/dts/rk3288.dtsi  | 2 ++
 arch/arm/boot/dts/rk3xxx.dtsi  | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/arch/arm/boot/dts/rk3066a.dtsi b/arch/arm/boot/dts/rk3066a.dtsi
index c0ba86c..0d0dae3 100644
--- a/arch/arm/boot/dts/rk3066a.dtsi
+++ b/arch/arm/boot/dts/rk3066a.dtsi
@@ -197,6 +197,8 @@
clock-names = "saradc", "apb_pclk";
interrupts = ;
#io-channel-cells = <1>;
+   resets = <&cru SRST_SARADC>;
+   reset-names = "saradc-apb";
status = "disabled";
};
 
diff --git a/arch/arm/boot/dts/rk3288.dtsi b/arch/arm/boot/dts/rk3288.dtsi
index cd33f01..91c4b3c 100644
--- a/arch/arm/boot/dts/rk3288.dtsi
+++ b/arch/arm/boot/dts/rk3288.dtsi
@@ -279,6 +279,8 @@
#io-channel-cells = <1>;
clocks = <&cru SCLK_SARADC>, <&cru PCLK_SARADC>;
clock-names = "saradc", "apb_pclk";
+   resets = <&cru SRST_SARADC>;
+   reset-names = "saradc-apb";
status = "disabled";
};
 
diff --git a/arch/arm/boot/dts/rk3xxx.dtsi b/arch/arm/boot/dts/rk3xxx.dtsi
index 99bbcc2..e2cd683 100644
--- a/arch/arm/boot/dts/rk3xxx.dtsi
+++ b/arch/arm/boot/dts/rk3xxx.dtsi
@@ -399,6 +399,8 @@
#io-channel-cells = <1>;
clocks = <&cru SCLK_SARADC>, <&cru PCLK_SARADC>;
clock-names = "saradc", "apb_pclk";
+   resets = <&cru SRST_SARADC>;
+   reset-names = "saradc-apb";
status = "disabled";
};
 
-- 
1.9.1

[PATCH v10 7/7] acpi: Provide the interface to validate the proc_id

2016-07-25 Thread Dou Liyang

When we want to identify whether the proc_id is unreasonable or not, we
can call the "acpi_processor_validate_proc_id" function. It will search
in the duplicate IDs. If we find the proc_id in the IDs, we return true
to the call function. Conversely, the false represents available.

When we establish all possible cpuid <-> nodeid mapping to handle the
cpu hotplugs, we will use the proc_id from ACPI table.

We do validation when we get the proc_id. If the result is true, we
will stop the mapping.

Signed-off-by: Dou Liyang 
---
 drivers/acpi/acpi_processor.c | 16 
 drivers/acpi/processor_core.c |  4 
 include/linux/acpi.h  |  3 +++
 3 files changed, 23 insertions(+)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 346fbfc..ae6dae9 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -659,6 +659,22 @@ static void acpi_processor_duplication_valiate(void)
NULL, NULL, NULL);
 }
 
+bool acpi_processor_validate_proc_id(int proc_id)
+{
+   int i;
+
+   /*
+* compare the proc_id with duplicate IDs, if the proc_id is already
+* in the duplicate IDs, return true, otherwise, return false.
+*/
+   for (i = 0; i < nr_duplicate_ids; i++) {
+   if (duplicate_processor_ids[i] == proc_id)
+   return true;
+   }
+
+   return false;
+}
+
 void __init acpi_processor_init(void)
 {
acpi_processor_duplication_valiate();
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index e814cd4..830c7ac 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -282,6 +282,10 @@ static bool map_processor(acpi_handle handle, phys_cpuid_t 
*phys_id, int *cpuid)
if (ACPI_FAILURE(status))
return false;
acpi_id = object.processor.proc_id;
+
+   /* validate the acpi_id */
+   if(acpi_processor_validate_proc_id(acpi_id))
+   return false;
break;
case ACPI_TYPE_DEVICE:
status = acpi_evaluate_integer(handle, "_UID", NULL, &tmp);
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 30df63c..11bc794 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -254,6 +254,9 @@ static inline bool invalid_phys_cpuid(phys_cpuid_t phys_id)
return phys_id == PHYS_CPUID_INVALID;
 }
 
+/* Validate the processor object's proc_id */
+bool acpi_processor_validate_proc_id(int proc_id);
+
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu);
-- 
2.5.5

[PATCH v10 0/7] Make cpuid <-> nodeid mapping persistent

2016-07-25 Thread Dou Liyang

[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue 
caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is 
established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs 
*attrs){
...
/* if cpumask is contained inside a NUMA node, we belong to that node */
if (wq_numa_enabled) {
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,
   wq_numa_possible_cpumask[node])) {
pool->node = node;
break;
}
}
}

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline 
node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min 
order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
struct worker *worker;

worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, 
useing the wrong node.

..

return worker;
}


[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id) <->   apicid
4. cpuid (logical cpu id) <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> 
pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at 
boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is 
also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for 
other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not 
persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> 
apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following 
steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to 
let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And 
also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping 
when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
   This is also done by introducing an extra parameter to these apis to let the 
caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.


For previous discussion, please refer to:
https://lkml.org/lkml/2015/2/27/145
https://lkml.org/lkml/2015/3/25/989
https://lkml.org/lkml/2015/5/14/244
https://lkml.org/lkml/2015/7/7/200
https://lkml.org/lkml/2015/9/27/209
https://lkml.org/lkml/2016/5/19/212
https://lkml.org/lkml/2016/7/19/181
https://lkml.org/lkml/2016/7/25/99

Change log v9 -> v10:
1. Providing an empty definition of acpi_set_

[PATCH v10 3/7] x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store persistent cpuid <-> apicid mapping.

2016-07-25 Thread Dou Liyang

From: Gu Zheng 

The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such 
as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' 
apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 2.

In this patch, we introduce a new static array named cpuid_to_apicid[],
which is large enough to store info for all possible cpus.

And then, we modify the cpuid calculation. In generic_processor_info(),
it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
mapping changes with node hotplug.

After this patch, we find the next unused cpuid, map it to an apicid,
and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid
mapping will be persistent.

And finally we will use this array to make cpuid <-> nodeid persistent.

cpuid <-> apicid mapping is established at local apic registeration time.
But non-present or disabled cpus are ignored.

In this patch, we establish all possible cpuid <-> apicid mapping when
registering local apic.

Signed-off-by: Gu Zheng 
Signed-off-by: Tang Chen 
Signed-off-by: Zhu Guihua 
Signed-off-by: Dou Liyang 
---
 arch/x86/include/asm/mpspec.h |  1 +
 arch/x86/kernel/acpi/boot.c   |  6 ++---
 arch/x86/kernel/apic/apic.c   | 61 ---
 3 files changed, 61 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index b07233b..db902d8 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -86,6 +86,7 @@ static inline void early_reserve_e820_mpc_new(void) { }
 #endif
 
 int generic_processor_info(int apicid, int version);
+int __generic_processor_info(int apicid, int version, bool enabled);
 
 #define PHYSID_ARRAY_SIZE  BITS_TO_LONGS(MAX_LOCAL_APIC)
 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 9414f84..37248c3 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -174,15 +174,13 @@ static int acpi_register_lapic(int id, u8 enabled)
return -EINVAL;
}
 
-   if (!enabled) {
+   if (!enabled)
++disabled_cpus;
-   return -EINVAL;
-   }
 
if (boot_cpu_physical_apicid != -1U)
ver = apic_version[boot_cpu_physical_apicid];
 
-   return generic_processor_info(id, ver);
+   return __generic_processor_info(id, ver, enabled);
 }
 
 static int __init
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 8e3c377..366fbbc 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1998,7 +1998,53 @@ void disconnect_bsp_APIC(int virt_wire_setup)
apic_write(APIC_LVT1, value);
 }
 
-static int __generic_processor_info(int apicid, int version, bool enabled)
+/*
+ * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
+ * contiguously, it equals to current allocated max logical CPU ID plus 1.
+ * All allocated CPU ID should be in [0, nr_logical_cpuidi), so the maximum of
+ * nr_logical_cpuids is nr_cpu_ids.
+ *
+ * NOTE: Reserve 0 for BSP.
+ */
+static int nr_logical_cpuids = 1;
+
+/*
+ * Used to store mapping between logical CPU IDs and APIC IDs.
+ */
+static int cpuid_to_apicid[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+/*
+ * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
+ * and cpuid_to_apicid[] synchronized.
+ */
+static int allocate_logical_cpuid(int apicid)
+{
+   int i;
+
+   /*
+* cpuid <-> apicid mapping is persistent, so when a cpu is up,
+* check if the kernel has allocated a cpuid for it.
+*/
+   for (i = 0; i < nr_logical_cpuids; i++) {
+   if (cpuid_to_apicid[i] == apicid)
+   return i;
+   }
+
+   /* Allocate a new cpuid. */
+   if (nr_logical_cpuids >= nr_cpu_ids) {
+   WARN_ONCE(1, "Only %d processors supported."
+"Processor %d/0x%x and the rest are ignored.\n",
+nr_cpu_ids - 1, nr_logical_cpuids, apicid);
+   return -1;
+   }
+
+   cpuid_to_apicid[nr_logical_cpuids] = apicid;
+   return nr_logical_cpuids++;
+}
+
+int __generic_processor_info(int apicid, int version, bool enabled)
 {
int cpu, max = nr_cpu_ids;
bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
@@ -2079,8 +2125,17 @@ static int __generic_processor_info(int apicid, int 
version, bool enabled)
 * for BSP.
 */
cpu = 0;
-   } else
-   cpu = cpumask_next_zero(-1, cpu_present_mask);
+
+   /

[PATCH v10 1/7] x86, memhp, numa: Online memory-less nodes at boot time.

2016-07-25 Thread Dou Liyang

From: Tang Chen 

For now, x86 does not support memory-less node. A node without memory
will not be onlined, and the cpus on it will be mapped to the other
online nodes with memory in init_cpu_to_node(). The reason of doing this
is to ensure each cpu has mapped to a node with memory, so that it will
be able to allocate local memory for that cpu.

But we don't have to do it in this way.

In this series of patches, we are going to construct cpu <-> node mapping
for all possible cpus at boot time, which is a persistent mapping. It means
that the cpu will be mapped to the node which it belongs to, and will never
be changed. If a node has only cpus but no memory, the cpus on it will be
mapped to a memory-less node. And the memory-less node should be onlined.

This patch allocate pgdats for all memory-less nodes and online them at
boot time. Then build zonelists for these nodes. As a result, when cpus
on these memory-less nodes try to allocate memory from local node, it
will automatically fall back to the proper zones in the zonelists.

Signed-off-by: Zhu Guihua 
Signed-off-by: Dou Liyang 
---
 arch/x86/mm/numa.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 9c086c5..2a87a28 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -723,22 +723,19 @@ void __init x86_numa_init(void)
numa_init(dummy_numa_init);
 }
 
-static __init int find_near_online_node(int node)
+static void __init init_memory_less_node(int nid)
 {
-   int n, val;
-   int min_val = INT_MAX;
-   int best_node = -1;
+   unsigned long zones_size[MAX_NR_ZONES] = {0};
+   unsigned long zholes_size[MAX_NR_ZONES] = {0};
 
-   for_each_online_node(n) {
-   val = node_distance(node, n);
+   /* Allocate and initialize node data. Memory-less node is now online.*/
+   alloc_node_data(nid);
+   free_area_init_node(nid, zones_size, 0, zholes_size);
 
-   if (val < min_val) {
-   min_val = val;
-   best_node = n;
-   }
-   }
-
-   return best_node;
+   /*
+* All zonelists will be built later in start_kernel() after per cpu
+* areas are initialized.
+*/
 }
 
 /*
@@ -767,8 +764,10 @@ void __init init_cpu_to_node(void)
 
if (node == NUMA_NO_NODE)
continue;
+
if (!node_online(node))
-   node = find_near_online_node(node);
+   init_memory_less_node(node);
+
numa_set_node(cpu, node);
}
 }
-- 
2.5.5

Re: [PATCH] power:bq27xxx: 27000/10 read FLAGS register as single

2016-07-25 Thread H. Nikolaus Schaller

ping

> Am 18.07.2016 um 18:12 schrieb H. Nikolaus Schaller :
> 
> The bq27000 and bq27010 have a single byte FLAGS register.
> Other gauges have 16 bit FLAGS registers.
> 
> For reading the FLAGS register it is sufficient to read the single
> register instead of reading RSOC at the next higher address as
> well and then ignore the high byte.
> 
> This does not change functionality but optimizes i2c and hdq
> traffic.
> 
> Signed-off-by: H. Nikolaus Schaller 
> ---
> drivers/power/bq27xxx_battery.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/power/bq27xxx_battery.c b/drivers/power/bq27xxx_battery.c
> index 45f6ebf..56712b2 100644
> --- a/drivers/power/bq27xxx_battery.c
> +++ b/drivers/power/bq27xxx_battery.c
> @@ -656,8 +656,9 @@ static bool bq27xxx_battery_dead(struct 
> bq27xxx_device_info *di, u16 flags)
> static int bq27xxx_battery_read_health(struct bq27xxx_device_info *di)
> {
>   int flags;
> + bool has_singe_flag = di->chip == BQ27000 || di->chip == BQ27010;
> 
> - flags = bq27xxx_read(di, BQ27XXX_REG_FLAGS, false);
> + flags = bq27xxx_read(di, BQ27XXX_REG_FLAGS, has_singe_flag);
>   if (flags < 0) {
>   dev_err(di->dev, "error reading flag register:%d\n", flags);
>   return flags;
> @@ -760,7 +761,7 @@ static int bq27xxx_battery_current(struct 
> bq27xxx_device_info *di,
>   }
> 
>   if (di->chip == BQ27000 || di->chip == BQ27010) {
> - flags = bq27xxx_read(di, BQ27XXX_REG_FLAGS, false);
> + flags = bq27xxx_read(di, BQ27XXX_REG_FLAGS, true);
>   if (flags & BQ27000_FLAG_CHGS) {
>   dev_dbg(di->dev, "negative current!\n");
>   curr = -curr;
> -- 
> 2.7.3
>

Re: [PATCH v2 09/10] netns: Add a limit on the number of net namespaces

2016-07-25 Thread Andrei Vagin

On Thu, Jul 21, 2016 at 9:40 AM, Eric W. Biederman
 wrote:
> Signed-off-by: "Eric W. Biederman" 
> ---
>  include/linux/user_namespace.h |  1 +
>  kernel/user_namespace.c|  1 +
>  net/core/net_namespace.c   | 15 +++
>  3 files changed, 17 insertions(+)
>
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 1a3a9cb93277..f86afa536baf 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -27,6 +27,7 @@ enum ucounts {
> UCOUNT_PID_NAMESPACES,
> UCOUNT_UTS_NAMESPACES,
> UCOUNT_IPC_NAMESPACES,
> +   UCOUNT_NET_NAMESPACES,
> UCOUNT_CGROUP_NAMESPACES,
> UCOUNT_COUNTS,
>  };
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 1cf074cb47e2..e326ca722ae0 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -80,6 +80,7 @@ static struct ctl_table userns_table[] = {
> UCOUNT_ENTRY("max_pid_namespaces"),
> UCOUNT_ENTRY("max_uts_namespaces"),
> UCOUNT_ENTRY("max_ipc_namespaces"),
> +   UCOUNT_ENTRY("max_net_namespaces"),
> UCOUNT_ENTRY("max_cgroup_namespaces"),
> { }
>  };
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index 2c2eb1b629b1..a489f192d619 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -266,6 +266,16 @@ struct net *get_net_ns_by_id(struct net *net, int id)
> return peer;
>  }
>
> +static bool inc_net_namespaces(struct user_namespace *ns)
> +{
> +   return inc_ucount(ns, UCOUNT_NET_NAMESPACES);
> +}
> +
> +static void dec_net_namespaces(struct user_namespace *ns)
> +{
> +   dec_ucount(ns, UCOUNT_NET_NAMESPACES);
> +}
> +
>  /*
>   * setup_net runs the initializers for the network namespace object.
>   */
> @@ -276,6 +286,9 @@ static __net_init int setup_net(struct net *net, struct 
> user_namespace *user_ns)
> int error = 0;
> LIST_HEAD(net_exit_list);
>
> +   if (!inc_net_namespaces(user_ns))
> +   return -ENFILE;

I think you need to move this check after initilizing  net->passive.
When setup_net returns an error, net_drop_ns is called:

void net_drop_ns(void *p)
{
struct net *ns = p;
if (ns && atomic_dec_and_test(&ns->passive))
net_free(ns);
}

Actually, I think it would be better to make this check before net_alloc().

> +
> atomic_set(&net->count, 1);
> atomic_set(&net->passive, 1);
> net->dev_base_seq = 1;
> @@ -372,6 +385,7 @@ struct net *copy_net_ns(unsigned long flags,
> }
> mutex_unlock(&net_mutex);
> if (rv < 0) {
> +   dec_net_namespaces(user_ns);
> put_user_ns(user_ns);
> net_drop_ns(net);
> return ERR_PTR(rv);
> @@ -444,6 +458,7 @@ static void cleanup_net(struct work_struct *work)
> /* Finally it is safe to free my network namespace structure */
> list_for_each_entry_safe(net, tmp, &net_exit_list, exit_list) {
> list_del_init(&net->exit_list);
> +   dec_net_namespaces(net->user_ns);
> put_user_ns(net->user_ns);
> net_drop_ns(net);
> }
> --
> 2.8.3
>
> ___
> Containers mailing list
> contain...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

linux-next: Tree for Jul 26

2016-07-25 Thread Stephen Rothwell

Hi all,

Please do not add material destined for v4.9 to your linux-next included
branches until after v4.8-rc1 has been released.

Changes since 20160725:

New tree: random
Removed tree: perf (problem solved and merged)

My fixes tree contains:

  22065b8b8dc5 Merge branch 'perf/core' of ../../tip
  70ca58970f4a staging: emxx_udc: allow modular build

The powerpc tree still had its build failure for which I applied a fix patch.

The xen-tip tree gained conflicts against the tip tree.

The random tree gained a conflict against the kspp tree.

Non-merge commits (relative to Linus' tree): 9990
 9049 files changed, 523915 insertions(+), 181557 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 240 trees (counting Linus' and 35 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (766fd5f6cdaf Merge branch 'timers-nohz-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
Merging fixes/master (22065b8b8dc5 Merge branch 'perf/core' of ../../tip)
Merging kbuild-current/rc-fixes (b36fad65d61f kbuild: Initialize exported 
variables)
Merging arc-current/for-curr (9bd54517ee86 arc: unwind: warn only once if 
DW2_UNWIND is disabled)
Merging arm-current/fixes (f6492164ecb1 ARM: 8577/1: Fix Cortex-A15 798181 
errata initialization)
Merging m68k-current/for-linus (6bd80f372371 m68k/defconfig: Update defconfigs 
for v4.7-rc2)
Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached 
build errors)
Merging powerpc-fixes/fixes (bfa37087aa04 powerpc: Initialise pci_io_base as 
early as possible)
Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2)
Merging sparc/master (6b15d6650c53 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging net/master (107df03203bb Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging ipsec/master (1ba5bf993c6a xfrm: fix crash in XFRM_MSG_GETSA netlink 
handler)
Merging netfilter/master (ea43f860d984 Merge branch 'ethoc-fixes')
Merging ipvs/master (ea43f860d984 Merge branch 'ethoc-fixes')
Merging wireless-drivers/master (034fdd4a17ff Merge ath-current from ath.git)
Merging mac80211/master (16a910a6722b cfg80211: handle failed skb allocation)
Merging sound-current/for-linus (cf81d6b58344 Merge branch 'for-next' into 
for-linus)
Merging pci-current/for-linus (ef0dab4aae14 PCI: Fix unaligned accesses in VC 
code)
Merging driver-core.current/driver-core-linus (523d939ef98f Linux 4.7)
Merging tty.current/tty-linus (a99cde438de0 Linux 4.7-rc6)
Merging usb.current/usb-linus (b7545b79a169 Merge tag 'usb-4.8-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb)
Merging usb-gadget-fixes/fixes (50c763f8c1ba usb: dwc3: Set the ClearPendIN bit 
on Clear Stall EP command)
Merging usb-serial-fixes/usb-linus (4c2e07c6a29e Linux 4.7-rc5)
Merging usb-chipidea-fixes/ci-for-usb-stable (ea1d39a31d3b usb: common: 
otg-fsm: add license to usb-otg-fsm)
Merging staging.current/staging-linus (a99cde438de0 Linux 4.7-rc6)
Merging char-misc.current/char-misc-linus (dd9506954539 Merge tag 
'hwmon-for-linus-v4.8' of 
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging)
Merging input-current/for-linus (e9003c9cfaa1 Input: tsc200x - report proper 
input_dev name)
Merging crypto-

Re: [PATCH] clocksource: sun4i: Clear interrupts after stopping timer in probe function

2016-07-25 Thread Maxime Ripard

On Tue, Jul 26, 2016 at 11:01:59AM +0800, Chen-Yu Tsai wrote:
> The bootloader (U-boot) sometimes uses this timer for various delays.
> It uses it as a ongoing counter, and does comparisons on the current
> counter value. The timer counter is never stopped.
> 
> In some cases when the user interacts with the bootloader, or lets
> it idle for some time before loading Linux, the timer may expire,
> and an interrupt will be pending. This results in an unexpected
> interrupt when the timer interrupt is enabled by the kernel, at
> which point the event_handler isn't set yet. This results in a NULL
> pointer dereference exception, panic, and no way to reboot.
> 
> Clear any pending interrupts after we stop the timer in the probe
> function to avoid this.
> 
> Signed-off-by: Chen-Yu Tsai 

Awesome, thanks!

You should put stable in Cc though for this kind of patches.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature

[PATCH] USB: appledisplay: Remove deprecated create_singlethread_workqueue

2016-07-25 Thread Bhaktipriya Shridhar

The workqueue "wq" is involved in controlling the brightness of an
Apple Cinema Display over USB.

It has a single work item(&pdata->work) per appledisplay and hence
doesn't require ordering. Also, it is not being used on a memory
reclaim path.

Hence, the singlethreaded workqueue has been replaced with the use of
system_wq.

System workqueues have been able to handle high level of concurrency
for a long time now and hence it's not required to have a singlethreaded
workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue
created with create_singlethread_workqueue(), system_wq allows multiple
work items to overlap executions even on the same CPU; however, a
per-cpu workqueue doesn't have any CPU locality or global ordering
guarantee unless the target CPU is explicitly specified and thus the
increase of local concurrency shouldn't make any difference.

The work item is self-requeueing and needs to wait for the in-flight
work item to finish before proceeding with destruction.
Hence, it has been sync cancelled in appledisplay_disconnect().
This also ensures that there are no pending tasks while disconnecting the
driver.

Signed-off-by: Bhaktipriya Shridhar 
---
 drivers/usb/misc/appledisplay.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/usb/misc/appledisplay.c b/drivers/usb/misc/appledisplay.c
index a0a3827..c760455 100644
--- a/drivers/usb/misc/appledisplay.c
+++ b/drivers/usb/misc/appledisplay.c
@@ -85,7 +85,6 @@ struct appledisplay {
 };

 static atomic_t count_displays = ATOMIC_INIT(0);
-static struct workqueue_struct *wq;

 static void appledisplay_complete(struct urb *urb)
 {
@@ -122,7 +121,7 @@ static void appledisplay_complete(struct urb *urb)
case ACD_BTN_BRIGHT_UP:
case ACD_BTN_BRIGHT_DOWN:
pdata->button_pressed = 1;
-   queue_delayed_work(wq, &pdata->work, 0);
+   schedule_delayed_work(&pdata->work, 0);
break;
case ACD_BTN_NONE:
default:
@@ -159,7 +158,7 @@ static int appledisplay_bl_update_status(struct 
backlight_device *bd)
pdata->msgdata, 2,
ACD_USB_TIMEOUT);
mutex_unlock(&pdata->sysfslock);
-
+
return retval;
 }

@@ -344,7 +343,7 @@ static void appledisplay_disconnect(struct usb_interface 
*iface)

if (pdata) {
usb_kill_urb(pdata->urb);
-   cancel_delayed_work(&pdata->work);
+   cancel_delayed_work_sync(&pdata->work);
backlight_device_unregister(pdata->bd);
usb_free_coherent(pdata->udev, ACD_URB_BUFFER_LEN,
pdata->urbdata, pdata->urb->transfer_dma);
@@ -365,19 +364,11 @@ static struct usb_driver appledisplay_driver = {

 static int __init appledisplay_init(void)
 {
-   wq = create_singlethread_workqueue("appledisplay");
-   if (!wq) {
-   printk(KERN_ERR "appledisplay: Could not create work queue\n");
-   return -ENOMEM;
-   }
-
return usb_register(&appledisplay_driver);
 }

 static void __exit appledisplay_exit(void)
 {
-   flush_workqueue(wq);
-   destroy_workqueue(wq);
usb_deregister(&appledisplay_driver);
 }

--
2.1.4

Re: [PATCH] xen/x86: Define stubs for xen_smp_intr_init/xen_smp_intr_free

2016-07-25 Thread Juergen Gross

On 25/07/16 23:14, Boris Ostrovsky wrote:
> xen_smp_intr_init() and xen_smp_intr_free() are now called from
> enlighten.c and therefore not guaranteed to have CONFIG_SMP.
> 
> Instead of adding multiple ifdefs there provide stubs in smp.h
> 
> Signed-off-by: Boris Ostrovsky 

Reviewed-by: Juergen Gross 


Juergen

Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces

2016-07-25 Thread Andrew Vagin

On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)"  writes:

[snip]

> [snip]
> >>> So, from my point of view, the important piece that was missing from
> >>> your commit message was the note to use readlink("/proc/self/fd/%d")
> >>> on the returned FDs. I think that detail needs to be part of the
> >>> commit message (and also the man page text). I think it even be
> >>> helpful to include the above program as part of the commit message:
> >>> it helps people more quickly grasp the API.
> >>
> >> Please, please make the standard way to compare these things fstat.
> >> That is much less magic than a symlink, and a little more future proof.
> >> Possibly even kcmp.

I like the idea to use kcmp to compare namespaces. I am going to add this
functionality to kcmp and describe all these in the man page.

> >
> > As in fstat() to get the st_ino field, right?
> 
> Both the st_ino and st_dev fields.
> 
> The most likely change to support checkpoint/restart in the future is to
> preserve st_ino across migrations and instantiate a different instance
> of nsfs to hold the inode numbers from the previous machine.

It sounds tricky. BTW: Actually this is not only one places where we have
this sort of problem. For example, now mount id-s are not preserved when
a container is migrated. The same problem is applied to tmpfs, where
inode numbers are not preserved for files. 

> 
> We would need to handle the preservation carefully or else there is
> a chance that two namespace file descriptors (collected from different
> sources) with different st_dev and st_ino fields may actuall refer to
> the same object.
> 
> Which is a long way of saying we have the st_dev field please use it,
> it may matter at some point.
> 
> Eric

RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-25 Thread Dexuan Cui

> From: David Miller [mailto:da...@davemloft.net]
> ...
> From: Dexuan Cui 
> Date: Tue, 26 Jul 2016 03:09:16 +
> 
> > BTW, during the past month, at least 7 other people also reviewed
> > the patch and gave me quite a few good comments, which have
> > been addressed.
> 
> Correction: Several people gave coding style and simple corrections
> to your patch.
> 
> Very few gave any review of the _SUBSTANCE_ of your changes.
> 
> And the one of the few who did, and suggested you build your
> facilities using the existing S390 hypervisor socket infrastructure,
> you brushed off _IMMEDIATELY_.
>
> That drives me crazy.  The one person who gave you real feedback
> you basically didn't consider seriously at all.

Hi David,
I'm very sorry -- I guess I must have missed something here -- I don't
remember somebody replied with S390 hypervisor socket
infrastructure... I'm re-reading all the replies, trying to locate the
reply and I'll find out why I didn't take it seriously. Sorry in advance.

> I know why you don't want to consider alternative implementations,
> and it's because you guys have so much invested in what you've
> implemented already.
This is not true. I'm absolutely open to any possibility to have an
alternative better implementation.
Please allow me to find the "S390 hypervisor socket infrastructure" reply
first and I'll report back ASAP.
 
> But that's tough and not our problem.
> 
> And until this changes, yes, this submission will be stuck in the
> mud and continue slogging on like this.

I definitely agree and understand.

Thanks,
-- Dexuan

[PATCH v2 2/3] xen-blkfront: introduce blkif_set_queue_limits()

2016-07-25 Thread Bob Liu

blk_mq_update_nr_hw_queues() reset all queue limits to default which it's not
as xen-blkfront expected, introducing blkif_set_queue_limits() to reset limits
with initial correct values.

Signed-off-by: Bob Liu 
---
v2: Move blkif_set_queue_limits() after blkfront_gather_backend_features.
---
 drivers/block/xen-blkfront.c |   87 +++---
 1 file changed, 48 insertions(+), 39 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 032fc94..1b4c380 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -189,6 +189,8 @@ struct blkfront_info
struct mutex mutex;
struct xenbus_device *xbdev;
struct gendisk *gd;
+   u16 sector_size;
+   unsigned int physical_sector_size;
int vdevice;
blkif_vdev_t handle;
enum blkif_state connected;
@@ -913,9 +915,45 @@ static struct blk_mq_ops blkfront_mq_ops = {
.map_queue = blk_mq_map_queue,
 };
 
+static void blkif_set_queue_limits(struct blkfront_info *info)
+{
+   struct request_queue *rq = info->rq;
+   struct gendisk *gd = info->gd;
+   unsigned int segments = info->max_indirect_segments ? :
+   BLKIF_MAX_SEGMENTS_PER_REQUEST;
+
+   queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq);
+
+   if (info->feature_discard) {
+   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq);
+   blk_queue_max_discard_sectors(rq, get_capacity(gd));
+   rq->limits.discard_granularity = info->discard_granularity;
+   rq->limits.discard_alignment = info->discard_alignment;
+   if (info->feature_secdiscard)
+   queue_flag_set_unlocked(QUEUE_FLAG_SECDISCARD, rq);
+   }
+
+   /* Hard sector size and max sectors impersonate the equiv. hardware. */
+   blk_queue_logical_block_size(rq, info->sector_size);
+   blk_queue_physical_block_size(rq, info->physical_sector_size);
+   blk_queue_max_hw_sectors(rq, (segments * XEN_PAGE_SIZE) / 512);
+
+   /* Each segment in a request is up to an aligned page in size. */
+   blk_queue_segment_boundary(rq, PAGE_SIZE - 1);
+   blk_queue_max_segment_size(rq, PAGE_SIZE);
+
+   /* Ensure a merged request will fit in a single I/O ring slot. */
+   blk_queue_max_segments(rq, segments / GRANTS_PER_PSEG);
+
+   /* Make sure buffer addresses are sector-aligned. */
+   blk_queue_dma_alignment(rq, 511);
+
+   /* Make sure we don't use bounce buffers. */
+   blk_queue_bounce_limit(rq, BLK_BOUNCE_ANY);
+}
+
 static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
-   unsigned int physical_sector_size,
-   unsigned int segments)
+   unsigned int physical_sector_size)
 {
struct request_queue *rq;
struct blkfront_info *info = gd->private_data;
@@ -947,37 +985,11 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 
sector_size,
}
 
rq->queuedata = info;
-   queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq);
-
-   if (info->feature_discard) {
-   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq);
-   blk_queue_max_discard_sectors(rq, get_capacity(gd));
-   rq->limits.discard_granularity = info->discard_granularity;
-   rq->limits.discard_alignment = info->discard_alignment;
-   if (info->feature_secdiscard)
-   queue_flag_set_unlocked(QUEUE_FLAG_SECDISCARD, rq);
-   }
-
-   /* Hard sector size and max sectors impersonate the equiv. hardware. */
-   blk_queue_logical_block_size(rq, sector_size);
-   blk_queue_physical_block_size(rq, physical_sector_size);
-   blk_queue_max_hw_sectors(rq, (segments * XEN_PAGE_SIZE) / 512);
-
-   /* Each segment in a request is up to an aligned page in size. */
-   blk_queue_segment_boundary(rq, PAGE_SIZE - 1);
-   blk_queue_max_segment_size(rq, PAGE_SIZE);
-
-   /* Ensure a merged request will fit in a single I/O ring slot. */
-   blk_queue_max_segments(rq, segments / GRANTS_PER_PSEG);
-
-   /* Make sure buffer addresses are sector-aligned. */
-   blk_queue_dma_alignment(rq, 511);
-
-   /* Make sure we don't use bounce buffers. */
-   blk_queue_bounce_limit(rq, BLK_BOUNCE_ANY);
-
-   gd->queue = rq;
-
+   info->rq = gd->queue = rq;
+   info->gd = gd;
+   info->sector_size = sector_size;
+   info->physical_sector_size = physical_sector_size;
+   blkif_set_queue_limits(info);
return 0;
 }
 
@@ -1142,16 +1154,11 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
gd->driverfs_dev = &(info->xbdev->dev);
set_capacity(gd, capacity);
 
-   if (xlvbd_init_blk_queue(gd, sector_size, physical_sector_size,
-info->max_indirect_segments ? :
-BLKIF_

[PATCH v2 3/3] xen-blkfront: dynamic configuration of per-vbd resources

2016-07-25 Thread Bob Liu

The current VBD layer reserves buffer space for each attached device based on
three statically configured settings which are read at boot time.
 * max_indirect_segs: Maximum amount of segments.
 * max_ring_page_order: Maximum order of pages to be used for the shared ring.
 * max_queues: Maximum of queues(rings) to be used.

But the storage backend, workload, and guest memory result in very different
tuning requirements. It's impossible to centrally predict application
characteristics so it's best to leave allow the settings can be dynamiclly
adjusted based on workload inside the Guest.

Usage:
Show current values:
cat /sys/devices/vbd-xxx/max_indirect_segs
cat /sys/devices/vbd-xxx/max_ring_page_order
cat /sys/devices/vbd-xxx/max_queues

Write new values:
echo  > /sys/devices/vbd-xxx/max_indirect_segs
echo  > /sys/devices/vbd-xxx/max_ring_page_order
echo  > /sys/devices/vbd-xxx/max_queues

Signed-off-by: Bob Liu 
--
v2: Rename to max_ring_page_order and rm the waiting code suggested by Roger.
---
 drivers/block/xen-blkfront.c |  275 +-
 1 file changed, 269 insertions(+), 6 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 1b4c380..ff5ebe5 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -212,6 +212,11 @@ struct blkfront_info
/* Save uncomplete reqs and bios for migration. */
struct list_head requests;
struct bio_list bio_list;
+   /* For dynamic configuration. */
+   unsigned int reconfiguring:1;
+   int new_max_indirect_segments;
+   int max_ring_page_order;
+   int max_queues;
 };
 
 static unsigned int nr_minors;
@@ -1350,6 +1355,31 @@ static void blkif_free(struct blkfront_info *info, int 
suspend)
for (i = 0; i < info->nr_rings; i++)
blkif_free_ring(&info->rinfo[i]);
 
+   /* Remove old xenstore nodes. */
+   if (info->nr_ring_pages > 1)
+   xenbus_rm(XBT_NIL, info->xbdev->nodename, "ring-page-order");
+
+   if (info->nr_rings == 1) {
+   if (info->nr_ring_pages == 1) {
+   xenbus_rm(XBT_NIL, info->xbdev->nodename, "ring-ref");
+   } else {
+   for (i = 0; i < info->nr_ring_pages; i++) {
+   char ring_ref_name[RINGREF_NAME_LEN];
+
+   snprintf(ring_ref_name, RINGREF_NAME_LEN, 
"ring-ref%u", i);
+   xenbus_rm(XBT_NIL, info->xbdev->nodename, 
ring_ref_name);
+   }
+   }
+   } else {
+   xenbus_rm(XBT_NIL, info->xbdev->nodename, 
"multi-queue-num-queues");
+
+   for (i = 0; i < info->nr_rings; i++) {
+   char queuename[QUEUE_NAME_LEN];
+
+   snprintf(queuename, QUEUE_NAME_LEN, "queue-%u", i);
+   xenbus_rm(XBT_NIL, info->xbdev->nodename, queuename);
+   }
+   }
kfree(info->rinfo);
info->rinfo = NULL;
info->nr_rings = 0;
@@ -1763,15 +1793,21 @@ static int talk_to_blkback(struct xenbus_device *dev,
const char *message = NULL;
struct xenbus_transaction xbt;
int err;
-   unsigned int i, max_page_order = 0;
+   unsigned int i, backend_max_order = 0;
unsigned int ring_page_order = 0;
 
err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
-  "max-ring-page-order", "%u", &max_page_order);
+  "max-ring-page-order", "%u", &backend_max_order);
if (err != 1)
info->nr_ring_pages = 1;
else {
-   ring_page_order = min(xen_blkif_max_ring_order, max_page_order);
+   if (info->max_ring_page_order) {
+   /* Dynamic configured through /sys. */
+   BUG_ON(info->max_ring_page_order > backend_max_order);
+   ring_page_order = info->max_ring_page_order;
+   } else
+   /* Default. */
+   ring_page_order = min(xen_blkif_max_ring_order, 
backend_max_order);
info->nr_ring_pages = 1 << ring_page_order;
}
 
@@ -1894,7 +1930,14 @@ static int negotiate_mq(struct blkfront_info *info)
if (err < 0)
backend_max_queues = 1;
 
-   info->nr_rings = min(backend_max_queues, xen_blkif_max_queues);
+   if (info->max_queues) {
+   /* Dynamic configured through /sys */
+   BUG_ON(info->max_queues > backend_max_queues);
+   info->nr_rings = info->max_queues;
+   } else
+   /* Default. */
+   info->nr_rings = min(backend_max_queues, xen_blkif_max_queues);
+
/* We need at least one ring. */
if (!info->nr_rings)
info->nr_rings = 1;
@@ -2352,11 +2395,197 @@ static void blkfront_gather_backend_features(struct 
blkfront_info *info)

[PATCH 1/3] xen-blkfront: fix places not updated after introducing 64KB page granularity

2016-07-25 Thread Bob Liu

Two places didn't get updated when 64KB page granularity was introduced, this
patch fix them.

Signed-off-by: Bob Liu 
Acked-by: Roger Pau Monné 
---
 drivers/block/xen-blkfront.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index fcc5b4e..032fc94 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1321,7 +1321,7 @@ free_shadow:
rinfo->ring_ref[i] = GRANT_INVALID_REF;
}
}
-   free_pages((unsigned long)rinfo->ring.sring, 
get_order(info->nr_ring_pages * PAGE_SIZE));
+   free_pages((unsigned long)rinfo->ring.sring, 
get_order(info->nr_ring_pages * XEN_PAGE_SIZE));
rinfo->ring.sring = NULL;
 
if (rinfo->irq)
@@ -2013,7 +2013,7 @@ static int blkif_recover(struct blkfront_info *info)
 
blkfront_gather_backend_features(info);
segs = info->max_indirect_segments ? : BLKIF_MAX_SEGMENTS_PER_REQUEST;
-   blk_queue_max_segments(info->rq, segs);
+   blk_queue_max_segments(info->rq, segs / GRANTS_PER_PSEG);
 
for (r_index = 0; r_index < info->nr_rings; r_index++) {
struct blkfront_ring_info *rinfo = &info->rinfo[r_index];
-- 
1.7.10.4

[PATCH] usb: ftdi-elan: Remove deprecated create_singlethread_workqueue

2016-07-25 Thread Bhaktipriya Shridhar

The status workqueue is involved in initializing the Uxxx and polling
the Uxxx until a supported PCMCIA CardBus device is detected.
It then starts the command and respond workqueues and then loads the
module that handles the device, after which it just polls the Uxxx
looking for card ejects.

The command and respond workqueues are involved in implementing a command
sequencer for communicating with the firmware on the other side of
the FTDI chip in the Uxxx.

These workqueues have only a single work item each and hence they do not
require ordering. Also, none of the above workqueues are being used on a
memory recliam path. Hence, the singlethreaded workqueues have been
replaced with the use of system_wq.

System workqueues have been able to handle high level of concurrency
for a long time now and hence it's not required to have a singlethreaded
workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue
created with create_singlethread_workqueue(), system_wq allows multiple
work items to overlap executions even on the same CPU; however, a
per-cpu workqueue doesn't have any CPU locality or global ordering
guarantee unless the target CPU is explicitly specified and thus the
increase of local concurrency shouldn't make any difference.

The work items have been sync cancelled because they are self-requeueing
and need to wait for the in-flight work item to finish before proceeding
with destruction. Hence, they have been sync cancelled in
ftdi_status_cancel_work(), ftdi_command_cancel_work() and
ftdi_response_cancel_work(). These functions are called in
ftdi_elan_exit() to ensure that there are no pending work items while
disconnecting the driver.

Signed-off-by: Bhaktipriya Shridhar 
---
 drivers/usb/misc/ftdi-elan.c | 53 +---
 1 file changed, 10 insertions(+), 43 deletions(-)

diff --git a/drivers/usb/misc/ftdi-elan.c b/drivers/usb/misc/ftdi-elan.c
index 52c27ca..59031dc 100644
--- a/drivers/usb/misc/ftdi-elan.c
+++ b/drivers/usb/misc/ftdi-elan.c
@@ -61,9 +61,6 @@ module_param(distrust_firmware, bool, 0);
 MODULE_PARM_DESC(distrust_firmware,
 "true to distrust firmware power/overcurrent setup");
 extern struct platform_driver u132_platform_driver;
-static struct workqueue_struct *status_queue;
-static struct workqueue_struct *command_queue;
-static struct workqueue_struct *respond_queue;
 /*
  * ftdi_module_lock exists to protect access to global variables
  *
@@ -228,56 +225,56 @@ static void ftdi_elan_init_kref(struct usb_ftdi *ftdi)

 static void ftdi_status_requeue_work(struct usb_ftdi *ftdi, unsigned int delta)
 {
-   if (!queue_delayed_work(status_queue, &ftdi->status_work, delta))
+   if (!schedule_delayed_work(&ftdi->status_work, delta))
kref_put(&ftdi->kref, ftdi_elan_delete);
 }

 static void ftdi_status_queue_work(struct usb_ftdi *ftdi, unsigned int delta)
 {
-   if (queue_delayed_work(status_queue, &ftdi->status_work, delta))
+   if (schedule_delayed_work(&ftdi->status_work, delta))
kref_get(&ftdi->kref);
 }

 static void ftdi_status_cancel_work(struct usb_ftdi *ftdi)
 {
-   if (cancel_delayed_work(&ftdi->status_work))
+   if (cancel_delayed_work_sync(&ftdi->status_work))
kref_put(&ftdi->kref, ftdi_elan_delete);
 }

 static void ftdi_command_requeue_work(struct usb_ftdi *ftdi, unsigned int 
delta)
 {
-   if (!queue_delayed_work(command_queue, &ftdi->command_work, delta))
+   if (!schedule_delayed_work(&ftdi->command_work, delta))
kref_put(&ftdi->kref, ftdi_elan_delete);
 }

 static void ftdi_command_queue_work(struct usb_ftdi *ftdi, unsigned int delta)
 {
-   if (queue_delayed_work(command_queue, &ftdi->command_work, delta))
+   if (schedule_delayed_work(&ftdi->command_work, delta))
kref_get(&ftdi->kref);
 }

 static void ftdi_command_cancel_work(struct usb_ftdi *ftdi)
 {
-   if (cancel_delayed_work(&ftdi->command_work))
+   if (cancel_delayed_work_sync(&ftdi->command_work))
kref_put(&ftdi->kref, ftdi_elan_delete);
 }

 static void ftdi_response_requeue_work(struct usb_ftdi *ftdi,
   unsigned int delta)
 {
-   if (!queue_delayed_work(respond_queue, &ftdi->respond_work, delta))
+   if (!schedule_delayed_work(&ftdi->respond_work, delta))
kref_put(&ftdi->kref, ftdi_elan_delete);
 }

 static void ftdi_respond_queue_work(struct usb_ftdi *ftdi, unsigned int delta)
 {
-   if (queue_delayed_work(respond_queue, &ftdi->respond_work, delta))
+   if (schedule_delayed_work(&ftdi->respond_work, delta))
kref_get(&ftdi->kref);
 }

 static void ftdi_response_cancel_work(struct usb_ftdi *ftdi)
 {
-   if (cancel_delayed_work(&ftdi->respond_work))
+   if (cancel_delayed_work_sync(&ftdi->respond_work))
kref_put(&ftdi->kref, ftdi_elan_delete);
 }

@@ -2823,9 +2820,6 @@ static void ftdi_elan_disconnect(stru

[PATCH 1/1] socfpga: defconfig: Enable Altera GPIO driver as module

2016-07-25 Thread thloh

From: Tien Hock Loh 

This patch enables Altera GPIO driver as module in socfpga_defconfig

Signed-off-by: Tien Hock Loh 
---
 arch/arm/configs/socfpga_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/configs/socfpga_defconfig 
b/arch/arm/configs/socfpga_defconfig
index 753f1a5..241ce4ca 100644
--- a/arch/arm/configs/socfpga_defconfig
+++ b/arch/arm/configs/socfpga_defconfig
@@ -108,3 +108,4 @@ CONFIG_DETECT_HUNG_TASK=y
 # CONFIG_SCHED_DEBUG is not set
 CONFIG_ENABLE_DEFAULT_TRACERS=y
 CONFIG_DEBUG_USER=y
+CONFIG_GPIO_ALTERA=m
-- 
1.7.11.GIT

Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

2016-07-25 Thread Alan Curry

Al Viro wrote:
> On Sun, Jul 24, 2016 at 07:45:13PM +0200, Christian Lamparter wrote:
> 
> > > The symptom is that downloaded files (http, ftp, and probably other
> > > protocols) have small corrupted segments (about 1-2 kilobytes long) in
> > > random locations. Only downloads that sustain a high speed for at least a
> > > few seconds are corrupted. Anything small enough to be received in less
> > > than about 5 seconds is not affected.
> 
> Can that sucker be reproduced with netcat?  That would eliminate all issues
> with multi-iovec recvmsg(2), narrowing the things down quite bit.

netcat seems to be immune. Comparing strace results, I didn't see any
recvmsg() calls in the other programs that have had the problem, but there
is an interesting difference: netcat calls select() to wait for the socket
to be ready for reading, where my other test programs just call read() and
let it block until ready.

So I wrote a small test program to isolate that difference. It downloads
a file using only read() and write() and a hardcoded HTTP request. It has
a select mode (main loop alternates read() and select() on the TCP socket)
and a noselect mode (main loop just read()s the TCP socket).

The program is included at the bottom of this message.

I ran it several times in both modes and got corruption if and only if the
noselect mode was used.

> 
> Another thing (and if that works, it's *NOT* a proper fix - it would be
> papering over the problem, but at least it would show where to look for
> it) - try (on top of mainline) the following delta:
> 
> diff --git a/net/core/datagram.c b/net/core/datagram.c

Will try that patch soon. Meanwhile, here's my test:

/* Demonstration program "dlbug".
   Usage: dlbug select > outfile
  or
  dlbug noselect > outfile
   outfile will contain the full HTTP response. Edit out the HTTP headers
   and what's left should be a valid gzip if the download worked. */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
  const char *request =
"GET /debian/dists/stable/main/Contents-amd64.gz HTTP/1.0\r\n"
"Host: ftp.us.debian.org\r\n"
"\r\n";
  ssize_t request_len = strlen(request), w, r, copied;
  struct addrinfo hints, *host;
  int sock, err, doselect;
  char buf[10240];

  if(argc!=2 || (!strcmp(argv[1], "select") && !strcmp(argv[1], "noselect"))) {
fprintf(stderr, "Usage: %s {select|noselect}\n", argv[0]);
return 1;
  }

  doselect = !strcmp(argv[1], "select");

  memset(&hints, 0, sizeof hints);
  hints.ai_family = AF_INET;
  hints.ai_socktype = SOCK_STREAM;

  err = getaddrinfo("ftp.us.debian.org", 0, &hints, &host);
  if(err) {
fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(err));
return 1;
  }

  sock = socket(host->ai_family, host->ai_socktype, host->ai_protocol);
  if(sock < 0) {
perror("socket");
return 1;
  }

  ((struct sockaddr_in *)host->ai_addr)->sin_port = htons(80);

  if(connect(sock, host->ai_addr, host->ai_addrlen) < 0) {
perror("connect");
return 1;
  }

  while(request_len) {
w = write(sock, request, request_len);
if(w < 0) {
  perror("write to socket");
  return 1;
}
request += w;
request_len -= w;
  }

  while((r = read(sock, buf, sizeof buf))) {
if(r < 0) {
  perror("read from socket");
  return 1;
}

copied = 0;
while(copied < r) {
  w = write(1, buf+copied, r-copied);
  if(w < 0) {
perror("write to stdout");
return 1;
  }
  copied += w;
}

if(doselect) {
  fd_set rfds;
  FD_ZERO(&rfds);
  FD_SET(sock, &rfds);
  select(sock+1, &rfds, 0, 0, 0);
}
  }

  return 0;
}

-- 
Alan Curry

Re: linux-next: manual merge of the xen-tip tree with the block tree

2016-07-25 Thread Stephen Rothwell

Hi Boris,

On Mon, 25 Jul 2016 18:25:00 -0400 Boris Ostrovsky  
wrote:
>
> > Jeremy Fitzhardinge   
> 
> Jeremy is no longer involved with Xen. However,
> 
> Juergen Gross 
> 
> is also Linux Xen/x86 maintainer.

I have replaced Jeremy with Juergen.

-- 
Cheers,
Stephen Rothwell

linux-next: manual merge of the random tree with the kspp tree

2016-07-25 Thread Stephen Rothwell

Hi Theodore,

Today's linux-next merge of the random tree got a conflict in:

  drivers/char/random.c

between commit:

  8c6a68e9eaa5 ("latent_entropy: Mark functions with __latent_entropy")

from the kspp tree and commit:

  e192be9d9a30 ("random: replace non-blocking pool with a Chacha20-based CRNG")

from the random tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/char/random.c
index 6cca3ed45817,8d0af74f6569..
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@@ -442,10 -471,15 +471,15 @@@ struct entropy_store 
__u8 last_data[EXTRACT_SIZE];
  };
  
+ static ssize_t extract_entropy(struct entropy_store *r, void *buf,
+  size_t nbytes, int min, int rsvd);
+ static ssize_t _extract_entropy(struct entropy_store *r, void *buf,
+   size_t nbytes, int fips);
+ 
+ static void crng_reseed(struct crng_state *crng, struct entropy_store *r);
  static void push_to_pool(struct work_struct *work);
 -static __u32 input_pool_data[INPUT_POOL_WORDS];
 -static __u32 blocking_pool_data[OUTPUT_POOL_WORDS];
 +static __u32 input_pool_data[INPUT_POOL_WORDS] __latent_entropy;
 +static __u32 blocking_pool_data[OUTPUT_POOL_WORDS] __latent_entropy;
- static __u32 nonblocking_pool_data[OUTPUT_POOL_WORDS] __latent_entropy;
  
  static struct entropy_store input_pool = {
.poolinfo = &poolinfo_table[0],

Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

2016-07-25 Thread alexmcwhirter

Thanks for the detailed bug-report. I looked around the web to see if 
it
was already reported or not. If found that this issue was reported 
before:

[0], [1] and [2] by the same person (CC'ed). One difference is that the
reporter had this issue with rsync on multiple SPARC systems. I ran a
git grep on a 4.7.0-rc7+ (wt-2016-07-21-15-g97bd3b0). But it didn't 
find

any patches directly referencing the commit. I'm not sure if this issue
has been fixed by now or not. I would greatly appreciate any comment
about this from the "people of netdev" (Al Viro? Alex Mcwhirter?).


I can confirm the issue i was having with this commit still exists on 
sparc with the latest mainline kernel.

Re: [PATCH v3 02/11] mm: Hardened usercopy

2016-07-25 Thread Kees Cook

On Mon, Jul 25, 2016 at 7:03 PM, Michael Ellerman  wrote:
> Josh Poimboeuf  writes:
>
>> On Thu, Jul 21, 2016 at 11:34:25AM -0700, Kees Cook wrote:
>>> On Wed, Jul 20, 2016 at 11:52 PM, Michael Ellerman  
>>> wrote:
>>> > Kees Cook  writes:
>>> >
>>> >> diff --git a/mm/usercopy.c b/mm/usercopy.c
>>> >> new file mode 100644
>>> >> index ..e4bf4e7ccdf6
>>> >> --- /dev/null
>>> >> +++ b/mm/usercopy.c
>>> >> @@ -0,0 +1,234 @@
>>> > ...
>>> >> +
>>> >> +/*
>>> >> + * Checks if a given pointer and length is contained by the current
>>> >> + * stack frame (if possible).
>>> >> + *
>>> >> + *   0: not at all on the stack
>>> >> + *   1: fully within a valid stack frame
>>> >> + *   2: fully on the stack (when can't do frame-checking)
>>> >> + *   -1: error condition (invalid stack position or bad stack frame)
>>> >> + */
>>> >> +static noinline int check_stack_object(const void *obj, unsigned long 
>>> >> len)
>>> >> +{
>>> >> + const void * const stack = task_stack_page(current);
>>> >> + const void * const stackend = stack + THREAD_SIZE;
>>> >
>>> > That allows access to the entire stack, including the struct thread_info,
>>> > is that what we want - it seems dangerous? Or did I miss a check
>>> > somewhere else?
>>>
>>> That seems like a nice improvement to make, yeah.
>>>
>>> > We have end_of_stack() which computes the end of the stack taking
>>> > thread_info into account (end being the opposite of your end above).
>>>
>>> Amusingly, the object_is_on_stack() check in sched.h doesn't take
>>> thread_info into account either. :P Regardless, I think using
>>> end_of_stack() may not be best. To tighten the check, I think we could
>>> add this after checking that the object is on the stack:
>>>
>>> #ifdef CONFIG_STACK_GROWSUP
>>> stackend -= sizeof(struct thread_info);
>>> #else
>>> stack += sizeof(struct thread_info);
>>> #endif
>>>
>>> e.g. then if the pointer was in the thread_info, the second test would
>>> fail, triggering the protection.
>>
>> FWIW, this won't work right on x86 after Andy's
>> CONFIG_THREAD_INFO_IN_TASK patches get merged.
>
> Yeah. I wonder if it's better for the arch helper to just take the obj and 
> len,
> and work out it's own bounds for the stack using current and whatever makes
> sense on that arch.
>
> It would avoid too much ifdefery in the generic code, and also avoid any
> confusion about whether stackend is the high or low address.
>
> eg. on powerpc we could do:
>
> int noinline arch_within_stack_frames(const void *obj, unsigned long len)
> {
> void *stack_low  = end_of_stack(current);
> void *stack_high = task_stack_page(current) + THREAD_SIZE;
>
>
> Whereas arches with STACK_GROWSUP=y could do roughly the reverse, and x86 can 
> do
> whatever it needs to depending on whether the thread_info is on or off stack.
>
> cheers

Yeah, I agree: this should be in the arch code. If the arch can
actually do frame checking, the thread_info (if it exists on the
stack) would already be excluded. But it'd be a nice tightening of the
check.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

Re: [PATCH 1/3] net: asix: Add in_pm parameter

2016-07-25 Thread David Miller


Please correct the problems Grant Grundler mentioned in all of these
patches, and resubmit this entire series freshly.

Also, please include a proper "[PATCH 0/3] ..." introduction posting
for the series which explains what this series is about, how it
implements what it is doing, and why it is doing things that way.

Thanks.

[PATCH] powerpc: sgy_cts1000: Fix gpio_halt_cb()'s signature

2016-07-25 Thread Andrey Smirnov

Halt callback in struct machdep_calls is declared with __noreturn
attribute, so omitting that attribute in gpio_halt_cb()'s signatrue
results in compilation error.

Change the signature to address the problem as well as change the code
of the function to avoid ever returning from the function.

Signed-off-by: Andrey Smirnov 
---
 arch/powerpc/platforms/85xx/sgy_cts1000.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/sgy_cts1000.c 
b/arch/powerpc/platforms/85xx/sgy_cts1000.c
index 79fd0df..21d6aaa 100644
--- a/arch/powerpc/platforms/85xx/sgy_cts1000.c
+++ b/arch/powerpc/platforms/85xx/sgy_cts1000.c
@@ -38,18 +38,18 @@ static void gpio_halt_wfn(struct work_struct *work)
 }
 static DECLARE_WORK(gpio_halt_wq, gpio_halt_wfn);
 
-static void gpio_halt_cb(void)
+static void __noreturn gpio_halt_cb(void)
 {
enum of_gpio_flags flags;
int trigger, gpio;
 
if (!halt_node)
-   return;
+   panic("No reset GPIO information was provided in DT\n");
 
gpio = of_get_gpio_flags(halt_node, 0, &flags);
 
if (!gpio_is_valid(gpio))
-   return;
+   panic("Provided GPIO is invalid\n");
 
trigger = (flags == OF_GPIO_ACTIVE_LOW);
 
@@ -57,6 +57,8 @@ static void gpio_halt_cb(void)
 
/* Probably wont return */
gpio_set_value(gpio, trigger);
+
+   panic("Halt failed\n");
 }
 
 /* This IRQ means someone pressed the power button and it is waiting for us
-- 
2.5.5

Re: [RFC patch 1/6] random: Simplify API for random address requests

2016-07-25 Thread Kees Cook

On Mon, Jul 25, 2016 at 8:01 PM, Jason Cooper  wrote:
> To date, all callers of randomize_range() have set the length to 0, and
> check for a zero return value.  For the current callers, the only way
> to get zero returned is if end <= start.  Since they are all adding a
> constant to the start address, this is unnecessary.
>
> We can remove a bunch of needless checks by simplifying the API to do
> just what everyone wants, return an address between [start, start +
> range].
>
> While we're here, s/get_random_int/get_random_long/.  No current call
> site is adversely affected by get_random_int(), since all current range
> requests are < MAX_UINT.  However, we should match caller expectations
> to avoid coming up short (ha!) in the future.
>
> Signed-off-by: Jason Cooper 
> ---
>  drivers/char/random.c  | 17 -
>  include/linux/random.h |  2 +-
>  2 files changed, 5 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 0158d3bff7e5..1251cb2cbab2 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1822,22 +1822,13 @@ unsigned long get_random_long(void)
>  EXPORT_SYMBOL(get_random_long);
>
>  /*
> - * randomize_range() returns a start address such that
> - *
> - *[..  .]
> - *  start  end
> - *
> - * a  with size "len" starting at the return value is inside in the
> - * area defined by [start, end], but is otherwise randomized.
> + * randomize_addr() returns a page aligned address within [start, start +
> + * range]
>   */
>  unsigned long
> -randomize_range(unsigned long start, unsigned long end, unsigned long len)
> +randomize_addr(unsigned long start, unsigned long range)

Also, this series isn't bisectable since randomize_range gets removed
here before the callers are updated. Perhaps add a macro that calls
randomize_addr with a BUG_ON for len != 0? (And then remove it in the
last patch?)

-Kees

>  {
> -   unsigned long range = end - len - start;
> -
> -   if (end <= start + len)
> -   return 0;
> -   return PAGE_ALIGN(get_random_int() % range + start);
> +   return PAGE_ALIGN(get_random_long() % range + start);
>  }
>
>  /* Interface for in-kernel drivers of true hardware RNGs.
> diff --git a/include/linux/random.h b/include/linux/random.h
> index e47e533742b5..1ad877a98186 100644
> --- a/include/linux/random.h
> +++ b/include/linux/random.h
> @@ -34,7 +34,7 @@ extern const struct file_operations random_fops, 
> urandom_fops;
>
>  unsigned int get_random_int(void);
>  unsigned long get_random_long(void);
> -unsigned long randomize_range(unsigned long start, unsigned long end, 
> unsigned long len);
> +unsigned long randomize_addr(unsigned long start, unsigned long range);
>
>  u32 prandom_u32(void);
>  void prandom_bytes(void *buf, size_t nbytes);
> --
> 2.9.2
>



-- 
Kees Cook
Chrome OS & Brillo Security

Re: [kbuild-all] arch/xtensa/include/asm/initialize_mmu.h:55: Error: invalid register 'atomctl' for 'wsr' instruction

2016-07-25 Thread Fengguang Wu


Hi Max,

On Tue, Jul 26, 2016 at 02:20:25AM +0300, Max Filippov wrote:

Hi Fengguang,

On Fri, Jul 22, 2016 at 3:44 PM, Fengguang Wu  wrote:

On Fri, Jul 22, 2016 at 06:32:28PM +0800, kbuild test robot wrote:

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
master
head:   47ef4ad2684d380dd6d596140fb79395115c3950
commit: 9da8320bb97768e35f2e64fa7642015271d672eb xtensa: add
test_kc705_hifi variant
date:   4 months ago
config: xtensa-audio_kc705_defconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 4.9.0



All errors (new ones prefixed by >>):

  arch/xtensa/include/asm/initialize_mmu.h: Assembler messages:


arch/xtensa/include/asm/initialize_mmu.h:55: Error: invalid register
'atomctl' for 'wsr' instruction


--
  arch/xtensa/kernel/coprocessor.S: Assembler messages:


arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'rur.ae_ovf_sar'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'rur.ae_bithead'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'rur.ae_ts_fts_bu_bp'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'rur.ae_cw_sd_no'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'rur.ae_cbegin0'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'rur.ae_cend0'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'ae_s64.i'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'ae_s64.i'


Are they really matter? Or I can shut these errors up.


Looks like I haven't supplied you with the compiler for test_kc705_hifi, for
which these errors are reported. I've built it and put it here:

 
http://jcmvbkbc.spb.ru/~jcmvbkbc/tmp/201604261801/x86_64-gcc-5.3.0-nolibc-xtensa-test_kc705_hifi-elf.tar.xz

Please integrate it into your system along with other xtensa compilers.


OK, done. :)

Thanks,
Fengguang

Re: [PATCH] caif-hsi: Remove deprecated create_singlethread_workqueue

2016-07-25 Thread David Miller

From: Bhaktipriya Shridhar 
Date: Mon, 25 Jul 2016 18:40:57 +0530

> alloc_workqueue replaces deprecated create_singlethread_workqueue().
> 
> A dedicated workqueue has been used since the workitems are being used
> on a packet tx/rx path. Hence, WQ_MEM_RECLAIM has been set to guarantee
> forward progress under memory pressure.
> 
> An ordered workqueue has been used since workitems &cfhsi->wake_up_work
> and &cfhsi->wake_down_work cannot be run concurrently.
> 
> Calls to flush_workqueue() before destroy_workqueue() have been dropped
> since destroy_workqueue() itself calls drain_workqueue() which flushes
> repeatedly till the workqueue becomes empty.
> 
> Signed-off-by: Bhaktipriya Shridhar 

Applied.

Re: [RFC patch 1/6] random: Simplify API for random address requests

2016-07-25 Thread Kees Cook

On Mon, Jul 25, 2016 at 8:30 PM, Jason Cooper  wrote:
> All,
>
> On Tue, Jul 26, 2016 at 03:01:55AM +, Jason Cooper wrote:
>> To date, all callers of randomize_range() have set the length to 0, and
>> check for a zero return value.  For the current callers, the only way
>> to get zero returned is if end <= start.  Since they are all adding a
>> constant to the start address, this is unnecessary.
>>
>> We can remove a bunch of needless checks by simplifying the API to do
>> just what everyone wants, return an address between [start, start +
>> range].
>>
>> While we're here, s/get_random_int/get_random_long/.  No current call
>> site is adversely affected by get_random_int(), since all current range
>> requests are < MAX_UINT.  However, we should match caller expectations
>> to avoid coming up short (ha!) in the future.
>>
>> Signed-off-by: Jason Cooper 
>> ---
>>  drivers/char/random.c  | 17 -
>>  include/linux/random.h |  2 +-
>>  2 files changed, 5 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/char/random.c b/drivers/char/random.c
>> index 0158d3bff7e5..1251cb2cbab2 100644
>> --- a/drivers/char/random.c
>> +++ b/drivers/char/random.c
>> @@ -1822,22 +1822,13 @@ unsigned long get_random_long(void)
>>  EXPORT_SYMBOL(get_random_long);
>>
>>  /*
>> - * randomize_range() returns a start address such that
>> - *
>> - *[..  .]
>> - *  start  end
>> - *
>> - * a  with size "len" starting at the return value is inside in the
>> - * area defined by [start, end], but is otherwise randomized.
>> + * randomize_addr() returns a page aligned address within [start, start +
>> + * range]
>>   */
>>  unsigned long
>> -randomize_range(unsigned long start, unsigned long end, unsigned long len)
>> +randomize_addr(unsigned long start, unsigned long range)
>>  {
>> - unsigned long range = end - len - start;
>> -
>> - if (end <= start + len)
>> - return 0;
>> - return PAGE_ALIGN(get_random_int() % range + start);
>> + return PAGE_ALIGN(get_random_long() % range + start);
>>  }
>
> bah!  old patch file.  This should have been:
>
> if (range == 0)
> return start;
> else
> return PAGE_ALIGN(get_random_long() % range + start);

I think range should be limited to start + range < UINTMAX, and it
should be very clear if the range is inclusive or exclusive.  start =
0, range = 4096. does this mean 1 page, or 2 pages possible?

-Kees

>
> sorry,
>
> Jason.
>
>>
>>  /* Interface for in-kernel drivers of true hardware RNGs.
>> diff --git a/include/linux/random.h b/include/linux/random.h
>> index e47e533742b5..1ad877a98186 100644
>> --- a/include/linux/random.h
>> +++ b/include/linux/random.h
>> @@ -34,7 +34,7 @@ extern const struct file_operations random_fops, 
>> urandom_fops;
>>
>>  unsigned int get_random_int(void);
>>  unsigned long get_random_long(void);
>> -unsigned long randomize_range(unsigned long start, unsigned long end, 
>> unsigned long len);
>> +unsigned long randomize_addr(unsigned long start, unsigned long range);
>>
>>  u32 prandom_u32(void);
>>  void prandom_bytes(void *buf, size_t nbytes);
>> --
>> 2.9.2
>>



-- 
Kees Cook
Chrome OS & Brillo Security

Re: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-25 Thread David Miller

From: Dexuan Cui 
Date: Tue, 26 Jul 2016 03:09:16 +

> BTW, during the past month, at least 7 other people also reviewed
> the patch and gave me quite a few good comments, which have
> been addressed.

Correction: Several people gave coding style and simple corrections
to your patch.

Very few gave any review of the _SUBSTANCE_ of your changes.

And the one of the few who did, and suggested you build your
facilities using the existing S390 hypervisor socket infrastructure,
you brushed off _IMMEDIATELY_.

That drives me crazy.  The one person who gave you real feedback
you basically didn't consider seriously at all.

I know why you don't want to consider alternative implementations,
and it's because you guys have so much invested in what you've
implemented already.

But that's tough and not our problem.

And until this changes, yes, this submission will be stuck in the
mud and continue slogging on like this.

Sorry.

Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

2016-07-25 Thread Alan Curry

Christian Lamparter wrote:
> 
> As for carl9170: I'm not sure what the driver or firmware can do about
> this at this time. You can try to disable the hardware crypto by setting
> nohwcrypt via the module option. However, this might not do anything at all.

The nohwcrypt parameter didn't make any difference.

> > 
> > lsusb identifies my network device as:
> > 
> > Bus 005 Device 004: ID 0cf3:1002 Atheros Communications, Inc. TP-Link 
> > TL-WN821N v2 802.11n [Atheros AR9170]
> > 
> > I have version 1.9.9 of carl9170-1.fw in /lib/firmware
> Just one additional question: Is the TL-WN821N connected to a USB3 port?

It never has been before. I tried it today and it made no difference.

-- 
Alan Curry

[PATCH 1/2] powerpc: mpc85xx_mds: Select PHYLIB only if NETDEVICES is enabled

2016-07-25 Thread Andrey Smirnov

PHYLIB depends on NETDEVICES, so to avoid unmet dependencies warning
from Kconfig it needs to be selected conditionally.

Also add checks if PHYLIB is built-in to avoid undefined references to
PHYLIB's symbols.

Signed-off-by: Andrey Smirnov 
---
 arch/powerpc/platforms/85xx/Kconfig   | 2 +-
 arch/powerpc/platforms/85xx/mpc85xx_mds.c | 9 -
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/Kconfig 
b/arch/powerpc/platforms/85xx/Kconfig
index e626461..3da35bc 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -72,7 +72,7 @@ config MPC85xx_CDS
 config MPC85xx_MDS
bool "Freescale MPC85xx MDS"
select DEFAULT_UIMAGE
-   select PHYLIB
+   select PHYLIB if NETDEVICES
select HAS_RAPIDIO
select SWIOTLB
help
diff --git a/arch/powerpc/platforms/85xx/mpc85xx_mds.c 
b/arch/powerpc/platforms/85xx/mpc85xx_mds.c
index dbcb467..71aff5e 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx_mds.c
+++ b/arch/powerpc/platforms/85xx/mpc85xx_mds.c
@@ -63,6 +63,8 @@
 #define DBG(fmt...)
 #endif
 
+#if IS_BUILTIN(CONFIG_PHYLIB)
+
 #define MV88E_SCR  0x10
 #define MV88E_SCR_125CLK   0x0010
 static int mpc8568_fixup_125_clock(struct phy_device *phydev)
@@ -152,6 +154,8 @@ static int mpc8568_mds_phy_fixups(struct phy_device *phydev)
return err;
 }
 
+#endif
+
 /* 
  *
  * Setup the architecture
@@ -313,6 +317,7 @@ static void __init mpc85xx_mds_setup_arch(void)
swiotlb_detect_4g();
 }
 
+#if IS_BUILTIN(CONFIG_PHYLIB)
 
 static int __init board_fixups(void)
 {
@@ -342,9 +347,12 @@ static int __init board_fixups(void)
 
return 0;
 }
+
 machine_arch_initcall(mpc8568_mds, board_fixups);
 machine_arch_initcall(mpc8569_mds, board_fixups);
 
+#endif
+
 static int __init mpc85xx_publish_devices(void)
 {
if (machine_is(mpc8568_mds))
@@ -435,4 +443,3 @@ define_machine(p1021_mds) {
.pcibios_fixup_phb  = fsl_pcibios_fixup_phb,
 #endif
 };
-
-- 
2.5.5

[PATCH 2/2] powerpc: e8248e: Select PHYLIB only if NETDEVICES is enabled

2016-07-25 Thread Andrey Smirnov

Select PHYLIB only if NETDEVICES is enabled and MDIO_BITBANG only if
PHYLIB is present to avoid warnings from Kconfig.

To prevent undefined references during linking register MDIO driver only
if CONFIG_MDIO_BITBANG is enabled.

Signed-off-by: Andrey Smirnov 
---
 arch/powerpc/platforms/82xx/Kconfig   | 4 ++--
 arch/powerpc/platforms/82xx/ep8248e.c | 4 +++-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/82xx/Kconfig 
b/arch/powerpc/platforms/82xx/Kconfig
index 7c7df400..994d1a9 100644
--- a/arch/powerpc/platforms/82xx/Kconfig
+++ b/arch/powerpc/platforms/82xx/Kconfig
@@ -30,8 +30,8 @@ config EP8248E
select 8272
select 8260
select FSL_SOC
-   select PHYLIB
-   select MDIO_BITBANG
+   select PHYLIB if NETDEVICES
+   select MDIO_BITBANG if PHYLIB
help
  This enables support for the Embedded Planet EP8248E board.
 
diff --git a/arch/powerpc/platforms/82xx/ep8248e.c 
b/arch/powerpc/platforms/82xx/ep8248e.c
index cdab847..8fec050 100644
--- a/arch/powerpc/platforms/82xx/ep8248e.c
+++ b/arch/powerpc/platforms/82xx/ep8248e.c
@@ -298,7 +298,9 @@ static const struct of_device_id of_bus_ids[] __initconst = 
{
 static int __init declare_of_platform_devices(void)
 {
of_platform_bus_probe(NULL, of_bus_ids, NULL);
-   platform_driver_register(&ep8248e_mdio_driver);
+
+   if (IS_ENABLED(CONFIG_MDIO_BITBANG))
+   platform_driver_register(&ep8248e_mdio_driver);
 
return 0;
 }
-- 
2.5.5

[PATCH 2/3] powerpc: Call chained reset handlers during reset

2016-07-25 Thread Andrey Smirnov

Call out to all restart handlers that were added via
register_restart_handler() API when restarting the machine.

Signed-off-by: Andrey Smirnov 
---
 arch/powerpc/kernel/setup-common.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 5cd3283..205d073 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -145,6 +145,10 @@ void machine_restart(char *cmd)
ppc_md.restart(cmd);
 
smp_send_stop();
+
+   do_kernel_restart(cmd);
+   mdelay(1000);
+
machine_hang();
 }
 
-- 
2.5.5

[PATCH 3/3] powerpc: Convert fsl_rstcr_restart to a reset handler

2016-07-25 Thread Andrey Smirnov

Convert fsl_rstcr_restart into a function to be registered with
register_reset_handler() API and introduce fls_rstcr_restart_register()
function that can be added as an initcall that would do aforementioned
registration.

Signed-off-by: Andrey Smirnov 
---
 arch/powerpc/platforms/85xx/bsc913x_qds.c |  2 +-
 arch/powerpc/platforms/85xx/bsc913x_rdb.c |  2 +-
 arch/powerpc/platforms/85xx/c293pcie.c|  2 +-
 arch/powerpc/platforms/85xx/corenet_generic.c |  2 +-
 arch/powerpc/platforms/85xx/ge_imp3a.c|  2 +-
 arch/powerpc/platforms/85xx/mpc8536_ds.c  |  2 +-
 arch/powerpc/platforms/85xx/mpc85xx_ads.c |  2 +-
 arch/powerpc/platforms/85xx/mpc85xx_cds.c | 26 +++---
 arch/powerpc/platforms/85xx/mpc85xx_ds.c  |  7 ---
 arch/powerpc/platforms/85xx/mpc85xx_mds.c |  7 ---
 arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 21 +++--
 arch/powerpc/platforms/85xx/mvme2500.c|  2 +-
 arch/powerpc/platforms/85xx/p1010rdb.c|  2 +-
 arch/powerpc/platforms/85xx/p1022_ds.c|  2 +-
 arch/powerpc/platforms/85xx/p1022_rdk.c   |  3 ++-
 arch/powerpc/platforms/85xx/p1023_rdb.c   |  2 +-
 arch/powerpc/platforms/85xx/ppa8548.c |  2 +-
 arch/powerpc/platforms/85xx/qemu_e500.c   |  2 +-
 arch/powerpc/platforms/85xx/sbc8548.c |  2 +-
 arch/powerpc/platforms/85xx/socrates.c|  2 +-
 arch/powerpc/platforms/85xx/stx_gp3.c |  2 +-
 arch/powerpc/platforms/85xx/tqm85xx.c |  2 +-
 arch/powerpc/platforms/85xx/twr_p102x.c   |  2 +-
 arch/powerpc/platforms/85xx/xes_mpc85xx.c |  7 ---
 arch/powerpc/platforms/86xx/gef_ppc9a.c   |  2 +-
 arch/powerpc/platforms/86xx/gef_sbc310.c  |  2 +-
 arch/powerpc/platforms/86xx/gef_sbc610.c  |  2 +-
 arch/powerpc/platforms/86xx/mpc8610_hpcd.c|  2 +-
 arch/powerpc/platforms/86xx/mpc86xx_hpcn.c|  2 +-
 arch/powerpc/platforms/86xx/sbc8641d.c|  2 +-
 arch/powerpc/sysdev/fsl_soc.c | 22 +-
 arch/powerpc/sysdev/fsl_soc.h |  2 +-
 32 files changed, 86 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/bsc913x_qds.c 
b/arch/powerpc/platforms/85xx/bsc913x_qds.c
index 07dd6ae..14ea7a0 100644
--- a/arch/powerpc/platforms/85xx/bsc913x_qds.c
+++ b/arch/powerpc/platforms/85xx/bsc913x_qds.c
@@ -53,6 +53,7 @@ static void __init bsc913x_qds_setup_arch(void)
 }
 
 machine_arch_initcall(bsc9132_qds, mpc85xx_common_publish_devices);
+machine_arch_initcall(bsc9133_qds, fsl_rstcr_restart_register);
 
 /*
  * Called very early, device-tree isn't unflattened
@@ -72,7 +73,6 @@ define_machine(bsc9132_qds) {
.pcibios_fixup_bus  = fsl_pcibios_fixup_bus,
 #endif
.get_irq= mpic_get_irq,
-   .restart= fsl_rstcr_restart,
.calibrate_decr = generic_calibrate_decr,
.progress   = udbg_progress,
 };
diff --git a/arch/powerpc/platforms/85xx/bsc913x_rdb.c 
b/arch/powerpc/platforms/85xx/bsc913x_rdb.c
index e48f671..cd4e717 100644
--- a/arch/powerpc/platforms/85xx/bsc913x_rdb.c
+++ b/arch/powerpc/platforms/85xx/bsc913x_rdb.c
@@ -43,6 +43,7 @@ static void __init bsc913x_rdb_setup_arch(void)
 }
 
 machine_device_initcall(bsc9131_rdb, mpc85xx_common_publish_devices);
+machine_arch_initcall(bsc9131_rdb, fsl_rstcr_restart_register);
 
 /*
  * Called very early, device-tree isn't unflattened
@@ -59,7 +60,6 @@ define_machine(bsc9131_rdb) {
.setup_arch = bsc913x_rdb_setup_arch,
.init_IRQ   = bsc913x_rdb_pic_init,
.get_irq= mpic_get_irq,
-   .restart= fsl_rstcr_restart,
.calibrate_decr = generic_calibrate_decr,
.progress   = udbg_progress,
 };
diff --git a/arch/powerpc/platforms/85xx/c293pcie.c 
b/arch/powerpc/platforms/85xx/c293pcie.c
index 3b9e3f0..fbd63f9 100644
--- a/arch/powerpc/platforms/85xx/c293pcie.c
+++ b/arch/powerpc/platforms/85xx/c293pcie.c
@@ -48,6 +48,7 @@ static void __init c293_pcie_setup_arch(void)
 }
 
 machine_arch_initcall(c293_pcie, mpc85xx_common_publish_devices);
+machine_arch_initcall(c293_pcie, fsl_rstcr_restart_register);
 
 /*
  * Called very early, device-tree isn't unflattened
@@ -65,7 +66,6 @@ define_machine(c293_pcie) {
.setup_arch = c293_pcie_setup_arch,
.init_IRQ   = c293_pcie_pic_init,
.get_irq= mpic_get_irq,
-   .restart= fsl_rstcr_restart,
.calibrate_decr = generic_calibrate_decr,
.progress   = udbg_progress,
 };
diff --git a/arch/powerpc/platforms/85xx/corenet_generic.c 
b/arch/powerpc/platforms/85xx/corenet_generic.c
index 3a6a84f..297379b 100644
--- a/arch/powerpc/platforms/85xx/corenet_generic.c
+++ b/arch/powerpc/platforms/85xx/corenet_generic.c
@@ -225,7 +225,6 @@ define_machine(corenet_generic) {
 #else
.get_irq

[PATCH 1/3] powerpc: Factor out common code in setup-common.c

2016-07-25 Thread Andrey Smirnov

Factor out a small bit of common code in machine_restart(),
machine_power_off() and machine_halt().

Signed-off-by: Andrey Smirnov 
---
 arch/powerpc/kernel/setup-common.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 714b4ba..5cd3283 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -130,15 +130,22 @@ void machine_shutdown(void)
ppc_md.machine_shutdown();
 }
 
+static void machine_hang(void)
+{
+   pr_emerg("System Halted, OK to turn off power\n");
+   local_irq_disable();
+   while (1)
+   ;
+}
+
 void machine_restart(char *cmd)
 {
machine_shutdown();
if (ppc_md.restart)
ppc_md.restart(cmd);
+
smp_send_stop();
-   printk(KERN_EMERG "System Halted, OK to turn off power\n");
-   local_irq_disable();
-   while (1) ;
+   machine_hang();
 }
 
 void machine_power_off(void)
@@ -146,10 +153,9 @@ void machine_power_off(void)
machine_shutdown();
if (pm_power_off)
pm_power_off();
+
smp_send_stop();
-   printk(KERN_EMERG "System Halted, OK to turn off power\n");
-   local_irq_disable();
-   while (1) ;
+   machine_hang();
 }
 /* Used by the G5 thermal driver */
 EXPORT_SYMBOL_GPL(machine_power_off);
@@ -162,10 +168,9 @@ void machine_halt(void)
machine_shutdown();
if (ppc_md.halt)
ppc_md.halt();
+
smp_send_stop();
-   printk(KERN_EMERG "System Halted, OK to turn off power\n");
-   local_irq_disable();
-   while (1) ;
+   machine_hang();
 }
 
 
-- 
2.5.5

Re: [PATCH v2 02/10] userns: Add per user namespace sysctls.

2016-07-25 Thread Eric W. Biederman

David Miller  writes:

> From: ebied...@xmission.com (Eric W. Biederman)
> Date: Mon, 25 Jul 2016 19:44:50 -0500
>
>> User namespaces have enabled unprivileged users access to a lot more
>> data structures and so to catch programs that go crazy we need a lot
>> more limits.  I believe some of those limits make sense per namespace.
>> As it is easy in some cases to say any more than Y number of those
>> per namespace is excessive.   For example a limit of 1,000,000 ipv4
>> routes per network namespaces is a sanity check as there are
>> currently 621,649 ipv4 prefixes advertized in bgp.
>
> When we give a new namespace to unprivileged users, we honestly should
> make the sysctl settings we give to them become "limits".  They can
> further constrain the sysctl settings but may not raise them.

I won't disagree.  I was thinking in terms of global setting that
hold the limits for per namespace counters.  As we are talking sanity
check limits.

Perhaps we could get sophisticated and do something more but the simpler
we can make things and get the job done the better.

Eric

[PATCH 3/3] mm/duet: framework code

2016-07-25 Thread George Amvrosiadis

The Duet framework code:

- bittree.c: red-black bitmap tree that keeps track of items of interest
- debug.c: functions used to print information used to debug Duet
- hash.c: implementation of the global hash table where page events are stored
  for all tasks
- hook.c: the function invoked by the page cache hooks when Duet is online
- init.c: routines used to bring Duet online or offline
- path.c: routines performing resolution of UUIDs to paths using d_path
- task.c: implementation of Duet task fd operations

Signed-off-by: George Amvrosiadis 
---
 init/Kconfig  |   2 +
 mm/Makefile   |   1 +
 mm/duet/Kconfig   |  31 +++
 mm/duet/Makefile  |   7 +
 mm/duet/bittree.c | 537 +
 mm/duet/common.h  | 211 
 mm/duet/debug.c   |  98 +
 mm/duet/hash.c| 315 +
 mm/duet/hook.c|  81 
 mm/duet/init.c| 172 
 mm/duet/path.c| 184 +
 mm/duet/syscall.h |  61 ++
 mm/duet/task.c| 584 ++
 13 files changed, 2284 insertions(+)
 create mode 100644 mm/duet/Kconfig
 create mode 100644 mm/duet/Makefile
 create mode 100644 mm/duet/bittree.c
 create mode 100644 mm/duet/common.h
 create mode 100644 mm/duet/debug.c
 create mode 100644 mm/duet/hash.c
 create mode 100644 mm/duet/hook.c
 create mode 100644 mm/duet/init.c
 create mode 100644 mm/duet/path.c
 create mode 100644 mm/duet/syscall.h
 create mode 100644 mm/duet/task.c

diff --git a/init/Kconfig b/init/Kconfig
index c02d897..6f94b5a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -294,6 +294,8 @@ config USELIB
  earlier, you may need to enable this syscall.  Current systems
  running glibc can safely disable this.
 
+source mm/duet/Kconfig
+
 config AUDIT
bool "Auditing support"
depends on NET
diff --git a/mm/Makefile b/mm/Makefile
index 78c6f7d..074c15f 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -99,3 +99,4 @@ obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
 obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
 obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
+obj-$(CONFIG_DUET) += duet/
diff --git a/mm/duet/Kconfig b/mm/duet/Kconfig
new file mode 100644
index 000..2f3a0c5
--- /dev/null
+++ b/mm/duet/Kconfig
@@ -0,0 +1,31 @@
+config DUET
+   bool "Duet framework support"
+
+   help
+ Duet is a framework aiming to reduce the IO footprint of analytics
+ and maintenance work. By exposing page cache events to these tasks,
+ it allows them to adapt their data processing order, in order to
+ benefit from data available in the page cache. Duet's operation is
+ based on hooks into the page cache.
+
+ To compile support for Duet, say Y.
+
+config DUET_STATS
+   bool "Duet statistics collection"
+   depends on DUET
+   help
+ This option enables support for the collection of statistics on the
+ operation of Duet. It will print information about the data structures
+ used internally, and profiling information about the framework.
+
+ If unsure, say N.
+
+config DUET_DEBUG
+   bool "Duet debugging support"
+   depends on DUET
+   help
+ Enable runtime debugging support for the Duet framework. This may
+ enable additional and expensive checks with negative impact on
+ performance.
+
+ To compile debugging support for Duet, say Y. If unsure, say N.
diff --git a/mm/duet/Makefile b/mm/duet/Makefile
new file mode 100644
index 000..c0c9e11
--- /dev/null
+++ b/mm/duet/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for the linux Duet framework.
+#
+
+obj-$(CONFIG_DUET) += duet.o
+
+duet-y := init.o hash.o hook.o task.o bittree.o path.o debug.o
diff --git a/mm/duet/bittree.c b/mm/duet/bittree.c
new file mode 100644
index 000..3b20c35
--- /dev/null
+++ b/mm/duet/bittree.c
@@ -0,0 +1,537 @@
+/*
+ * Copyright (C) 2016 George Amvrosiadis.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include "common.h"
+
+#define BMAP_READ  0x01/* Read bmaps (overrides other flags) */
+#define BMAP_CHECK 0x02/* Check given bmap value expression */
+   /* Sets bmaps to match expression if not set */
+
+/* Bmap expressions can be formed using the following flags: */
+#define BMAP_DONE_SET  0x04/* Set done bmap values */
+#define BMAP_DONE_RST  0x08/* Reset done bmap values */
+#define BMAP_RELV_SET  0x10/* Set re

[PATCH 1/3] mm: support for duet hooks

2016-07-25 Thread George Amvrosiadis

Adds the Duet hooks in the page cache. In filemap.c, two hooks are added at the
time of addition and removal of a page descriptor. In page-flags.h, two more
hooks are added to track page dirtying and flushing.

The hooks are inactive while Duet is offline.

Signed-off-by: George Amvrosiadis 
---
 include/linux/duet.h   | 43 +
 include/linux/page-flags.h | 53 ++
 mm/filemap.c   | 11 ++
 3 files changed, 107 insertions(+)
 create mode 100644 include/linux/duet.h

diff --git a/include/linux/duet.h b/include/linux/duet.h
new file mode 100644
index 000..80491e2
--- /dev/null
+++ b/include/linux/duet.h
@@ -0,0 +1,43 @@
+/*
+ * Defs necessary for Duet hooks
+ *
+ * Author: George Amvrosiadis 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#ifndef _DUET_H
+#define _DUET_H
+
+/*
+ * Duet hooks into the page cache to monitor four types of events:
+ *   ADDED:a page __descriptor__ was inserted into the page cache
+ *   REMOVED:  a page __describptor__ was removed from the page cache
+ *   DIRTY:page's dirty bit was set
+ *   FLUSHED:  page's dirty bit was cleared
+ */
+#define DUET_PAGE_ADDED0x0001
+#define DUET_PAGE_REMOVED  0x0002
+#define DUET_PAGE_DIRTY0x0004
+#define DUET_PAGE_FLUSHED  0x0008
+
+#define DUET_HOOK(funp, evt, data) \
+   do { \
+   rcu_read_lock(); \
+   funp = rcu_dereference(duet_hook_fp); \
+   if (funp) \
+   funp(evt, (void *)data); \
+   rcu_read_unlock(); \
+   } while (0)
+
+/* Hook function pointer initialized by the Duet framework */
+typedef void (duet_hook_t) (__u16, void *);
+extern duet_hook_t *duet_hook_fp;
+
+#endif /* _DUET_H */
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e5a3244..53be4a0 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -12,6 +12,9 @@
 #include 
 #include 
 #endif /* !__GENERATING_BOUNDS_H */
+#ifdef CONFIG_DUET
+#include 
+#endif /* CONFIG_DUET */
 
 /*
  * Various page->flags bits:
@@ -254,8 +257,58 @@ PAGEFLAG(Error, error, PF_NO_COMPOUND) 
TESTCLEARFLAG(Error, error, PF_NO_COMPOUN
 PAGEFLAG(Referenced, referenced, PF_HEAD)
TESTCLEARFLAG(Referenced, referenced, PF_HEAD)
__SETPAGEFLAG(Referenced, referenced, PF_HEAD)
+#ifdef CONFIG_DUET
+TESTPAGEFLAG(Dirty, dirty, PF_HEAD)
+
+static inline void SetPageDirty(struct page *page)
+{
+   duet_hook_t *dhfp = NULL;
+
+   if (!test_and_set_bit(PG_dirty, &page->flags))
+   DUET_HOOK(dhfp, DUET_PAGE_DIRTY, page);
+}
+
+static inline void __ClearPageDirty(struct page *page)
+{
+   duet_hook_t *dhfp = NULL;
+
+   if (__test_and_clear_bit(PG_dirty, &page->flags))
+   DUET_HOOK(dhfp, DUET_PAGE_FLUSHED, page);
+}
+
+static inline void ClearPageDirty(struct page *page)
+{
+   duet_hook_t *dhfp = NULL;
+
+   if (test_and_clear_bit(PG_dirty, &page->flags))
+   DUET_HOOK(dhfp, DUET_PAGE_FLUSHED, page);
+}
+
+static inline int TestSetPageDirty(struct page *page)
+{
+   duet_hook_t *dhfp = NULL;
+
+   if (!test_and_set_bit(PG_dirty, &page->flags)) {
+   DUET_HOOK(dhfp, DUET_PAGE_DIRTY, page);
+   return 0;
+   }
+   return 1;
+}
+
+static inline int TestClearPageDirty(struct page *page)
+{
+   duet_hook_t *dhfp = NULL;
+
+   if (test_and_clear_bit(PG_dirty, &page->flags)) {
+   DUET_HOOK(dhfp, DUET_PAGE_FLUSHED, page);
+   return 1;
+   }
+   return 0;
+}
+#else
 PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD)
__CLEARPAGEFLAG(Dirty, dirty, PF_HEAD)
+#endif /* CONFIG_DUET */
 PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD)
 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
TESTCLEARFLAG(Active, active, PF_HEAD)
diff --git a/mm/filemap.c b/mm/filemap.c
index 20f3b1f..f06ebc0 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -166,6 +166,11 @@ static void page_cache_tree_delete(struct address_space 
*mapping,
 void __delete_from_page_cache(struct page *page, void *shadow)
 {
struct address_space *mapping = page->mapping;
+#ifdef CONFIG_DUET
+   duet_hook_t *dhfp = NULL;
+
+   DUET_HOOK(dhfp, DUET_PAGE_REMOVED, page);
+#endif /* CONFIG_DUET */
 
trace_mm_filemap_delete_from_page_cache(page);
/*
@@ -628,6 +633,9 @@ static int __add_to_page_cache_locked(struct page *page,
int huge = PageHuge(page);

[PATCH 2/3] mm/duet: syscall wiring

2016-07-25 Thread George Amvrosiadis

Usual syscall wiring for the four Duet syscalls.

Signed-off-by: George Amvrosiadis 
---
 arch/x86/entry/syscalls/syscall_32.tbl |  4 
 arch/x86/entry/syscalls/syscall_64.tbl |  4 
 include/linux/syscalls.h   |  8 
 include/uapi/asm-generic/unistd.h  | 12 +++-
 kernel/sys_ni.c|  6 ++
 5 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index 4cddd17..f34ff94 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -386,3 +386,7 @@
 377i386copy_file_range sys_copy_file_range
 378i386preadv2 sys_preadv2 
compat_sys_preadv2
 379i386pwritev2sys_pwritev2
compat_sys_pwritev2
+380i386duet_status sys_duet_status
+381i386duet_init   sys_duet_init
+382i386duet_bmap   sys_duet_bmap
+383i386duet_get_path   sys_duet_get_path
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index 555263e..d04efaa 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -335,6 +335,10 @@
 326common  copy_file_range sys_copy_file_range
 32764  preadv2 sys_preadv2
 32864  pwritev2sys_pwritev2
+329common  duet_status sys_duet_status
+330common  duet_init   sys_duet_init
+331common  duet_bmap   sys_duet_bmap
+332common  duet_get_path   sys_duet_get_path
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index d022390..da1049e 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -65,6 +65,8 @@ struct old_linux_dirent;
 struct perf_event_attr;
 struct file_handle;
 struct sigaltstack;
+struct duet_status_args;
+struct duet_uuid_arg;
 union bpf_attr;
 
 #include 
@@ -898,4 +900,10 @@ asmlinkage long sys_copy_file_range(int fd_in, loff_t 
__user *off_in,
 
 asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
 
+asmlinkage long sys_duet_status(u16 flags, struct duet_status_args __user 
*arg);
+asmlinkage long sys_duet_init(const char __user *taskname, u32 regmask,
+ const char __user *pathname);
+asmlinkage long sys_duet_bmap(u16 flags, struct duet_uuid_arg __user *arg);
+asmlinkage long sys_duet_get_path(struct duet_uuid_arg __user *uarg,
+ char __user *pathbuf, int pathbufsize);
 #endif
diff --git a/include/uapi/asm-generic/unistd.h 
b/include/uapi/asm-generic/unistd.h
index a26415b..7c287c0 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -725,8 +725,18 @@ __SC_COMP(__NR_preadv2, sys_preadv2, compat_sys_preadv2)
 #define __NR_pwritev2 287
 __SC_COMP(__NR_pwritev2, sys_pwritev2, compat_sys_pwritev2)
 
+/* mm/duet/syscall.c */
+#define __NR_duet_status 288
+__SYSCALL(__NR_duet_status, sys_duet_status)
+#define __NR_duet_init 289
+__SYSCALL(__NR_duet_init, sys_duet_init)
+#define __NR_duet_bmap 290
+__SYSCALL(__NR_duet_bmap, sys_duet_bmap)
+#define __NR_duet_get_path 291
+__SYSCALL(__NR_duet_get_path, sys_duet_get_path)
+
 #undef __NR_syscalls
-#define __NR_syscalls 288
+#define __NR_syscalls 292
 
 /*
  * All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 2c5e3a8..3d4c53a 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -176,6 +176,12 @@ cond_syscall(sys_capget);
 cond_syscall(sys_capset);
 cond_syscall(sys_copy_file_range);
 
+/* Duet syscall entries */
+cond_syscall(sys_duet_status);
+cond_syscall(sys_duet_init);
+cond_syscall(sys_duet_bmap);
+cond_syscall(sys_duet_get_path);
+
 /* arch-specific weak syscall entries */
 cond_syscall(sys_pciconfig_read);
 cond_syscall(sys_pciconfig_write);
-- 
2.7.4

[PATCH 0/3] new feature: monitoring page cache events

2016-07-25 Thread George Amvrosiadis

I'm attaching a patch set implementing a mechanism we call Duet, which allows
applications to monitor events at the page cache level: page additions,
removals, dirtying, and flushing. Using such events, applications can identify
and prioritize processing of cached data, thereby reducing their I/O footprint.

One user of these events are maintenance tasks that scan large amounts of data
(e.g., backup, defrag, scrubbing). Knowing what is currently cached allows them
to piggy-back on each other and other applications running in the system. I've
managed to run up to 3 such applications together (backup, scrubbing, defrag)
and have them finish their work with 1/3rd of the I/O by using Duet. In this
case, the task that traversed the data the fastest (scrubber) allowed the rest
of the tasks to piggyback on the data brought into the cache. I.e., a file that
was read to be backed up was also picked up by the scrubber and defrag process.

I've found adapting applications to be straight-forward. Although I don't
include examples in this patch set, I've adapted btrfs scrubbing, btrfs send
(backup), btrfs defrag, rsync, and f2fs garbage collection in a few hundred
lines of code each (basically just had to add an event handler and wire it up
to the task's processing loop). You can read more about this in our full paper:
http://dl.acm.org/citation.cfm?id=2815424. I'd be happy to generate subsequent
patch sets for individual tasks if there's interest in this one. We've also
used Duet to speed up Hadoop and Spark by taking into account cache residency
of HDFS blocks across the cluster, when scheduling tasks, by up to 54%
depending on overlap on the data processed:
https://www.usenix.org/conference/hotstorage16/workshop-program/presentation/deslauriers


Syscall interface (and how it works): Duet uses hooks into the page cache (see
the "mm: support for duet hooks" patch). These hooks inform Duet of page events,
which are stored in a hash table. Only events that are of interest to running
tasks are stored, and only one copy of each event is stored for all interested
tasks. To register for events, the following syscalls are used (see the
"mm/duet: syscall wiring" patch for prototypes):

- sys_duet_init(char *taskname, u32 regmask, char *path): returns an fd that
  watches for events under PATH (e.g. '/home') and are also described in the
  REGMASK (e.g. DUET_PAGE_ADDED | DUET_PAGE_REMOVED). TASKNAME is an optional,
  human-readable name for the task.

- sys_duet_bmap(u16 flags, struct duet_uuid_arg *uuid): Duet allows applications
  to track processed items on an internal bitmap (which improves performance by
  being used to filter unnecessary events). The specified UUID is what read()
  returns on the fd created with sys_duet_init(), and uniquely identifies a
  file. FLAGS allow the bitmap to be set, reset, or have its state checked.

- sys_duet_get_path(struct duet_uuid_arg *uuid, char *buf, int bufsize):
  Applications running with Duet do not understand UUIDs, but pathnames. This
  syscall traverses the dentry cache and returns the corresponding path in BUF.

- sys_duet_status(u16 flags, struct duet_status_args *arg): Currently, the Duet
  framework can be turned on/off manually. This allows the admin to specify the
  number of max applications that will be registered concurrently, which allows
  us to size the internal hash table nodes appropriately (and limit performance
  or memory overhead). The syscall is also used for debugging purposes. I think
  this functionality should probably be exposed through ioctl()s to a device,
  and I'm open to suggestions on how to improve the current implementation.

The framework itself (a bit less than 2300 LoC) is currently placed under
mm/duet and the code is included in the "mm/duet: framework code" patch.


Application interface: Applications interface with Duet through a user library,
which is available at https://github.com/gamvrosi/duet-tools. In the same repo,
I have included a dummy_task application which provides an example of how Duet
can be used.


Changelog: The patches are based on Linus' v4.7 tag, and touch on the following
parts of the kernel:

- mm/filemap.c and include/linux/page-flags.h: hooks in the page cache to track
  page events on page addition, removal, dirtying, and flushing.

- arch/x86/*, include/linux/syscalls.h, kernel/sys_ni.h: wiring the 4 syscalls

- mm/duet/*: framework code



George Amvrosiadis (3):
  mm: support for duet hooks
  mm/duet: syscall wiring
  mm/duet: framework code

 arch/x86/entry/syscalls/syscall_32.tbl |   4 +
 arch/x86/entry/syscalls/syscall_64.tbl |   4 +
 include/linux/duet.h   |  43 +++
 include/linux/page-flags.h |  53 +++
 include/linux/syscalls.h   |   8 +
 include/uapi/asm-generic/unistd.h  |  12 +-
 init/Kconfig   |   2 +
 kernel/sys_ni.c|   6 +
 mm/Makefile|   1 +
 mm/duet/Kconfig

Re: [PATCH v2 3/3] x86/apic: Improved the setting of interrupt mode for bsp

2016-07-25 Thread Eric W. Biederman

Wei Jiangang  writes:

> If we specify the 'notsc' parameter for the dump-capture kernel,
> and then trigger a crash(panic) by using "ALT-SysRq-c" or
> "echo c > /proc/sysrq-trigger", the dump-capture kernel will
> hang in calibrate_delay_converge() and wait for jiffies changes.
> serial log as follows:
>
> tsc: Fast TSC calibration using PIT
> tsc: Detected 2099.947 MHz processor
> Calibrating delay loop...
>
> The reason for jiffies not changes is there's no timer interrupt
> passed to dump-capture kernel.
>
> In fact, once kernel panic occurs, the local APIC is disabled
> by lapic_shutdown() in reboot path.
> generly speaking, local APIC state can be initialized by BIOS
> after Power-Up or Reset, which doesn't apply to kdump case.
> so the kernel has to be responsible for initialize the interrupt
> mode properly according the latest status of APIC in bootup path.
>
> An MP operating system is booted under either PIC mode or
> virtual wire mode. Later, the operating system switches to
> symmetric I/O mode as it enters multiprocessor mode.
> Two kinds of virtual wire mode are defined in Intel MP spec:
> virtual wire mode via local APIC or via I/O APIC.
>
> Now we determine the mode of APIC only through a SMP BIOS(MP table).
> That's not enough. It's better to do further check if APIC works
> with effective interrupt mode, and then, do some proper setting.

Reading through the code let me pause a moment and say:
"Yowzers the interrupt initialization code has gotten hard to follow.  It
is now full of indirection with ill defined semantics."  pre_vector_init
indeed.

I will argue this is the wrong fix.

We really should not have to worry about getting the system functional
in virtual wire mode on a modern system.  And looking at the code
someone has done half the work and made it conditional under
acpi_gbl_reduced_hardware.

Now reduced hardware implies a bit more than we ware talking about but
if there is ACPI apic information we should not need to worry about
external interrupts and can just enable the apics.

In fact I think having MPtable information is enough for that.

So I think what needs to happens is for the apic initialization to get
an overhaul that makes apic initialization the happy path and the other
irq controllers the odd backwards compatibility path.  And when we
are done we never run in anything except full apic mode unless the
hardware doesn't support it.

I think that will leave things more robust as we don't need to setup
and then reset up the interrupts during boot.

Eric


> Signed-off-by: Cao jin 
> Signed-off-by: Wei Jiangang 
> ---
>  arch/x86/include/asm/io_apic.h |  5 
>  arch/x86/kernel/apic/apic.c| 60 
> +-
>  arch/x86/kernel/apic/io_apic.c | 28 
>  3 files changed, 92 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
> index 6cbf2cfb3f8a..a3257366bf7f 100644
> --- a/arch/x86/include/asm/io_apic.h
> +++ b/arch/x86/include/asm/io_apic.h
> @@ -190,6 +190,7 @@ static inline unsigned int io_apic_read(unsigned int 
> apic, unsigned int reg)
>  }
>  
>  extern void setup_IO_APIC(void);
> +extern bool virt_wire_through_ioapic(void);
>  extern void enable_IO_APIC(void);
>  extern void disable_IO_APIC(void);
>  extern void setup_ioapic_dest(void);
> @@ -231,6 +232,10 @@ static inline void io_apic_init_mappings(void) { }
>  #define native_disable_io_apic   NULL
>  
>  static inline void setup_IO_APIC(void) { }
> +static inline bool virt_wire_through_ioapic(void)
> +{
> + return false;
> +}
>  static inline void enable_IO_APIC(void) { }
>  static inline void setup_ioapic_dest(void) { }
>  
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 8e25b9b2d351..a3939fb130cc 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -1124,6 +1124,58 @@ void __init sync_Arb_IDs(void)
>  }
>  
>  /*
> + * Check APIC enable/disable flag
> + */
> +static bool check_apic_enabled(void)
> +{
> + unsigned int value;
> +
> + /*
> +  * If APIC is disabled globally (IA32_APIC_BASE[11] == 0)
> +  * the boot cpu hasn't X86_FEATURE_APIC,
> +  * and init_bsp_APIC() has already checked it before.
> +  * so no need to check global enable/disable flag here
> +  */
> +
> + /* Check the software enable/disable flag */
> + value = apic_read(APIC_SPIV);
> + if (!(value & APIC_SPIV_APIC_ENABLED))
> + return false;
> +
> + return true;
> +}
> +
> +/*
> + * Return false means the through-local-APIC virtual wire mode is inactive
> + */
> +static bool virt_wire_through_lapic(void)
> +{
> + unsigned int value;
> +
> + /*
> +  * The through-local-APIC virtual wire mode requests
> +  * local APIC to enable LINT0 for ExtINT delivery mode
> +  * and LINT1 for NMI delivery mode
> +  */
> + value = apic_read(APIC_LVT0);
> + if (GET_APIC_

linux-next: manual merge of the xen-tip tree with the tip tree

2016-07-25 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the xen-tip tree got a conflict in:

  arch/x86/xen/smp.c

between commit:

  4c9075835511 ("xen/x86: Move irq allocation from Xen smp_op.cpu_up()")

from the tip tree and commit:

  ad5475f9faf5 ("x86/xen: use xen_vcpu_id mapping for HYPERVISOR_vcpu_op")

from the xen-tip tree.

I fixed it up (I think - see below) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/x86/xen/smp.c
index 09d5cc062dbe,0b4d04c8ab4d..
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@@ -486,7 -495,11 +493,7 @@@ static int xen_cpu_up(unsigned int cpu
  
xen_pmu_init(cpu);
  
-   rc = HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL);
 -  rc = xen_smp_intr_init(cpu);
 -  if (rc)
 -  return rc;
 -
+   rc = HYPERVISOR_vcpu_op(VCPUOP_up, xen_vcpu_nr(cpu), NULL);
BUG_ON(rc);
  
while (cpu_report_state(cpu) != CPU_ONLINE)

linux-next: manual merge of the xen-tip tree with the tip tree

2016-07-25 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the xen-tip tree got a conflict in:

  arch/x86/xen/enlighten.c

between commit:

  4c9075835511 ("xen/x86: Move irq allocation from Xen smp_op.cpu_up()")

from the tip tree and commit:

  88e957d6e47f ("xen: introduce xen_vcpu_id mapping")

from the xen-tip tree.

I fixed it up (I think - see below) and can carry the fix as necessary.
This is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/x86/xen/enlighten.c
index dc96f939af88,85ef4c0442e0..
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@@ -1803,49 -1823,21 +1824,53 @@@ static void __init init_hvm_pv_info(voi
xen_domain_type = XEN_HVM_DOMAIN;
  }
  
 -static int xen_hvm_cpu_notify(struct notifier_block *self, unsigned long 
action,
 -void *hcpu)
 +static int xen_cpu_notify(struct notifier_block *self, unsigned long action,
 +void *hcpu)
  {
int cpu = (long)hcpu;
 +  int rc;
 +
switch (action) {
case CPU_UP_PREPARE:
 -  if (cpu_acpi_id(cpu) != U32_MAX)
 -  per_cpu(xen_vcpu_id, cpu) = cpu_acpi_id(cpu);
 -  else
 -  per_cpu(xen_vcpu_id, cpu) = cpu;
 -  xen_vcpu_setup(cpu);
 -  if (xen_have_vector_callback) {
 -  if (xen_feature(XENFEAT_hvm_safe_pvclock))
 -  xen_setup_timer(cpu);
 +  if (xen_hvm_domain()) {
 +  /*
 +   * This can happen if CPU was offlined earlier and
 +   * offlining timed out in common_cpu_die().
 +   */
 +  if (cpu_report_state(cpu) == CPU_DEAD_FROZEN) {
 +  xen_smp_intr_free(cpu);
 +  xen_uninit_lock_cpu(cpu);
 +  }
 +
++  if (cpu_acpi_id(cpu) != U32_MAX)
++  per_cpu(xen_vcpu_id, cpu) = cpu_acpi_id(cpu);
++  else
++  per_cpu(xen_vcpu_id, cpu) = cpu;
 +  xen_vcpu_setup(cpu);
}
 +
 +  if (xen_pv_domain() ||
 +  (xen_have_vector_callback &&
 +   xen_feature(XENFEAT_hvm_safe_pvclock)))
 +  xen_setup_timer(cpu);
 +
 +  rc = xen_smp_intr_init(cpu);
 +  if (rc) {
 +  WARN(1, "xen_smp_intr_init() for CPU %d failed: %d\n",
 +   cpu, rc);
 +  return NOTIFY_BAD;
 +  }
 +
 +  break;
 +  case CPU_ONLINE:
 +  xen_init_lock_cpu(cpu);
 +  break;
 +  case CPU_UP_CANCELED:
 +  xen_smp_intr_free(cpu);
 +  if (xen_pv_domain() ||
 +  (xen_have_vector_callback &&
 +   xen_feature(XENFEAT_hvm_safe_pvclock)))
 +  xen_teardown_timer(cpu);
break;
default:
break;

Re: [PATCH v9 0/7] Make cpuid <-> nodeid mapping persistent

2016-07-25 Thread Dou Liyang




在 2016年07月26日 07:20, Andrew Morton 写道:

On Mon, 25 Jul 2016 16:35:42 +0800 Dou Liyang  wrote:


[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue 
caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is 
established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.


Plan B is to hunt down and fix up all the workqueue structures at
hotplug-time.  Has that option been evaluated?



Yes, the option has been evaluate in this patch:
http://www.gossamer-threads.com/lists/linux/kernel/2116748



Your fix is x86-only and this bug presumably affects other
architectures, yes?I think a "Plan B" would fix all architectures?



Yes, the bug may presumably affect few architectures which support CPU 
hotplug and NUMA.


We have sent the "Plan B" in our community and got a lot of advice and 
ideas. Based on these suggestions, We carefully balance that two plan. 
Then we choice the first.




Thirdly, what is the merge path for these patches?  Is an x86
or ACPI maintainer working with you on them?


Yes, we get a lot of guidance and help from RJ who is an ACPI maintainer.


Thanks,

Dou

[e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-25 Thread Fengguang Wu

Greetings,

This BUG message can be found in recent kernels as well as v4.4 and
linux-stable. It happens when running

modprobe netconsole netconsole=@/,$port@$server/ 

[   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 
offset -673.833841 sec
[   39.943285] netpoll: netconsole: local port 6665
[   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
[   39.943609] netpoll: netconsole: interface 'eth0'
[   39.943756] netpoll: netconsole: remote port 6672
[   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
[   39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
[   39.944311] netpoll: netconsole: local IP 192.168.1.193
[   39.944514] BUG: sleeping function called from invalid context at 
kernel/irq/manage.c:110
[   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
[   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 
4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
[   39.944518] Hardware name:  /DZ77BH-55K, BIOS 
BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
[   39.944522]   c90001f2f9e8 813417d9 
88007faba5c0
[   39.944524]  006e c90001f2fa00 810aec03 
81a25948
[   39.944525]  c90001f2fa28 810aec9a 8803e5bd9400 
8803e50fbd68
[   39.944526] Call Trace:
[   39.944533]  [] dump_stack+0x63/0x8a
[   39.944536]  [] ___might_sleep+0xd3/0x120
[   39.944537]  [] __might_sleep+0x4a/0x80
[   39.944541]  [] synchronize_irq+0x38/0xa0
[   39.944543]  [] ? __irq_put_desc_unlock+0x1e/0x40
[   39.944545]  [] ? __disable_irq_nosync+0x43/0x60
[   39.944547]  [] disable_irq+0x1c/0x20
[   39.944559]  [] e1000_netpoll+0xf2/0x120 [e1000e]
[   39.944563]  [] netpoll_poll_dev+0x5c/0x1a0
[   39.944567]  [] ? __kmalloc_reserve+0x31/0x90
[   39.944569]  [] netpoll_send_skb_on_dev+0x16b/0x250
[   39.944572]  [] netpoll_send_udp+0x2ec/0x450
[   39.944576]  [] write_msg+0xb2/0xf0 [netconsole]
[   39.944578]  [] call_console_drivers+0x115/0x120
[   39.944580]  [] console_unlock+0x333/0x5c0
[   39.944583]  [] register_console+0x1c4/0x380
[   39.944586]  [] init_netconsole+0x1c5/0x1000 [netconsole]
[   39.944588]  [] ? 0xa004f000
[   39.944591]  [] do_one_initcall+0x3d/0x150
[   39.944592]  [] ? __might_sleep+0x4a/0x80
[   39.944596]  [] ? kmem_cache_alloc_trace+0x188/0x1e0
[   39.944598]  [] do_init_module+0x5f/0x1d8
[   39.944602]  [] load_module+0x1429/0x1b40
[   39.944604]  [] ? __symbol_put+0x40/0x40
[   39.944607]  [] ? kernel_read_file+0x178/0x1a0
[   39.944608]  [] ? kernel_read_file_from_fd+0x49/0x80
[   39.944611]  [] SYSC_finit_module+0xc3/0xf0
[   39.944614]  [] SyS_finit_module+0xe/0x10
[   39.944617]  [] entry_SYSCALL_64_fastpath+0x1a/0xa9
[   39.946384] console [netcon0] enabled
[   39.946514] netconsole: network logging started

Can this be possibly fixed?

Thanks,
Fengguang

Re: [PATCH v3 3/3] mac80211: mesh: fixed HT ies in beacon template

2016-07-25 Thread Masashi Honma


On 2016年07月22日 14:26, Masashi Honma wrote:
> On 2016年07月14日 05:07, Yaniv Machani wrote:
>> +
>> +/* if channel width is 20MHz - configure HT capab accordingly*/
>> +if (sdata->vif.bss_conf.chandef.width == NL80211_CHAN_WIDTH_20) {
>> +cap &= ~IEEE80211_HT_CAP_SUP_WIDTH_20_40;
>> +cap &= ~IEEE80211_HT_CAP_DSSSCCK40;
>> +}
>
> I have tested this part of your patch and this works for me.
>
> Previouly, "Supported Channel Width Set bit" in HT Capabilities element
> was 1 even though disable_ht40=1 existed in wpa_supplicant.conf.
> After appllication of patch, the bit was 0.
>
>

# I retransmit this because of mail delivery errors.

I forgot to mention I have used this patch to test.
http://lists.infradead.org/pipermail/hostap/2016-July/036029.html

Re: [RFC patch 1/6] random: Simplify API for random address requests

2016-07-25 Thread Jason Cooper

All,

On Tue, Jul 26, 2016 at 03:01:55AM +, Jason Cooper wrote:
> To date, all callers of randomize_range() have set the length to 0, and
> check for a zero return value.  For the current callers, the only way
> to get zero returned is if end <= start.  Since they are all adding a
> constant to the start address, this is unnecessary.
> 
> We can remove a bunch of needless checks by simplifying the API to do
> just what everyone wants, return an address between [start, start +
> range].
> 
> While we're here, s/get_random_int/get_random_long/.  No current call
> site is adversely affected by get_random_int(), since all current range
> requests are < MAX_UINT.  However, we should match caller expectations
> to avoid coming up short (ha!) in the future.
> 
> Signed-off-by: Jason Cooper 
> ---
>  drivers/char/random.c  | 17 -
>  include/linux/random.h |  2 +-
>  2 files changed, 5 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 0158d3bff7e5..1251cb2cbab2 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1822,22 +1822,13 @@ unsigned long get_random_long(void)
>  EXPORT_SYMBOL(get_random_long);
>  
>  /*
> - * randomize_range() returns a start address such that
> - *
> - *[..  .]
> - *  start  end
> - *
> - * a  with size "len" starting at the return value is inside in the
> - * area defined by [start, end], but is otherwise randomized.
> + * randomize_addr() returns a page aligned address within [start, start +
> + * range]
>   */
>  unsigned long
> -randomize_range(unsigned long start, unsigned long end, unsigned long len)
> +randomize_addr(unsigned long start, unsigned long range)
>  {
> - unsigned long range = end - len - start;
> -
> - if (end <= start + len)
> - return 0;
> - return PAGE_ALIGN(get_random_int() % range + start);
> + return PAGE_ALIGN(get_random_long() % range + start);
>  }

bah!  old patch file.  This should have been:

if (range == 0)
return start;
else
return PAGE_ALIGN(get_random_long() % range + start);

sorry,

Jason.

>  
>  /* Interface for in-kernel drivers of true hardware RNGs.
> diff --git a/include/linux/random.h b/include/linux/random.h
> index e47e533742b5..1ad877a98186 100644
> --- a/include/linux/random.h
> +++ b/include/linux/random.h
> @@ -34,7 +34,7 @@ extern const struct file_operations random_fops, 
> urandom_fops;
>  
>  unsigned int get_random_int(void);
>  unsigned long get_random_long(void);
> -unsigned long randomize_range(unsigned long start, unsigned long end, 
> unsigned long len);
> +unsigned long randomize_addr(unsigned long start, unsigned long range);
>  
>  u32 prandom_u32(void);
>  void prandom_bytes(void *buf, size_t nbytes);
> -- 
> 2.9.2
>

Re: [PATCH] iio: adc: rockchip_saradc: Explicitly disable ADC on probe

2016-07-25 Thread Guenter Roeck


On 07/25/2016 07:51 PM, Caesar Wang wrote:

Hi Guenter,

Thanks for fixing it.

On 2016年07月26日 03:39, Guenter Roeck wrote:

If the ADC is read for the first time, the caller gets a timeout error,
and the kernel log shows

read channel() error: -110

The ADC may be enabled on boot, and needs to be explicitly disabled
for a read sequence to work (otherwise there is no completion interrupt).
Disaple it explicitly in the probe function.

Fixes: 44d6f2ef94f9 ("iio: adc: add driver for Rockchip saradc")
Signed-off-by: Guenter Roeck 
---
  drivers/iio/adc/rockchip_saradc.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/iio/adc/rockchip_saradc.c 
b/drivers/iio/adc/rockchip_saradc.c
index f9ad6c2d6821..6aa3271d86b5 100644
--- a/drivers/iio/adc/rockchip_saradc.c
+++ b/drivers/iio/adc/rockchip_saradc.c
@@ -280,6 +280,9 @@ static int rockchip_saradc_probe(struct platform_device 
*pdev)
  goto err_pclk;
  }
+/* Make sure ADC is disabled */
+writel_relaxed(0, info->regs + SARADC_CTRL);


I think we should reset the saradc controller.
Since make sure the reset value is 0 and loader-->kernel may even cause harm, 
as my experience on tsadc. (drivers/thermal/rockchip_thermal.c)


e.g.:
/**
* Reset SARADC Controller, reset all saradc registers.
*/
static void rockchip_saradc_reset_controller(struct reset_control *reset)
{
reset_control_assert(reset);
usleep_range(10, 20);
reset_control_deassert(reset);
}

..probe()
{
...
rockchip_saradc_reset_controller();
...
}



Ok, I'll give it a try.

Guenter



-
Caesar


+
  platform_set_drvdata(pdev, indio_dev);
  indio_dev->name = dev_name(&pdev->dev);

Re: [PATCH 04/32] x86/intel_rdt: Add L3 cache capacity bitmask management

2016-07-25 Thread Luck, Tony

You must specify a mask for each L3 cache. So you can achieve your 80/80 split 
either with one rdtgroup that has an 80% mask on each of the sockets and using 
affinity to make one VM run only on CPUs on one socket and the second VM on the 
other. 

Or separate rdtgroups for each VM that give them the 80% when they are on their 
own socket and the spare 20% if the wander off to the other socket.

Sent from my iPhone

> On Jul 25, 2016, at 19:13, Marcelo Tosatti  wrote:
> 
>> On Fri, Jul 22, 2016 at 02:43:23PM -0700, Luck, Tony wrote:
>>> On Fri, Jul 22, 2016 at 04:12:04AM -0300, Marcelo Tosatti wrote:
>>> How does this patchset handle the following condition:
>>> 
>>> 6) Create reservations in such a way that the sum is larger than
>>> total amount of cache, and CPU pinning (example from Karen Noel):
>>> 
>>> VM-1 on socket-1 with 80% of reservation.
>>> VM-2 on socket-2 with 80% of reservation.
>>> VM-1 pinned to socket-1.
>>> VM-2 pinned to socket-2.
>> 
>> That's legal, but perhaps we need a description of
>> overlapping cache reservations.
>> 
>> Hardware tells you how finely you can divide the cache (and this
>> information is shown in /sys/fs/resctrl/info/l3/max_cbm_len to save
>> you from digging in CPUID leaves).  E.g. on Broadwell the value is
>> 20, so you can control cache allocations in 5% slices.
>> 
>> A bitmask defines which slices you can use (and h/w has the restriction
>> that you must have contiguous '1' bits in any mask).  So you can pick
>> your 80% using 0x0, 0x1fffe, 0x3fffc, 0x7fff8 or 0x0.
>> 
>> There is no requirement that masks be exclusive of each other. So
>> you might pick the two extremes: 0x0 and 0x0 for your two
>> VM's in this example. Each would be allowed to allocate up to 80%,
>> but with a big overlap in the middle. Each has 20% exclusive, but
>> there is a 60% range in the middle that they would compete for.
> 
> This are different sockets, so there is no competing/sharing of L3 cache
> here: the question is about whether the interface allows the
> user to specify that 80/80 reservation without complaining:
> because the VM's are pinned, they will never actually
> share the same L3 cache.
> 
> (haven't finished reading the patchset to be certain).
> 
>> Is this specific case useful? Possibly not.  I think the more common
>> overlap cases might be between processes that you know have shared
>> code/data. Also the case where some rdtgroup has access to allocate
>> in the entire cache (mask 0xf on Broadwell) and some other
>> rdtgroups
>> have limited cache allocation with less bits in the mask.
>> 
>> -Tony
> 
> All you have to do is to build the bitmask for a given processor
> from the union of the tasks which have been scheduled on that
> processor.
> 
>

RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-25 Thread Dexuan Cui

> From: David Miller [mailto:da...@davemloft.net]
> 
> From: Dexuan Cui 
> Date: Sat, 23 Jul 2016 01:35:51 +
> 
> > +static struct sock *hvsock_create(struct net *net, struct socket *sock,
> > + gfp_t priority, unsigned short type)
> > +{
> > +   struct hvsock_sock *hvsk;
> > +   struct sock *sk;
> > +
> > +   sk = sk_alloc(net, AF_HYPERV, priority, &hvsock_proto, 0);
> > +   if (!sk)
> > +   return NULL;
>  ...
> > +   /* Looks stream-based socket doesn't need this. */
> > +   sk->sk_backlog_rcv = NULL;
> > +
> > +   sk->sk_state = 0;
> > +   sock_reset_flag(sk, SOCK_DONE);
> 
> All of these are unnecessary initializations, since sk_alloc() zeroes
> out the 'sk' object for you.

Hi David,
Thanks for the comment!  I'll remove the 3 lines.

May I know if you have more comments?

BTW, during the past month, at least 7 other people also reviewed
the patch and gave me quite a few good comments, which have
been addressed. Though only one of them gave the Reviewed-by
line for now, I guess I would get more if I ping them to have a look
at the latest version of the patch, i.e., v19 -- I'm going to post it
with the aforementioned 3 lines removed and if you've more 
comments, I'm ready to address them too. :-)

Thanks,
-- Dexuan

Re: [PATCH -next] drm/hisilicon: Fix error handling of ade_power_up()

2016-07-25 Thread Xinliang Liu

On 19 July 2016 at 19:30, Wei Yongjun  wrote:
> From: Wei Yongjun 
>
> Fix the reset_control_deassert() fail and clk_prepare_enable() fail
> error handling of ade_power_up().
>
> Signed-off-by: Wei Yongjun 

Applied, thanks.

-xinliang

> ---
>  drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c 
> b/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c
> index c3707d4..e2bd1e6 100644
> --- a/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c
> +++ b/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c
> @@ -258,18 +258,24 @@ static int ade_power_up(struct ade_hw_ctx *ctx)
> ret = reset_control_deassert(ctx->reset);
> if (ret) {
> DRM_ERROR("failed to deassert reset\n");
> -   return ret;
> +   goto err_reset;
> }
>
> ret = clk_prepare_enable(ctx->ade_core_clk);
> if (ret) {
> DRM_ERROR("failed to enable ade_core_clk (%d)\n", ret);
> -   return ret;
> +   goto err_prepare_enable;
> }
>
> ade_init(ctx);
> ctx->power_on = true;
> return 0;
> +
> +err_prepare_enable:
> +   reset_control_assert(ctx->reset);
> +err_reset:
> +   clk_disable_unprepare(ctx->media_noc_clk);
> +   return ret;
>  }
>
>  static void ade_power_down(struct ade_hw_ctx *ctx)
>
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

[RFC patch 6/6] unicore32: Use simpler API for random address requests

2016-07-25 Thread Jason Cooper

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address.  We can
simplify the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Signed-off-by: Jason Cooper 
---
 arch/unicore32/kernel/process.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
index 00299c927852..b856178cf167 100644
--- a/arch/unicore32/kernel/process.c
+++ b/arch/unicore32/kernel/process.c
@@ -295,8 +295,7 @@ unsigned long get_wchan(struct task_struct *p)
 
 unsigned long arch_randomize_brk(struct mm_struct *mm)
 {
-   unsigned long range_end = mm->brk + 0x0200;
-   return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
+   return randomize_addr(mm->brk, 0x0200);
 }
 
 /*
-- 
2.9.2

[RFC patch 4/6] arm64: Use simpler API for random address requests

2016-07-25 Thread Jason Cooper

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address.  We can
simplify the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Signed-off-by: Jason Cooper 
---
 arch/arm64/kernel/process.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 6cd2612236dc..11bf454baf86 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -374,12 +374,8 @@ unsigned long arch_align_stack(unsigned long sp)
 
 unsigned long arch_randomize_brk(struct mm_struct *mm)
 {
-   unsigned long range_end = mm->brk;
-
if (is_compat_task())
-   range_end += 0x0200;
+   return randomize_addr(mm->brk, 0x0200);
else
-   range_end += 0x4000;
-
-   return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
+   return randomize_addr(mm->brk, 0x4000);
 }
-- 
2.9.2

[RFC patch 2/6] x86: Use simpler API for random address requests

2016-07-25 Thread Jason Cooper

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address.  We can
simplify the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Signed-off-by: Jason Cooper 
---
 arch/x86/kernel/process.c| 3 +--
 arch/x86/kernel/sys_x86_64.c | 5 +
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 96becbbb52e0..a083a2c0744e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -507,8 +507,7 @@ unsigned long arch_align_stack(unsigned long sp)
 
 unsigned long arch_randomize_brk(struct mm_struct *mm)
 {
-   unsigned long range_end = mm->brk + 0x0200;
-   return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
+   return randomize_addr(mm->brk, 0x0200);
 }
 
 /*
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 10e0272d789a..f9cad22808fc 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -101,7 +101,6 @@ static void find_start_end(unsigned long flags, unsigned 
long *begin,
   unsigned long *end)
 {
if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT)) {
-   unsigned long new_begin;
/* This is usually used needed to map code in small
   model, so it needs to be in the first 31bit. Limit
   it to that.  This means we need to move the
@@ -112,9 +111,7 @@ static void find_start_end(unsigned long flags, unsigned 
long *begin,
*begin = 0x4000;
*end = 0x8000;
if (current->flags & PF_RANDOMIZE) {
-   new_begin = randomize_range(*begin, *begin + 
0x0200, 0);
-   if (new_begin)
-   *begin = new_begin;
+   *begin = randomize_addr(*begin, 0x0200);
}
} else {
*begin = current->mm->mmap_legacy_base;
-- 
2.9.2

[RFC patch 5/6] tile: Use simpler API for random address requests

2016-07-25 Thread Jason Cooper

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address.  We can
simplify the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Signed-off-by: Jason Cooper 
---
 arch/tile/mm/mmap.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/tile/mm/mmap.c b/arch/tile/mm/mmap.c
index 851a94e6ae58..50f6a693a2b6 100644
--- a/arch/tile/mm/mmap.c
+++ b/arch/tile/mm/mmap.c
@@ -88,6 +88,5 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 
 unsigned long arch_randomize_brk(struct mm_struct *mm)
 {
-   unsigned long range_end = mm->brk + 0x0200;
-   return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
+   return randomize_addr(mm->brk, 0x0200);
 }
-- 
2.9.2

[RFC patch 3/6] ARM: Use simpler API for random address requests

2016-07-25 Thread Jason Cooper

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address.  We can
simplify the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Signed-off-by: Jason Cooper 
---
 arch/arm/kernel/process.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 4a803c5a1ff7..02dee671cded 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -314,8 +314,7 @@ unsigned long get_wchan(struct task_struct *p)
 
 unsigned long arch_randomize_brk(struct mm_struct *mm)
 {
-   unsigned long range_end = mm->brk + 0x0200;
-   return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
+   return randomize_addr(mm->brk, 0x0200);
 }
 
 #ifdef CONFIG_MMU
-- 
2.9.2

[RFC patch 1/6] random: Simplify API for random address requests

2016-07-25 Thread Jason Cooper

To date, all callers of randomize_range() have set the length to 0, and
check for a zero return value.  For the current callers, the only way
to get zero returned is if end <= start.  Since they are all adding a
constant to the start address, this is unnecessary.

We can remove a bunch of needless checks by simplifying the API to do
just what everyone wants, return an address between [start, start +
range].

While we're here, s/get_random_int/get_random_long/.  No current call
site is adversely affected by get_random_int(), since all current range
requests are < MAX_UINT.  However, we should match caller expectations
to avoid coming up short (ha!) in the future.

Signed-off-by: Jason Cooper 
---
 drivers/char/random.c  | 17 -
 include/linux/random.h |  2 +-
 2 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0158d3bff7e5..1251cb2cbab2 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1822,22 +1822,13 @@ unsigned long get_random_long(void)
 EXPORT_SYMBOL(get_random_long);
 
 /*
- * randomize_range() returns a start address such that
- *
- *[..  .]
- *  start  end
- *
- * a  with size "len" starting at the return value is inside in the
- * area defined by [start, end], but is otherwise randomized.
+ * randomize_addr() returns a page aligned address within [start, start +
+ * range]
  */
 unsigned long
-randomize_range(unsigned long start, unsigned long end, unsigned long len)
+randomize_addr(unsigned long start, unsigned long range)
 {
-   unsigned long range = end - len - start;
-
-   if (end <= start + len)
-   return 0;
-   return PAGE_ALIGN(get_random_int() % range + start);
+   return PAGE_ALIGN(get_random_long() % range + start);
 }
 
 /* Interface for in-kernel drivers of true hardware RNGs.
diff --git a/include/linux/random.h b/include/linux/random.h
index e47e533742b5..1ad877a98186 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -34,7 +34,7 @@ extern const struct file_operations random_fops, urandom_fops;
 
 unsigned int get_random_int(void);
 unsigned long get_random_long(void);
-unsigned long randomize_range(unsigned long start, unsigned long end, unsigned 
long len);
+unsigned long randomize_addr(unsigned long start, unsigned long range);
 
 u32 prandom_u32(void);
 void prandom_bytes(void *buf, size_t nbytes);
-- 
2.9.2

[PATCH v2 1/3] x86/apic: Remove "focus disabled" for 64bit case

2016-07-25 Thread Wei Jiangang

Disable processor focus for 64bit causes a crash,
Call Trace as following:

  [] dump_stack+0x63/0x84
  [] __warn+0xd1/0xf0
  [] warn_slowpath_fmt+0x5f/0x80
  [] ex_handler_wrmsr_unsafe+0x62/0x70
  [] fixup_exception+0x39/0x50
  [] do_general_protection+0x80/0x160
  [] general_protection+0x28/0x30
  [] ? native_write_msr+0x4/0x30
  [] ? native_apic_msr_write+0x32/0x40
  [] init_bsp_APIC+0x5f/0x118
  [] init_ISA_irqs+0x19/0x4c
  [] native_init_IRQ+0xd/0x377
  [] init_IRQ+0x42/0x49
  [] start_kernel+0x2ce/0x4c8
  [] ? set_init_arg+0x55/0x55
  [] ? early_idt_handler_array+0x120/0x120
  [] x86_64_start_reservations+0x2f/0x31
  [] x86_64_start_kernel+0x14c/0x16f

Keep a consistent implementation with the setup_local_APIC(),
always use processor focus for 64bit.
more details refer to commit 89c38c2867eb ("x86: apic - unify
setup_local_APIC")

Signed-off-by: Cao jin 
Signed-off-by: Wei Jiangang 
---
 arch/x86/kernel/apic/apic.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 60078a67d7e3..0273b652c689 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1154,9 +1154,7 @@ void __init init_bsp_APIC(void)
if ((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) &&
(boot_cpu_data.x86 == 15))
value &= ~APIC_SPIV_FOCUS_DISABLED;
-   else
 #endif
-   value |= APIC_SPIV_FOCUS_DISABLED;
value |= SPURIOUS_APIC_VECTOR;
apic_write(APIC_SPIV, value);
 
-- 
1.9.3

[PATCH v2 2/3] x86/apic: Update comment about disabling processor focus

2016-07-25 Thread Wei Jiangang

Fix references to discarded end_level_ioapic_irq().

Signed-off-by: Cao jin 
Signed-off-by: Wei Jiangang 
---
 arch/x86/kernel/apic/apic.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 0273b652c689..8e25b9b2d351 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1346,7 +1346,6 @@ void setup_local_APIC(void)
 * Actually disabling the focus CPU check just makes the hang less
 * frequent as it makes the interrupt distributon model be more
 * like LRU than MRU (the short-term load is more even across CPUs).
-* See also the comment in end_level_ioapic_irq().  --macro
 */
 
/*
-- 
1.9.3

[PATCH v2 3/3] x86/apic: Improved the setting of interrupt mode for bsp

2016-07-25 Thread Wei Jiangang

If we specify the 'notsc' parameter for the dump-capture kernel,
and then trigger a crash(panic) by using "ALT-SysRq-c" or
"echo c > /proc/sysrq-trigger", the dump-capture kernel will
hang in calibrate_delay_converge() and wait for jiffies changes.
serial log as follows:

tsc: Fast TSC calibration using PIT
tsc: Detected 2099.947 MHz processor
Calibrating delay loop...

The reason for jiffies not changes is there's no timer interrupt
passed to dump-capture kernel.

In fact, once kernel panic occurs, the local APIC is disabled
by lapic_shutdown() in reboot path.
generly speaking, local APIC state can be initialized by BIOS
after Power-Up or Reset, which doesn't apply to kdump case.
so the kernel has to be responsible for initialize the interrupt
mode properly according the latest status of APIC in bootup path.

An MP operating system is booted under either PIC mode or
virtual wire mode. Later, the operating system switches to
symmetric I/O mode as it enters multiprocessor mode.
Two kinds of virtual wire mode are defined in Intel MP spec:
virtual wire mode via local APIC or via I/O APIC.

Now we determine the mode of APIC only through a SMP BIOS(MP table).
That's not enough. It's better to do further check if APIC works
with effective interrupt mode, and then, do some proper setting.

Signed-off-by: Cao jin 
Signed-off-by: Wei Jiangang 
---
 arch/x86/include/asm/io_apic.h |  5 
 arch/x86/kernel/apic/apic.c| 60 +-
 arch/x86/kernel/apic/io_apic.c | 28 
 3 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 6cbf2cfb3f8a..a3257366bf7f 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -190,6 +190,7 @@ static inline unsigned int io_apic_read(unsigned int apic, 
unsigned int reg)
 }
 
 extern void setup_IO_APIC(void);
+extern bool virt_wire_through_ioapic(void);
 extern void enable_IO_APIC(void);
 extern void disable_IO_APIC(void);
 extern void setup_ioapic_dest(void);
@@ -231,6 +232,10 @@ static inline void io_apic_init_mappings(void) { }
 #define native_disable_io_apic NULL
 
 static inline void setup_IO_APIC(void) { }
+static inline bool virt_wire_through_ioapic(void)
+{
+   return false;
+}
 static inline void enable_IO_APIC(void) { }
 static inline void setup_ioapic_dest(void) { }
 
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 8e25b9b2d351..a3939fb130cc 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1124,6 +1124,58 @@ void __init sync_Arb_IDs(void)
 }
 
 /*
+ * Check APIC enable/disable flag
+ */
+static bool check_apic_enabled(void)
+{
+   unsigned int value;
+
+   /*
+* If APIC is disabled globally (IA32_APIC_BASE[11] == 0)
+* the boot cpu hasn't X86_FEATURE_APIC,
+* and init_bsp_APIC() has already checked it before.
+* so no need to check global enable/disable flag here
+*/
+
+   /* Check the software enable/disable flag */
+   value = apic_read(APIC_SPIV);
+   if (!(value & APIC_SPIV_APIC_ENABLED))
+   return false;
+
+   return true;
+}
+
+/*
+ * Return false means the through-local-APIC virtual wire mode is inactive
+ */
+static bool virt_wire_through_lapic(void)
+{
+   unsigned int value;
+
+   /*
+* The through-local-APIC virtual wire mode requests
+* local APIC to enable LINT0 for ExtINT delivery mode
+* and LINT1 for NMI delivery mode
+*/
+   value = apic_read(APIC_LVT0);
+   if (GET_APIC_DELIVERY_MODE(value) != APIC_MODE_EXTINT)
+   return false;
+
+   value = apic_read(APIC_LVT1);
+   if (GET_APIC_DELIVERY_MODE(value) != APIC_MODE_NMI)
+   return false;
+
+   return true;
+}
+
+static bool check_virt_wire_mode(void)
+{
+   /* If neither of virtual wire mode is active, return false */
+   return (check_apic_enabled() && (virt_wire_through_lapic() ||
+   virt_wire_through_ioapic()));
+}
+
+/*
  * An initial setup of the virtual wire mode.
  */
 void __init init_bsp_APIC(void)
@@ -1133,8 +1185,14 @@ void __init init_bsp_APIC(void)
/*
 * Don't do the setup now if we have a SMP BIOS as the
 * through-I/O-APIC virtual wire mode might be active.
+*
+* It's better to do further check if either through-I/O-APIC
+* or through-local-APIC is active.
+* the worst case is that both of them are inactive, If so,
+* we need to enable the through-local-APIC virtual wire mode
 */
-   if (smp_found_config || !boot_cpu_has(X86_FEATURE_APIC))
+   if (pic_mode || !boot_cpu_has(X86_FEATURE_APIC) ||
+   (smp_found_config && check_virt_wire_mode()))
return;
 
/*
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 446702ed99dc..f794d389ba85 100

[PATCH v2 0/3] Fix dump-capture kernel hangs with notsc

2016-07-25 Thread Wei Jiangang

v2:
Just about the commit ("x86/apic: Improved the setting of interrupt
 mode for bsp")

- Unify the name
  s/virtual_wire_via_*/virt_wire_through_*
- Add check for PIC mode
  suggested-by Baoquan He 
- Add check enable/disable flag for IO-APIC
  suggested-by Xunlei Pang 
- Update comments

v1:
The goal is to fix dump-capture kernel with notsc option hangs
in calibrate_delay_converge()

Wei Jiangang (3):
  x86/apic: Remove "focus disabled" for 64bit case
  x86/apic: Update comment about disabling processor focus
  x86/apic: Improved the setting of interrupt mode for bsp

 arch/x86/include/asm/io_apic.h |  5 
 arch/x86/kernel/apic/apic.c| 63 +++---
 arch/x86/kernel/apic/io_apic.c | 28 +++
 3 files changed, 92 insertions(+), 4 deletions(-)

-- 
1.9.3

Re: [RFC PATCH v7 1/7] Restartable sequences system call

2016-07-25 Thread Mathieu Desnoyers

- On Jul 25, 2016, at 7:02 PM, Andy Lutomirski l...@amacapital.net wrote:

> On Thu, Jul 21, 2016 at 2:14 PM, Mathieu Desnoyers
>  wrote:
>> Man page associated:
>>
>> RSEQ(2)Linux Programmer's Manual   RSEQ(2)
>>
>> NAME
>>rseq - Restartable sequences and cpu number cache
>>
>> SYNOPSIS
>>#include 
>>
>>int rseq(struct rseq * rseq, int flags);
>>
>> DESCRIPTION
>>The  rseq()  ABI  accelerates  user-space operations on per-cpu
>>data by defining a shared data structure ABI between each user-
>>space thread and the kernel.
>>
>>The  rseq argument is a pointer to the thread-local rseq struc‐
>>ture to be shared between kernel and user-space.  A  NULL  rseq
>>value  can  be used to check whether rseq is registered for the
>>current thread.
>>
>>The layout of struct rseq is as follows:
>>
>>Structure alignment
>>   This structure needs to be aligned on  multiples  of  64
>>   bytes.
>>
>>Structure size
>>   This structure has a fixed size of 128 bytes.
>>
>>Fields
>>
>>cpu_id
>>   Cache  of  the CPU number on which the calling thread is
>>   running.
>>
>>event_counter
>>   Restartable sequences event_counter field.
> 
> That's an unhelpful description.

Good point, how about:

event_counter
   Counter guaranteed to be incremented when the current thread is
   preempted or when a signal is delivered to the current thread.

In that same line of thoughts, I would reword cpu_id as:

cpu_id
   Cache  of  the CPU number on which the current thread is
   running.

> 
>>
>>rseq_cs
>>   Restartable sequences rseq_cs field. Points to a  struct
>>   rseq_cs.
> 
> Why is it a pointer?

Rewording like this should help understand:

rseq_cs
   The rseq_cs field is a pointer to a struct rseq_cs. Is is NULL when
   no rseq assembly block critical section is active for the current
   thread. Setting it to point to a critical section descriptor (struct
   rseq_cs) marks the beginning of the critical section. It is cleared
   after the end of the critical section.


> 
>>
>>The layout of struct rseq_cs is as follows:
>>
>>Structure alignment
>>   This  structure  needs  to be aligned on multiples of 64
>>   bytes.
>>
>>Structure size
>>   This structure has a fixed size of 192 bytes.
>>
>>Fields
>>
>>start_ip
>>   Instruction pointer address of the first instruction  of
>>   the sequence of consecutive assembly instructions.
>>
>>post_commit_ip
>>   Instruction  pointer  address after the last instruction
>>   of the sequence of consecutive assembly instructions.
>>
>>abort_ip
>>   Instruction pointer address where to move the  execution
>>   flow  in  case  of  abort of the sequence of consecutive
>>   assembly instructions.
>>
>>The flags argument is currently unused and must be specified as
>>0.
>>
>>Typically,  a  library or application will keep the rseq struc‐
>>ture in a thread-local storage variable, or other memory  areas
> 
> "variable or other memory area"

ok

> 
>>belonging to each thread. It is recommended to perform volatile
>>reads of the thread-local cache to prevent  the  compiler  from
>>doing  load  tearing.  An  alternative approach is to read each
>>field from inline assembly.
> 
> I don't think the man page needs to tell people how to implement
> correct atomic loads.

ok, I can remove the two previous sentences.

> 
>>
>>Each thread is responsible for registering its rseq  structure.
>>Only  one  rseq structure address can be registered per thread.
>>Once set, the rseq address is idempotent for a given thread.
> 
> "Idempotent" is a property that applies to an action, and the "rseq
> address" is not an action.  I don't know what you're trying to say.

I mean there is only one address registered per thread, and it stays
registered for the life-time of the thread. Perhaps I could say:

  "Once set, the rseq address never changes for a given thread."

> 
>>
>>In a typical usage scenario, the thread  registering  the  rseq
>>structure  will  be  performing  loads  and stores from/to that
>>structure. It is however also allowed to  read  that  structure
>>from  other  threads.   The rseq field updates performed by the
>>kernel provide single-copy atomicity semantics, which guarantee
>>that  other  threads performing single-copy atomic reads of the
>>cpu number cache will always observe a consistent value.
> 
> s/single-copy/relaxed atomic/ perhaps?

ok

> 
>>
>>Memory registered as rseq structure should ne

[PATCH] clocksource: sun4i: Clear interrupts after stopping timer in probe function

2016-07-25 Thread Chen-Yu Tsai

The bootloader (U-boot) sometimes uses this timer for various delays.
It uses it as a ongoing counter, and does comparisons on the current
counter value. The timer counter is never stopped.

In some cases when the user interacts with the bootloader, or lets
it idle for some time before loading Linux, the timer may expire,
and an interrupt will be pending. This results in an unexpected
interrupt when the timer interrupt is enabled by the kernel, at
which point the event_handler isn't set yet. This results in a NULL
pointer dereference exception, panic, and no way to reboot.

Clear any pending interrupts after we stop the timer in the probe
function to avoid this.

Signed-off-by: Chen-Yu Tsai 
---

I've run into this many times while working on U-boot. Finally
made time to figure it out.

---
 drivers/clocksource/sun4i_timer.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/clocksource/sun4i_timer.c 
b/drivers/clocksource/sun4i_timer.c
index 6f3719d73390..d5725c82401d 100644
--- a/drivers/clocksource/sun4i_timer.c
+++ b/drivers/clocksource/sun4i_timer.c
@@ -193,6 +193,9 @@ static void __init sun4i_timer_init(struct device_node 
*node)
/* Make sure timer is stopped before playing with interrupts */
sun4i_clkevt_time_stop(0);
 
+   /* clear timer0 interrupt */
+   writel(0x1, timer_base + TIMER_IRQ_ST_REG);
+
sun4i_clockevent.cpumask = cpu_possible_mask;
sun4i_clockevent.irq = irq;
 
-- 
2.8.1

Re: [PATCH v2 02/10] userns: Add per user namespace sysctls.

2016-07-25 Thread David Miller

From: ebied...@xmission.com (Eric W. Biederman)
Date: Mon, 25 Jul 2016 19:44:50 -0500

> User namespaces have enabled unprivileged users access to a lot more
> data structures and so to catch programs that go crazy we need a lot
> more limits.  I believe some of those limits make sense per namespace.
> As it is easy in some cases to say any more than Y number of those
> per namespace is excessive.   For example a limit of 1,000,000 ipv4
> routes per network namespaces is a sanity check as there are
> currently 621,649 ipv4 prefixes advertized in bgp.

When we give a new namespace to unprivileged users, we honestly should
make the sysctl settings we give to them become "limits".  They can
further constrain the sysctl settings but may not raise them.

Re: [PATCH] iio: adc: rockchip_saradc: Explicitly disable ADC on probe

2016-07-25 Thread Caesar Wang


Hi Guenter,

Thanks for fixing it.

On 2016年07月26日 03:39, Guenter Roeck wrote:

If the ADC is read for the first time, the caller gets a timeout error,
and the kernel log shows

read channel() error: -110

The ADC may be enabled on boot, and needs to be explicitly disabled
for a read sequence to work (otherwise there is no completion interrupt).
Disaple it explicitly in the probe function.

Fixes: 44d6f2ef94f9 ("iio: adc: add driver for Rockchip saradc")
Signed-off-by: Guenter Roeck 
---
  drivers/iio/adc/rockchip_saradc.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/iio/adc/rockchip_saradc.c 
b/drivers/iio/adc/rockchip_saradc.c
index f9ad6c2d6821..6aa3271d86b5 100644
--- a/drivers/iio/adc/rockchip_saradc.c
+++ b/drivers/iio/adc/rockchip_saradc.c
@@ -280,6 +280,9 @@ static int rockchip_saradc_probe(struct platform_device 
*pdev)
goto err_pclk;
}
  
+	/* Make sure ADC is disabled */

+   writel_relaxed(0, info->regs + SARADC_CTRL);


I think we should reset the saradc controller.
Since make sure the reset value is 0 and loader-->kernel may even cause 
harm, as my experience on tsadc. (drivers/thermal/rockchip_thermal.c)



e.g.:
/**
* Reset SARADC Controller, reset all saradc registers.
*/
static void rockchip_saradc_reset_controller(struct reset_control *reset)
{
reset_control_assert(reset);
usleep_range(10, 20);
reset_control_deassert(reset);
}

..probe()
{
...
rockchip_saradc_reset_controller();
...
}


-
Caesar


+
platform_set_drvdata(pdev, indio_dev);
  
  	indio_dev->name = dev_name(&pdev->dev);



--
caesar wang | software engineer | w...@rock-chip.com

Re: [PATCH 4.6 000/203] 4.6.5-stable review

2016-07-25 Thread Greg Kroah-Hartman

On Mon, Jul 25, 2016 at 07:49:58PM -0600, Shuah Khan wrote:
> On 07/25/2016 02:53 PM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.6.5 release.
> > There are 203 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Wed Jul 27 20:33:38 UTC 2016.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.6.5-rc1.gz
> > or in the git tree and branch at:
> >   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-4.6.y
> > and the diffstat can be found below.
> > 
> Compiled and booted on my test system. No dmesg regressions,

Great, thanks for testing all of these and letting me know.

greg k-h

Re: [PATCH 1/3] memory: mediatek: Add a new interface mtk_smi_larb_is_ready

2016-07-25 Thread Tomasz Figa

Hi,

On Mon, Jul 25, 2016 at 5:39 PM, Matthias Brugger
 wrote:
>
>
> On 20/07/16 05:01, Yong Wu wrote:
>>
>> Currently the iommu consumer always call iommu_present to get whether
>> the iommu is ready. But in MTK IOMMU, this function can't indicate
>> this. The IOMMU call bus_set_iommu->mtk_iommu_add_device->
>> mtk_iommu_attach_device to parse the iommu data, then it's able to
>> transfer "struct mtk_smi_iommu" to SMI-LARB, and the iommu uses the
>> larbs as compoents, the iommu will finish its probe until all the larbs
>> probe done.
>>
>> If the iommu consumer(like DRM) begin to probe after the time of
>> calling bus_set_iommu and before the time of SMI probe finish, it
>> will hang like this:
>>
>> [7.832359] Call trace:
>> [7.834778] [] mtk_smi_larb_get+0x24/0xa8
>> [7.840300] [] mtk_drm_crtc_enable+0x6c/0x450
>>
>> Because the larb->mmu is NULL at that time.
>>
>> In order to avoid this issue, we add a new interface
>> (mtk_smi_larb_is_ready) for checking whether the IOMMU and SMI have
>> finished their probe. If it return false, the iommu consumer should
>> probe-defer for the IOMMU and SMI.
>>
>
> Can't we just skip the functions in the probe and call bus_set_iommu only if
> we were able to bind all components?
> Something like this:

Note that we have to call bus_set_iommu() and actually have
.add_device() and .attach_device() called before any of the slave
devices probe. I found a similar problem with rockchip IOMMU after
adding power domain and runtime PM handling there. I also found that
current design of IOMMU core and related DMA mapping code is utterly
broken regarding the device add/probe ordering (no support for
deferring things properly).

So my idea is to keep .add_device() as is, since typically it doesn't
seem to require anything from the IOMMU hardware and just initializes
some per-device data, but make .attach_device() being able to defer
probe of that device if respective IOMMU has not probed yet. I'm still
in process of figuring out the right way to achieve it, though...

Best regards,
Tomasz

Re: [Qemu-devel] [PATCH v2 0/2] vfio: add aer process

2016-07-25 Thread Zhou Jie


ping

On 2016/7/19 16:13, Zhou Jie wrote:

From: Chen Fan 

v1-v2:
   1. Add aer process to vfio driver.

Chen Fan (2):
  vfio : add aer process
  vfio : resume notifier

 drivers/vfio/pci/vfio_pci.c | 58 -
 drivers/vfio/pci/vfio_pci_intrs.c   | 18 
 drivers/vfio/pci/vfio_pci_private.h |  3 ++
 include/uapi/linux/vfio.h   |  3 ++
 4 files changed, 81 insertions(+), 1 deletion(-)

Re: [PATCH v2] ceph: Mark the file cache as unreclaimable

2016-07-25 Thread Yan, Zheng


> On Jul 26, 2016, at 01:12, Nikolay Borisov  wrote:
> 
> Ceph creates multiple caches with the SLAB_RECLAIMABLE flag set, so
> that it can satisfy its internal needs. Inspecting the code shows that
> most of the caches are indeed reclaimable since they are directly
> related to the generic inode/dentry shrinkers. However, one of the
> cache used to satisfy struct file is not reclaimable since its
> entries are freed only when the last reference to the file is
> dropped. If a heavily loaded node opens a lot of files it can
> introduce non-trivial discrepancies between memory shown as reclaimable
> and what is actually reclaimed when drop_caches is used.
> 
> Fix this by removing the reclaimable flag for the file's cache.
> 
> Signed-off-by: Nikolay Borisov 
> ---
> 
> Fixed checkpatch warning + missing SOB line
> 
> fs/ceph/super.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> index 91e02481ce06..8697cac6add0 100644
> --- a/fs/ceph/super.c
> +++ b/fs/ceph/super.c
> @@ -672,8 +672,8 @@ static int __init init_caches(void)
>   if (ceph_dentry_cachep == NULL)
>   goto bad_dentry;
> 
> - ceph_file_cachep = KMEM_CACHE(ceph_file_info,
> -   SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD);
> + ceph_file_cachep = KMEM_CACHE(ceph_file_info, SLAB_MEM_SPREAD);
> +
>   if (ceph_file_cachep == NULL)
>   goto bad_file;
> 

Applied, thanks

Yan, Zheng

> -- 
> 2.7.4
>

Re: [PATCH] randomize_range: use random long instead of int

2016-07-25 Thread Jason Cooper

Hi William, Kees,

On Mon, Jul 25, 2016 at 11:25:41AM -0700, william.c.robe...@intel.com wrote:
> From: William Roberts 
> 
> Use a long when generating the random range rather than
> an int. This will produce better random distributions as
> well as matching all the types at hand.
> 
> Signed-off-by: William Roberts 
> ---
>  drivers/char/random.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Upon further review, I think we should dig into this a little bit
deeper.  Standby, I'll post an RFC series shortly.

thx,

Jason.

> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 0158d3b..bbf11b5 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1837,7 +1837,8 @@ randomize_range(unsigned long start, unsigned long end, 
> unsigned long len)
>  
>   if (end <= start + len)
>   return 0;
> - return PAGE_ALIGN(get_random_int() % range + start);
> +
> + return PAGE_ALIGN(get_random_long() % range + start);
>  }
>  
>  /* Interface for in-kernel drivers of true hardware RNGs.
> -- 
> 1.9.1
>

Re: [PATCH] tools lib bpf: Use official ELF e_machine value

2016-07-25 Thread Wangnan (F)


Hi Arnaldo,

Please don't forget this patch.

Thank you.

On 2016/7/19 5:37, Alexei Starovoitov wrote:

On Mon, Jul 18, 2016 at 06:01:08AM +, Wang Nan wrote:

New LLVM will issue newly assigned EM_BPF machine code. The new code
will be propogated to glibc and libelf.

This patch introduces the new machine code to libbpf.

Signed-off-by: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Zefan Li 
Cc: pi3or...@163.com
---
  tools/lib/bpf/libbpf.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 32e6b6b..b699aea 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -37,6 +37,10 @@
  #include "libbpf.h"
  #include "bpf.h"
  
+#ifndef EM_BPF

+#define EM_BPF 247
+#endif
+
  #define __printf(a, b)__attribute__((format(printf, a, b)))
  
  __printf(1, 2)

@@ -439,7 +443,8 @@ static int bpf_object__elf_init(struct bpf_object *obj)
}
ep = &obj->efile.ehdr;
  
-	if ((ep->e_type != ET_REL) || (ep->e_machine != 0)) {

+   /* Old LLVM set e_machine to EM_NONE */
+   if ((ep->e_type != ET_REL) || (ep->e_machine && (ep->e_machine != 
EM_BPF))) {

Thanks for the fix. Didn't realize we already check for zero here.
btw EM_BPF will be in llvm 3.9 release.

Acked-by: Alexei Starovoitov

Re: [PATCH 04/32] x86/intel_rdt: Add L3 cache capacity bitmask management

2016-07-25 Thread Marcelo Tosatti

On Fri, Jul 22, 2016 at 02:43:23PM -0700, Luck, Tony wrote:
> On Fri, Jul 22, 2016 at 04:12:04AM -0300, Marcelo Tosatti wrote:
> > How does this patchset handle the following condition:
> > 
> > 6) Create reservations in such a way that the sum is larger than
> > total amount of cache, and CPU pinning (example from Karen Noel):
> > 
> > VM-1 on socket-1 with 80% of reservation.
> > VM-2 on socket-2 with 80% of reservation.
> > VM-1 pinned to socket-1.
> > VM-2 pinned to socket-2.
> 
> That's legal, but perhaps we need a description of
> overlapping cache reservations.
> 
> Hardware tells you how finely you can divide the cache (and this
> information is shown in /sys/fs/resctrl/info/l3/max_cbm_len to save
> you from digging in CPUID leaves).  E.g. on Broadwell the value is
> 20, so you can control cache allocations in 5% slices.
> 
> A bitmask defines which slices you can use (and h/w has the restriction
> that you must have contiguous '1' bits in any mask).  So you can pick
> your 80% using 0x0, 0x1fffe, 0x3fffc, 0x7fff8 or 0x0.
> 
> There is no requirement that masks be exclusive of each other. So
> you might pick the two extremes: 0x0 and 0x0 for your two
> VM's in this example. Each would be allowed to allocate up to 80%,
> but with a big overlap in the middle. Each has 20% exclusive, but
> there is a 60% range in the middle that they would compete for.

This are different sockets, so there is no competing/sharing of L3 cache
here: the question is about whether the interface allows the
user to specify that 80/80 reservation without complaining:
because the VM's are pinned, they will never actually
share the same L3 cache.

(haven't finished reading the patchset to be certain).

> Is this specific case useful? Possibly not.  I think the more common
> overlap cases might be between processes that you know have shared
> code/data. Also the case where some rdtgroup has access to allocate
> in the entire cache (mask 0xf on Broadwell) and some other
> rdtgroups
> have limited cache allocation with less bits in the mask.
>
> -Tony

All you have to do is to build the bitmask for a given processor
from the union of the tasks which have been scheduled on that
processor.

RE: [PATCH v3 02/11] mm: Hardened usercopy

2016-07-25 Thread Michael Ellerman

David Laight  writes:

> From: Josh Poimboeuf
>> Sent: 22 July 2016 18:46
>> >
>> > e.g. then if the pointer was in the thread_info, the second test would
>> > fail, triggering the protection.
>> 
>> FWIW, this won't work right on x86 after Andy's
>> CONFIG_THREAD_INFO_IN_TASK patches get merged.
>
> What ends up in the 'thread_info' area?

It depends on the arch.

> If it contains the fp save area then programs like gdb may end up requesting
> copy_in/out directly from that area.

On the arches I've seen thread_info doesn't usually contain register save areas,
but if it did then it would be up to the arch helper to allow that copy to go
through.

However given thread_info generally contains lots of low level flags that would
be a good target for an attacker, the best way to cope with ptrace wanting to
copy to/from it would be to use a temporary, and prohibit copying directly
to/from thread_info - IMHO.

cheers

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1150 matches

Mail list logo