Re: ethernet "bus" number in DTS ?

2018-10-26 Thread Florian Fainelli
On 10/23/18 11:22 PM, Michal Suchánek wrote:
> On Tue, 23 Oct 2018 11:20:36 -0700
> Florian Fainelli  wrote:
> 
>> On 10/23/18 11:02 AM, Joakim Tjernlund wrote:
>>> On Tue, 2018-10-23 at 10:03 -0700, Florian Fainelli wrote:  
> 
>>>
>>> I also noted that using status = "disabled" didn't work either to
>>> create a fix name scheme. Even worse, all the eth I/F after gets
>>> renumbered. It seems to me there is value in having stability in
>>> eth I/F naming at boot. Then userspace(udev) can rename if need be. 
>>>
>>> Sure would like to known more about why this feature is not wanted ?
>>>
>>> I found
>>>   https://patchwork.kernel.org/patch/4122441/
>>> You quote policy as reason but surely it must be better to
>>> have something stable, connected to the hardware name, than
>>> semirandom naming?  
>>
>> If the Device Tree nodes are ordered by ascending base register
>> address, my understanding is that you get the same order as far as
>> platform_device creation goes, this may not be true in the future if
>> Rob decides to randomize that, but AFAICT this is still true. This
>> may not work well with status = disabled properties being inserted
>> here and there, but we have used that here and it has worked for as
>> far as I can remember doing it.
> 
> So this is unstable in several respects. First is changing the
> enabled/disabled status in the deivecetrees provided by the kernel.
> 
> Second is if you have hardware hotplug mechanism either by firmware or
> by loading device overlays.
> 
> Third is the case when the devicetree is not built as part of the
> kernel but is instead provided by firmware that initializes the
> low-level hardware details. Then the ordering by address is not
> guaranteed nor is that the same address will be used to access the same
> interface every time. There might be multiple ways to configure the
> hardware depending on firmware configuration and/or version.
> 
>  
>> Second, you might want to name network devices ethX, but what if I
>> want to name them ethernetX or fooX or barX? Should we be accepting a
>> mechanism in the kernel that would allow someone to name the
>> interfaces the way they want straight from a name being provided in
>> Device Tree?
> 
> Clearly if there is text Ethernet1 printed above the Ethernet port we
> should provide a mechanism to name the port Ethernet1 by default.

Yes, but then have a specific property that is flexible enough to
retrieve things like "vital product information". For DSA switches, we
have an optional "label" property which names the network device
directly like it would be found on the product's case. Ideally this
should be much more flexible such that it can point to the appropriate
node/firmware service/whatever to get that name, which is what some
people have been working on lately, see [1].

[1]: https://lkml.org/lkml/2018/8/14/1039

The point is don't re-purpose the aliases which is something that exists
within Device Tree to basically provide a shorter path to a specific set
of nodes for the boot program to mangle and muck with instead of having
to resolve a full path to a node. At least that is how I conceive it.

Now what complicates the matter is that some OSes like Linux tend to use
it to also generate seemingly stable index for peripherals that are
numbered by index such as SPI, I2C, etc buses, which is why we are
having this conversation here, see below for more.

> 
>>
>> Aliases are fine for providing relative stability within the Device
>> Tree itself and boot programs that might need to modify the Device
>> Tree (e.g: inserting MAC addresses) such that you don't have to
>> encode logic to search for nodes by compatible strings etc. but
>> outside of that use case, it seems to me that you can resolve every
>> naming decision in user-space.
> 
> However, this is pushing platform-specific knowledge to userspace. The
> way the Ethernet interface is attached and hence the device properties
> usable for identifying the device uniquely are platform-specific.

There is always going to be some amount of platform specific knowledge
to user-space, what matters is the level of abstraction that is
presented to you.

> 
> On the other hand, aliases are universal when provided. If they are
> good enough to assign a MAC address they are good enough to provide a
> stable default name.

We are not talking about the same aliases then. The special Device Tree
node named "aliases" is a way to create shorted Device Tree node paths,
it is not by any means an equivalent for what I would rather call a
"label", or "port name" or silk screen annotation on a PCB for instance.

> 
> I think this is indeed forcing the userspace to reinvent several wheels
> for no good reason.

udev or systemd will come up with some stable names for your network
device right off the bat. If you are deeply embedded maybe you don't
want those, but then use something like the full path in /sys to get
some stable names based on the SoC's topology.

> 
> What is the 

Re: [GIT PULL] Please pull powerpc/linux.git powerpc-4.20-1 tag

2018-10-26 Thread Linus Torvalds
On Fri, Oct 26, 2018 at 11:42 AM Michael Ellerman  wrote:
>
> Please pull powerpc updates for 4.20:

Pulled,

   Linus


Re: [PATCH 3/7] dmaengine: fsl-qdma: Add qDMA controller driver for Layerscape SoCs

2018-10-26 Thread Li Yang
On Fri, Oct 26, 2018 at 4:57 AM Peng Ma  wrote:
>
> NXP Queue DMA controller(qDMA) on Layerscape SoCs supports channel
> virtuallization by allowing DMA jobs to be enqueued into different
> command queues.
>
> Note that this module depends on NXP DPAA.

It is not clear if you are saying that the driver can only work on
SoCs with a DPAA hardware block, or the driver is actually depending
on the DPAA drivers also.  If it is the later case, you also should
express that in the Kconfig you added below.

>
> Signed-off-by: Wen He 
> Signed-off-by: Jiaheng Fan 
> Signed-off-by: Peng Ma 
> ---
> change in v10:
> - no
>
>  drivers/dma/Kconfig|   13 +
>  drivers/dma/Makefile   |1 +
>  drivers/dma/fsl-qdma.c | 1257 
> 
>  3 files changed, 1271 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/dma/fsl-qdma.c
>
> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
> index dacf3f4..50e19d7 100644
> --- a/drivers/dma/Kconfig
> +++ b/drivers/dma/Kconfig
> @@ -218,6 +218,19 @@ config FSL_EDMA
>   multiplexing capability for DMA request sources(slot).
>   This module can be found on Freescale Vybrid and LS-1 SoCs.
>
> +config FSL_QDMA
> +   tristate "NXP Layerscape qDMA engine support"
> +   depends on ARM || ARM64
> +   select DMA_ENGINE
> +   select DMA_VIRTUAL_CHANNELS
> +   select DMA_ENGINE_RAID
> +   select ASYNC_TX_ENABLE_CHANNEL_SWITCH
> +   help
> + Support the NXP Layerscape qDMA engine with command queue and 
> legacy mode.
> + Channel virtualization is supported through enqueuing of DMA jobs 
> to,
> + or dequeuing DMA jobs from, different work queues.
> + This module can be found on NXP Layerscape SoCs.
> +
>  config FSL_RAID
>  tristate "Freescale RAID engine Support"
>  depends on FSL_SOC && !ASYNC_TX_ENABLE_CHANNEL_SWITCH
> diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
> index c91702d..2d1b586 100644
> --- a/drivers/dma/Makefile
> +++ b/drivers/dma/Makefile
> @@ -32,6 +32,7 @@ obj-$(CONFIG_DW_DMAC_CORE) += dw/
>  obj-$(CONFIG_EP93XX_DMA) += ep93xx_dma.o
>  obj-$(CONFIG_FSL_DMA) += fsldma.o
>  obj-$(CONFIG_FSL_EDMA) += fsl-edma.o
> +obj-$(CONFIG_FSL_QDMA) += fsl-qdma.o
>  obj-$(CONFIG_FSL_RAID) += fsl_raid.o
>  obj-$(CONFIG_HSU_DMA) += hsu/
>  obj-$(CONFIG_IMG_MDC_DMA) += img-mdc-dma.o
> diff --git a/drivers/dma/fsl-qdma.c b/drivers/dma/fsl-qdma.c
> new file mode 100644
> index 000..404869e
> --- /dev/null
> +++ b/drivers/dma/fsl-qdma.c
> @@ -0,0 +1,1257 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright 2018 NXP

I'm not sure if this is really the case.  The driver at least has been
sent out in 2015.  We should keep these copyright claims, even the
legacy Freescale copyright claims.

> +
> +/*
> + * Driver for NXP Layerscape Queue Direct Memory Access Controller
> + *
> + * Author:
> + *  Wen He 
> + *  Jiaheng Fan 
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "virt-dma.h"
> +#include "fsldma.h"
> +
> +/* Register related definition */
> +#define FSL_QDMA_DMR   0x0
> +#define FSL_QDMA_DSR   0x4
> +#define FSL_QDMA_DEIER 0xe00
> +#define FSL_QDMA_DEDR  0xe04
> +#define FSL_QDMA_DECFDW0R  0xe10
> +#define FSL_QDMA_DECFDW1R  0xe14
> +#define FSL_QDMA_DECFDW2R  0xe18
> +#define FSL_QDMA_DECFDW3R  0xe1c
> +#define FSL_QDMA_DECFQIDR  0xe30
> +#define FSL_QDMA_DECBR 0xe34
> +
> +#define FSL_QDMA_BCQMR(x)  (0xc0 + 0x100 * (x))
> +#define FSL_QDMA_BCQSR(x)  (0xc4 + 0x100 * (x))
> +#define FSL_QDMA_BCQEDPA_SADDR(x)  (0xc8 + 0x100 * (x))
> +#define FSL_QDMA_BCQDPA_SADDR(x)   (0xcc + 0x100 * (x))
> +#define FSL_QDMA_BCQEEPA_SADDR(x)  (0xd0 + 0x100 * (x))
> +#define FSL_QDMA_BCQEPA_SADDR(x)   (0xd4 + 0x100 * (x))
> +#define FSL_QDMA_BCQIER(x) (0xe0 + 0x100 * (x))
> +#define FSL_QDMA_BCQIDR(x) (0xe4 + 0x100 * (x))
> +
> +#define FSL_QDMA_SQDPAR0x80c
> +#define FSL_QDMA_SQEPAR0x814
> +#define FSL_QDMA_BSQMR 0x800
> +#define FSL_QDMA_BSQSR 0x804
> +#define FSL_QDMA_BSQICR0x828
> +#define FSL_QDMA_CQMR  0xa00
> +#define FSL_QDMA_CQDSCR1   0xa08
> +#define FSL_QDMA_CQDSCR20xa0c
> +#define FSL_QDMA_CQIER 0xa10
> +#define FSL_QDMA_CQEDR 0xa14
> +#define FSL_QDMA_SQCCMR0xa20
> +
> +/* Registers for bit and genmask */
> +#define FSL_QDMA_CQIDR_SQT BIT(15)
> +#define QDMA_CCDF_FOTMAT   BIT(29)
> +#define QDMA_CCDF_SER  BIT(30)
> +#define QDMA_SG_FINBIT(30)
> +#define QDMA_SG_LEN_MASK   GENMASK(29, 0)
> 

Re: [PATCH 3/6] PCI: layerscape: Add the EP mode support

2018-10-26 Thread Li Yang
On Fri, Oct 26, 2018 at 2:43 AM Xiaowei Bao  wrote:
>
>
>
> -Original Message-
> From: arndbergm...@gmail.com  On Behalf Of Arnd 
> Bergmann
> Sent: 2018年10月26日 15:01
> To: Xiaowei Bao 
> Cc: Rob Herring ; bhelg...@google.com; mark.rutl...@arm.com; 
> shawn...@kernel.org; Leo Li ; kis...@ti.com; 
> lorenzo.pieral...@arm.com; gre...@linuxfoundation.org; M.h. Lian 
> ; Mingkai Hu ; Roy Zang 
> ; kstew...@linuxfoundation.org; 
> cyrille.pitc...@free-electrons.com; pombreda...@nexb.com; 
> shawn@rock-chips.com; niklas.cas...@axis.com; linux-...@vger.kernel.org; 
> devicet...@vger.kernel.org; linux-ker...@vger.kernel.org; 
> linux-arm-ker...@lists.infradead.org; linuxppc-dev@lists.ozlabs.org
> Subject: Re: [PATCH 3/6] PCI: layerscape: Add the EP mode support
>
> On 10/26/18, Xiaowei Bao  wrote:
> > From: Rob Herring 
> >> On Thu, Oct 25, 2018 at 07:08:58PM +0800, Xiaowei Bao wrote:
> >>>  "fsl,ls2080a-pcie", "fsl,ls2085a-pcie", "snps,dw-pcie"
> >>>  "fsl,ls2088a-pcie"
> >>>  "fsl,ls1088a-pcie"
> >>>  "fsl,ls1046a-pcie"
> >>>  "fsl,ls1012a-pcie
> >>> +  EP mode:
> >>> +"fsl,ls-pcie-ep"
> >>
> > > You need SoC specific compatibles for the same reasons as the RC.
> >
> > [Xiaowei Bao] I want to contains all layerscape platform use one
> > compatible if the PCIe controller work in EP mode.
>
> Do you mean only one of the SoCs that support RC mode has EP mode?
> I think you still need a SoC specific compatible as Rob explained, in case 
> there will be a second one in the future.
>
> If you want to ensure that you don't have to update the device driver for 
> each new chip that comes in when the EP mode is compatible, the way this is 
> handled is to list multiple values in the compatible property, listing the 
> first SoC that introduced the specific version of that IP block as the most 
> generic type, e.g.
>
>   copatible = "fsl,ls2088a-pcie-ep", "fsl,ls1012a-pcie-ep", "snps,dw-pcie-ep";
>
> For consistency, it probably is best to match each RC mode value with the 
> corresponding EP mode string for each device that can support both (if there 
> is more than one).
>
>   Arnd
> [Xiaowei Bao] My mean is that the ls-pcie-ep compatibles will contain all 
> layerscape SOCs of NXP, e.g: ls1046a-pcie-ep, fsl,ls2088a-pcie-ep, 
> ls2088a-pcie-ep and so on, other layerscape SOCs have not test except the 
> ls1046a, I think it is compatible if the new chip or other SOCs use the DW 
> core, OK, I will discuss this issue internally, and reply to you later.

You can define a generic compatible string for the EP mode of all
these platforms.  But like Rob and Arnd mentioned, it is good to also
define the SoC specific compatible strings just in case that we need
special treatment for certain SoCs in the future.

Regards,
Leo


Re: [PATCH] Documentation: fix spelling mistake, EACCESS -> EACCES

2018-10-26 Thread Matthew Wilcox
On Fri, Oct 26, 2018 at 08:20:12PM +0200, Miguel Ojeda wrote:
> On Fri, Oct 26, 2018 at 7:27 PM Colin King  wrote:
> >
> > From: Colin Ian King 
> >
> > Trivial fix to a spelling mistake of the error access name EACCESS,
> > rename to EACCES
> 
> ? It is not a typo, it is the name of the error (POSIX). Same thing
> for the rest of the patches.

Are you sure?  From open(2):

   EACCES The  requested access to the file is not allowed, or search per‐
  mission is denied for one of the directories in the path  prefix
  of  pathname,  or the file did not exist yet and write access to
  the parent directory is not  allowed.   (See  also  path_resolu‐
  tion(7).)

include/uapi/asm-generic/errno-base.h:#define   EACCES  13  /* 
Permission denied */



Re: [PATCH v2 0/2] arm64: Cut rebuild time when changing CONFIG_BLK_DEV_INITRD

2018-10-26 Thread Florian Fainelli
On 10/26/18 4:07 AM, Mike Rapoport wrote:
> On Thu, Oct 25, 2018 at 04:07:13PM -0700, Florian Fainelli wrote:
>> On 10/25/18 2:13 PM, Rob Herring wrote:
>>> On Thu, Oct 25, 2018 at 12:30 PM Mike Rapoport  wrote:

 On Thu, Oct 25, 2018 at 08:15:15AM -0500, Rob Herring wrote:
> +Ard
>
> On Thu, Oct 25, 2018 at 4:38 AM Mike Rapoport  wrote:
>>
>> On Wed, Oct 24, 2018 at 02:55:17PM -0500, Rob Herring wrote:
>>> On Wed, Oct 24, 2018 at 2:33 PM Florian Fainelli  
>>> wrote:

 Hi all,

 While investigating why ARM64 required a ton of objects to be rebuilt
 when toggling CONFIG_DEV_BLK_INITRD, it became clear that this was
 because we define __early_init_dt_declare_initrd() differently and we 
 do
 that in arch/arm64/include/asm/memory.h which gets included by a fair
 amount of other header files, and translation units as well.
>>>
>> I think arm64 does not have to redefine __early_init_dt_declare_initrd().
>> Something like this might be just all we need (completely untested,
>> probably it won't even compile):
> 
> [ ... ]
>  
>> FWIW, I am extracting the ARM implementation that parses the initrd
>> early command line parameter and the "setup" code doing the page
>> boundary alignment and memblock checking into a helper into lib/ that
>> other architectures can re-use. So far, this removes the need for
>> unicore32, arc and arm to duplicate essentially the same logic.
> 
> Presuming you are going to need asm-generic/initrd.h for that as well,
> using override for __early_init_dt_declare_initrd in arm64 version of
> initrd.h might be the simplest option.

What I am contemplating doing is promote
phys_initrd_start/phys_initrd_size to be global variables (similar to
initrd_start, initrd_end) and have a generic helper function for parsing
the initrd= command line parameter and finally removing
__early_init_dt_declare_initrd() because we could have
early_init_dt_check_for_initrd() just populate
phys_initrd_start/phys_initrd_size directly as well as
initrd_start/initrd_end using __va() to preserve compatibility with
architectures that rely on this. Then I would convert ARM64 to check for
phys_initrd_start which is really what it is is trying to do in
arch/arm64/mm/init.c::arm64_memblock_init().

Does that sound like a reasonable approach?
-- 
Florian


[GIT PULL] Please pull powerpc/linux.git powerpc-4.20-1 tag

2018-10-26 Thread Michael Ellerman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Linus,

Please pull powerpc updates for 4.20:

The following changes since commit 11da3a7f84f19c26da6f86af878298694ede0804:

  Linux 4.19-rc3 (2018-09-09 17:26:43 -0700)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-4.20-1

for you to fetch changes up to 58cfbac25b1fd2b76f94566aed28a3662b0ff8c6:

  Revert "selftests/powerpc: Fix out-of-tree build errors" (2018-10-26 21:58:58 
+1100)

- --
powerpc updates for 4.20

Notable changes:

 - A large series to rewrite our SLB miss handling, replacing a lot of fairly
   complicated asm with much fewer lines of C.

 - Following on from that, we now maintain a cache of SLB entries for each
   process and preload them on context switch. Leading to a 27% speedup for our
   context switch benchmark on Power9.

 - Improvements to our handling of SLB multi-hit errors. We now print more debug
   information when they occur, and try to continue running by flushing the SLB
   and reloading, rather than treating them as fatal.

 - Enable THP migration on 64-bit Book3S machines (eg. Power7/8/9).

 - Add support for physical memory up to 2PB in the linear mapping on 64-bit
   Book3S. We only support up to 512TB as regular system memory, otherwise the
   percpu allocator runs out of vmalloc space.

 - Add stack protector support for 32 and 64-bit, with a per-task canary.

 - Add support for PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP.

 - Support recognising "big cores" on Power9, where two SMT4 cores are presented
   to us as a single SMT8 core.

 - A large series to cleanup some of our ioremap handling and PTE flags.

 - Add a driver for the PAPR SCM (storage class memory) interface, allowing
   guests to operate on SCM devices (acked by Dan).

 - Changes to our ftrace code to handle very large kernels, where we need to use
   a trampoline to get to ftrace_caller().

Many other smaller enhancements and cleanups.

Thanks to:
  Alan Modra, Alistair Popple, Aneesh Kumar K.V, Anton Blanchard, Aravinda
  Prasad, Bartlomiej Zolnierkiewicz, Benjamin Herrenschmidt, Breno Leitao,
  Cédric Le Goater, Christophe Leroy, Christophe Lombard, Dan Carpenter, Daniel
  Axtens, Finn Thain, Gautham R. Shenoy, Gustavo Romero, Haren Myneni, Hari
  Bathini, Jia Hongtao, Joel Stanley, John Allen, Laurent Dufour, Madhavan
  Srinivasan, Mahesh Salgaonkar, Mark Hairgrove, Masahiro Yamada, Michael
  Bringmann, Michael Neuling, Michal Suchanek, Murilo Opsfelder Araujo, Nathan
  Fontenot, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran,
  Paul Mackerras, Petr Vorel, Rashmica Gupta, Reza Arbab, Rob Herring, Sam
  Bobroff, Samuel Mendoza-Jonas, Scott Wood, Stan Johnson, Stephen Rothwell,
  Stewart Smith, Suraj Jitindar Singh, Tyrel Datwyler, Vaibhav Jain, Vasant
  Hegde, YueHaibing, zhong jiang,

- --
Alan Modra (1):
  powerpc/vdso: Correct call frame information

Aneesh Kumar K.V (10):
  powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit
  powerpc/mm/hugetlb/book3s: add _PAGE_PRESENT to hugepd pointer.
  powerpc/mm/book3s: Check for pmd_large instead of pmd_trans_huge
  arch/powerpc/mm/hash: validate the pte entries before handling the hash 
fault
  powerpc/mm/thp: update pmd_trans_huge to check for pmd_present
  powerpc/mm:book3s: Enable THP migration support
  powerpc/mm/hash: Rename get_ea_context to get_user_context
  powerpc/mm: Increase the max addressable memory to 2PB
  powerpc/mm: Make pte_pgprot return all pte bits
  powerpc/mm: Fix WARN_ON with THP NUMA migration

Anton Blanchard (4):
  powerpc/64: Remove static branch hints from memset()
  powerpc: Fix duplicate const clang warning in user access code
  powerpc/time: Use clockevents_register_device(), fixing an issue with 
large decrementer
  powerpc/time: Add set_state_oneshot_stopped decrementer callback

Aravinda Prasad (1):
  powerpc/pseries: Export raw per-CPU VPA data via debugfs

Bartlomiej Zolnierkiewicz (1):
  powerpc: remove redundant 'default n' from Kconfig-s

Benjamin Herrenschmidt (13):
  powerpc/prom_init: Make of_workarounds static
  powerpc/prom_init: Make "fake_elf" const
  powerpc/prom_init: Make "default_colors" const
  powerpc/prom_init: Replace __initdata with __prombss when applicable
  powerpc/prom_init: Remove support for OPAL v2
  powerpc/prom_init: Move prom_radix_disable to __prombss
  powerpc/prom_init: Move ibm_arch_vec to __prombss
  powerpc/prom_init: Move const structures to __initconst
  powerpc/prom_init: Move a few remaining statics to appropriate sections
  powerpc/prom_init: Move __prombss to it's own section and store it in .bss
  powerpc: Check prom_init for disallowed sections
  powerpc/prom_init: 

[PATCH] Documentation: fix spelling mistake, EACCESS -> EACCES

2018-10-26 Thread Colin King
From: Colin Ian King 

Trivial fix to a spelling mistake of the error access name EACCESS,
rename to EACCES

Signed-off-by: Colin Ian King 
---
 Documentation/filesystems/spufs.txt | 2 +-
 Documentation/gpu/drm-uapi.rst  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/spufs.txt 
b/Documentation/filesystems/spufs.txt
index 1343d118a9b2..eb9e3aa63026 100644
--- a/Documentation/filesystems/spufs.txt
+++ b/Documentation/filesystems/spufs.txt
@@ -452,7 +452,7 @@ RETURN VALUE
 
 
 ERRORS
-   EACCESS
+   EACCES
   The  current  user does not have write access on the spufs mount
   point.
 
diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index a2214cc1f821..f2f079e91b4c 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -190,11 +190,11 @@ ENOSPC:
 
 Simply running out of kernel/system memory is signalled through ENOMEM.
 
-EPERM/EACCESS:
+EPERM/EACCES:
 Returned for an operation that is valid, but needs more privileges.
 E.g. root-only or much more common, DRM master-only operations return
 this when when called by unpriviledged clients. There's no clear
-difference between EACCESS and EPERM.
+difference between EACCES and EPERM.
 
 ENODEV:
 Feature (like PRIME, modesetting, GEM) is not supported by the driver.
-- 
2.19.1



Re: [PATCH 5/5] powerpc/64s: Document that PPC supports nosmap

2018-10-26 Thread LEROY Christophe

Why not call our new functionnality SMAP instead of calling it GUAP ?

Christophe

Russell Currey  a écrit :


Signed-off-by: Russell Currey 
---
 Documentation/admin-guide/kernel-parameters.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt  
b/Documentation/admin-guide/kernel-parameters.txt

index a5ad67d5cb16..8f78e75965f0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2764,7 +2764,7 @@
noexec=on: enable non-executable mappings (default)
noexec=off: disable non-executable mappings

-   nosmap  [X86]
+   nosmap  [X86,PPC]
Disable SMAP (Supervisor Mode Access Prevention)
even if it is supported by processor.

--
2.19.1





Re: [PATCH 3/5] powerpc/lib: checksum GUAP support

2018-10-26 Thread LEROY Christophe

Same comment as for futex

Christophe

Russell Currey  a écrit :


Wrap the checksumming code in GUAP locks and unlocks.

Signed-off-by: Russell Currey 
---
 arch/powerpc/lib/checksum_wrappers.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/lib/checksum_wrappers.c  
b/arch/powerpc/lib/checksum_wrappers.c

index a0cb63fb76a1..c67db0a6e18b 100644
--- a/arch/powerpc/lib/checksum_wrappers.c
+++ b/arch/powerpc/lib/checksum_wrappers.c
@@ -28,6 +28,7 @@ __wsum csum_and_copy_from_user(const void __user  
*src, void *dst,

 {
unsigned int csum;

+   unlock_user_access();
might_sleep();

*err_ptr = 0;
@@ -60,6 +61,7 @@ __wsum csum_and_copy_from_user(const void __user  
*src, void *dst,

}

 out:
+   lock_user_access();
return (__force __wsum)csum;
 }
 EXPORT_SYMBOL(csum_and_copy_from_user);
@@ -69,6 +71,7 @@ __wsum csum_and_copy_to_user(const void *src, void  
__user *dst, int len,

 {
unsigned int csum;

+   unlock_user_access();
might_sleep();

*err_ptr = 0;
@@ -97,6 +100,7 @@ __wsum csum_and_copy_to_user(const void *src,  
void __user *dst, int len,

}

 out:
+   lock_user_access();
return (__force __wsum)csum;
 }
 EXPORT_SYMBOL(csum_and_copy_to_user);
--
2.19.1





Re: [PATCH 2/5] powerpc/futex: GUAP support for futex ops

2018-10-26 Thread LEROY Christophe

Russell Currey  a écrit :


Wrap the futex operations in GUAP locks and unlocks.


Does it means futex doesn't work anymore once only patch 1 is applied  
? If so, then you should split patch 1 in two parts and reorder  
patches so that guap can only be activated once all necessary changes  
are done. Otherwise the serie won't be bisectable


Christophe



Signed-off-by: Russell Currey 
---
 arch/powerpc/include/asm/futex.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/include/asm/futex.h  
b/arch/powerpc/include/asm/futex.h

index 94542776a62d..3aed640ee9ef 100644
--- a/arch/powerpc/include/asm/futex.h
+++ b/arch/powerpc/include/asm/futex.h
@@ -35,6 +35,7 @@ static inline int arch_futex_atomic_op_inuser(int  
op, int oparg, int *oval,

 {
int oldval = 0, ret;

+   unlock_user_access();
pagefault_disable();

switch (op) {
@@ -62,6 +63,7 @@ static inline int arch_futex_atomic_op_inuser(int  
op, int oparg, int *oval,

if (!ret)
*oval = oldval;

+   lock_user_access();
return ret;
 }

@@ -75,6 +77,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
return -EFAULT;

+   unlock_user_access();
 __asm__ __volatile__ (
 PPC_ATOMIC_ENTRY_BARRIER
 "1: lwarx   %1,0,%3 # futex_atomic_cmpxchg_inatomic\n\
@@ -95,6 +98,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 : "cc", "memory");

*uval = prev;
+   lock_user_access();
 return ret;
 }

--
2.19.1





Re: [PATCH 0/5] Guarded Userspace Access Prevention on Radix

2018-10-26 Thread LEROY Christophe

Russell Currey  a écrit :


Guarded Userspace Access Prevention is a security mechanism that prevents
the kernel from being able to read and write userspace addresses outside of
the allowed paths, most commonly copy_{to/from}_user().

At present, the only CPU that supports this is POWER9, and only while using
the Radix MMU.  Privileged reads and writes cannot access user data when
key 0 of the AMR is set.  This is described in the "Radix Tree Translation
Storage Protection" section of the POWER ISA as of version 3.0.


It is not right that only power9 can support that.

The 8xx has mmu access protection registers which can serve the same  
purpose. Today on the 8xx kernel space is group 0 and user space is  
group 1. Group 0 is set to "page defined access permission" in MD_AP  
and MI_AP registers, and group 1 is set to "all accesses done with  
supervisor rights". By setting group 1 to "user and supervisor  
interpretation swapped" we can forbid kernel access to user space  
while still allowing user access to it. Then by simply changing group  
1 mode at dedicated places we can lock/unlock kernel access to user  
space.


Could you implement something as generic as possible having that in  
mind for a future patch ?


Christophe



GUAP code sets key 0 of the AMR (thus disabling accesses of user data)
early during boot, and only ever "unlocks" access prior to certain
operations, like copy_{to/from}_user(), futex ops, etc.  Setting this does
not prevent unprivileged access, so userspace can operate fine while access
is locked.

There is a performance impact, although I don't consider it heavy.  Running
a worst-case benchmark of a 1GB copy 1 byte at a time (and thus constant
read(1) write(1) syscalls), I found enabling GUAP to be 3.5% slower than
when disabled.  In most cases, the difference is negligible.  The main
performance impact is the mtspr instruction, which is quite slow.

There are a few caveats with this series that could be improved upon in
future.  Right now there is no saving and restoring of the AMR value -
there is no userspace exploitation of the AMR on Radix in POWER9, but if
this were to change in future, saving and restoring the value would be
necessary.

No attempt to optimise cases of repeated calls - for example, if some
code was repeatedly calling copy_to_user() for small sizes very frequently,
it would be slower than the equivalent of wrapping that code in an unlock
and lock and only having to modify the AMR once.

There are some interesting cases that I've attempted to handle, such as if
the AMR is unlocked (i.e. because a copy_{to_from}_user is in progress)...

- and an exception is taken, the kernel would then be running with the
AMR unlocked and freely able to access userspace again.  I am working
around this by storing a flag in the PACA to indicate if the AMR is
unlocked (to save a costly SPR read), and if so, locking the AMR in
the exception entry path and unlocking it on the way out.

- and gets context switched out, goes into a path that locks the AMR,
then context switches back, access will be disabled and will fault.
As a result, I context switch the AMR between tasks as if it was used
by userspace like hash (which already implements this).

Another consideration is use of the isync instruction.  Without an isync
following the mtspr instruction, there is no guarantee that the change
takes effect.  The issue is that isync is very slow, and so I tried to
avoid them wherever necessary.  In this series, the only place an isync
gets used is after *unlocking* the AMR, because if an access takes place
and access is still prevented, the kernel will fault.

On the flipside, a slight delay in unlocking caused by skipping an isync
potentially allows a small window of vulnerability.  It is my opinion
that this window is practically impossible to exploit, but if someone
thinks otherwise, please do share.

This series is my first attempt at POWER assembly so all feedback is very
welcome.

The official theme song of this series can be found here:
https://www.youtube.com/watch?v=QjTrnKAcYjE

Russell Currey (5):
  powerpc/64s: Guarded Userspace Access Prevention
  powerpc/futex: GUAP support for futex ops
  powerpc/lib: checksum GUAP support
  powerpc/64s: Disable GUAP with nosmap option
  powerpc/64s: Document that PPC supports nosmap

 .../admin-guide/kernel-parameters.txt |  2 +-
 arch/powerpc/include/asm/exception-64e.h  |  3 +
 arch/powerpc/include/asm/exception-64s.h  | 19 ++-
 arch/powerpc/include/asm/futex.h  |  6 ++
 arch/powerpc/include/asm/mmu.h|  7 +++
 arch/powerpc/include/asm/paca.h   |  3 +
 arch/powerpc/include/asm/reg.h|  1 +
 arch/powerpc/include/asm/uaccess.h| 57 ---
 arch/powerpc/kernel/asm-offsets.c |  1 +
 arch/powerpc/kernel/dt_cpu_ftrs.c |  4 ++
 arch/powerpc/kernel/entry_64.S| 17 

Re: ethernet "bus" number in DTS ?

2018-10-26 Thread Joakim Tjernlund
On Wed, 2018-10-24 at 08:22 +0200, Michal Suchánek wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
> 
> 
> On Tue, 23 Oct 2018 11:20:36 -0700
> Florian Fainelli  wrote:
> 
> > On 10/23/18 11:02 AM, Joakim Tjernlund wrote:
> > > On Tue, 2018-10-23 at 10:03 -0700, Florian Fainelli wrote:
> > > I also noted that using status = "disabled" didn't work either to
> > > create a fix name scheme. Even worse, all the eth I/F after gets
> > > renumbered. It seems to me there is value in having stability in
> > > eth I/F naming at boot. Then userspace(udev) can rename if need be.
> > > 
> > > Sure would like to known more about why this feature is not wanted ?
> > > 
> > > I found
> > >   https://patchwork.kernel.org/patch/4122441/
> > > You quote policy as reason but surely it must be better to
> > > have something stable, connected to the hardware name, than
> > > semirandom naming?
> > 
> > If the Device Tree nodes are ordered by ascending base register
> > address, my understanding is that you get the same order as far as
> > platform_device creation goes, this may not be true in the future if
> > Rob decides to randomize that, but AFAICT this is still true. This
> > may not work well with status = disabled properties being inserted
> > here and there, but we have used that here and it has worked for as
> > far as I can remember doing it.
> 
> So this is unstable in several respects. First is changing the
> enabled/disabled status in the deivecetrees provided by the kernel.
> 
> Second is if you have hardware hotplug mechanism either by firmware or
> by loading device overlays.
> 
> Third is the case when the devicetree is not built as part of the
> kernel but is instead provided by firmware that initializes the
> low-level hardware details. Then the ordering by address is not
> guaranteed nor is that the same address will be used to access the same
> interface every time. There might be multiple ways to configure the
> hardware depending on firmware configuration and/or version.
> 
> 
> > Second, you might want to name network devices ethX, but what if I
> > want to name them ethernetX or fooX or barX? Should we be accepting a
> > mechanism in the kernel that would allow someone to name the
> > interfaces the way they want straight from a name being provided in
> > Device Tree?

Just to be clear, I am saying that we don't need to control the full
name of the Ethernet device, just the numerical id so one can tie eth0
to a fixed physical device.

> 
> Clearly if there is text Ethernet1 printed above the Ethernet port we
> should provide a mechanism to name the port Ethernet1 by default.
> 
> > Aliases are fine for providing relative stability within the Device
> > Tree itself and boot programs that might need to modify the Device
> > Tree (e.g: inserting MAC addresses) such that you don't have to
> > encode logic to search for nodes by compatible strings etc. but
> > outside of that use case, it seems to me that you can resolve every
> > naming decision in user-space.
> 
> However, this is pushing platform-specific knowledge to userspace. The
> way the Ethernet interface is attached and hence the device properties
> usable for identifying the device uniquely are platform-specific.
> 
> On the other hand, aliases are universal when provided. If they are
> good enough to assign a MAC address they are good enough to provide a
> stable default name.
> 
> I think this is indeed forcing the userspace to reinvent several wheels
> for no good reason.
> 
> What is the problem with adding the aliases?

Well put above, thanks.

   Jocke


Re: ethernet "bus" number in DTS ?

2018-10-26 Thread Michal Suchánek
On Tue, 23 Oct 2018 11:20:36 -0700
Florian Fainelli  wrote:

> On 10/23/18 11:02 AM, Joakim Tjernlund wrote:
> > On Tue, 2018-10-23 at 10:03 -0700, Florian Fainelli wrote:  

> > 
> > I also noted that using status = "disabled" didn't work either to
> > create a fix name scheme. Even worse, all the eth I/F after gets
> > renumbered. It seems to me there is value in having stability in
> > eth I/F naming at boot. Then userspace(udev) can rename if need be. 
> > 
> > Sure would like to known more about why this feature is not wanted ?
> > 
> > I found
> >   https://patchwork.kernel.org/patch/4122441/
> > You quote policy as reason but surely it must be better to
> > have something stable, connected to the hardware name, than
> > semirandom naming?  
> 
> If the Device Tree nodes are ordered by ascending base register
> address, my understanding is that you get the same order as far as
> platform_device creation goes, this may not be true in the future if
> Rob decides to randomize that, but AFAICT this is still true. This
> may not work well with status = disabled properties being inserted
> here and there, but we have used that here and it has worked for as
> far as I can remember doing it.

So this is unstable in several respects. First is changing the
enabled/disabled status in the deivecetrees provided by the kernel.

Second is if you have hardware hotplug mechanism either by firmware or
by loading device overlays.

Third is the case when the devicetree is not built as part of the
kernel but is instead provided by firmware that initializes the
low-level hardware details. Then the ordering by address is not
guaranteed nor is that the same address will be used to access the same
interface every time. There might be multiple ways to configure the
hardware depending on firmware configuration and/or version.

 
> Second, you might want to name network devices ethX, but what if I
> want to name them ethernetX or fooX or barX? Should we be accepting a
> mechanism in the kernel that would allow someone to name the
> interfaces the way they want straight from a name being provided in
> Device Tree?

Clearly if there is text Ethernet1 printed above the Ethernet port we
should provide a mechanism to name the port Ethernet1 by default.

> 
> Aliases are fine for providing relative stability within the Device
> Tree itself and boot programs that might need to modify the Device
> Tree (e.g: inserting MAC addresses) such that you don't have to
> encode logic to search for nodes by compatible strings etc. but
> outside of that use case, it seems to me that you can resolve every
> naming decision in user-space.

However, this is pushing platform-specific knowledge to userspace. The
way the Ethernet interface is attached and hence the device properties
usable for identifying the device uniquely are platform-specific. 

On the other hand, aliases are universal when provided. If they are
good enough to assign a MAC address they are good enough to provide a
stable default name.

I think this is indeed forcing the userspace to reinvent several wheels
for no good reason.

What is the problem with adding the aliases?

Thanks

Michal


Re: [PATCH v2] idle/x86: remove the call to boot_init_stack_canary() from cpu_startup_entry()

2018-10-26 Thread kbuild test robot
Hi Christophe,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on xen-tip/linux-next]
[also build test ERROR on v4.19 next-20181019]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Christophe-Leroy/idle-x86-remove-the-call-to-boot_init_stack_canary-from-cpu_startup_entry/20181020-061217
base:   https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git linux-next
config: x86_64-rhel (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   arch/x86/xen/smp_pv.c: In function 'cpu_bringup_and_idle':
>> arch/x86/xen/smp_pv.c:91:2: error: implicit declaration of function 
>> 'boot_init_stack_canary'; did you mean 'snprint_stack_trace'? 
>> [-Werror=implicit-function-declaration]
 boot_init_stack_canary();
 ^~
 snprint_stack_trace
   cc1: some warnings being treated as errors

vim +91 arch/x86/xen/smp_pv.c

87  
88  asmlinkage __visible void cpu_bringup_and_idle(void)
89  {
90  cpu_bringup();
  > 91  boot_init_stack_canary();
92  cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
93  }
94  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v2 0/2] arm64: Cut rebuild time when changing CONFIG_BLK_DEV_INITRD

2018-10-26 Thread Mike Rapoport
On Thu, Oct 25, 2018 at 04:07:13PM -0700, Florian Fainelli wrote:
> On 10/25/18 2:13 PM, Rob Herring wrote:
> > On Thu, Oct 25, 2018 at 12:30 PM Mike Rapoport  wrote:
> >>
> >> On Thu, Oct 25, 2018 at 08:15:15AM -0500, Rob Herring wrote:
> >>> +Ard
> >>>
> >>> On Thu, Oct 25, 2018 at 4:38 AM Mike Rapoport  wrote:
> 
>  On Wed, Oct 24, 2018 at 02:55:17PM -0500, Rob Herring wrote:
> > On Wed, Oct 24, 2018 at 2:33 PM Florian Fainelli  
> > wrote:
> >>
> >> Hi all,
> >>
> >> While investigating why ARM64 required a ton of objects to be rebuilt
> >> when toggling CONFIG_DEV_BLK_INITRD, it became clear that this was
> >> because we define __early_init_dt_declare_initrd() differently and we 
> >> do
> >> that in arch/arm64/include/asm/memory.h which gets included by a fair
> >> amount of other header files, and translation units as well.
> >
>  I think arm64 does not have to redefine __early_init_dt_declare_initrd().
>  Something like this might be just all we need (completely untested,
>  probably it won't even compile):

[ ... ]
 
> FWIW, I am extracting the ARM implementation that parses the initrd
> early command line parameter and the "setup" code doing the page
> boundary alignment and memblock checking into a helper into lib/ that
> other architectures can re-use. So far, this removes the need for
> unicore32, arc and arm to duplicate essentially the same logic.

Presuming you are going to need asm-generic/initrd.h for that as well,
using override for __early_init_dt_declare_initrd in arm64 version of
initrd.h might be the simplest option.

> -- 
> Florian
> 

-- 
Sincerely yours,
Mike.



[PATCH 7/7] dt-bindings: fsl-qdma: Add NXP Layerscpae qDMA controller bindings

2018-10-26 Thread Peng Ma
Document the devicetree bindings for NXP Layerscape qDMA controller
which could be found on NXP QorIQ Layerscape SoCs.

Signed-off-by: Wen He 
Signed-off-by: Peng Ma 
Reviewed-by: Rob Herring 
---
change in v10:
- no

 Documentation/devicetree/bindings/dma/fsl-qdma.txt |   57 
 1 files changed, 57 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/dma/fsl-qdma.txt

diff --git a/Documentation/devicetree/bindings/dma/fsl-qdma.txt 
b/Documentation/devicetree/bindings/dma/fsl-qdma.txt
new file mode 100644
index 000..6a0ff90
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/fsl-qdma.txt
@@ -0,0 +1,57 @@
+NXP Layerscape SoC qDMA Controller
+==
+
+This device follows the generic DMA bindings defined in dma/dma.txt.
+
+Required properties:
+
+- compatible:  Must be one of
+"fsl,ls1021a-qdma": for LS1021A Board
+"fsl,ls1043a-qdma": for ls1043A Board
+"fsl,ls1046a-qdma": for ls1046A Board
+- reg: Should contain the register's base address and length.
+- interrupts:  Should contain a reference to the interrupt used by this
+   device.
+- interrupt-names: Should contain interrupt names:
+"qdma-queue0": the block0 interrupt
+"qdma-queue1": the block1 interrupt
+"qdma-queue2": the block2 interrupt
+"qdma-queue3": the block3 interrupt
+"qdma-error":  the error interrupt
+- fsl,dma-queues:  Should contain number of queues supported.
+- dma-channels:Number of DMA channels supported
+- block-number:the virtual block number
+- block-offset:the offset of different virtual block
+- status-sizes:status queue size of per virtual block
+- queue-sizes: command queue size of per virtual block, the size number
+   based on queues
+
+Optional properties:
+
+- dma-channels:Number of DMA channels supported by the 
controller.
+- big-endian:  If present registers and hardware scatter/gather 
descriptors
+   of the qDMA are implemented in big endian mode, 
otherwise in little
+   mode.
+
+Examples:
+
+   qdma: dma-controller@839 {
+   compatible = "fsl,ls1021a-qdma";
+   reg = <0x0 0x8388000 0x0 0x1000>, /* Controller regs */
+ <0x0 0x8389000 0x0 0x1000>, /* Status regs */
+ <0x0 0x838a000 0x0 0x2000>; /* Block regs */
+   interrupts = ,
+,
+;
+   interrupt-names = "qdma-error",
+   "qdma-queue0", "qdma-queue1";
+   dma-channels = <8>;
+   block-number = <2>;
+   block-offset = <0x1000>;
+   fsl,dma-queues = <2>;
+   status-sizes = <64>;
+   queue-sizes = <64 64>;
+   big-endian;
+   };
+
+DMA clients must use the format described in dma/dma.txt file.
-- 
1.7.1



[PATCH 6/7] arm64: dts: ls1046a: add qdma device tree nodes

2018-10-26 Thread Peng Ma
add the qDMA device tree nodes for LS1046A devices.

Signed-off-by: Wen He 
Signed-off-by: Peng Ma 
---
change in v10:
- no

 arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi |   21 +
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index ef83786..dc65318 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -704,6 +704,27 @@
< 0 0 4  GIC_SPI 154 
IRQ_TYPE_LEVEL_HIGH>;
};
 
+   qdma: dma-controller@838 {
+   compatible = "fsl,ls1046a-qdma", "fsl,ls1021a-qdma";
+   reg = <0x0 0x838 0x0 0x1000>, /* Controller regs */
+ <0x0 0x839 0x0 0x1>, /* Status regs */
+ <0x0 0x83a 0x0 0x4>; /* Block regs */
+   interrupts = <0 153 0x4>,
+<0 39 0x4>,
+<0 40 0x4>,
+<0 41 0x4>,
+<0 42 0x4>;
+   interrupt-names = "qdma-error", "qdma-queue0",
+   "qdma-queue1", "qdma-queue2", "qdma-queue3";
+   dma-channels = <8>;
+   block-number = <1>;
+   block-offset = <0x1>;
+   fsl,dma-queues = <2>;
+   status-sizes = <64>;
+   queue-sizes = <64 64>;
+   big-endian;
+   };
+
};
 
reserved-memory {
-- 
1.7.1



[PATCH 4/7] arm: dts: ls1021a: add qdma device tree nodes

2018-10-26 Thread Peng Ma
add the qDMA device tree nodes for LS1021A devices.

Signed-off-by: Wen He 
Signed-off-by: Peng Ma 
---
change in v10:
- no

 arch/arm/boot/dts/ls1021a.dtsi |   20 
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/arch/arm/boot/dts/ls1021a.dtsi b/arch/arm/boot/dts/ls1021a.dtsi
index f184905..c0ed6be 100644
--- a/arch/arm/boot/dts/ls1021a.dtsi
+++ b/arch/arm/boot/dts/ls1021a.dtsi
@@ -806,5 +806,25 @@
#size-cells = <1>;
ranges = <0x0 0x0 0x1001 0x1>;
};
+
+   qdma: dma-controller@839 {
+   compatible = "fsl,ls1021a-qdma";
+   reg = <0x0 0x8388000 0x0 0x1000>, /* Controller regs */
+ <0x0 0x8389000 0x0 0x1000>, /* Status regs */
+ <0x0 0x838a000 0x0 0x2000>; /* Block regs */
+   interrupts = ,
+,
+;
+   interrupt-names = "qdma-error",
+   "qdma-queue0", "qdma-queue1";
+   dma-channels = <8>;
+   block-number = <1>;
+   block-offset = <0x1000>;
+   fsl,dma-queues = <2>;
+   status-sizes = <64>;
+   queue-sizes = <64 64>;
+   big-endian;
+   };
+
};
 };
-- 
1.7.1



[PATCH 3/7] dmaengine: fsl-qdma: Add qDMA controller driver for Layerscape SoCs

2018-10-26 Thread Peng Ma
NXP Queue DMA controller(qDMA) on Layerscape SoCs supports channel
virtuallization by allowing DMA jobs to be enqueued into different
command queues.

Note that this module depends on NXP DPAA.

Signed-off-by: Wen He 
Signed-off-by: Jiaheng Fan 
Signed-off-by: Peng Ma 
---
change in v10:
- no

 drivers/dma/Kconfig|   13 +
 drivers/dma/Makefile   |1 +
 drivers/dma/fsl-qdma.c | 1257 
 3 files changed, 1271 insertions(+), 0 deletions(-)
 create mode 100644 drivers/dma/fsl-qdma.c

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index dacf3f4..50e19d7 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -218,6 +218,19 @@ config FSL_EDMA
  multiplexing capability for DMA request sources(slot).
  This module can be found on Freescale Vybrid and LS-1 SoCs.
 
+config FSL_QDMA
+   tristate "NXP Layerscape qDMA engine support"
+   depends on ARM || ARM64
+   select DMA_ENGINE
+   select DMA_VIRTUAL_CHANNELS
+   select DMA_ENGINE_RAID
+   select ASYNC_TX_ENABLE_CHANNEL_SWITCH
+   help
+ Support the NXP Layerscape qDMA engine with command queue and legacy 
mode.
+ Channel virtualization is supported through enqueuing of DMA jobs to,
+ or dequeuing DMA jobs from, different work queues.
+ This module can be found on NXP Layerscape SoCs.
+
 config FSL_RAID
 tristate "Freescale RAID engine Support"
 depends on FSL_SOC && !ASYNC_TX_ENABLE_CHANNEL_SWITCH
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index c91702d..2d1b586 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -32,6 +32,7 @@ obj-$(CONFIG_DW_DMAC_CORE) += dw/
 obj-$(CONFIG_EP93XX_DMA) += ep93xx_dma.o
 obj-$(CONFIG_FSL_DMA) += fsldma.o
 obj-$(CONFIG_FSL_EDMA) += fsl-edma.o
+obj-$(CONFIG_FSL_QDMA) += fsl-qdma.o
 obj-$(CONFIG_FSL_RAID) += fsl_raid.o
 obj-$(CONFIG_HSU_DMA) += hsu/
 obj-$(CONFIG_IMG_MDC_DMA) += img-mdc-dma.o
diff --git a/drivers/dma/fsl-qdma.c b/drivers/dma/fsl-qdma.c
new file mode 100644
index 000..404869e
--- /dev/null
+++ b/drivers/dma/fsl-qdma.c
@@ -0,0 +1,1257 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright 2018 NXP
+
+/*
+ * Driver for NXP Layerscape Queue Direct Memory Access Controller
+ *
+ * Author:
+ *  Wen He 
+ *  Jiaheng Fan 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "virt-dma.h"
+#include "fsldma.h"
+
+/* Register related definition */
+#define FSL_QDMA_DMR   0x0
+#define FSL_QDMA_DSR   0x4
+#define FSL_QDMA_DEIER 0xe00
+#define FSL_QDMA_DEDR  0xe04
+#define FSL_QDMA_DECFDW0R  0xe10
+#define FSL_QDMA_DECFDW1R  0xe14
+#define FSL_QDMA_DECFDW2R  0xe18
+#define FSL_QDMA_DECFDW3R  0xe1c
+#define FSL_QDMA_DECFQIDR  0xe30
+#define FSL_QDMA_DECBR 0xe34
+
+#define FSL_QDMA_BCQMR(x)  (0xc0 + 0x100 * (x))
+#define FSL_QDMA_BCQSR(x)  (0xc4 + 0x100 * (x))
+#define FSL_QDMA_BCQEDPA_SADDR(x)  (0xc8 + 0x100 * (x))
+#define FSL_QDMA_BCQDPA_SADDR(x)   (0xcc + 0x100 * (x))
+#define FSL_QDMA_BCQEEPA_SADDR(x)  (0xd0 + 0x100 * (x))
+#define FSL_QDMA_BCQEPA_SADDR(x)   (0xd4 + 0x100 * (x))
+#define FSL_QDMA_BCQIER(x) (0xe0 + 0x100 * (x))
+#define FSL_QDMA_BCQIDR(x) (0xe4 + 0x100 * (x))
+
+#define FSL_QDMA_SQDPAR0x80c
+#define FSL_QDMA_SQEPAR0x814
+#define FSL_QDMA_BSQMR 0x800
+#define FSL_QDMA_BSQSR 0x804
+#define FSL_QDMA_BSQICR0x828
+#define FSL_QDMA_CQMR  0xa00
+#define FSL_QDMA_CQDSCR1   0xa08
+#define FSL_QDMA_CQDSCR20xa0c
+#define FSL_QDMA_CQIER 0xa10
+#define FSL_QDMA_CQEDR 0xa14
+#define FSL_QDMA_SQCCMR0xa20
+
+/* Registers for bit and genmask */
+#define FSL_QDMA_CQIDR_SQT BIT(15)
+#define QDMA_CCDF_FOTMAT   BIT(29)
+#define QDMA_CCDF_SER  BIT(30)
+#define QDMA_SG_FINBIT(30)
+#define QDMA_SG_LEN_MASK   GENMASK(29, 0)
+#define QDMA_CCDF_MASK GENMASK(28, 20)
+
+#define FSL_QDMA_DEDR_CLEARGENMASK(31, 0)
+#define FSL_QDMA_BCQIDR_CLEAR  GENMASK(31, 0)
+#define FSL_QDMA_DEIER_CLEAR   GENMASK(31, 0)
+
+#define FSL_QDMA_BCQIER_CQTIE  BIT(15)
+#define FSL_QDMA_BCQIER_CQPEIE BIT(23)
+#define FSL_QDMA_BSQICR_ICEN   BIT(31)
+
+#define FSL_QDMA_BSQICR_ICST(x)((x) << 16)
+#define FSL_QDMA_CQIER_MEIEBIT(31)
+#define FSL_QDMA_CQIER_TEIEBIT(0)
+#define FSL_QDMA_SQCCMR_ENTER_WM   BIT(21)
+
+#define FSL_QDMA_BCQMR_EN  BIT(31)
+#define FSL_QDMA_BCQMR_EI  BIT(30)
+#define FSL_QDMA_BCQMR_CD_THLD(x)  ((x) << 20)

[PATCH 5/7] arm64: dts: ls1043a: add qdma device tree nodes

2018-10-26 Thread Peng Ma
add the qDMA device tree nodes for LS1043A devices.

Signed-off-by: Wen He 
Signed-off-by: Peng Ma 
---
change in v10:
- no

 arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi |   22 ++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
index 7881e3d..d560141 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
@@ -734,6 +734,28 @@
< 0 0 3  0 156 0x4>,
< 0 0 4  0 157 0x4>;
};
+
+   qdma: dma-controller@838 {
+   compatible = "fsl,ls1021a-qdma", "fsl,ls1043a-qdma";
+   reg = <0x0 0x838 0x0 0x1000>, /* Controller regs */
+ <0x0 0x839 0x0 0x1>, /* Status regs */
+ <0x0 0x83a 0x0 0x4>; /* Block regs */
+   interrupts = <0 153 0x4>,
+<0 39 0x4>,
+<0 40 0x4>,
+<0 41 0x4>,
+<0 42 0x4>;
+   interrupt-names = "qdma-error", "qdma-queue0",
+   "qdma-queue1", "qdma-queue2", "qdma-queue3";
+   dma-channels = <8>;
+   block-number = <1>;
+   block-offset = <0x1>;
+   fsl,dma-queues = <2>;
+   status-sizes = <64>;
+   queue-sizes = <64 64>;
+   big-endian;
+   };
+
};
 
firmware {
-- 
1.7.1



[PATCH 2/7] dmaengine: fsldma: Adding macro FSL_DMA_IN/OUT implement for ARM platform

2018-10-26 Thread Peng Ma
This patch add the macro FSL_DMA_IN/OUT implement for ARM platform.

Signed-off-by: Wen He 
Signed-off-by: Peng Ma 
---
change in v10:
- fixed compile warning on powerpc

 drivers/dma/fsldma.h |   61 ++---
 1 files changed, 42 insertions(+), 19 deletions(-)

diff --git a/drivers/dma/fsldma.h b/drivers/dma/fsldma.h
index 982845b..88db939 100644
--- a/drivers/dma/fsldma.h
+++ b/drivers/dma/fsldma.h
@@ -196,39 +196,62 @@ struct fsldma_chan {
 #define to_fsl_desc(lh) container_of(lh, struct fsl_desc_sw, node)
 #define tx_to_fsl_desc(tx) container_of(tx, struct fsl_desc_sw, async_tx)
 
+#ifdef CONFIG_PPC
+#define fsl_ioread32(p)in_le32(p)
+#define fsl_ioread32be(p)  in_be32(p)
+#define fsl_iowrite32(v, p)out_le32(p, v)
+#define fsl_iowrite32be(v, p)  out_be32(p, v)
+
 #ifndef __powerpc64__
-static u64 in_be64(const u64 __iomem *addr)
+static u64 fsl_ioread64(const u64 __iomem *addr)
 {
-   return ((u64)in_be32((u32 __iomem *)addr) << 32) |
-   (in_be32((u32 __iomem *)addr + 1));
+   u32 fsl_addr = lower_32_bits(addr);
+   u64 fsl_addr_hi = (u64)in_le32((u32 *)(fsl_addr + 1)) << 32;
+
+   return fsl_addr_hi | in_le32((u32 *)fsl_addr);
 }
 
-static void out_be64(u64 __iomem *addr, u64 val)
+static void fsl_iowrite64(u64 val, u64 __iomem *addr)
 {
-   out_be32((u32 __iomem *)addr, val >> 32);
-   out_be32((u32 __iomem *)addr + 1, (u32)val);
+   out_le32((u32 __iomem *)addr + 1, val >> 32);
+   out_le32((u32 __iomem *)addr, (u32)val);
 }
 
-/* There is no asm instructions for 64 bits reverse loads and stores */
-static u64 in_le64(const u64 __iomem *addr)
+static u64 fsl_ioread64be(const u64 __iomem *addr)
 {
-   return ((u64)in_le32((u32 __iomem *)addr + 1) << 32) |
-   (in_le32((u32 __iomem *)addr));
+   u32 fsl_addr = lower_32_bits(addr);
+   u64 fsl_addr_hi = (u64)in_be32((u32 *)fsl_addr) << 32;
+
+   return fsl_addr_hi | in_be32((u32 *)(fsl_addr + 1));
 }
 
-static void out_le64(u64 __iomem *addr, u64 val)
+static void fsl_iowrite64be(u64 val, u64 __iomem *addr)
 {
-   out_le32((u32 __iomem *)addr + 1, val >> 32);
-   out_le32((u32 __iomem *)addr, (u32)val);
+   out_be32((u32 __iomem *)addr, val >> 32);
+   out_be32((u32 __iomem *)addr + 1, (u32)val);
 }
 #endif
+#endif
 
-#define FSL_DMA_IN(fsl_chan, addr, width)  \
-   (((fsl_chan)->feature & FSL_DMA_BIG_ENDIAN) ?   \
-   in_be##width(addr) : in_le##width(addr))
-#define FSL_DMA_OUT(fsl_chan, addr, val, width)\
-   (((fsl_chan)->feature & FSL_DMA_BIG_ENDIAN) ?   \
-   out_be##width(addr, val) : out_le##width(addr, val))
+#if defined(CONFIG_ARM64) || defined(CONFIG_ARM)
+#define fsl_ioread32(p)ioread32(p)
+#define fsl_ioread32be(p)  ioread32be(p)
+#define fsl_iowrite32(v, p)iowrite32(v, p)
+#define fsl_iowrite32be(v, p)  iowrite32be(v, p)
+#define fsl_ioread64(p)ioread64(p)
+#define fsl_ioread64be(p)  ioread64be(p)
+#define fsl_iowrite64(v, p)iowrite64(v, p)
+#define fsl_iowrite64be(v, p)  iowrite64be(v, p)
+#endif
+
+#define FSL_DMA_IN(fsl_dma, addr, width)   \
+   (((fsl_dma)->feature & FSL_DMA_BIG_ENDIAN) ?\
+   fsl_ioread##width##be(addr) : fsl_ioread##width(addr))
+
+#define FSL_DMA_OUT(fsl_dma, addr, val, width) \
+   (((fsl_dma)->feature & FSL_DMA_BIG_ENDIAN) ?\
+   fsl_iowrite##width##be(val, addr) : fsl_iowrite \
+   ##width(val, addr))
 
 #define DMA_TO_CPU(fsl_chan, d, width) \
(((fsl_chan)->feature & FSL_DMA_BIG_ENDIAN) ?   \
-- 
1.7.1



[PATCH 1/7] dmaengine: fsldma: Replace DMA_IN/OUT by FSL_DMA_IN/OUT

2018-10-26 Thread Peng Ma
From: Wen He 

This patch implement a standard macro call functions is
used to NXP dma drivers.

Signed-off-by: Wen He 
Signed-off-by: Peng Ma 
---
change in v10:
- no

 drivers/dma/fsldma.c |   16 
 drivers/dma/fsldma.h |4 ++--
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 1117b51..39871e0 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -53,42 +53,42 @@
 
 static void set_sr(struct fsldma_chan *chan, u32 val)
 {
-   DMA_OUT(chan, >regs->sr, val, 32);
+   FSL_DMA_OUT(chan, >regs->sr, val, 32);
 }
 
 static u32 get_sr(struct fsldma_chan *chan)
 {
-   return DMA_IN(chan, >regs->sr, 32);
+   return FSL_DMA_IN(chan, >regs->sr, 32);
 }
 
 static void set_mr(struct fsldma_chan *chan, u32 val)
 {
-   DMA_OUT(chan, >regs->mr, val, 32);
+   FSL_DMA_OUT(chan, >regs->mr, val, 32);
 }
 
 static u32 get_mr(struct fsldma_chan *chan)
 {
-   return DMA_IN(chan, >regs->mr, 32);
+   return FSL_DMA_IN(chan, >regs->mr, 32);
 }
 
 static void set_cdar(struct fsldma_chan *chan, dma_addr_t addr)
 {
-   DMA_OUT(chan, >regs->cdar, addr | FSL_DMA_SNEN, 64);
+   FSL_DMA_OUT(chan, >regs->cdar, addr | FSL_DMA_SNEN, 64);
 }
 
 static dma_addr_t get_cdar(struct fsldma_chan *chan)
 {
-   return DMA_IN(chan, >regs->cdar, 64) & ~FSL_DMA_SNEN;
+   return FSL_DMA_IN(chan, >regs->cdar, 64) & ~FSL_DMA_SNEN;
 }
 
 static void set_bcr(struct fsldma_chan *chan, u32 val)
 {
-   DMA_OUT(chan, >regs->bcr, val, 32);
+   FSL_DMA_OUT(chan, >regs->bcr, val, 32);
 }
 
 static u32 get_bcr(struct fsldma_chan *chan)
 {
-   return DMA_IN(chan, >regs->bcr, 32);
+   return FSL_DMA_IN(chan, >regs->bcr, 32);
 }
 
 /*
diff --git a/drivers/dma/fsldma.h b/drivers/dma/fsldma.h
index 4787d48..982845b 100644
--- a/drivers/dma/fsldma.h
+++ b/drivers/dma/fsldma.h
@@ -223,10 +223,10 @@ static void out_le64(u64 __iomem *addr, u64 val)
 }
 #endif
 
-#define DMA_IN(fsl_chan, addr, width)  \
+#define FSL_DMA_IN(fsl_chan, addr, width)  \
(((fsl_chan)->feature & FSL_DMA_BIG_ENDIAN) ?   \
in_be##width(addr) : in_le##width(addr))
-#define DMA_OUT(fsl_chan, addr, val, width)\
+#define FSL_DMA_OUT(fsl_chan, addr, val, width)\
(((fsl_chan)->feature & FSL_DMA_BIG_ENDIAN) ?   \
out_be##width(addr, val) : out_le##width(addr, val))
 
-- 
1.7.1



RE: [PATCH 5/6] pci: layerscape: Add the EP mode support.

2018-10-26 Thread Xiaowei Bao


-Original Message-
From: Kishon Vijay Abraham I  
Sent: 2018年10月26日 13:29
To: Xiaowei Bao ; bhelg...@google.com; robh...@kernel.org; 
mark.rutl...@arm.com; shawn...@kernel.org; Leo Li ; 
lorenzo.pieral...@arm.com; a...@arndb.de; gre...@linuxfoundation.org; M.h. Lian 
; Mingkai Hu ; Roy Zang 
; kstew...@linuxfoundation.org; 
cyrille.pitc...@free-electrons.com; pombreda...@nexb.com; 
shawn@rock-chips.com; niklas.cas...@axis.com; linux-...@vger.kernel.org; 
devicet...@vger.kernel.org; linux-ker...@vger.kernel.org; 
linux-arm-ker...@lists.infradead.org; linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 5/6] pci: layerscape: Add the EP mode support.

Hi,

On Thursday 25 October 2018 04:39 PM, Xiaowei Bao wrote:
> Add the PCIe EP mode support for layerscape platform.
> 
> Signed-off-by: Xiaowei Bao 
> ---
>  drivers/pci/controller/dwc/Makefile|2 +-
>  drivers/pci/controller/dwc/pci-layerscape-ep.c |  161 
> 
>  2 files changed, 162 insertions(+), 1 deletions(-)  create mode 
> 100644 drivers/pci/controller/dwc/pci-layerscape-ep.c
> 
> diff --git a/drivers/pci/controller/dwc/Makefile 
> b/drivers/pci/controller/dwc/Makefile
> index 5d2ce72..b26d617 100644
> --- a/drivers/pci/controller/dwc/Makefile
> +++ b/drivers/pci/controller/dwc/Makefile
> @@ -8,7 +8,7 @@ obj-$(CONFIG_PCI_EXYNOS) += pci-exynos.o
>  obj-$(CONFIG_PCI_IMX6) += pci-imx6.o
>  obj-$(CONFIG_PCIE_SPEAR13XX) += pcie-spear13xx.o
>  obj-$(CONFIG_PCI_KEYSTONE) += pci-keystone-dw.o pci-keystone.o
> -obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o
> +obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o pci-layerscape-ep.o
>  obj-$(CONFIG_PCIE_QCOM) += pcie-qcom.o
>  obj-$(CONFIG_PCIE_ARMADA_8K) += pcie-armada8k.o
>  obj-$(CONFIG_PCIE_ARTPEC6) += pcie-artpec6.o diff --git 
> a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
> b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> new file mode 100644
> index 000..3b33bbc
> --- /dev/null
> +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> @@ -0,0 +1,161 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PCIe controller EP driver for Freescale Layerscape SoCs
> + *
> + * Copyright (C) 2018 NXP Semiconductor.
> + *
> + * Author: Xiaowei Bao   */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "pcie-designware.h"
> +
> +#define PCIE_DBI2_OFFSET 0x1000  /* DBI2 base address*/

The base address should come from dt.
> +
> +struct ls_pcie_ep {
> + struct dw_pcie  *pci;
> +};
> +
> +#define to_ls_pcie_ep(x) dev_get_drvdata((x)->dev)
> +
> +static bool ls_pcie_is_bridge(struct ls_pcie_ep *pcie) {
> + struct dw_pcie *pci = pcie->pci;
> + u32 header_type;
> +
> + header_type = ioread8(pci->dbi_base + PCI_HEADER_TYPE);
> + header_type &= 0x7f;
> +
> + return header_type == PCI_HEADER_TYPE_BRIDGE; }
> +
> +static int ls_pcie_establish_link(struct dw_pcie *pci) {
> + return 0;
> +}

There should be some way by which EP should tell RC that it is not configured 
yet. Are there no bits to control LTSSM state initialization or Configuration 
retry status enabling?
[Xiaowei Bao] There have not bits to control LTSSM state to tell the RC it is 
configured. The start link is auto completed.
> +
> +static const struct dw_pcie_ops ls_pcie_ep_ops = {
> + .start_link = ls_pcie_establish_link, };
> +
> +static const struct of_device_id ls_pcie_ep_of_match[] = {
> + { .compatible = "fsl,ls-pcie-ep",},
> + { },
> +};
> +
> +static void ls_pcie_ep_init(struct dw_pcie_ep *ep) {
> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> + struct pci_epc *epc = ep->epc;
> + enum pci_barno bar;
> +
> + for (bar = BAR_0; bar <= BAR_5; bar++)
> + dw_pcie_ep_reset_bar(pci, bar);
> +
> + epc->features |= EPC_FEATURE_NO_LINKUP_NOTIFIER; }
> +
> +static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
> +   enum pci_epc_irq_type type, u16 
> interrupt_num) {
> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> +
> + switch (type) {
> + case PCI_EPC_IRQ_LEGACY:
> + return dw_pcie_ep_raise_legacy_irq(ep, func_no);
> + case PCI_EPC_IRQ_MSI:
> + return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num);
> + case PCI_EPC_IRQ_MSIX:
> + return dw_pcie_ep_raise_msix_irq(ep, func_no, interrupt_num);
> + default:
> + dev_err(pci->dev, "UNKNOWN IRQ type\n");
> + }
> +
> + return 0;
> +}
> +
> +static struct dw_pcie_ep_ops pcie_ep_ops = {
> + .ep_init = ls_pcie_ep_init,
> + .raise_irq = ls_pcie_ep_raise_irq,
> +};
> +
> +static int __init ls_add_pcie_ep(struct ls_pcie_ep *pcie,
> + struct platform_device *pdev)
> +{
> + struct dw_pcie *pci = pcie->pci;
> + struct device *dev = pci->dev;
> + struct dw_pcie_ep *ep;
> + struct resource *res;
> + int ret;
> +
> 

Re: [PATCH 1/4] treewide: remove unused address argument from pte_alloc functions (v2)

2018-10-26 Thread Peter Zijlstra
On Thu, Oct 25, 2018 at 01:47:03PM +0300, Kirill A. Shutemov wrote:
> On Wed, Oct 24, 2018 at 10:37:16AM +0200, Peter Zijlstra wrote:
> > On Fri, Oct 12, 2018 at 06:31:57PM -0700, Joel Fernandes (Google) wrote:
> > > This series speeds up mremap(2) syscall by copying page tables at the
> > > PMD level even for non-THP systems. There is concern that the extra
> > > 'address' argument that mremap passes to pte_alloc may do something
> > > subtle architecture related in the future that may make the scheme not
> > > work.  Also we find that there is no point in passing the 'address' to
> > > pte_alloc since its unused. So this patch therefore removes this
> > > argument tree-wide resulting in a nice negative diff as well. Also
> > > ensuring along the way that the enabled architectures do not do anything
> > > funky with 'address' argument that goes unnoticed by the optimization.
> > 
> > Did you happen to look at the history of where that address argument
> > came from? -- just being curious here. ISTR something vague about
> > architectures having different paging structure for different memory
> > ranges.
> 
> I see some archicetures (i.e. sparc and, I believe power) used the address
> for coloring. It's not needed anymore. Page allocator and SL?B are good
> enough now.
> 
> See 3c936465249f ("[SPARC64]: Kill pgtable quicklists and use SLAB.")

Ah, shiny. Thanks.


Re: [PATCH 1/4] treewide: remove unused address argument from pte_alloc functions (v2)

2018-10-26 Thread Peter Zijlstra
On Wed, Oct 24, 2018 at 07:21:19PM -0700, Joel Fernandes wrote:
> On Wed, Oct 24, 2018 at 10:37:16AM +0200, Peter Zijlstra wrote:
> > On Fri, Oct 12, 2018 at 06:31:57PM -0700, Joel Fernandes (Google) wrote:
> > > This series speeds up mremap(2) syscall by copying page tables at the
> > > PMD level even for non-THP systems. There is concern that the extra
> > > 'address' argument that mremap passes to pte_alloc may do something
> > > subtle architecture related in the future that may make the scheme not
> > > work.  Also we find that there is no point in passing the 'address' to
> > > pte_alloc since its unused. So this patch therefore removes this
> > > argument tree-wide resulting in a nice negative diff as well. Also
> > > ensuring along the way that the enabled architectures do not do anything
> > > funky with 'address' argument that goes unnoticed by the optimization.
> > 
> > Did you happen to look at the history of where that address argument
> > came from? -- just being curious here. ISTR something vague about
> > architectures having different paging structure for different memory
> > ranges.
> 
> I didn't happen to do that analysis but from code analysis, no architecutre
> is using it. Since its unused in the kernel, may be such architectures don't
> exist or were removed, so we don't need to bother? Could you share more about
> your concern with the removal of this argument?

No concerns at all with removing it; I was purely curious as to the
origin of the unused argument. Kirill provided that answer.



Re: [PATCH 1/5] powerpc/64s: Guarded Userspace Access Prevention

2018-10-26 Thread kbuild test robot
Hi Russell,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on next-20181019]
[cannot apply to v4.19]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Russell-Currey/Guarded-Userspace-Access-Prevention-on-Radix/20181026-145017
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All error/warnings (new ones prefixed by >>):

   arch/powerpc/kernel/entry_64.S: Assembler messages:
>> arch/powerpc/kernel/entry_64.S:304: Error: junk at end of line: 
>> `9452:.pushsection __ftr_alt_945,"a"'
>> arch/powerpc/kernel/entry_64.S:304: Warning: .popsection without 
>> corresponding .pushsection; ignored
>> arch/powerpc/kernel/entry_64.S:304: Error: backward ref to unknown label 
>> "9452:"
>> arch/powerpc/kernel/entry_64.S:304: Error: backward ref to unknown label 
>> "9452:"
>> arch/powerpc/kernel/entry_64.S:304: Error: non-constant expression in ".if" 
>> statement
   arch/powerpc/kernel/entry_64.S:997: Error: junk at end of line: 
`9452:.pushsection __ftr_alt_945,"a"'
   arch/powerpc/kernel/entry_64.S:997: Warning: .popsection without 
corresponding .pushsection; ignored
   arch/powerpc/kernel/entry_64.S:997: Error: backward ref to unknown label 
"9452:"
   arch/powerpc/kernel/entry_64.S:997: Error: backward ref to unknown label 
"9452:"
   arch/powerpc/kernel/entry_64.S:997: Error: non-constant expression in ".if" 
statement

vim +304 arch/powerpc/kernel/entry_64.S

   288  
   289  ld  r13,GPR13(r1)   /* only restore r13 if returning to 
usermode */
   290  ld  r2,GPR2(r1)
   291  ld  r1,GPR1(r1)
   292  mtlrr4
   293  mtcrr5
   294  mtspr   SPRN_SRR0,r7
   295  mtspr   SPRN_SRR1,r8
   296  RFI_TO_USER
   297  b   .   /* prevent speculative execution */
   298  
   299  /* exit to kernel */
   300  1:  /* if the AMR was unlocked before, unlock it again */
   301  lbz r2,PACA_USER_ACCESS_ALLOWED(r13)
   302  cmpwi   cr1,0
   303  bne 2f
 > 304  UNLOCK_USER_ACCESS(r2)
   305  2:  ld  r2,GPR2(r1)
   306  ld  r1,GPR1(r1)
   307  mtlrr4
   308  mtcrr5
   309  mtspr   SPRN_SRR0,r7
   310  mtspr   SPRN_SRR1,r8
   311  RFI_TO_KERNEL
   312  b   .   /* prevent speculative execution */
   313  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v2 0/2] arm64: Cut rebuild time when changing CONFIG_BLK_DEV_INITRD

2018-10-26 Thread Mike Rapoport
On Thu, Oct 25, 2018 at 04:13:10PM -0500, Rob Herring wrote:
> On Thu, Oct 25, 2018 at 12:30 PM Mike Rapoport  wrote:
> >
> > On Thu, Oct 25, 2018 at 08:15:15AM -0500, Rob Herring wrote:
> > > +Ard
> > >
> > > On Thu, Oct 25, 2018 at 4:38 AM Mike Rapoport  wrote:
> > > >
> > > > On Wed, Oct 24, 2018 at 02:55:17PM -0500, Rob Herring wrote:
> > > > > On Wed, Oct 24, 2018 at 2:33 PM Florian Fainelli 
> > > > >  wrote:
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > While investigating why ARM64 required a ton of objects to be 
> > > > > > rebuilt
> > > > > > when toggling CONFIG_DEV_BLK_INITRD, it became clear that this was
> > > > > > because we define __early_init_dt_declare_initrd() differently and 
> > > > > > we do
> > > > > > that in arch/arm64/include/asm/memory.h which gets included by a 
> > > > > > fair
> > > > > > amount of other header files, and translation units as well.
> > > > >
> > > > > I scratch my head sometimes as to why some config options rebuild so
> > > > > much stuff. One down, ? to go. :)
> > > > >
> > > > > > Changing the value of CONFIG_DEV_BLK_INITRD is a common thing with 
> > > > > > build
> > > > > > systems that generate two kernels: one with the initramfs and one
> > > > > > without. buildroot is one of these build systems, OpenWrt is also
> > > > > > another one that does this.
> > > > > >
> > > > > > This patch series proposes adding an empty initrd.h to satisfy the 
> > > > > > need
> > > > > > for drivers/of/fdt.c to unconditionally include that file, and 
> > > > > > moves the
> > > > > > custom __early_init_dt_declare_initrd() definition away from
> > > > > > asm/memory.h
> > > > > >
> > > > > > This cuts the number of objects rebuilds from 1920 down to 26, so a
> > > > > > factor 73 approximately.
> > > > > >
> > > > > > Apologies for the long CC list, please let me know how you would go
> > > > > > about merging that and if another approach would be preferable, e.g:
> > > > > > introducing a CONFIG_ARCH_INITRD_BELOW_START_OK Kconfig option or
> > > > > > something like that.
> > > > >
> > > > > There may be a better way as of 4.20 because bootmem is now gone and
> > > > > only memblock is used. This should unify what each arch needs to do
> > > > > with initrd early. We need the physical address early for memblock
> > > > > reserving. Then later on we need the virtual address to access the
> > > > > initrd. Perhaps we should just change initrd_start and initrd_end to
> > > > > physical addresses (or add 2 new variables would be less invasive and
> > > > > allow for different translation than __va()). The sanity checks and
> > > > > memblock reserve could also perhaps be moved to a common location.
> > > > >
> > > > > Alternatively, given arm64 is the only oddball, I'd be fine with an
> > > > > "if (IS_ENABLED(CONFIG_ARM64))" condition in the default
> > > > > __early_init_dt_declare_initrd as long as we have a path to removing
> > > > > it like the above option.
> > > >
> > > > I think arm64 does not have to redefine 
> > > > __early_init_dt_declare_initrd().
> > > > Something like this might be just all we need (completely untested,
> > > > probably it won't even compile):
> > > >
> > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > > index 9d9582c..e9ca238 100644
> > > > --- a/arch/arm64/mm/init.c
> > > > +++ b/arch/arm64/mm/init.c
> > > > @@ -62,6 +62,9 @@ s64 memstart_addr __ro_after_init = -1;
> > > >  phys_addr_t arm64_dma_phys_limit __ro_after_init;
> > > >
> > > >  #ifdef CONFIG_BLK_DEV_INITRD
> > > > +
> > > > +static phys_addr_t initrd_start_phys, initrd_end_phys;
> > > > +
> > > >  static int __init early_initrd(char *p)
> > > >  {
> > > > unsigned long start, size;
> > > > @@ -71,8 +74,8 @@ static int __init early_initrd(char *p)
> > > > if (*endp == ',') {
> > > > size = memparse(endp + 1, NULL);
> > > >
> > > > -   initrd_start = start;
> > > > -   initrd_end = start + size;
> > > > +   initrd_start_phys = start;
> > > > +   initrd_end_phys = end;
> > > > }
> > > > return 0;
> > > >  }
> > > > @@ -407,14 +410,27 @@ void __init arm64_memblock_init(void)
> > > > memblock_add(__pa_symbol(_text), (u64)(_end - _text));
> > > > }
> > > >
> > > > -   if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && initrd_start) {
> > > > +   if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) &&
> > > > +   (initrd_start || initrd_start_phys)) {
> > > > +   /*
> > > > +* FIXME: ensure proper precendence between
> > > > +* early_initrd and DT when both are present
> > >
> > > Command line takes precedence, so just reverse the order.
> > >
> > > > +*/
> > > > +   if (initrd_start) {
> > > > +   initrd_start_phys = 
> > > > __phys_to_virt(initrd_start);
> > > > +   initrd_end_phys = __phys_to_virt(initrd_end);
> 
> BTW, I think 

RE: [PATCH 3/6] PCI: layerscape: Add the EP mode support

2018-10-26 Thread Xiaowei Bao


-Original Message-
From: arndbergm...@gmail.com  On Behalf Of Arnd Bergmann
Sent: 2018年10月26日 15:01
To: Xiaowei Bao 
Cc: Rob Herring ; bhelg...@google.com; mark.rutl...@arm.com; 
shawn...@kernel.org; Leo Li ; kis...@ti.com; 
lorenzo.pieral...@arm.com; gre...@linuxfoundation.org; M.h. Lian 
; Mingkai Hu ; Roy Zang 
; kstew...@linuxfoundation.org; 
cyrille.pitc...@free-electrons.com; pombreda...@nexb.com; 
shawn@rock-chips.com; niklas.cas...@axis.com; linux-...@vger.kernel.org; 
devicet...@vger.kernel.org; linux-ker...@vger.kernel.org; 
linux-arm-ker...@lists.infradead.org; linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 3/6] PCI: layerscape: Add the EP mode support

On 10/26/18, Xiaowei Bao  wrote:
> From: Rob Herring 
>> On Thu, Oct 25, 2018 at 07:08:58PM +0800, Xiaowei Bao wrote:
>>>  "fsl,ls2080a-pcie", "fsl,ls2085a-pcie", "snps,dw-pcie"
>>>  "fsl,ls2088a-pcie"
>>>  "fsl,ls1088a-pcie"
>>>  "fsl,ls1046a-pcie"
>>>  "fsl,ls1012a-pcie
>>> +  EP mode:
>>> +"fsl,ls-pcie-ep"
>>
> > You need SoC specific compatibles for the same reasons as the RC.
>
> [Xiaowei Bao] I want to contains all layerscape platform use one 
> compatible if the PCIe controller work in EP mode.

Do you mean only one of the SoCs that support RC mode has EP mode?
I think you still need a SoC specific compatible as Rob explained, in case 
there will be a second one in the future.

If you want to ensure that you don't have to update the device driver for each 
new chip that comes in when the EP mode is compatible, the way this is handled 
is to list multiple values in the compatible property, listing the first SoC 
that introduced the specific version of that IP block as the most generic type, 
e.g.

  copatible = "fsl,ls2088a-pcie-ep", "fsl,ls1012a-pcie-ep", "snps,dw-pcie-ep";

For consistency, it probably is best to match each RC mode value with the 
corresponding EP mode string for each device that can support both (if there is 
more than one).

  Arnd
[Xiaowei Bao] My mean is that the ls-pcie-ep compatibles will contain all 
layerscape SOCs of NXP, e.g: ls1046a-pcie-ep, fsl,ls2088a-pcie-ep, 
ls2088a-pcie-ep and so on, other layerscape SOCs have not test except the 
ls1046a, I think it is compatible if the new chip or other SOCs use the DW 
core, OK, I will discuss this issue internally, and reply to you later.


Re: [PATCH v1 3/5] powerpc/pseries: Fix stolen time accounting when dtl debugfs is used

2018-10-26 Thread Naveen N. Rao

Paul Mackerras wrote:

On Fri, Oct 26, 2018 at 01:55:44AM +0530, Naveen N. Rao wrote:

When the dtl debugfs interface is used, we usually set the
dtl_enable_mask to 0x7 (DTL_LOG_ALL). When this happens, we start seeing
DTL entries for all preempt reasons, including CEDE. In
scan_dispatch_log(), we add up the times from all entries and account
those towards stolen time. However, we should only be accounting stolen
time when the preemption was due to HDEC at the end of our time slice.


It's always been the case that stolen time when idle has been
accounted as idle time, not stolen time.  That's why we didn't check
for this in the past.

Do you have a test that shows different results (as in reported idle
and stolen times) with this patch compared to without?


Ah ok, that makes sense now and explains why I couldn't observe much of 
a difference in practice. However, I also went by the fact that there 
are 7 other preemption reasons, which could impact our calculation.  
Looking at the list again, it looks like H_CONFER/H_PROD and some faults 
can also have an impact here, though they may be rare?


Thanks,
Naveen




Re: [PATCH 3/6] PCI: layerscape: Add the EP mode support

2018-10-26 Thread Arnd Bergmann
On 10/26/18, Xiaowei Bao  wrote:
> From: Rob Herring 
>> On Thu, Oct 25, 2018 at 07:08:58PM +0800, Xiaowei Bao wrote:
>>>  "fsl,ls2080a-pcie", "fsl,ls2085a-pcie", "snps,dw-pcie"
>>>  "fsl,ls2088a-pcie"
>>>  "fsl,ls1088a-pcie"
>>>  "fsl,ls1046a-pcie"
>>>  "fsl,ls1012a-pcie"
>>> +  EP mode:
>>> +"fsl,ls-pcie-ep"
>>
> > You need SoC specific compatibles for the same reasons as the RC.
>
> [Xiaowei Bao] I want to contains all layerscape platform use one compatible
> if the PCIe controller work in EP mode.

Do you mean only one of the SoCs that support RC mode has EP mode?
I think you still need a SoC specific compatible as Rob explained, in case
there will be a second one in the future.

If you want to ensure that you don't have to update the device driver
for each new chip that comes in when the EP mode is compatible,
the way this is handled is to list multiple values in the compatible
property, listing the first SoC that introduced the specific version of
that IP block as the most generic type, e.g.

  copatible = "fsl,ls2088a-pcie-ep", "fsl,ls1012a-pcie-ep", "snps,dw-pcie-ep";

For consistency, it probably is best to match each RC mode value with
the corresponding EP mode string for each device that can support both
(if there is more than one).

  Arnd


[PATCH 5/5] powerpc/64s: Document that PPC supports nosmap

2018-10-26 Thread Russell Currey
Signed-off-by: Russell Currey 
---
 Documentation/admin-guide/kernel-parameters.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index a5ad67d5cb16..8f78e75965f0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2764,7 +2764,7 @@
noexec=on: enable non-executable mappings (default)
noexec=off: disable non-executable mappings
 
-   nosmap  [X86]
+   nosmap  [X86,PPC]
Disable SMAP (Supervisor Mode Access Prevention)
even if it is supported by processor.
 
-- 
2.19.1



[PATCH 4/5] powerpc/64s: Disable GUAP with nosmap option

2018-10-26 Thread Russell Currey
GUAP is similar to SMAP on x86 platforms, so implement support for
the same kernel parameter.

Signed-off-by: Russell Currey 
---
 arch/powerpc/mm/init_64.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 7a9886f98b0c..b26641df36f2 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -312,6 +312,7 @@ void register_page_bootmem_memmap(unsigned long section_nr,
 
 #ifdef CONFIG_PPC_BOOK3S_64
 static bool disable_radix = !IS_ENABLED(CONFIG_PPC_RADIX_MMU_DEFAULT);
+static bool disable_guap = !IS_ENABLED(CONFIG_PPC_RADIX_GUAP);
 
 static int __init parse_disable_radix(char *p)
 {
@@ -328,6 +329,18 @@ static int __init parse_disable_radix(char *p)
 }
 early_param("disable_radix", parse_disable_radix);
 
+static int __init parse_nosmap(char *p)
+{
+   /*
+* nosmap is an existing option on x86 where it doesn't return -EINVAL
+* if the parameter is set to something, so even though it's different
+* to disable_radix, don't return an error for compatibility.
+*/
+   disable_guap = true;
+   return 0;
+}
+early_param("nosmap", parse_nosmap);
+
 /*
  * If we're running under a hypervisor, we need to check the contents of
  * /chosen/ibm,architecture-vec-5 to see if the hypervisor is willing to do
@@ -381,6 +394,8 @@ void __init mmu_early_init_devtree(void)
/* Disable radix mode based on kernel command line. */
if (disable_radix)
cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
+   if (disable_radix || disable_guap)
+   cur_cpu_spec->mmu_features &= ~MMU_FTR_RADIX_GUAP;
 
/*
 * Check /chosen/ibm,architecture-vec-5 if running as a guest.
-- 
2.19.1



[PATCH 3/5] powerpc/lib: checksum GUAP support

2018-10-26 Thread Russell Currey
Wrap the checksumming code in GUAP locks and unlocks.

Signed-off-by: Russell Currey 
---
 arch/powerpc/lib/checksum_wrappers.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/lib/checksum_wrappers.c 
b/arch/powerpc/lib/checksum_wrappers.c
index a0cb63fb76a1..c67db0a6e18b 100644
--- a/arch/powerpc/lib/checksum_wrappers.c
+++ b/arch/powerpc/lib/checksum_wrappers.c
@@ -28,6 +28,7 @@ __wsum csum_and_copy_from_user(const void __user *src, void 
*dst,
 {
unsigned int csum;
 
+   unlock_user_access();
might_sleep();
 
*err_ptr = 0;
@@ -60,6 +61,7 @@ __wsum csum_and_copy_from_user(const void __user *src, void 
*dst,
}
 
 out:
+   lock_user_access();
return (__force __wsum)csum;
 }
 EXPORT_SYMBOL(csum_and_copy_from_user);
@@ -69,6 +71,7 @@ __wsum csum_and_copy_to_user(const void *src, void __user 
*dst, int len,
 {
unsigned int csum;
 
+   unlock_user_access();
might_sleep();
 
*err_ptr = 0;
@@ -97,6 +100,7 @@ __wsum csum_and_copy_to_user(const void *src, void __user 
*dst, int len,
}
 
 out:
+   lock_user_access();
return (__force __wsum)csum;
 }
 EXPORT_SYMBOL(csum_and_copy_to_user);
-- 
2.19.1



[PATCH 0/5] Guarded Userspace Access Prevention on Radix

2018-10-26 Thread Russell Currey
Guarded Userspace Access Prevention is a security mechanism that prevents
the kernel from being able to read and write userspace addresses outside of
the allowed paths, most commonly copy_{to/from}_user().

At present, the only CPU that supports this is POWER9, and only while using
the Radix MMU.  Privileged reads and writes cannot access user data when
key 0 of the AMR is set.  This is described in the "Radix Tree Translation
Storage Protection" section of the POWER ISA as of version 3.0.

GUAP code sets key 0 of the AMR (thus disabling accesses of user data)
early during boot, and only ever "unlocks" access prior to certain
operations, like copy_{to/from}_user(), futex ops, etc.  Setting this does
not prevent unprivileged access, so userspace can operate fine while access
is locked.

There is a performance impact, although I don't consider it heavy.  Running
a worst-case benchmark of a 1GB copy 1 byte at a time (and thus constant
read(1) write(1) syscalls), I found enabling GUAP to be 3.5% slower than
when disabled.  In most cases, the difference is negligible.  The main
performance impact is the mtspr instruction, which is quite slow.

There are a few caveats with this series that could be improved upon in
future.  Right now there is no saving and restoring of the AMR value -
there is no userspace exploitation of the AMR on Radix in POWER9, but if
this were to change in future, saving and restoring the value would be
necessary.

No attempt to optimise cases of repeated calls - for example, if some
code was repeatedly calling copy_to_user() for small sizes very frequently,
it would be slower than the equivalent of wrapping that code in an unlock
and lock and only having to modify the AMR once.

There are some interesting cases that I've attempted to handle, such as if
the AMR is unlocked (i.e. because a copy_{to_from}_user is in progress)...

- and an exception is taken, the kernel would then be running with the
AMR unlocked and freely able to access userspace again.  I am working
around this by storing a flag in the PACA to indicate if the AMR is
unlocked (to save a costly SPR read), and if so, locking the AMR in
the exception entry path and unlocking it on the way out.

- and gets context switched out, goes into a path that locks the AMR,
then context switches back, access will be disabled and will fault.
As a result, I context switch the AMR between tasks as if it was used
by userspace like hash (which already implements this).

Another consideration is use of the isync instruction.  Without an isync
following the mtspr instruction, there is no guarantee that the change
takes effect.  The issue is that isync is very slow, and so I tried to
avoid them wherever necessary.  In this series, the only place an isync
gets used is after *unlocking* the AMR, because if an access takes place
and access is still prevented, the kernel will fault.

On the flipside, a slight delay in unlocking caused by skipping an isync
potentially allows a small window of vulnerability.  It is my opinion
that this window is practically impossible to exploit, but if someone
thinks otherwise, please do share.

This series is my first attempt at POWER assembly so all feedback is very
welcome.

The official theme song of this series can be found here:
https://www.youtube.com/watch?v=QjTrnKAcYjE

Russell Currey (5):
  powerpc/64s: Guarded Userspace Access Prevention
  powerpc/futex: GUAP support for futex ops
  powerpc/lib: checksum GUAP support
  powerpc/64s: Disable GUAP with nosmap option
  powerpc/64s: Document that PPC supports nosmap

 .../admin-guide/kernel-parameters.txt |  2 +-
 arch/powerpc/include/asm/exception-64e.h  |  3 +
 arch/powerpc/include/asm/exception-64s.h  | 19 ++-
 arch/powerpc/include/asm/futex.h  |  6 ++
 arch/powerpc/include/asm/mmu.h|  7 +++
 arch/powerpc/include/asm/paca.h   |  3 +
 arch/powerpc/include/asm/reg.h|  1 +
 arch/powerpc/include/asm/uaccess.h| 57 ---
 arch/powerpc/kernel/asm-offsets.c |  1 +
 arch/powerpc/kernel/dt_cpu_ftrs.c |  4 ++
 arch/powerpc/kernel/entry_64.S| 17 +-
 arch/powerpc/lib/checksum_wrappers.c  |  6 +-
 arch/powerpc/mm/fault.c   |  9 +++
 arch/powerpc/mm/init_64.c | 15 +
 arch/powerpc/mm/pgtable-radix.c   |  2 +
 arch/powerpc/mm/pkeys.c   |  7 ++-
 arch/powerpc/platforms/Kconfig.cputype| 15 +
 17 files changed, 158 insertions(+), 16 deletions(-)

-- 
2.19.1



[PATCH 2/5] powerpc/futex: GUAP support for futex ops

2018-10-26 Thread Russell Currey
Wrap the futex operations in GUAP locks and unlocks.

Signed-off-by: Russell Currey 
---
 arch/powerpc/include/asm/futex.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/include/asm/futex.h b/arch/powerpc/include/asm/futex.h
index 94542776a62d..3aed640ee9ef 100644
--- a/arch/powerpc/include/asm/futex.h
+++ b/arch/powerpc/include/asm/futex.h
@@ -35,6 +35,7 @@ static inline int arch_futex_atomic_op_inuser(int op, int 
oparg, int *oval,
 {
int oldval = 0, ret;
 
+   unlock_user_access();
pagefault_disable();
 
switch (op) {
@@ -62,6 +63,7 @@ static inline int arch_futex_atomic_op_inuser(int op, int 
oparg, int *oval,
if (!ret)
*oval = oldval;
 
+   lock_user_access();
return ret;
 }
 
@@ -75,6 +77,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
return -EFAULT;
 
+   unlock_user_access();
 __asm__ __volatile__ (
 PPC_ATOMIC_ENTRY_BARRIER
 "1: lwarx   %1,0,%3 # futex_atomic_cmpxchg_inatomic\n\
@@ -95,6 +98,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 : "cc", "memory");
 
*uval = prev;
+   lock_user_access();
 return ret;
 }
 
-- 
2.19.1



[PATCH 1/5] powerpc/64s: Guarded Userspace Access Prevention

2018-10-26 Thread Russell Currey
Guarded Userspace Access Prevention (GUAP)  utilises a feature of
the Radix MMU which disallows read and write access to userspace
addresses.  By utilising this, the kernel is prevented from accessing
user data from outside of trusted paths that perform proper safety checks,
such as copy_{to/from}_user() and friends.

Userspace access is disabled from early boot and is only enabled when:

- exiting the kernel and entering userspace
- performing an operation like copy_{to/from}_user()
- context switching to a process that has access enabled

and similarly, access is disabled again when exiting userspace and entering
the kernel.

This feature has a slight performance impact which I roughly measured to be
3% slower in the worst case (performing 1GB of 1 byte read()/write()
syscalls), and is gated behind the CONFIG_PPC_RADIX_GUAP option for
performance-critical builds.

This feature can be tested by using the lkdtm driver (CONFIG_LKDTM=y) and
performing the following:

echo ACCESS_USERSPACE > [debugfs]/provoke-crash/DIRECT

if enabled, this should send SIGSEGV to the thread.

Signed-off-by: Russell Currey 
---
Since the previous version of this patchset (named KHRAP) there have been
several changes, some of which include:

- macro naming, suggested by Nick
- builds should be fixed outside of 64s
- no longer unlock heading out to userspace
- removal of unnecessary isyncs
- more config option testing
- removal of save/restore
- use pr_crit() and reword message on fault

 arch/powerpc/include/asm/exception-64e.h |  3 ++
 arch/powerpc/include/asm/exception-64s.h | 19 +++-
 arch/powerpc/include/asm/mmu.h   |  7 +++
 arch/powerpc/include/asm/paca.h  |  3 ++
 arch/powerpc/include/asm/reg.h   |  1 +
 arch/powerpc/include/asm/uaccess.h   | 57 
 arch/powerpc/kernel/asm-offsets.c|  1 +
 arch/powerpc/kernel/dt_cpu_ftrs.c|  4 ++
 arch/powerpc/kernel/entry_64.S   | 17 ++-
 arch/powerpc/mm/fault.c  | 12 +
 arch/powerpc/mm/pgtable-radix.c  |  2 +
 arch/powerpc/mm/pkeys.c  |  7 ++-
 arch/powerpc/platforms/Kconfig.cputype   | 15 +++
 13 files changed, 135 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64e.h 
b/arch/powerpc/include/asm/exception-64e.h
index 555e22d5e07f..bf25015834ee 100644
--- a/arch/powerpc/include/asm/exception-64e.h
+++ b/arch/powerpc/include/asm/exception-64e.h
@@ -215,5 +215,8 @@ exc_##label##_book3e:
 #define RFI_TO_USER\
rfi
 
+#define UNLOCK_USER_ACCESS(reg)
+#define LOCK_USER_ACCESS(reg)
+
 #endif /* _ASM_POWERPC_EXCEPTION_64E_H */
 
diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 3b4767ed3ec5..0cac5bd380ca 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -264,6 +264,19 @@ BEGIN_FTR_SECTION_NESTED(943)  
\
std ra,offset(r13); \
 END_FTR_SECTION_NESTED(ftr,ftr,943)
 
+#define LOCK_USER_ACCESS(reg)  
\
+BEGIN_MMU_FTR_SECTION_NESTED(944)  \
+   LOAD_REG_IMMEDIATE(reg,AMR_LOCKED); \
+   mtspr   SPRN_AMR,reg;   \
+END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_GUAP,MMU_FTR_RADIX_GUAP,944)
+
+#define UNLOCK_USER_ACCESS(reg)
\
+BEGIN_MMU_FTR_SECTION_NESTED(945)  \
+   li  reg,0;  \
+   mtspr   SPRN_AMR,reg;   \
+   isync   \
+END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_GUAP,MMU_FTR_RADIX_GUAP,945)
+
 #define EXCEPTION_PROLOG_0(area)   \
GET_PACA(r13);  \
std r9,area+EX_R9(r13); /* save r9 */   \
@@ -500,7 +513,11 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
beq 4f; /* if from kernel mode  */ \
ACCOUNT_CPU_USER_ENTRY(r13, r9, r10);  \
SAVE_PPR(area, r9);\
-4: EXCEPTION_PROLOG_COMMON_2(area)\
+4: lbz r9,PACA_USER_ACCESS_ALLOWED(r13);  \
+   cmpwi   cr1,r9,0;  \
+   beq 5f;\
+   LOCK_USER_ACCESS(r9);   
   \
+5: