Re: [PATCH v3 11/13] dt-bindings: tegra: Update csi data-lanes to maximum 8 lanes

2020-12-09 Thread Rob Herring
On Thu, 03 Dec 2020 11:00:00 -0800, Sowjanya Komatineni wrote:
> Tegra VI/CSI hardware don't have native 8 lane CSI RX port.
> 
> But x8 capture can be supported by using consecutive x4 ports
> simultaneously with HDMI-to-CSI bridges where source image is split
> on to two x4 ports.
> 
> This patch updates dt-bindings for csi endpoint data-lane property
> with maximum of 8 lanes.
> 
> Signed-off-by: Sowjanya Komatineni 
> ---
>  .../devicetree/bindings/display/tegra/nvidia,tegra20-host1x.txt   | 4 
> ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 

Acked-by: Rob Herring 


Re: [RESEND PATCH 2/2] dt-bindings: tegra: Add missing HDA properties

2020-12-09 Thread Rob Herring
On Thu, 03 Dec 2020 20:06:42 +0530, Sameer Pujar wrote:
> Document the missing properties which are currently required for
> Tegra186/Tegra194 DT files.
> 
> Signed-off-by: Sameer Pujar 
> ---
>  .../devicetree/bindings/sound/nvidia,tegra30-hda.yaml  | 14 
> ++
>  1 file changed, 14 insertions(+)
> 

Reviewed-by: Rob Herring 


Re: [PATCH] spi: Limit the spi device max speed to controller's max speed

2020-12-09 Thread Serge Semin
On Wed, Dec 09, 2020 at 07:54:20PM +, Mark Brown wrote:
> On Wed, Dec 09, 2020 at 10:46:36PM +0300, Serge Semin wrote:
> 
> > On Wed, Dec 09, 2020 at 07:35:14PM +0200, Tudor Ambarus wrote:
> 
> > > Make sure the max_speed_hz of spi_device does not override
> > > the max_speed_hz of controller.
> 
> > I have doubts that's right thing to do. It seems better to let
> > the controller driver to handle the speed clamping itself, while
> > to leave the SPI client device max_speed_hz field describing the
> > device speed capability. Moreover the SPI-transfers passed to the
> > controller will have a SPI-bus speed fixed in accordance with the
> > controller and client device capabilities anyway.
> > See the __spi_validate() method for details:
> > https://elixir.bootlin.com/linux/v5.10-rc7/source/drivers/spi/spi.c#L3570
> 

> Right, in general we aim to do this sort of fixup on the transfers
> and messages rather than the devices, I guess we might be missing
> validation in some of the flash acceleration paths or was this an issue
> seen through inspection?

In case of DW SPI driver we just make sure the SPI-client device
speed set in the max_speed_hz doesn't exceed the controller SPI-bus
clock frequency and clamp it if it does. So the driver is safe in that
matter.

-Sergey


Re: linux-next: Tree for Dec 9 (ethernet/mellanox/mlx5)

2020-12-09 Thread Randy Dunlap
On 12/9/20 2:44 AM, Stephen Rothwell wrote:
> Hi all,
> 
> Changes since 20201208:
> 

on i386:

I see this warning:/note: repeated 106 times on i386 build:

 from ../drivers/net/ethernet/mellanox/mlx5/core/alloc.c:34:
../include/vdso/bits.h:7:26: warning: left shift count >= width of type 
[-Wshift-count-overflow]
 #define BIT(nr)   (UL(1) << (nr))
  ^
../include/linux/mlx5/mlx5_ifc.h:10716:46: note: in expansion of macro ‘BIT’
  MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_SAMPLER = BIT(0x20),
  ^~~



-- 
~Randy
Reported-by: Randy Dunlap 


Re: [PATCH linux hwmon-next v4 1/3] hwmon: (sbtsi) Add basic support for SB-TSI sensors

2020-12-09 Thread Supreeth Venkatesh
On 12/2/20 10:55 AM, Kun Yi wrote:
> [CAUTION: External Email]
> 
> SB Temperature Sensor Interface (SB-TSI) is an SMBus compatible
> interface that reports AMD SoC's Ttcl (normalized temperature),
> and resembles a typical 8-pin remote temperature sensor's I2C interface
> to BMC.
> 
> This commit adds basic support using this interface to read CPU
> temperature, and read/write high/low CPU temp thresholds.
> 
> To instantiate this driver on an AMD CPU with SB-TSI
> support, the i2c bus number would be the bus connected from the board
> management controller (BMC) to the CPU. The i2c address is specified in
> Section 6.3.1 of the spec [1]: The SB-TSI address is normally 98h for
> socket 0 and 90h for socket 1, but it could vary based on hardware address
> select pins.
> 
> [1]: 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2F56255_OSRR.pdf&data=04%7C01%7Csupreeth.venkatesh%40amd.com%7C412ca045665445483a0a08d896e33084%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C1%7C637425250268541698%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Xir5Jg%2FgHz8isnXNkmZMTRjg%2FOht%2FSDR4gajR5wVgSU%3D&reserved=0
> 
> Test status: tested reading temp1_input, and reading/writing
> temp1_max/min.
> 
> Signed-off-by: Kun Yi 
Series
 Acked-by: Supreeth Venkatesh 

> ---
>  drivers/hwmon/Kconfig  |  10 ++
>  drivers/hwmon/Makefile |   1 +
>  drivers/hwmon/sbtsi_temp.c | 265 +
>  3 files changed, 276 insertions(+)
>  create mode 100644 drivers/hwmon/sbtsi_temp.c
> 
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index 3c059fc23cd6..3d6a809700ae 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -1522,6 +1522,16 @@ config SENSORS_SL28CPLD
>   This driver can also be built as a module.  If so, the module
>   will be called sl28cpld-hwmon.
> 
> +config SENSORS_SBTSI
> +   tristate "Emulated SB-TSI temperature sensor"
> +   depends on I2C
> +   help
> + If you say yes here you get support for emulated temperature
> + sensors on AMD SoCs with SB-TSI interface connected to a BMC device.
> +
> + This driver can also be built as a module. If so, the module will
> + be called sbtsi_temp.
> +
>  config SENSORS_SHT15
> tristate "Sensiron humidity and temperature sensors. SHT15 and 
> compat."
> depends on GPIOLIB || COMPILE_TEST
> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> index 01ca5d3fbad4..ee8c037919da 100644
> --- a/drivers/hwmon/Makefile
> +++ b/drivers/hwmon/Makefile
> @@ -160,6 +160,7 @@ obj-$(CONFIG_SENSORS_POWR1220)  += powr1220.o
>  obj-$(CONFIG_SENSORS_PWM_FAN)  += pwm-fan.o
>  obj-$(CONFIG_SENSORS_RASPBERRYPI_HWMON)+= raspberrypi-hwmon.o
>  obj-$(CONFIG_SENSORS_S3C)  += s3c-hwmon.o
> +obj-$(CONFIG_SENSORS_SBTSI)+= sbtsi_temp.o
>  obj-$(CONFIG_SENSORS_SCH56XX_COMMON)+= sch56xx-common.o
>  obj-$(CONFIG_SENSORS_SCH5627)  += sch5627.o
>  obj-$(CONFIG_SENSORS_SCH5636)  += sch5636.o
> diff --git a/drivers/hwmon/sbtsi_temp.c b/drivers/hwmon/sbtsi_temp.c
> new file mode 100644
> index ..6b0a7b9df1f3
> --- /dev/null
> +++ b/drivers/hwmon/sbtsi_temp.c
> @@ -0,0 +1,265 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * sbtsi_temp.c - hwmon driver for a SBI Temperature Sensor Interface 
> (SB-TSI)
> + *compliant AMD SoC temperature device.
> + *
> + * Copyright (c) 2020, Google Inc.
> + * Copyright (c) 2020, Kun Yi 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * SB-TSI registers only support SMBus byte data access. "_INT" registers are
> + * the integer part of a temperature value or limit, and "_DEC" registers are
> + * corresponding decimal parts.
> + */
> +#define SBTSI_REG_TEMP_INT 0x01 /* RO */
> +#define SBTSI_REG_STATUS   0x02 /* RO */
> +#define SBTSI_REG_CONFIG   0x03 /* RO */
> +#define SBTSI_REG_TEMP_HIGH_INT0x07 /* RW */
> +#define SBTSI_REG_TEMP_LOW_INT 0x08 /* RW */
> +#define SBTSI_REG_TEMP_DEC 0x10 /* RW */
> +#define SBTSI_REG_TEMP_HIGH_DEC0x13 /* RW */
> +#define SBTSI_REG_TEMP_LOW_DEC 0x14 /* RW */
> +
> +#define SBTSI_CONFIG_READ_ORDER_SHIFT  5
> +
> +#define SBTSI_TEMP_MIN 0
> +#define SBTSI_TEMP_MAX 255875
> +
> +/* Each client has this additional data */
> +struct sbtsi_data {
> +   struct i2c_client *client;
> +   struct mutex lock;
> +};
> +
> +/*
> + * From SB-TSI spec: CPU temperature readings and limit registers encode the
> + * temperature in increments of 0.125 from 0 to 255.875. The "high byte"
> + * register encodes the base-2 of the integer portion, and the upper 3 bits 
> of
> + * the "low byte" encode in base-2 the decimal portion.
> + *
> + * e.g. INT=0x19, DEC=0x20 represents 25.12

Re: [BUG] iwlwifi: card unusable after firmware crash

2020-12-09 Thread Rui Salvaterra
Hi, guys,

On Wed, 9 Dec 2020 at 17:13, Jakub Kicinski  wrote:
>
> Any luck figuring this out, Luca? If this is a 5.10 regression we need
> to let Linus know tomorrow, so the time is ticking :(

I don't have the possibility to test other kernels at the moment, but
I will do so in a few days (at least to find a working version to
bisect). Meanwhile, I don't know if this is relevant or not, but I'm
using WPA3 PSK.

Thanks,
Rui


Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

2020-12-09 Thread Matthew Wilcox
On Wed, Dec 09, 2020 at 11:47:56AM -0800, Dan Williams wrote:
> On Tue, Dec 8, 2020 at 8:03 PM Matthew Wilcox  wrote:
> > On Tue, Dec 08, 2020 at 06:22:50PM -0800, Ira Weiny wrote:
> > > Therefore, I tend to agree with Dan that if anything is to be done it 
> > > should be
> > > a WARN_ON() which is only going to throw an error that something has 
> > > probably
> > > been wrong all along and should be fixed but continue running as before.
> >
> > Silent data corruption is for ever.  Are you absolutely sure nobody has
> > done:
> >
> > page = alloc_pages(GFP_HIGHUSER_MOVABLE, 3);
> > memcpy_to_page(page, PAGE_SIZE * 2, p, PAGE_SIZE * 2);
> >
> > because that will work fine if the pages come from ZONE_NORMAL and fail
> > miserably if they came from ZONE_HIGHMEM.
> 
> ...and violently regress with the BUG_ON.

... which is what we want, no?

> The question to me is: which is more likely that any bad usages have
> been covered up by being limited to ZONE_NORMAL / 64-bit only, or that
> silent data corruption has been occurring with no ill effects?

I wouldn't be at all surprised to learn that there is silent data
corruption on 32-bit systems with HIGHMEM.  Would you?  How much testing
do you do on 32-bit HIGHMEM systems?

Actually, I wouldn't be at all surprised if we can hit this problem today.
Look at this:

size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
if (unlikely(iov_iter_is_pipe(i))) {
WARN_ON(1);
return 0;
}
if (iter_is_iovec(i))
might_fault();
iterate_and_advance(i, bytes, v,
copyin((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
 v.bv_offset, v.bv_len),
memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
)

return bytes;
}
EXPORT_SYMBOL(_copy_from_iter);

There's a lot of macrology in there, so for those following along who
aren't familiar with the iov_iter code, if the iter is operating on a
bvec, then iterate_and_advance() will call memcpy_from_page(), passing
it the bv_page, bv_offset and bv_len stored in the bvec.  Since 2019,
Linux has supported multipage bvecs (commit 07173c3ec276).  So bv_len
absolutely *can* be > PAGE_SIZE.

Does this ever happen in practice?  I have no idea; I don't know whether
any multipage BIOs are currently handed to copy_from_iter().  But I
have no confidence in your audit if you didn't catch this one.

> > > FWIW I think this is a 'bad BUG_ON' use because we are "checking 
> > > something that
> > > we know we might be getting wrong".[1]  And because, "BUG() is only good 
> > > for
> > > something that never happens and that we really have no other option 
> > > for".[2]
> >
> > BUG() is our only option here.  Both limiting how much we copy or
> > copying the requested amount result in data corruption or leaking
> > information to a process that isn't supposed to see it.
> 
> At a minimum I think this should be debated in a follow on patch to
> add assertion checking where there was none before. There is no
> evidence of a page being overrun in the audit Ira performed.

If we put in into a separate patch, someone will suggest backing out the
patch which tells us that there's a problem.  You know, like this guy ...
https://lore.kernel.org/linux-mm/CAPcyv4jNVroYmirzKw_=cseixoeacdl3m1wc4xjd_tfv3h+...@mail.gmail.com/


Re: [PATCH v4 3/3] spi: dt-bindings: document zero value for spi-{rx,tx}-bus-width properties

2020-12-09 Thread Rob Herring
On Thu, 03 Dec 2020 16:05:31 +0200, Alexandru Ardelean wrote:
> Following a change to the SPI framework, providing a value of zero for
> 'spi-rx-bus-width' and 'spi-tx-bus-width' is now possible and will
> essentially mean that no RX or TX is allowed.
> 
> Signed-off-by: Alexandru Ardelean 
> ---
> 
> Changelog v3 -> v4:
> * 
> https://lore.kernel.org/linux-spi/20201127130834.136348-3-alexandru.ardel...@analog.com/
> * fix typos
> 
>  Documentation/devicetree/bindings/spi/spi-controller.yaml | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 

Reviewed-by: Rob Herring 


Re: [PATCH v5 0/2] MTE support for KVM guest

2020-12-09 Thread Richard Henderson
On 12/9/20 12:39 PM, Catalin Marinas wrote:
>> I would have thought that the best way is to use TCO, so that we don't have 
>> to
>> have dual mappings (and however many MB of extra page tables that might 
>> imply).
> 
> The problem appears when the VMM wants to use MTE itself (e.g. linked
> against an MTE-aware glibc), toggling TCO is no longer generic enough,
> especially when it comes to device emulation.

But we do know exactly when we're manipulating guest memory -- we have special
routines for that.  So the special routines gain a toggle of TCO around the
exact guest memory manipulation, not a blanket disable of MTE across large
swaths of QEMU.


r~


Re: [PATCH v1] scripts: switch explicitly to Python 3

2020-12-09 Thread Pavel Machek
Hi!

> Some distributions are about to switch to Python 3 support only.
> This means that /usr/bin/python, which is Python 2, is not available
> anymore. Hence, switch scripts to use Python 3 explicitly.

Should python be mentioned in Documentation/process/changes.rst ?

Best regards,
Pavel


-- 
http://www.livejournal.com/~pavelmachek


signature.asc
Description: PGP signature


Re: [PATCH net-next v5 2/9] dt-bindings: net: dsa: microchip,ksz: add interrupt property

2020-12-09 Thread Rob Herring
On Thu, 03 Dec 2020 11:21:10 +0100, Christian Eggers wrote:
> The devices have an optional interrupt line.
> 
> Signed-off-by: Christian Eggers 
> ---
>  .../devicetree/bindings/net/dsa/microchip,ksz.yaml | 7 +++
>  1 file changed, 7 insertions(+)
> 

Reviewed-by: Rob Herring 


Re: [PATCH] spi: dt-bindings: clarify CS behavior for spi-cs-high and gpio descriptors

2020-12-09 Thread H. Nikolaus Schaller
Hi Andreas,

> Am 09.12.2020 um 21:01 schrieb Andreas Kemnade :
> 
> On Wed, 9 Dec 2020 14:04:26 -0500
> Sven Van Asbroeck  wrote:
> 
>> On Wed, Dec 9, 2020 at 1:16 PM H. Nikolaus Schaller  
>> wrote:
>>> 
>>> This is also what made me wonder if that is really intended because then
>>> the whole discussion about the cs-gpio-flags and inversion and the fixes
>>> would not have been needed. The current code and fixes are all about
>>> not ignoring the flags...  
>> 
>> The inversion you witnessed was a bug caused by spi client drivers that
>> simply "plow over" the SPI_CS_HIGH mode flag. This includes the panel driver
>> you're using, see:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c?h=v5.10-rc6#n337
>> 
> ah, it would be set in spi->mode and is cleared by
> 
> spi->mode = SPI_MODE_3;
> 
> 
> Hmm, but we have
>  spi-cpol;
>spi-cpha;
> in devicetree. Why do we need that spi->mode line at all?

Because it is there in almost all or at least many drivers.

But I have tested with 

> spi->mode |= SPI_MODE_3;

which should keep the mode intact. Right? That did not work either.

So let's not derail the discussion by moving to the code of some
specific driver. Even if that is wrong it does not solve what
this patch wants to solve.

BR and thanks,
Nikolaus



Re: [PATCH linux hwmon-next v4 3/3] dt-bindings: (hwmon/sbtsi_tmep) Add SB-TSI hwmon driver bindings

2020-12-09 Thread Rob Herring
On Wed, 02 Dec 2020 08:56:01 -0800, Kun Yi wrote:
> Document device tree bindings for AMD SB-TSI emulated temperature
> sensor.
> 
> Signed-off-by: Kun Yi 
> ---
>  .../devicetree/bindings/hwmon/amd,sbtsi.yaml  | 54 +++
>  1 file changed, 54 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/hwmon/amd,sbtsi.yaml
> 

Reviewed-by: Rob Herring 


Re: [PATCH net-next] vrf: handle CONFIG_IPV6 not set for vrf_add_mac_header_if_unset()

2020-12-09 Thread Jakub Kicinski
On Tue, 8 Dec 2020 12:51:55 -0700 David Ahern wrote:
> On 12/8/20 10:52 AM, Andrea Mayer wrote:
> > The vrf_add_mac_header_if_unset() is defined within a conditional
> > compilation block which depends on the CONFIG_IPV6 macro.
> > However, the vrf_add_mac_header_if_unset() needs to be called also by IPv4
> > related code and when the CONFIG_IPV6 is not set, this function is missing.
> > As a consequence, the build process stops reporting the error:
> > 
> >  ERROR: implicit declaration of function 'vrf_add_mac_header_if_unset'
> > 
> > The problem is solved by *only* moving functions
> > vrf_add_mac_header_if_unset() and vrf_prepare_mac_header() out of the
> > conditional block.
> > 
> > Reported-by: kernel test robot 
> > Fixes: 0489390882202 ("vrf: add mac header for tunneled packets when 
> > sniffer is attached")
> > Signed-off-by: Andrea Mayer 
>
> I should have caught that in my review.
> 
> Reviewed-by: David Ahern 

Applied, thank you!


Re: [PATCH v2 sl-b 3/5] mm: Make mem_dump_obj() handle vmalloc() memory

2020-12-09 Thread Uladzislau Rezki
On Wed, Dec 09, 2020 at 11:42:39AM -0800, Paul E. McKenney wrote:
> On Wed, Dec 09, 2020 at 08:36:37PM +0100, Uladzislau Rezki wrote:
> > On Tue, Dec 08, 2020 at 05:13:01PM -0800, paul...@kernel.org wrote:
> > > From: "Paul E. McKenney" 
> > > 
> > > This commit adds vmalloc() support to mem_dump_obj().  Note that the
> > > vmalloc_dump_obj() function combines the checking and dumping, in
> > > contrast with the split between kmem_valid_obj() and kmem_dump_obj().
> > > The reason for the difference is that the checking in the vmalloc()
> > > case involves acquiring a global lock, and redundant acquisitions of
> > > global locks should be avoided, even on not-so-fast paths.
> > > 
> > > Note that this change causes on-stack variables to be reported as
> > > vmalloc() storage from kernel_clone() or similar, depending on the degree
> > > of inlining that your compiler does.  This is likely more helpful than
> > > the earlier "non-paged (local) memory".
> > > 
> > > Cc: Andrew Morton 
> > > Cc: Joonsoo Kim 
> > > Cc: 
> > > Reported-by: Andrii Nakryiko 
> > > Signed-off-by: Paul E. McKenney 
> > > ---
> > >  include/linux/vmalloc.h |  6 ++
> > >  mm/util.c   | 12 +++-
> > >  mm/vmalloc.c| 12 
> > >  3 files changed, 25 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> > > index 938eaf9..c89c2be 100644
> > > --- a/include/linux/vmalloc.h
> > > +++ b/include/linux/vmalloc.h
> > > @@ -248,4 +248,10 @@ pcpu_free_vm_areas(struct vm_struct **vms, int 
> > > nr_vms)
> > >  int register_vmap_purge_notifier(struct notifier_block *nb);
> > >  int unregister_vmap_purge_notifier(struct notifier_block *nb);
> > >  
> > > +#ifdef CONFIG_MMU
> > > +bool vmalloc_dump_obj(void *object);
> > > +#else
> > > +static inline bool vmalloc_dump_obj(void *object) { return false; }
> > > +#endif
> > > +
> > >  #endif /* _LINUX_VMALLOC_H */
> > > diff --git a/mm/util.c b/mm/util.c
> > > index 8c2449f..ee99a0a 100644
> > > --- a/mm/util.c
> > > +++ b/mm/util.c
> > > @@ -984,6 +984,12 @@ int __weak memcmp_pages(struct page *page1, struct 
> > > page *page2)
> > >   */
> > >  void mem_dump_obj(void *object)
> > >  {
> > > + if (kmem_valid_obj(object)) {
> > > + kmem_dump_obj(object);
> > > + return;
> > > + }
> > > + if (vmalloc_dump_obj(object))
> > > + return;
> > >   if (!virt_addr_valid(object)) {
> > >   if (object == NULL)
> > >   pr_cont(" NULL pointer.\n");
> > > @@ -993,10 +999,6 @@ void mem_dump_obj(void *object)
> > >   pr_cont(" non-paged (local) memory.\n");
> > >   return;
> > >   }
> > > - if (kmem_valid_obj(object)) {
> > > - kmem_dump_obj(object);
> > > - return;
> > > - }
> > > - pr_cont(" non-slab memory.\n");
> > > + pr_cont(" non-slab/vmalloc memory.\n");
> > >  }
> > >  EXPORT_SYMBOL_GPL(mem_dump_obj);
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 6ae491a..7421719 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -3431,6 +3431,18 @@ void pcpu_free_vm_areas(struct vm_struct **vms, 
> > > int nr_vms)
> > >  }
> > >  #endif   /* CONFIG_SMP */
> > >  
> > > +bool vmalloc_dump_obj(void *object)
> > > +{
> > > + struct vm_struct *vm;
> > > + void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > >
> > Paul, vmalloced addresses are already aligned to PAGE_SIZE, so that one
> > is odd.
> 
> They are, but this is to handle things like this:
> 
>   struct foo {
>   int a;
>   struct rcu_head rh;
>   };
> 
>   void silly(struct foo *fp)
>   {
>   call_rcu(&fp->rh, my_rcu_cb);
>   call_rcu(&fp->rh, my_other_rcu_cb);
>   }
> 
> In kernels built with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, this would
> result in a call to mem_dump_obj() and then to vmalloc_dump_obj()
> with a non-page-aligned pointer.
> 
OK, i got it. I thought the functions deals with original vmalloc
pointer. In fact it is not :)

--
Vlad Rezki


Re: [PATCH] dt-bindings: mfd: fix stm32 timers example

2020-12-09 Thread Rob Herring
On Wed, 02 Dec 2020 13:45:14 +0100, Fabrice Gasnier wrote:
> The stm32 timers example name should match the pattern timer@. Also,
> the example is based on stm32mp1 timer 2, so the identifier should be
> '1' instead of '0' (e.g. timer 1).
> 
> Fixes: bfbcbf88f9db ("dt-bindings: timer: Convert stm32 timer bindings to 
> json-schema")
> 
> Signed-off-by: Fabrice Gasnier 
> ---
>  Documentation/devicetree/bindings/mfd/st,stm32-timers.yaml | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 

Applied, thanks!


Re: [BUG] iwlwifi: card unusable after firmware crash

2020-12-09 Thread Coelho, Luciano
Hi Jakub et al,

On Wed, 2020-12-09 at 09:13 -0800, Jakub Kicinski wrote:
> On Tue, 8 Dec 2020 23:17:48 + Rui Salvaterra wrote:
> > Hi, Luca,
> > 
> > On Tue, 8 Dec 2020 at 16:27, Coelho, Luciano  
> > wrote:
> > > On Tue, 2020-12-08 at 11:27 +, Rui Salvaterra wrote:  
> > > > 
> > > > [ 3174.003910] iwlwifi :02:00.0: RF_KILL bit toggled to disable 
> > > > radio.
> > > > [ 3174.003913] iwlwifi :02:00.0: reporting RF_KILL (radio disabled) 
> > > >  
> > > 
> > > It looks like your machine is reporting RF-Kill to the WiFi device.  
> > 
> > Yes, that's an artifact of how I tested: I rebooted the router, the
> > Wi-Fi interface disassociated and the dmesg was clean. However, after
> > the router came up, the laptop didn't reconnect (and the connection
> > had completely disappeared from nmtui). Afterwards, I did the rfkill
> > cycle you see, and only then I got the register dump.
> > 
> > > There seems to be some sort of race there that is causing us to still
> > > try to communicate with the device (and thus you see the transaction
> > > failed dump), but that will obviously fail when RF-Kill is enabled.  
> > 
> > I'm not sure about that, the card was already dead before the rfkill cycle.
> 
> Any luck figuring this out, Luca? If this is a 5.10 regression we need
> to let Linus know tomorrow, so the time is ticking :(

I just checked all the commits in iwlwifi between v5.9 and v5.10 and I
don't see anything that could affect how RF-Kill works.

Emmanuel, do you remember anything that went in that could have
affected RF-Kill?

--
Cheers,
Luca.


Re: [PATCH] spi: dt-bindings: clarify CS behavior for spi-cs-high and gpio descriptors

2020-12-09 Thread Andreas Kemnade
On Wed, 9 Dec 2020 14:04:26 -0500
Sven Van Asbroeck  wrote:

> On Wed, Dec 9, 2020 at 1:16 PM H. Nikolaus Schaller  
> wrote:
> >
> > This is also what made me wonder if that is really intended because then
> > the whole discussion about the cs-gpio-flags and inversion and the fixes
> > would not have been needed. The current code and fixes are all about
> > not ignoring the flags...  
> 
> The inversion you witnessed was a bug caused by spi client drivers that
> simply "plow over" the SPI_CS_HIGH mode flag. This includes the panel driver
> you're using, see:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c?h=v5.10-rc6#n337
> 
ah, it would be set in spi->mode and is cleared by

spi->mode = SPI_MODE_3;


Hmm, but we have
  spi-cpol;
spi-cpha;
in devicetree. Why do we need that spi->mode line at all?

Regards,
Andreas


Re: [PATCH v11 4/5] dt-bindings: leds: Add bindings for MT6360 LED

2020-12-09 Thread Rob Herring
On Wed, Dec 02, 2020 at 06:46:50PM +0800, Gene Chen wrote:
> From: Gene Chen 
> 
> Add bindings document for LED support on MT6360 PMIC
> 
> Signed-off-by: Gene Chen 
> ---
>  .../devicetree/bindings/leds/leds-mt6360.yaml  | 159 
> +
>  1 file changed, 159 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/leds/leds-mt6360.yaml
> 
> diff --git a/Documentation/devicetree/bindings/leds/leds-mt6360.yaml 
> b/Documentation/devicetree/bindings/leds/leds-mt6360.yaml
> new file mode 100644
> index 000..73c67b1
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/leds/leds-mt6360.yaml
> @@ -0,0 +1,159 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/leds/leds-mt6360.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: LED driver for MT6360 PMIC from MediaTek Integrated.
> +
> +maintainers:
> +  - Gene Chen 
> +
> +description: |
> +  This module is part of the MT6360 MFD device.
> +  see Documentation/devicetree/bindings/mfd/mt6360.yaml
> +  Add MT6360 LED driver include 2-channel Flash LED with torch/strobe mode,
> +  and 4-channel RGB LED support Register/Flash/Breath Mode
> +
> +properties:
> +  compatible:
> +const: mediatek,mt6360-led
> +
> +  "#address-cells":
> +const: 1
> +
> +  "#size-cells":
> +const: 0
> +
> +patternProperties:
> +  "(^led@[0-5]$|led)":

"^(multi-)?led@[0-5]$":

> +type: object
> +$ref: common.yaml#
> +description:
> +  Properties for a single LED.
> +
> +properties:
> +  reg:
> +description: Index of the LED.
> +enum:
> +  - 0 # LED output ISINK1
> +  - 1 # LED output ISINK2
> +  - 2 # LED output ISINK3
> +  - 3 # LED output ISINKML
> +  - 4 # LED output FLASH1
> +  - 5 # LED output FLASH2
> +
> +unevaluatedProperties: false
> +
> +required:
> +  - compatible
> +  - "#address-cells"
> +  - "#size-cells"
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +   #include 
> +   led-controller {
> + compatible = "mediatek,mt6360-led";
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + multi-led@0 {
> +   reg = <0>;
> +   function = LED_FUNCTION_INDICATOR;
> +   color = ;
> +   led-max-microamp = <24000>;
> +   #address-cells = <1>;
> +   #size-cells = <0>;
> +   led@0 {
> + reg = <0>;
> + color = ;
> +   };
> +   led@1 {
> + reg = <1>;
> + color = ;
> +   };
> +   led@2 {
> + reg = <2>;
> + color = ;
> +   };
> + };
> + led@3 {
> +   reg = <3>;
> +   function = LED_FUNCTION_MOONLIGHT;
> +   color = ;
> +   led-max-microamp = <15>;
> + };
> + led@4 {
> +   reg = <4>;
> +   function = LED_FUNCTION_FLASH;
> +   color = ;
> +   function-enumerator = <1>;
> +   led-max-microamp = <20>;
> +   flash-max-microamp = <50>;
> +   flash-max-timeout-us = <1024000>;
> + };
> + led@5 {
> +   reg = <5>;
> +   function = LED_FUNCTION_FLASH;
> +   color = ;
> +   function-enumerator = <2>;
> +   led-max-microamp = <20>;
> +   flash-max-microamp = <50>;
> +   flash-max-timeout-us = <1024000>;
> + };
> +   };
> +
> +  - |
> +
> +   led-controller {
> + compatible = "mediatek,mt6360-led";
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + led@0 {
> +   reg = <0>;
> +   function = LED_FUNCTION_INDICATOR;
> +   color = ;
> +   led-max-microamp = <24000>;
> + };
> + led@1 {
> +   reg = <1>;
> +   function = LED_FUNCTION_INDICATOR;
> +   color = ;
> +   led-max-microamp = <24000>;
> + };
> + led@2 {
> +   reg = <2>;
> +   function = LED_FUNCTION_INDICATOR;
> +   color = ;
> +   led-max-microamp = <24000>;
> + };
> + led@3 {
> +   reg = <3>;
> +   function = LED_FUNCTION_MOONLIGHT;
> +   color = ;
> +   led-max-microamp = <15>;
> + };
> + led@4 {
> +   reg = <4>;
> +   function = LED_FUNCTION_FLASH;
> +   color = ;
> +   function-enumerator = <1>;
> +   led-max-microamp = <20>;
> +   flash-max-microamp = <50>;
> +   flash-max-timeout-us = <1024000>;
> + };
> + led@5 {
> +   reg = <5>;
> +   function = LED_FUNCTION_FLASH;
> +   color = ;
> +   function-enumerator = <2>;
> +   led-max-microamp = <20>;
> +   flash-max-microamp = <50>;
> +   flash-max-timeout-us = <1024000>;
> + };
> +   };
> +...
> -- 
> 2.7.4
> 


Re: [PATCH v11 3/5] dt-bindings: leds: Add LED_FUNCTION_MOONLIGHT definitions

2020-12-09 Thread Rob Herring
On Thu, Dec 03, 2020 at 09:06:42PM +0100, Jacek Anaszewski wrote:
> Hi Pavel,
> 
> On 12/3/20 12:40 PM, Pavel Machek wrote:
> > Hi!
> > 
> > > > > +++ b/include/dt-bindings/leds/common.h
> > > > > @@ -78,6 +78,7 @@
> > > > >   #define LED_FUNCTION_INDICATOR "indicator"
> > > > >   #define LED_FUNCTION_LAN "lan"
> > > > >   #define LED_FUNCTION_MAIL "mail"
> > > > > +#define LED_FUNCTION_MOONLIGHT "moonlight"
> > > > 
> > > > There's "torch" function that should be used for this. I guess comment
> > > > should be added with explanation what exactly that is and how should
> > > > the LED be named.
> > > > 
> > > 
> > > According to mail, 11/25 "Re: [PATCH v7 2/5] dt-bindings: leds: Add
> > > LED_COLOR_ID_MOONLIGHT definitions",
> > > The Moonlight LED is LED which maximum current more than torch, but
> > > less than flash. Such as front camera fill light.
> > > I think our channel is moonlight, not torch.
> > > I will add this description to comment.
> > > We can't exactly define moonlight current level, because every vendor
> > > has their own specification.
> > 
> > So... what is the timelimit on moonlight?
> > 
> > But if it is used for camera illumination, I believe it should be
> > simply called flash.
> 
> Let's keep FLASH reserved for LED flash class devices.
> This device has already two other flash iouts.
> 
> Also iouts amperage gives clue that they have three different
> functions.

Perhaps there should just be a table of currents and max times rather 
than trying to continually expand this while tied to function.

One could simply want an LED brighter when blinking than steady state 
would allow for.

Rob


Re: [PATCH] spi: Limit the spi device max speed to controller's max speed

2020-12-09 Thread Mark Brown
On Wed, Dec 09, 2020 at 10:46:36PM +0300, Serge Semin wrote:

> On Wed, Dec 09, 2020 at 07:35:14PM +0200, Tudor Ambarus wrote:

> > Make sure the max_speed_hz of spi_device does not override
> > the max_speed_hz of controller.

> I have doubts that's right thing to do. It seems better to let
> the controller driver to handle the speed clamping itself, while
> to leave the SPI client device max_speed_hz field describing the
> device speed capability. Moreover the SPI-transfers passed to the
> controller will have a SPI-bus speed fixed in accordance with the
> controller and client device capabilities anyway.
> See the __spi_validate() method for details:
> https://elixir.bootlin.com/linux/v5.10-rc7/source/drivers/spi/spi.c#L3570

Right, in general we aim to do this sort of fixup on the transfers
and messages rather than the devices, I guess we might be missing
validation in some of the flash acceleration paths or was this an issue
seen through inspection?


signature.asc
Description: PGP signature


Re: [PATCH v3 1/9] dt-binding: soc: xilinx: ai-engine: Add AI engine binding

2020-12-09 Thread Jiaying Liang



On 12/8/20 3:41 PM, Rob Herring wrote:

On Sun, Nov 29, 2020 at 11:48:17PM -0800, Wendy Liang wrote:

Xilinx AI engine array can be partitioned statically for different
applications. In the device tree, there will be device node for the AI
engine device, and device nodes for the statically configured AI engine
partitions. Each of the statically configured partition has a partition
ID in the system.

Signed-off-by: Wendy Liang 
---
  .../bindings/soc/xilinx/xlnx,ai-engine.yaml| 126 +
  1 file changed, 126 insertions(+)
  create mode 100644 
Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml

diff --git a/Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml 
b/Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
new file mode 100644
index 000..1de5623
--- /dev/null
+++ b/Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
@@ -0,0 +1,126 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/soc/xilinx/xlnx,ai-engine.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Xilinx AI Engine
+
+maintainers:
+  - Wendy Liang 
+
+description: |+

You don't need '|' unless there's formatting to preserve.

Will change



+  The Xilinx AI Engine is a tile processor with many cores (up to 400) that
+  can run in parallel. The data routing between cores is configured through
+  internal switches, and shim tiles interface with external interconnect, such
+  as memory or PL.
+
+properties:
+  compatible:
+const: xlnx,ai-engine-v1.0

This is soft logic? If not, don't use version numbers.


It is not soft logic, if there is a future version of the device, can we 
use version number


in compatible to describe the device version?




+
+  reg:
+description: |
+  Physical base address and length of the device registers.

That's every 'reg' property. Drop.

[Wendy] will drop it.



+  The AI engine address space assigned to Linux is defined by Xilinx
+  platform design tool.
+
+  '#address-cells':
+enum: [2]

const: 2

Will change



+description: |
+  size of cell to describe AI engine range of tiles address.
+  It is the location of the starting tile of the range.
+  As the AI engine tiles are 2D array, the location of a tile
+  is presented as (column, row), the address cell is 2.
+
+  '#size-cells':
+enum: [2]
+description: |
+  size of cell to describe AI engine range of tiles size.
+  As the AI engine tiles are 2D array, the size cell is 2.
+
+  power-domains:
+maxItems: 1
+description: phandle to the associated power domain
+
+  interrupts:
+maxItems: 3
+
+  interrupt-names:
+description: |
+  Should be "interrupt1", "interrupt2" or "interrupt3".

Really, not useful names. If you do have names, they should be a schema,
not freeform text.


+
+required:
+  - compatible
+  - reg
+  - '#address-cells'
+  - '#size-cells'
+  - power-domains
+  - interrupt-parent

Generally, never required because it could be in the parent node.

will remove



+  - interrupts
+  - interrupt-names
+
+patternProperties:
+  "^aie_partition@[0-9]+$":

aie-partition@

The unit-address is just the 1st cell of reg (the row)? Or needs to be
row and column, in which case you'd want something like '@0,0'. Also,
unit-address values are typically hex, not decimal.
It will be col,row, will change to address format with starting column 
and row



+type: object
+description: |
+  AI engine partition which is a group of column based tiles of the AI
+  engine device. Each AI engine partition is isolated from the other
+  AI engine partitions. An AI engine partition is defined by Xilinx
+  platform design tools. Each partition has a SHIM row and core tiles rows.
+  A SHIM row contains SHIM tiles which are the interface to external
+  components. AXI master can access AI engine registers, push data to and
+  fetch data from AI engine through the SHIM tiles. Core tiles are the
+  compute tiles.
+
+properties:
+  reg:
+description: |
+  It describes the group of tiles of the AI engine partition. It needs
+  to include the SHIM row. The format is defined by the parent AI 
engine
+  device node's '#address-cells' and '#size-cells' properties. e.g. a 
v1
+  AI engine device has 2D tiles array, the first row is SHIM row. A
+  partition which has 50 columns and 8 rows of core tiles and 1 row of
+  SHIM tiles will be presented as <0 0 50 9>.

You should be able to write some constraints like max row and column
values?


The max row and columns depends on the underline hardware platform, the 
driver can


get the max allowed row and columns from the size field of the "reg" 
property in the parent


AI engine device node.




+
+  label:
+maxItems: 1

'label' is not an array. Why do you need label?


+
+  xlnx,partitio

Re: [PATCH] files: rcu free files_struct

2020-12-09 Thread Al Viro
On Wed, Dec 09, 2020 at 11:13:38AM -0800, Linus Torvalds wrote:
> On Wed, Dec 9, 2020 at 10:05 AM Eric W. Biederman  
> wrote:
> >
> > -   struct file * file = xchg(&fdt->fd[i], 
> > NULL);
> > +   struct file * file = fdt->fd[i];
> > if (file) {
> > +   rcu_assign_pointer(fdt->fd[i], 
> > NULL);
> 
> This makes me nervous. Why did we use to do that xchg() there? That
> has atomicity guarantees that now are gone.
> 
> Now, this whole thing should be called for just the last ref of the fd
> table, so presumably that atomicity was never needed in the first
> place. But the fact that we did that very expensive xchg() then makes
> me go "there's some reason for it".
> 
> Is this xchg() just bogus historical leftover? It kind of looks that
> way. But maybe that change should be done separately?

I'm still not convinced that exposing close_files() to parallel
3rd-party accesses is safe in all cases, so this patch still needs
more analysis.  And I'm none too happy about "we'll fix the things
up at the tail of the series" - the changes are subtle enough and
the area affected is rather fundamental.  So if we end up returning
to that several years from now while debugging something, I would
very much prefer to have the transformation series as clean and
understandable as possible.  It's not just about bisect hazard -
asking yourself "WTF had it been done that way, is there anything
subtle I'm missing here?" can cost many hours of head-scratching,
IME.

Eric, I understand that you want to avoid reordering/folding, but
in this case it _is_ needed.  It's not as if there had been any
serious objections to the overall direction of changes; it's
just that we need to get that as understandable as possible.


Re: [PATCH] files: rcu free files_struct

2020-12-09 Thread Matthew Wilcox
On Wed, Dec 09, 2020 at 12:04:38PM -0600, Eric W. Biederman wrote:
> @@ -397,8 +397,9 @@ static struct fdtable *close_files(struct files_struct * 
> files)
>   set = fdt->open_fds[j++];
>   while (set) {
>   if (set & 1) {
> - struct file * file = xchg(&fdt->fd[i], NULL);
> + struct file * file = fdt->fd[i];
>   if (file) {
> + rcu_assign_pointer(fdt->fd[i], NULL);

Assuming this is safe, you can use RCU_INIT_POINTER() here because you're
storing NULL, so you don't need the wmb() before storing the pointer.



Re: [PATCH] PCI: Save/restore L1 PM Substate extended capability registers

2020-12-09 Thread David E. Box
Great. Thanks Vidya.

On Wed, 2020-12-09 at 09:38 +0530, Vidya Sagar wrote:
> There is a change already available for it in linux-next
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=4257f7e008ea394fcecc050f1569c3503b8bcc15
> 
> Thanks,
> Vidya Sagar
> 
> On 12/9/2020 3:36 AM, David E. Box wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Intel systems that support ACPI Low Power Idle it has been
> > observed
> > that the L1 Substate capability can return disabled after a s2idle
> > cycle. This causes the loss of L1 Substate support during runtime
> > leading to higher power consumption. Add save/restore of the L1SS
> > control registers.
> > 
> > Signed-off-by: David E. Box 
> > ---
> >   drivers/pci/pci.c | 49
> > +++
> >   1 file changed, 49 insertions(+)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index e578d34095e9..beee3d9952a6 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -1539,6 +1539,48 @@ static void pci_restore_ltr_state(struct
> > pci_dev *dev)
> >  pci_write_config_word(dev, ltr + PCI_LTR_MAX_NOSNOOP_LAT,
> > *cap++);
> >   }
> > 
> > +static void pci_save_l1ss_state(struct pci_dev *dev)
> > +{
> > +   int l1ss;
> > +   struct pci_cap_saved_state *save_state;
> > +   u16 *cap;
> > +
> > +   if (!pci_is_pcie(dev))
> > +   return;
> > +
> > +   l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> > +   if (!l1ss)
> > +   return;
> > +
> > +   save_state = pci_find_saved_ext_cap(dev,
> > PCI_EXT_CAP_ID_L1SS);
> > +   if (!save_state) {
> > +   pci_err(dev, "no suspend buffer for L1
> > Substates\n");
> > +   return;
> > +   }
> > +
> > +   cap = (u16 *)&save_state->cap.data[0];
> > +   pci_read_config_word(dev, l1ss + PCI_L1SS_CTL1, cap++);
> > +   pci_read_config_word(dev, l1ss + PCI_L1SS_CTL1 + 2, cap++);
> > +   pci_read_config_word(dev, l1ss + PCI_L1SS_CTL2, cap++);
> > +}
> > +
> > +static void pci_restore_l1ss_state(struct pci_dev *dev)
> > +{
> > +   struct pci_cap_saved_state *save_state;
> > +   int l1ss;
> > +   u16 *cap;
> > +
> > +   save_state = pci_find_saved_ext_cap(dev,
> > PCI_EXT_CAP_ID_L1SS);
> > +   l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> > +   if (!save_state || !l1ss)
> > +   return;
> > +
> > +   cap = (u16 *)&save_state->cap.data[0];
> > +   pci_write_config_word(dev, l1ss + PCI_L1SS_CTL1, *cap++);
> > +   pci_write_config_word(dev, l1ss + PCI_L1SS_CTL1 + 2,
> > *cap++);
> > +   pci_write_config_word(dev, l1ss + PCI_L1SS_CTL2, *cap++);
> > +}
> > +
> >   /**
> >* pci_save_state - save the PCI configuration space of a device
> > before
> >* suspending
> > @@ -1563,6 +1605,7 @@ int pci_save_state(struct pci_dev *dev)
> >  if (i != 0)
> >  return i;
> > 
> > +   pci_save_l1ss_state(dev);
> >  pci_save_ltr_state(dev);
> >  pci_save_dpc_state(dev);
> >  pci_save_aer_state(dev);
> > @@ -1670,6 +1713,7 @@ void pci_restore_state(struct pci_dev *dev)
> >   */
> >  pci_restore_ltr_state(dev);
> > 
> > +   pci_restore_l1ss_state(dev);
> >  pci_restore_pcie_state(dev);
> >  pci_restore_pasid_state(dev);
> >  pci_restore_pri_state(dev);
> > @@ -3332,6 +3376,11 @@ void pci_allocate_cap_save_buffers(struct
> > pci_dev *dev)
> >  if (error)
> >  pci_err(dev, "unable to allocate suspend buffer
> > for LTR\n");
> > 
> > +   error = pci_add_ext_cap_save_buffer(dev,
> > PCI_EXT_CAP_ID_L1SS,
> > +   3 * sizeof(u16));
> > +   if (error)
> > +   pci_err(dev, "unable to allocate suspend buffer for
> > L1 Substates\n");
> > +
> >  pci_allocate_vc_save_buffers(dev);
> >   }
> > 
> > --
> > 2.20.1
> > 



Re: [PATCH] mhi: use irq_flags if client driver configures it

2020-12-09 Thread Jeffrey Hugo

On 12/9/2020 11:34 AM, Hemant Kumar wrote:



On 12/7/20 7:55 PM, Carl Huang wrote:

If client driver has specified the irq_flags, mhi uses this specified
irq_flags. Otherwise, mhi uses default irq_flags.

The purpose of this change is to support one MSI vector for QCA6390.
MHI will use one same MSI vector too in this scenario.

In case of one MSI vector, IRQ_NO_BALANCING is needed when irq handler
is requested. The reason is if irq migration happens, the msi_data may
change too. However, the msi_data is already programmed to QCA6390
hardware during initialization phase. This msi_data inconsistence will
result in crash in kernel.


I'm confused as to how this happens.



Another issue is in case of one MSI vector, IRQF_NO_SUSPEND will trigger
WARNINGS because QCA6390 wants to disable the IRQ during the suspend.

To avoid above two issues, QCA6390 driver specifies the irq_flags in case
of one MSI vector when mhi_register_controller is called.


Surely this change should be in a series where there is a following 
change which updates the QCA6390 driver?




Signed-off-by: Carl Huang 
---
  drivers/bus/mhi/core/init.c | 9 +++--
  include/linux/mhi.h | 1 +
  2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/bus/mhi/core/init.c b/drivers/bus/mhi/core/init.c
index 0ffdebd..5f74e1e 100644
--- a/drivers/bus/mhi/core/init.c
+++ b/drivers/bus/mhi/core/init.c
@@ -148,12 +148,17 @@ int mhi_init_irq_setup(struct mhi_controller 
*mhi_cntrl)

  {
  struct mhi_event *mhi_event = mhi_cntrl->mhi_event;
  struct device *dev = &mhi_cntrl->mhi_dev->dev;
+    unsigned long irq_flags = IRQF_SHARED | IRQF_NO_SUSPEND;
  int i, ret;
+    /* if client driver has set irq_flags, use it */
+    if (mhi_cntrl->irq_flags)
+    irq_flags = mhi_cntrl->irq_flags;
Jeff if i remember correctly your use case also have one dedicated irq 
line for all the MSIs, just want to confirm if you are fine with this 
change ? i was wondering if any input check is required for irq_flags 
passed by controller, or responsibility is on controller for any 
undesired behavior. Like passing IRQF_SHARED and IRQF_ONESHOT when one 
irq line is shared among multiple MSIs.


This feels a bit weird to me, but I don't think it'll cause a problem.

If we are allowing the controller to specify flags, should they be in a 
per irq manner?



+
  /* Setup BHI_INTVEC IRQ */
  ret = request_threaded_irq(mhi_cntrl->irq[0], mhi_intvec_handler,
 mhi_intvec_threaded_handler,
-   IRQF_SHARED | IRQF_NO_SUSPEND,
+   irq_flags,
 "bhi", mhi_cntrl);
  if (ret)
  return ret;
@@ -171,7 +176,7 @@ int mhi_init_irq_setup(struct mhi_controller 
*mhi_cntrl)

  ret = request_irq(mhi_cntrl->irq[mhi_event->irq],
    mhi_irq_handler,
-  IRQF_SHARED | IRQF_NO_SUSPEND,
+  irq_flags,
    "mhi", mhi_event);
  if (ret) {
  dev_err(dev, "Error requesting irq:%d for ev:%d\n",
diff --git a/include/linux/mhi.h b/include/linux/mhi.h
index d4841e5..f039e58 100644
--- a/include/linux/mhi.h
+++ b/include/linux/mhi.h
@@ -442,6 +442,7 @@ struct mhi_controller {
  bool fbc_download;
  bool pre_init;
  bool wake_set;
+    unsigned long irq_flags;


You don't document this.  That gets a NACK from me.


  };
  /**



Thanks,
Hemant




--
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.


Re: [PATCHv3 net-next] octeontx2-pf: Add RSS multi group support

2020-12-09 Thread Jakub Kicinski
On Wed, 09 Dec 2020 11:08:24 -0800 Saeed Mahameed wrote:
> > -/* Configure RSS table and hash key */
> > -static int otx2_set_rxfh(struct net_device *dev, const u32 *indir,
> > -const u8 *hkey, const u8 hfunc)
> > +static int otx2_get_rxfh_context(struct net_device *dev, u32 *indir,
> > +u8 *hkey, u8 *hfunc, u32 rss_context)
> >  {
> > struct otx2_nic *pfvf = netdev_priv(dev);
> > +   struct otx2_rss_ctx *rss_ctx;
> > struct otx2_rss_info *rss;
> > int idx;
> >  
> > -   if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc !=
> > ETH_RSS_HASH_TOP)
> > -   return -EOPNOTSUPP;
> > -
> > rss = &pfvf->hw.rss_info;
> >  
> > if (!rss->enable) {
> > -   netdev_err(dev, "RSS is disabled, cannot change
> > settings\n");
> > +   netdev_err(dev, "RSS is disabled\n");
> > return -EIO;
> > }  
> 
> I see that you init/enable rss on open, is this is your way to block
> getting rss info if device is not open ? why do you need to report an
> error anyway, why not just report whatever default config you will be
> setting up on next open ? 
> 
> to me reporting errors to ethtool queries when device is down is a bad
> user experience.
> 
> > +   if (rss_context >= MAX_RSS_GROUPS)
> > +   return -EINVAL;
> > +  
> 
> -ENOENT
> > +   rss_ctx = rss->rss_ctx[rss_context];
> > +   if (!rss_ctx)
> > +   return -EINVAL;
> >   
> 
> -ENOENT

Plus looks like this version introduces a W=1 C=1 warning:

drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c:768:28: warning: 
Using plain integer as NULL pointer


Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

2020-12-09 Thread Dan Williams
On Tue, Dec 8, 2020 at 8:03 PM Matthew Wilcox  wrote:
>
> On Tue, Dec 08, 2020 at 06:22:50PM -0800, Ira Weiny wrote:
> > Right now we have a mixed bag.  zero_user() [and it's variants, circa 2008]
> > does a BUG_ON.[0]  While the other ones do nothing; clear_highpage(),
> > clear_user_highpage(), copy_user_highpage(), and copy_highpage().
>
> Erm, those functions operate on the entire PAGE_SIZE.  There's nothing
> for them to check.
>
> > While continuing to audit the code I don't see any users who would violating
> > the API with a simple conversion of the code.  The calls which I have 
> > worked on
> > [which is many at this point] all have checks in place which are well aware 
> > of
> > page boundaries.
>
> Oh good, then this BUG_ON won't trigger.
>
> > Therefore, I tend to agree with Dan that if anything is to be done it 
> > should be
> > a WARN_ON() which is only going to throw an error that something has 
> > probably
> > been wrong all along and should be fixed but continue running as before.
>
> Silent data corruption is for ever.  Are you absolutely sure nobody has
> done:
>
> page = alloc_pages(GFP_HIGHUSER_MOVABLE, 3);
> memcpy_to_page(page, PAGE_SIZE * 2, p, PAGE_SIZE * 2);
>
> because that will work fine if the pages come from ZONE_NORMAL and fail
> miserably if they came from ZONE_HIGHMEM.

...and violently regress with the BUG_ON.

The question to me is: which is more likely that any bad usages have
been covered up by being limited to ZONE_NORMAL / 64-bit only, or that
silent data corruption has been occurring with no ill effects?

> > FWIW I think this is a 'bad BUG_ON' use because we are "checking something 
> > that
> > we know we might be getting wrong".[1]  And because, "BUG() is only good for
> > something that never happens and that we really have no other option 
> > for".[2]
>
> BUG() is our only option here.  Both limiting how much we copy or
> copying the requested amount result in data corruption or leaking
> information to a process that isn't supposed to see it.

At a minimum I think this should be debated in a follow on patch to
add assertion checking where there was none before. There is no
evidence of a page being overrun in the audit Ira performed.


Re: [PATCH net-next] net: sja1105: simplify the return sja1105_cls_flower_stats()

2020-12-09 Thread Jakub Kicinski
On Wed, 9 Dec 2020 17:25:04 +0800 Zheng Yongjun wrote:
> Simplify the return expression.
> 
> Signed-off-by: Zheng Yongjun 

Looks like this one doesn't apply cleanly to net-next.


[PATCH v7 3/8] IMA: define a hook to measure kernel integrity critical data

2020-12-09 Thread Tushar Sugandhi
IMA provides capabilities to measure file data, and in-memory buffer
data. However, various data structures, policies, and states
stored in kernel memory also impact the integrity of the system.
Several kernel subsystems contain such integrity critical data. These
kernel subsystems help protect the integrity of a device. Currently,
IMA does not provide a generic function for kernel subsystems to measure
their integrity critical data.
 
Define a new IMA hook - ima_measure_critical_data to measure kernel
integrity critical data.

Signed-off-by: Tushar Sugandhi 
---
 Documentation/ABI/testing/ima_policy |  2 +-
 include/linux/ima.h  |  6 +
 security/integrity/ima/ima.h |  1 +
 security/integrity/ima/ima_api.c |  2 +-
 security/integrity/ima/ima_main.c| 36 
 security/integrity/ima/ima_policy.c  |  2 ++
 6 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/ima_policy 
b/Documentation/ABI/testing/ima_policy
index e35263f97fc1..6ec7daa87cba 100644
--- a/Documentation/ABI/testing/ima_policy
+++ b/Documentation/ABI/testing/ima_policy
@@ -32,7 +32,7 @@ Description:
func:= 
[BPRM_CHECK][MMAP_CHECK][CREDS_CHECK][FILE_CHECK]MODULE_CHECK]
[FIRMWARE_CHECK]
[KEXEC_KERNEL_CHECK] [KEXEC_INITRAMFS_CHECK]
-   [KEXEC_CMDLINE] [KEY_CHECK]
+   [KEXEC_CMDLINE] [KEY_CHECK] [CRITICAL_DATA]
mask:= [[^]MAY_READ] [[^]MAY_WRITE] [[^]MAY_APPEND]
   [[^]MAY_EXEC]
fsmagic:= hex value
diff --git a/include/linux/ima.h b/include/linux/ima.h
index ac3d82f962f2..675f54db6264 100644
--- a/include/linux/ima.h
+++ b/include/linux/ima.h
@@ -30,6 +30,9 @@ extern int ima_post_read_file(struct file *file, void *buf, 
loff_t size,
 extern void ima_post_path_mknod(struct dentry *dentry);
 extern int ima_file_hash(struct file *file, char *buf, size_t buf_size);
 extern void ima_kexec_cmdline(int kernel_fd, const void *buf, int size);
+extern void ima_measure_critical_data(const char *event_name,
+ const void *buf, int buf_len,
+ bool measure_buf_hash);
 
 #ifdef CONFIG_IMA_APPRAISE_BOOTPARAM
 extern void ima_appraise_parse_cmdline(void);
@@ -122,6 +125,9 @@ static inline int ima_file_hash(struct file *file, char 
*buf, size_t buf_size)
 }
 
 static inline void ima_kexec_cmdline(int kernel_fd, const void *buf, int size) 
{}
+static inline void ima_measure_critical_data(const char *event_name,
+const void *buf, int buf_len,
+bool measure_buf_hash) {}
 #endif /* CONFIG_IMA */
 
 #ifndef CONFIG_IMA_KEXEC
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index fa3044a7539f..7d9deda6a8b3 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -201,6 +201,7 @@ static inline unsigned int ima_hash_key(u8 *digest)
hook(POLICY_CHECK, policy)  \
hook(KEXEC_CMDLINE, kexec_cmdline)  \
hook(KEY_CHECK, key)\
+   hook(CRITICAL_DATA, critical_data)  \
hook(MAX_CHECK, none)
 
 #define __ima_hook_enumify(ENUM, str)  ENUM,
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index af218babd198..9917e1730cb6 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -176,7 +176,7 @@ void ima_add_violation(struct file *file, const unsigned 
char *filename,
  * subj=, obj=, type=, func=, mask=, fsmagic=
  * subj,obj, and type: are LSM specific.
  * func: FILE_CHECK | BPRM_CHECK | CREDS_CHECK | MMAP_CHECK | MODULE_CHECK
- * | KEXEC_CMDLINE | KEY_CHECK
+ * | KEXEC_CMDLINE | KEY_CHECK | CRITICAL_DATA
  * mask: contains the permission mask
  * fsmagic: hex value
  *
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 03aad13e9e70..ae59f4a4dd70 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -922,6 +922,42 @@ void ima_kexec_cmdline(int kernel_fd, const void *buf, int 
size)
fdput(f);
 }
 
+/**
+ * ima_measure_critical_data - measure kernel integrity critical data
+ * @event_name: event name to be used for the buffer entry
+ * @buf: pointer to buffer containing data to measure
+ * @buf_len: length of buffer(in bytes)
+ * @measure_buf_hash: measure buffer hash
+ *
+ * Measure the kernel subsystem data, critical to the integrity of the kernel,
+ * into the IMA log and extend the @pcr.
+ *
+ * Use @event_name to describe the state/buffer data change.
+ * Examples of critical data (buf) could be kernel in-memory r/o structures,
+ * hash of the memory structures, or data that represents

[PATCH v7 0/8] IMA: support for measuring kernel integrity critical data

2020-12-09 Thread Tushar Sugandhi
IMA measures files and buffer data such as keys, command-line arguments
passed to the kernel on kexec system call, etc. While these measurements
are necessary for monitoring and validating the integrity of the system,
they are not sufficient. Various data structures, policies, and states
stored in kernel memory also impact the integrity of the system.
Several kernel subsystems contain such integrity critical data -
e.g. LSMs like SELinux, AppArmor etc. or device-mapper targets like
dm-crypt, dm-verity, dm-integrity etc. These kernel subsystems help
protect the integrity of a device. Their integrity critical data is not
expected to change frequently during run-time. Some of these structures
cannot be defined as __ro_after_init, because they are initialized later.

For a given device, various external services/infrastructure tools
(including the attestation service) interact with it - both during the
setup and during rest of the device run-time. They share sensitive data
and/or execute critical workload on that device. The external services
may want to verify the current run-time state of the relevant kernel
subsystems before fully trusting the device with business critical
data/workload. For instance, verifying that SELinux is in "enforce" mode
along with the expected policy, disks are encrypted with a certain
configuration, secure boot is enabled etc.

This series provides the necessary IMA functionality for kernel
subsystems to ensure their configuration can be measured:
  - by kernel subsystems themselves,
  - in a tamper resistant way,
  - and re-measured - triggered on state/configuration change.

This patch set:
  - defines a new IMA hook ima_measure_critical_data() to measure
integrity critical data,
  - limits the critical data being measured based on a label,
  - defines a builtin critical data measurement policy,
  - and includes an SELinux consumer of the new IMA critical data hook.

This series is based on the following repo/branch:

 repo: https://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git
 branch: next-integrity
 commit 207cdd565dfc ("ima: Don't modify file descriptor mode on the fly")

Change Log v7:
Incorporated feedback from Mimi on v6 of this series.
 - Updated cover letter and patch descriptions as per Mimi's feedback.
 - Changed references to variable names and policy documentation from
   plural "data_sources" to singular "data_source".
 - Updated SELinux patch to measure only policy, instead of policy and
   state. The state measurement will be upstreamed through a separate
   patch.
 - Updated admin-guide/kernel-parameters.txt to document support for
   critical_data in builtin policy.

Change Log v6:
Incorporated feedback from Mimi on v5 of this series.
 - Got rid of patch 5 from the v5 of the series.(the allow list for data
   sources)
 - Updated function descriptions, changed variable names etc.
 - Moved the input param event_data_source in ima_measure_critical_data()
   to a new patch. (patch 6/8 of this series)
 - Split patch 4 from v5 of the series into two patches (patch 4/8 and 
   patch 5/8)
 - Updated cover letter and patch descriptions as per feedback.

Change Log v5:
(1) Incorporated feedback from Stephen on the last SeLinux patch.
 SeLinux Patch: https://patchwork.kernel.org/patch/11801585/
 - Freed memory in the reverse order of allocation in 
   selinux_measure_state().
 - Used scnprintf() instead of snprintf() to create the string for
   selinux state.
 - Allocated event name passed to ima_measure_critical_data() before
   gathering selinux state and policy information for measuring.

(2) Incorporated feedback from Mimi on v4 of this series.
 V4 of this Series: 
https://patchwork.kernel.org/project/linux-integrity/list/?series=354437

 - Removed patch "[v4,2/6] IMA: conditionally allow empty rule data"
 - Reversed the order of following patches.
  [v4,4/6] IMA: add policy to measure critical data from kernel components
  [v4,5/6] IMA: add hook to measure critical data from kernel components
   and renamed them to remove "from kernel components"
 - Added a new patch to this series - 
   IMA: add critical_data to built-in policy rules

 - Added the next version of SeLinux patch (mentioned above) to this
   series 
   selinux: measure state and hash of the policy using IMA

 - Updated cover-letter description to give broader perspective of the
   feature, rearranging paragraphs, removing unnecessary info, clarifying
   terms etc.
 - Got rid of opt_list param from ima_match_rule_data().
 - Updated the documentation to remove sources that don't yet exist.
 - detailed IMA hook description added to ima_measure_critical_data(),
   as well as elaborating terms event_name, event_data_source. 
 - "data_sources:=" is not a mandatory policy option for 
   func=CRITICAL_DATA anymore. If not present, all the data sources
   specified in __ima_supported_kernel_data_sources will be measured.


Lakshmi Ramasubramanian (2):
  IMA: define a b

Re: [PATCH] spi: Limit the spi device max speed to controller's max speed

2020-12-09 Thread Serge Semin
Hello Tudor,

On Wed, Dec 09, 2020 at 07:35:14PM +0200, Tudor Ambarus wrote:
> Make sure the max_speed_hz of spi_device does not override
> the max_speed_hz of controller.

I have doubts that's right thing to do. It seems better to let
the controller driver to handle the speed clamping itself, while
to leave the SPI client device max_speed_hz field describing the
device speed capability. Moreover the SPI-transfers passed to the
controller will have a SPI-bus speed fixed in accordance with the
controller and client device capabilities anyway.
See the __spi_validate() method for details:
https://elixir.bootlin.com/linux/v5.10-rc7/source/drivers/spi/spi.c#L3570

-Sergey

> 
> Signed-off-by: Tudor Ambarus 
> ---
>  drivers/spi/spi.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
> index cd3c395b4e90..51d7c004fbab 100644
> --- a/drivers/spi/spi.c
> +++ b/drivers/spi/spi.c
> @@ -3378,7 +3378,8 @@ int spi_setup(struct spi_device *spi)
>   if (status)
>   return status;
>  
> - if (!spi->max_speed_hz)
> + if (!spi->max_speed_hz ||
> + spi->max_speed_hz > spi->controller->max_speed_hz)
>   spi->max_speed_hz = spi->controller->max_speed_hz;
>  
>   mutex_lock(&spi->controller->io_mutex);
> -- 
> 2.25.1
> 


[PATCH v7 4/8] IMA: add policy rule to measure critical data

2020-12-09 Thread Tushar Sugandhi
A new IMA policy rule is needed for the IMA hook
ima_measure_critical_data() and the corresponding func CRITICAL_DATA for
measuring the input buffer. The policy rule should ensure the buffer
would get measured only when the policy rule allows the action. The
policy rule should also support the necessary constraints (flags etc.)
for integrity critical buffer data measurements.

Add a policy rule to define the constraints for restricting integrity
critical data measurements.

Signed-off-by: Tushar Sugandhi 
---
 security/integrity/ima/ima_policy.c | 35 +
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/security/integrity/ima/ima_policy.c 
b/security/integrity/ima/ima_policy.c
index 2a0c0603626e..9a8ee80a3128 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -34,6 +34,7 @@
 #define IMA_PCR0x0100
 #define IMA_FSNAME 0x0200
 #define IMA_KEYRINGS   0x0400
+#define IMA_DATA_SOURCE0x0800
 
 #define UNKNOWN0
 #define MEASURE0x0001  /* same as IMA_MEASURE */
@@ -85,6 +86,7 @@ struct ima_rule_entry {
} lsm[MAX_LSM_RULES];
char *fsname;
struct ima_rule_opt_list *keyrings; /* Measure keys added to these 
keyrings */
+   struct ima_rule_opt_list *data_source; /* Measure data from this source 
*/
struct ima_template_desc *template;
 };
 
@@ -479,6 +481,12 @@ static bool ima_match_rule_data(struct ima_rule_entry 
*rule,
else
opt_list = rule->keyrings;
break;
+   case CRITICAL_DATA:
+   if (!rule->data_source)
+   return true;
+   else
+   opt_list = rule->data_source;
+   break;
default:
break;
}
@@ -518,13 +526,19 @@ static bool ima_match_rules(struct ima_rule_entry *rule, 
struct inode *inode,
 {
int i;
 
-   if (func == KEY_CHECK) {
-   return (rule->flags & IMA_FUNC) && (rule->func == func) &&
-   ima_match_rule_data(rule, func_data, cred);
-   }
if ((rule->flags & IMA_FUNC) &&
(rule->func != func && func != POST_SETATTR))
return false;
+
+   switch (func) {
+   case KEY_CHECK:
+   case CRITICAL_DATA:
+   return ((rule->func == func) &&
+   ima_match_rule_data(rule, func_data, cred));
+   default:
+   break;
+   }
+
if ((rule->flags & IMA_MASK) &&
(rule->mask != mask && func != POST_SETATTR))
return false;
@@ -1119,6 +1133,19 @@ static bool ima_validate_rule(struct ima_rule_entry 
*entry)
if (ima_rule_contains_lsm_cond(entry))
return false;
 
+   break;
+   case CRITICAL_DATA:
+   if (entry->action & ~(MEASURE | DONT_MEASURE))
+   return false;
+
+   if (!(entry->flags & IMA_DATA_SOURCE) ||
+   (entry->flags & ~(IMA_FUNC | IMA_UID | IMA_PCR |
+   IMA_DATA_SOURCE)))
+   return false;
+
+   if (ima_rule_contains_lsm_cond(entry))
+   return false;
+
break;
default:
return false;
-- 
2.17.1



[PATCH v7 1/8] IMA: generalize keyring specific measurement constructs

2020-12-09 Thread Tushar Sugandhi
IMA functions such as ima_match_keyring(), process_buffer_measurement(),
ima_match_policy() etc. handle data specific to keyrings. Currently,
these constructs are not generic to handle any func specific data.
This makes it harder to extend them without code duplication.

Refactor the keyring specific measurement constructs to be generic and
reusable in other measurement scenarios.

Signed-off-by: Tushar Sugandhi 
---
 security/integrity/ima/ima.h|  6 ++--
 security/integrity/ima/ima_api.c|  6 ++--
 security/integrity/ima/ima_main.c   |  6 ++--
 security/integrity/ima/ima_policy.c | 49 ++---
 4 files changed, 40 insertions(+), 27 deletions(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 8e8b1e3cb847..e5622ce8cbb1 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -256,7 +256,7 @@ static inline void ima_process_queued_keys(void) {}
 int ima_get_action(struct inode *inode, const struct cred *cred, u32 secid,
   int mask, enum ima_hooks func, int *pcr,
   struct ima_template_desc **template_desc,
-  const char *keyring);
+  const char *func_data);
 int ima_must_measure(struct inode *inode, int mask, enum ima_hooks func);
 int ima_collect_measurement(struct integrity_iint_cache *iint,
struct file *file, void *buf, loff_t size,
@@ -268,7 +268,7 @@ void ima_store_measurement(struct integrity_iint_cache 
*iint, struct file *file,
   struct ima_template_desc *template_desc);
 void process_buffer_measurement(struct inode *inode, const void *buf, int size,
const char *eventname, enum ima_hooks func,
-   int pcr, const char *keyring);
+   int pcr, const char *func_data);
 void ima_audit_measurement(struct integrity_iint_cache *iint,
   const unsigned char *filename);
 int ima_alloc_init_template(struct ima_event_data *event_data,
@@ -284,7 +284,7 @@ const char *ima_d_path(const struct path *path, char 
**pathbuf, char *filename);
 int ima_match_policy(struct inode *inode, const struct cred *cred, u32 secid,
 enum ima_hooks func, int mask, int flags, int *pcr,
 struct ima_template_desc **template_desc,
-const char *keyring);
+const char *func_data);
 void ima_init_policy(void);
 void ima_update_policy(void);
 void ima_update_policy_flag(void);
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index 4f39fb93f278..af218babd198 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -170,7 +170,7 @@ void ima_add_violation(struct file *file, const unsigned 
char *filename,
  * @func: caller identifier
  * @pcr: pointer filled in if matched measure policy sets pcr=
  * @template_desc: pointer filled in if matched measure policy sets template=
- * @keyring: keyring name used to determine the action
+ * @func_data: private data specific to @func, can be NULL.
  *
  * The policy is defined in terms of keypairs:
  * subj=, obj=, type=, func=, mask=, fsmagic=
@@ -186,14 +186,14 @@ void ima_add_violation(struct file *file, const unsigned 
char *filename,
 int ima_get_action(struct inode *inode, const struct cred *cred, u32 secid,
   int mask, enum ima_hooks func, int *pcr,
   struct ima_template_desc **template_desc,
-  const char *keyring)
+  const char *func_data)
 {
int flags = IMA_MEASURE | IMA_AUDIT | IMA_APPRAISE | IMA_HASH;
 
flags &= ima_policy_flag;
 
return ima_match_policy(inode, cred, secid, func, mask, flags, pcr,
-   template_desc, keyring);
+   template_desc, func_data);
 }
 
 /*
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 68956e884403..e76ef4bfd0f4 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -786,13 +786,13 @@ int ima_post_load_data(char *buf, loff_t size,
  * @eventname: event name to be used for the buffer entry.
  * @func: IMA hook
  * @pcr: pcr to extend the measurement
- * @keyring: keyring name to determine the action to be performed
+ * @func_data: private data specific to @func, can be NULL.
  *
  * Based on policy, the buffer is measured into the ima log.
  */
 void process_buffer_measurement(struct inode *inode, const void *buf, int size,
const char *eventname, enum ima_hooks func,
-   int pcr, const char *keyring)
+   int pcr, const char *func_data)
 {
int ret = 0;
const char *audit_cause = "ENOMEM";
@@ -831,7 +831,7 @@ void process_buffer_measurement(struct inode *inode, const 
void *buf, int si

Re: [PATCH v2 sl-b 3/5] mm: Make mem_dump_obj() handle vmalloc() memory

2020-12-09 Thread Paul E. McKenney
On Wed, Dec 09, 2020 at 08:36:37PM +0100, Uladzislau Rezki wrote:
> On Tue, Dec 08, 2020 at 05:13:01PM -0800, paul...@kernel.org wrote:
> > From: "Paul E. McKenney" 
> > 
> > This commit adds vmalloc() support to mem_dump_obj().  Note that the
> > vmalloc_dump_obj() function combines the checking and dumping, in
> > contrast with the split between kmem_valid_obj() and kmem_dump_obj().
> > The reason for the difference is that the checking in the vmalloc()
> > case involves acquiring a global lock, and redundant acquisitions of
> > global locks should be avoided, even on not-so-fast paths.
> > 
> > Note that this change causes on-stack variables to be reported as
> > vmalloc() storage from kernel_clone() or similar, depending on the degree
> > of inlining that your compiler does.  This is likely more helpful than
> > the earlier "non-paged (local) memory".
> > 
> > Cc: Andrew Morton 
> > Cc: Joonsoo Kim 
> > Cc: 
> > Reported-by: Andrii Nakryiko 
> > Signed-off-by: Paul E. McKenney 
> > ---
> >  include/linux/vmalloc.h |  6 ++
> >  mm/util.c   | 12 +++-
> >  mm/vmalloc.c| 12 
> >  3 files changed, 25 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> > index 938eaf9..c89c2be 100644
> > --- a/include/linux/vmalloc.h
> > +++ b/include/linux/vmalloc.h
> > @@ -248,4 +248,10 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> >  int register_vmap_purge_notifier(struct notifier_block *nb);
> >  int unregister_vmap_purge_notifier(struct notifier_block *nb);
> >  
> > +#ifdef CONFIG_MMU
> > +bool vmalloc_dump_obj(void *object);
> > +#else
> > +static inline bool vmalloc_dump_obj(void *object) { return false; }
> > +#endif
> > +
> >  #endif /* _LINUX_VMALLOC_H */
> > diff --git a/mm/util.c b/mm/util.c
> > index 8c2449f..ee99a0a 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -984,6 +984,12 @@ int __weak memcmp_pages(struct page *page1, struct 
> > page *page2)
> >   */
> >  void mem_dump_obj(void *object)
> >  {
> > +   if (kmem_valid_obj(object)) {
> > +   kmem_dump_obj(object);
> > +   return;
> > +   }
> > +   if (vmalloc_dump_obj(object))
> > +   return;
> > if (!virt_addr_valid(object)) {
> > if (object == NULL)
> > pr_cont(" NULL pointer.\n");
> > @@ -993,10 +999,6 @@ void mem_dump_obj(void *object)
> > pr_cont(" non-paged (local) memory.\n");
> > return;
> > }
> > -   if (kmem_valid_obj(object)) {
> > -   kmem_dump_obj(object);
> > -   return;
> > -   }
> > -   pr_cont(" non-slab memory.\n");
> > +   pr_cont(" non-slab/vmalloc memory.\n");
> >  }
> >  EXPORT_SYMBOL_GPL(mem_dump_obj);
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 6ae491a..7421719 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3431,6 +3431,18 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int 
> > nr_vms)
> >  }
> >  #endif /* CONFIG_SMP */
> >  
> > +bool vmalloc_dump_obj(void *object)
> > +{
> > +   struct vm_struct *vm;
> > +   void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> >
> Paul, vmalloced addresses are already aligned to PAGE_SIZE, so that one
> is odd.

They are, but this is to handle things like this:

struct foo {
int a;
struct rcu_head rh;
};

void silly(struct foo *fp)
{
call_rcu(&fp->rh, my_rcu_cb);
call_rcu(&fp->rh, my_other_rcu_cb);
}

In kernels built with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, this would
result in a call to mem_dump_obj() and then to vmalloc_dump_obj()
with a non-page-aligned pointer.

Thanx, Paul


[PATCH v7 6/8] IMA: extend critical data hook to limit the measurement based on a label

2020-12-09 Thread Tushar Sugandhi
The IMA hook ima_measure_critical_data() does not support a way to
specify the source of the critical data provider. Thus, the data
measurement cannot be constrained based on the data source label
in the IMA policy.

Extend the IMA hook ima_measure_critical_data() to support passing 
the data source label as an input parameter, so that the policy rule can
be used to limit the measurements based on the label.

Signed-off-by: Tushar Sugandhi 
---
 include/linux/ima.h   |  6 --
 security/integrity/ima/ima_main.c | 11 ---
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/include/linux/ima.h b/include/linux/ima.h
index 675f54db6264..6434287a81cd 100644
--- a/include/linux/ima.h
+++ b/include/linux/ima.h
@@ -30,7 +30,8 @@ extern int ima_post_read_file(struct file *file, void *buf, 
loff_t size,
 extern void ima_post_path_mknod(struct dentry *dentry);
 extern int ima_file_hash(struct file *file, char *buf, size_t buf_size);
 extern void ima_kexec_cmdline(int kernel_fd, const void *buf, int size);
-extern void ima_measure_critical_data(const char *event_name,
+extern void ima_measure_critical_data(const char *event_data_source,
+ const char *event_name,
  const void *buf, int buf_len,
  bool measure_buf_hash);
 
@@ -125,7 +126,8 @@ static inline int ima_file_hash(struct file *file, char 
*buf, size_t buf_size)
 }
 
 static inline void ima_kexec_cmdline(int kernel_fd, const void *buf, int size) 
{}
-static inline void ima_measure_critical_data(const char *event_name,
+static inline void ima_measure_critical_data(const char *event_data_source,
+const char *event_name,
 const void *buf, int buf_len,
 bool measure_buf_hash) {}
 #endif /* CONFIG_IMA */
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index ae59f4a4dd70..7c633901f441 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -924,6 +924,7 @@ void ima_kexec_cmdline(int kernel_fd, const void *buf, int 
size)
 
 /**
  * ima_measure_critical_data - measure kernel integrity critical data
+ * @event_data_source: kernel data source being measured
  * @event_name: event name to be used for the buffer entry
  * @buf: pointer to buffer containing data to measure
  * @buf_len: length of buffer(in bytes)
@@ -932,6 +933,9 @@ void ima_kexec_cmdline(int kernel_fd, const void *buf, int 
size)
  * Measure the kernel subsystem data, critical to the integrity of the kernel,
  * into the IMA log and extend the @pcr.
  *
+ * Use @event_data_source to describe the kernel data source for the buffer
+ * being measured.
+ *
  * Use @event_name to describe the state/buffer data change.
  * Examples of critical data (buf) could be kernel in-memory r/o structures,
  * hash of the memory structures, or data that represents subsystem state
@@ -944,17 +948,18 @@ void ima_kexec_cmdline(int kernel_fd, const void *buf, 
int size)
  *
  * The data (buf) can only be measured, not appraised.
  */
-void ima_measure_critical_data(const char *event_name,
+void ima_measure_critical_data(const char *event_data_source,
+  const char *event_name,
   const void *buf, int buf_len,
   bool measure_buf_hash)
 {
-   if (!event_name || !buf || !buf_len) {
+   if (!event_name || !event_data_source || !buf || !buf_len) {
pr_err("Invalid arguments passed to %s().\n", __func__);
return;
}
 
process_buffer_measurement(NULL, buf, buf_len, event_name,
-  CRITICAL_DATA, 0, NULL,
+  CRITICAL_DATA, 0, event_data_source,
   measure_buf_hash);
 }
 
-- 
2.17.1



Re: [PATCH v8] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-12-09 Thread Borislav Petkov
On Wed, Dec 09, 2020 at 07:34:16PM +, Ashish Kalra wrote:
> This should work, but i am concerned about making IO_TLB_DEFAULT_SIZE
> (which is pretty much private to generic swiotlb code) to be visible
> externally, i don't know if there are any concerns with that ?

Meh, it's just a define and it is not a secret that swiotlb size by
default is 64M.

Btw, pls trim your reply by removing quoted text you're not responding
to.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


[PATCH v7 5/8] IMA: limit critical data measurement based on a label

2020-12-09 Thread Tushar Sugandhi
System administrators should be able to limit which kernel subsystems
they want to measure the critical data for. To enable that, an IMA policy
condition to choose specific kernel subsystems is needed. This policy
condition would constrain the measurement of the critical data based on
a label for the given subsystems.

Add a new IMA policy condition - "data_source:=" to the IMA func
CRITICAL_DATA to allow measurement of various kernel subsystems. This
policy condition would enable the system administrators to restrict the
measurement to the labels listed in "data_source:=".

Limit the measurement to the labels that are specified in the IMA
policy - CRITICAL_DATA+"data_source:=". If "data_sources:=" is not
provided with the func CRITICAL_DATA, the data from all the
supported kernel subsystems is measured.

Signed-off-by: Tushar Sugandhi 
---
 Documentation/ABI/testing/ima_policy |  2 ++
 security/integrity/ima/ima_policy.c  | 26 +-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/ima_policy 
b/Documentation/ABI/testing/ima_policy
index 6ec7daa87cba..0f4ee9e0a455 100644
--- a/Documentation/ABI/testing/ima_policy
+++ b/Documentation/ABI/testing/ima_policy
@@ -52,6 +52,8 @@ Description:
template:= name of a defined IMA template type
(eg, ima-ng). Only valid when action is "measure".
pcr:= decimal value
+   data_source:= [label]
+   label:= a unique string used for grouping and limiting 
critical data.
 
  default policy:
# PROC_SUPER_MAGIC
diff --git a/security/integrity/ima/ima_policy.c 
b/security/integrity/ima/ima_policy.c
index 9a8ee80a3128..7486d09a3f60 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -934,7 +934,7 @@ enum {
Opt_uid_lt, Opt_euid_lt, Opt_fowner_lt,
Opt_appraise_type, Opt_appraise_flag,
Opt_permit_directio, Opt_pcr, Opt_template, Opt_keyrings,
-   Opt_err
+   Opt_data_source, Opt_err
 };
 
 static const match_table_t policy_tokens = {
@@ -971,6 +971,7 @@ static const match_table_t policy_tokens = {
{Opt_pcr, "pcr=%s"},
{Opt_template, "template=%s"},
{Opt_keyrings, "keyrings=%s"},
+   {Opt_data_source, "data_source=%s"},
{Opt_err, NULL}
 };
 
@@ -1350,6 +1351,23 @@ static int ima_parse_rule(char *rule, struct 
ima_rule_entry *entry)
 
entry->flags |= IMA_KEYRINGS;
break;
+   case Opt_data_source:
+   ima_log_string(ab, "data_source", args[0].from);
+
+   if (entry->data_source) {
+   result = -EINVAL;
+   break;
+   }
+
+   entry->data_source = ima_alloc_rule_opt_list(args);
+   if (IS_ERR(entry->data_source)) {
+   result = PTR_ERR(entry->data_source);
+   entry->data_source = NULL;
+   break;
+   }
+
+   entry->flags |= IMA_DATA_SOURCE;
+   break;
case Opt_fsuuid:
ima_log_string(ab, "fsuuid", args[0].from);
 
@@ -1730,6 +1748,12 @@ int ima_policy_show(struct seq_file *m, void *v)
seq_puts(m, " ");
}
 
+   if (entry->flags & IMA_DATA_SOURCE) {
+   seq_puts(m, "data_source=");
+   ima_show_rule_opt_list(m, entry->data_source);
+   seq_puts(m, " ");
+   }
+
if (entry->flags & IMA_PCR) {
snprintf(tbuf, sizeof(tbuf), "%d", entry->pcr);
seq_printf(m, pt(Opt_pcr), tbuf);
-- 
2.17.1



[PATCH] mmc: owl-mmc: Fix a resource leak in an error handling path and in the remove function

2020-12-09 Thread Christophe JAILLET
'dma_request_chan()' calls should be balanced by a corresponding
'dma_release_channel()' call.

Add the missing call both in the error handling path of the probe function
and in the remove function.

Fixes: ff65ffe46d28 ("mmc: Add Actions Semi Owl SoCs SD/MMC driver")
Signed-off-by: Christophe JAILLET 
---
 drivers/mmc/host/owl-mmc.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/mmc/host/owl-mmc.c b/drivers/mmc/host/owl-mmc.c
index 53b81582f1af..5490962dc8e5 100644
--- a/drivers/mmc/host/owl-mmc.c
+++ b/drivers/mmc/host/owl-mmc.c
@@ -640,7 +640,7 @@ static int owl_mmc_probe(struct platform_device *pdev)
owl_host->irq = platform_get_irq(pdev, 0);
if (owl_host->irq < 0) {
ret = -EINVAL;
-   goto err_free_host;
+   goto err_release_channel;
}
 
ret = devm_request_irq(&pdev->dev, owl_host->irq, owl_irq_handler,
@@ -648,19 +648,21 @@ static int owl_mmc_probe(struct platform_device *pdev)
if (ret) {
dev_err(&pdev->dev, "Failed to request irq %d\n",
owl_host->irq);
-   goto err_free_host;
+   goto err_release_channel;
}
 
ret = mmc_add_host(mmc);
if (ret) {
dev_err(&pdev->dev, "Failed to add host\n");
-   goto err_free_host;
+   goto err_release_channel;
}
 
dev_dbg(&pdev->dev, "Owl MMC Controller Initialized\n");
 
return 0;
 
+err_release_channel:
+   dma_release_channel(owl_host->dma);
 err_free_host:
mmc_free_host(mmc);
 
@@ -674,6 +676,7 @@ static int owl_mmc_remove(struct platform_device *pdev)
 
mmc_remove_host(mmc);
disable_irq(owl_host->irq);
+   dma_release_channel(owl_host->dma);
mmc_free_host(mmc);
 
return 0;
-- 
2.27.0



[PATCH v7 7/8] IMA: define a builtin critical data measurement policy

2020-12-09 Thread Tushar Sugandhi
From: Lakshmi Ramasubramanian 

Define a new critical data builtin policy to allow measuring
early kernel integrity critical data before a custom IMA policy
is loaded.

Add critical data to built-in IMA rules if the kernel command line
contains "ima_policy=critical_data".

Update the documentation on kernel parameters to document
the new critical data builtin policy.

Signed-off-by: Lakshmi Ramasubramanian 
---
 Documentation/admin-guide/kernel-parameters.txt |  5 -
 security/integrity/ima/ima_policy.c | 12 
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 526d65d8573a..6034d75c3ca0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1746,7 +1746,7 @@
ima_policy= [IMA]
The builtin policies to load during IMA setup.
Format: "tcb | appraise_tcb | secure_boot |
-fail_securely"
+fail_securely | critical_data"
 
The "tcb" policy measures all programs exec'd, files
mmap'd for exec, and all files opened with the read
@@ -1765,6 +1765,9 @@
filesystems with the SB_I_UNVERIFIABLE_SIGNATURE
flag.
 
+   The "critical_data" policy measures kernel integrity
+   critical data.
+
ima_tcb [IMA] Deprecated.  Use ima_policy= instead.
Load a policy which meets the needs of the Trusted
Computing Base.  This means IMA will measure all
diff --git a/security/integrity/ima/ima_policy.c 
b/security/integrity/ima/ima_policy.c
index 7486d09a3f60..37ca16a9e65f 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -206,6 +206,10 @@ static struct ima_rule_entry secure_boot_rules[] 
__ro_after_init = {
 .flags = IMA_FUNC | IMA_DIGSIG_REQUIRED},
 };
 
+static struct ima_rule_entry critical_data_rules[] __ro_after_init = {
+   {.action = MEASURE, .func = CRITICAL_DATA, .flags = IMA_FUNC},
+};
+
 /* An array of architecture specific rules */
 static struct ima_rule_entry *arch_policy_entry __ro_after_init;
 
@@ -228,6 +232,7 @@ __setup("ima_tcb", default_measure_policy_setup);
 
 static bool ima_use_appraise_tcb __initdata;
 static bool ima_use_secure_boot __initdata;
+static bool ima_use_critical_data __initdata;
 static bool ima_fail_unverifiable_sigs __ro_after_init;
 static int __init policy_setup(char *str)
 {
@@ -242,6 +247,8 @@ static int __init policy_setup(char *str)
ima_use_appraise_tcb = true;
else if (strcmp(p, "secure_boot") == 0)
ima_use_secure_boot = true;
+   else if (strcmp(p, "critical_data") == 0)
+   ima_use_critical_data = true;
else if (strcmp(p, "fail_securely") == 0)
ima_fail_unverifiable_sigs = true;
else
@@ -875,6 +882,11 @@ void __init ima_init_policy(void)
  ARRAY_SIZE(default_appraise_rules),
  IMA_DEFAULT_POLICY);
 
+   if (ima_use_critical_data)
+   add_rules(critical_data_rules,
+ ARRAY_SIZE(critical_data_rules),
+ IMA_DEFAULT_POLICY);
+
ima_update_policy_flag();
 }
 
-- 
2.17.1



[PATCH v7 8/8] selinux: include a consumer of the new IMA critical data hook

2020-12-09 Thread Tushar Sugandhi
From: Lakshmi Ramasubramanian 

IMA measures files and buffer data such as keys, command line arguments
passed to the kernel on kexec system call, etc. While these measurements
enable monitoring and validating the integrity of the system, it is not
sufficient. Various data structures, policies and states stored in kernel
memory also impact the integrity of the system. Updates to these data
structures would have an impact on the security functionalities.
For example, SELinux stores the active policy in memory. Changes to this
data at runtime would have an impact on the security guarantees provided
by SELinux. Measuring such in-memory data structures through IMA
subsystem provides a secure way for a remote attestation service to
know the state of the system and also the runtime changes in the state
of the system.

SELinux policy is a critical data for this security module that needs
to be measured. This measurement can be used by an attestation service,
for instance, to verify if the policy has been setup correctly and that
it hasn't been tampered at run-time.

Measure the hash of the loaded policy by calling the IMA hook
ima_measure_critical_data(). Since the size of the loaded policy can
be large (several MB), measure the hash of the policy instead of
the entire policy to avoid bloating the IMA log entry.

Add "selinux" to the list of supported data sources maintained by IMA
to enable measuring SELinux data.

To enable SELinux data measurement, the following steps are required:

1, Add "ima_policy=critical_data" to the kernel command line arguments
   to enable measuring SELinux data at boot time.
For example,
  BOOT_IMAGE=/boot/vmlinuz-5.10.0-rc1+ 
root=UUID=fd643309-a5d2-4ed3-b10d-3c579a5fab2f ro nomodeset security=selinux 
ima_policy=critical_data

2, Add the following rule to /etc/ima/ima-policy
   measure func=CRITICAL_DATA data_source=selinux

Sample measurement of the hash of SELinux policy:

To verify the measured data with the current SELinux policy run
the following commands and verify the output hash values match.

  sha256sum /sys/fs/selinux/policy | cut -d' ' -f 1

  grep "selinux-policy-hash" 
/sys/kernel/security/integrity/ima/ascii_runtime_measurements | tail -1 | cut 
-d' ' -f 6

Note that the actual verification of SELinux policy would require loading
the expected policy into an identical kernel on a pristine/known-safe
system and run the sha256sum /sys/kernel/selinux/policy there to get
the expected hash.

Signed-off-by: Lakshmi Ramasubramanian 
Suggested-by: Stephen Smalley 
---
 Documentation/ABI/testing/ima_policy |  3 +-
 security/selinux/Makefile|  2 +
 security/selinux/include/security.h  | 11 +++-
 security/selinux/measure.c   | 86 
 security/selinux/ss/services.c   | 71 ---
 5 files changed, 162 insertions(+), 11 deletions(-)
 create mode 100644 security/selinux/measure.c

diff --git a/Documentation/ABI/testing/ima_policy 
b/Documentation/ABI/testing/ima_policy
index 0f4ee9e0a455..7c7023f7986b 100644
--- a/Documentation/ABI/testing/ima_policy
+++ b/Documentation/ABI/testing/ima_policy
@@ -52,8 +52,9 @@ Description:
template:= name of a defined IMA template type
(eg, ima-ng). Only valid when action is "measure".
pcr:= decimal value
-   data_source:= [label]
+   data_source:= [selinux]|[label]
label:= a unique string used for grouping and limiting 
critical data.
+   For example, "selinux" to measure critical data for 
SELinux.
 
  default policy:
# PROC_SUPER_MAGIC
diff --git a/security/selinux/Makefile b/security/selinux/Makefile
index 4d8e0e8adf0b..83d512116341 100644
--- a/security/selinux/Makefile
+++ b/security/selinux/Makefile
@@ -16,6 +16,8 @@ selinux-$(CONFIG_NETLABEL) += netlabel.o
 
 selinux-$(CONFIG_SECURITY_INFINIBAND) += ibpkey.o
 
+selinux-$(CONFIG_IMA) += measure.o
+
 ccflags-y := -I$(srctree)/security/selinux 
-I$(srctree)/security/selinux/include
 
 $(addprefix $(obj)/,$(selinux-y)): $(obj)/flask.h
diff --git a/security/selinux/include/security.h 
b/security/selinux/include/security.h
index 3cc8bab31ea8..18ee65c98446 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -229,7 +229,8 @@ void selinux_policy_cancel(struct selinux_state *state,
struct selinux_policy *policy);
 int security_read_policy(struct selinux_state *state,
 void **data, size_t *len);
-
+int security_read_policy_kernel(struct selinux_state *state,
+   void **data, size_t *len);
 int security_policycap_supported(struct selinux_state *state,
 unsigned int req_cap);
 
@@ -446,4 +447,12 @@ extern void ebitmap_cache_init(void);
 extern void hashtab_cache_init(void);
 extern int security_sidt

Re: [PATCH] Staging: silabs si4455 serial driver

2020-12-09 Thread József Horváth
Hello Jérôme,

On Wed, Dec 09, 2020 at 06:38:08PM +0100, Jérôme Pouiller wrote:
> On Wednesday 9 December 2020 12:09:58 CET Info wrote:
> > 
> > This is a serial port driver for
> > Silicon Labs Si4455 Sub-GHz transciver.
> 
> Hello József,
> 
> Thank you for taking care of support of Silabs products :)

I think great products :) and great support :)

> 
> 
> > Signed-off-by: József Horváth 
> 
> I think you have to use your personal address to sign-off.

I'm a self-employed, currently this is my "personal" e-mail address.

> 
> > ---
> >  .../bindings/staging/serial/silabs,si4455.txt |   39 +
> >  drivers/staging/Kconfig   |2 +
> >  drivers/staging/Makefile  |1 +
> >  drivers/staging/si4455/Kconfig|8 +
> >  drivers/staging/si4455/Makefile   |2 +
> >  drivers/staging/si4455/TODO   |3 +
> >  drivers/staging/si4455/si4455.c   | 1465 +
> >  drivers/staging/si4455/si4455_api.h   |   56 +
> >  8 files changed, 1576 insertions(+)
> >  create mode 100644
> > Documentation/devicetree/bindings/staging/serial/silabs,si4455.txt
> >  create mode 100644 drivers/staging/si4455/Kconfig
> >  create mode 100644 drivers/staging/si4455/Makefile
> >  create mode 100644 drivers/staging/si4455/TODO
> >  create mode 100644 drivers/staging/si4455/si4455.c
> >  create mode 100644 drivers/staging/si4455/si4455_api.h
> 
> Since you add a new directory, you should also update MAINTAINERS file
> (checkpatch didn't warn you about that?).
> 
> 
> > diff --git
> > a/Documentation/devicetree/bindings/staging/serial/silabs,si4455.txt
> > b/Documentation/devicetree/bindings/staging/serial/silabs,si4455.txt
> > new file mode 100644
> > index ..abd659b7b952
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/staging/serial/silabs,si4455.txt
> > @@ -0,0 +1,39 @@
> > +* Silicon Labs Si4455 EASY-TO-USE, LOW-CURRENT OOK/(G)FSK SUB-GHZ
> > TRANSCEIVER
> 
> AFAIK, Si4455 is a programmable product. So I think that this driver only
> work if the Si4455 use a specific firmware, isn't? In this case, you
> should mention it in the documentation. 

Si4455 is programmed by silabs.
In case of C2A, it is possible to load patch.

My solution is to loading EZConfig(generated by WDS) and firmware patch with 
ioctl calls.
You can see the definitions in si4455_api.h.
A short example for EZConfig loading(patch loading will be exacly the same if 
Si4455 is rev C2A):
...
#include "si4455_api.h"
...
#include "radio.h"  //< Generated by WDS3
#include "radio_config_Si4455.h"//< Generated by WDS3
...
struct si4455_iocbuff iocbuff = { 0 };
iocbuff.length = sizeof(Radio_Configuration_Data_Array);
memcpy(iocbuff.data, Radio_Configuration_Data_Array, iocbuff.length);
ret = ioctl(portFd, SI4455_IOC_SEZC, &iocbuff);
...

After SI4455_IOC_SEZC or SI4455_IOC_SEZP ioctl calls, the driver resets the 
Si4455, and apply the configuration/patch.

> 
> 
> > +
> > +Required properties:
> > +- compatible: Should be one of the following:
> > +  - "silabs,si4455" for Silicon Labs Si4455-B1A or Si4455-C2A (driver
> > automatically detects the part info),
> > +  - "silabs,si4455b1a" for Silicon Labs Si4455-B1A,
> > +  - "silabs,si4455c2a" for Silicon Labs Si4455-C2A,
> > +- reg: SPI chip select number.
> > +- interrupts: Specifies the interrupt source of the parent interrupt
> > +  controller. The format of the interrupt specifier depends on the
> > +  parent interrupt controller.
> > +- clocks: phandle to the IC source clock (only external clock source
> > supported).
> > +- spi-max-frequency: maximum clock frequency on SPI port
> > +- shdn-gpios: gpio pin for SDN
> > +
> > +Example:
> > +
> > +/ {
> > +   clocks {
> > +si4455_1_2_osc: si4455_1_2_osc {
> > +compatible = "fixed-clock";
> > +#clock-cells = <0>;
> > +clock-frequency  = <3000>;
> > +};
> > +   };
> > +};
> > +
> > +&spi0 {
> > +   si4455: si4455@0 {
> > +   compatible = "silabs,si4455";
> > +   reg = <0>;
> > +   clocks = <&si4455_1_2_osc>;
> 
> It seems that the driver does not use this clock. So, is the clock
> attribute mandatory? What is the purpose of declaring a fixed-clock
> for this device?
> 

Yes you are right, but the uart subsystem maybe. I'll check it again, and if 
does not, I'll remove these definitions.

> > +   interrupt-parent = <&gpio>;
> > +   interrupts = <7 IRQ_TYPE_LEVEL_LOW>;
> > +shdn-gpios = <&gpio 26 1>;
> > +status = "okay";
> > +spi-max-frequency = <300>;
> > +   };
> > +};
> 
> [...]
> 
> 
> > diff --git a/drivers/staging/si4455/Kconfig b/drivers/staging/si4455/Kconfig
> > new file mode 1006

[PATCH v7 2/8] IMA: add support to measure buffer data hash

2020-12-09 Thread Tushar Sugandhi
The original IMA buffer data measurement sizes were small (e.g. boot
command line), but the new buffer data measurement use cases have data 
sizes that are a lot larger.  Just as IMA measures the file data hash,
not the file data, IMA should similarly support the option for measuring 
the hash of the buffer data.

Measuring in-memory buffer-data/buffer-data-hash is different than
measuring file-data/file-data-hash. For the file, IMA stores the
measurements in both measurement log and the file's extended attribute -
which can later be used for appraisal as well. For buffer, the
measurements are only stored in the IMA log, since the buffer has no
extended attributes associated with it.

Introduce a boolean parameter measure_buf_hash to support measuring
hash of a buffer, which would be much smaller, instead of the buffer
itself.

Signed-off-by: Tushar Sugandhi 
---
 security/integrity/ima/ima.h |  3 +-
 security/integrity/ima/ima_appraise.c|  2 +-
 security/integrity/ima/ima_asymmetric_keys.c |  2 +-
 security/integrity/ima/ima_main.c| 36 +---
 security/integrity/ima/ima_queue_keys.c  |  3 +-
 5 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index e5622ce8cbb1..fa3044a7539f 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -268,7 +268,8 @@ void ima_store_measurement(struct integrity_iint_cache 
*iint, struct file *file,
   struct ima_template_desc *template_desc);
 void process_buffer_measurement(struct inode *inode, const void *buf, int size,
const char *eventname, enum ima_hooks func,
-   int pcr, const char *func_data);
+   int pcr, const char *func_data,
+   bool measure_buf_hash);
 void ima_audit_measurement(struct integrity_iint_cache *iint,
   const unsigned char *filename);
 int ima_alloc_init_template(struct ima_event_data *event_data,
diff --git a/security/integrity/ima/ima_appraise.c 
b/security/integrity/ima/ima_appraise.c
index 8361941ee0a1..46ffa38bab12 100644
--- a/security/integrity/ima/ima_appraise.c
+++ b/security/integrity/ima/ima_appraise.c
@@ -352,7 +352,7 @@ int ima_check_blacklist(struct integrity_iint_cache *iint,
if ((rc == -EPERM) && (iint->flags & IMA_MEASURE))
process_buffer_measurement(NULL, digest, digestsize,
   "blacklisted-hash", NONE,
-  pcr, NULL);
+  pcr, NULL, false);
}
 
return rc;
diff --git a/security/integrity/ima/ima_asymmetric_keys.c 
b/security/integrity/ima/ima_asymmetric_keys.c
index 1c68c500c26f..a74095793936 100644
--- a/security/integrity/ima/ima_asymmetric_keys.c
+++ b/security/integrity/ima/ima_asymmetric_keys.c
@@ -60,5 +60,5 @@ void ima_post_key_create_or_update(struct key *keyring, 
struct key *key,
 */
process_buffer_measurement(NULL, payload, payload_len,
   keyring->description, KEY_CHECK, 0,
-  keyring->description);
+  keyring->description, false);
 }
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index e76ef4bfd0f4..03aad13e9e70 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -779,7 +779,7 @@ int ima_post_load_data(char *buf, loff_t size,
 }
 
 /*
- * process_buffer_measurement - Measure the buffer to ima log.
+ * process_buffer_measurement - Measure the buffer or the buffer data hash
  * @inode: inode associated with the object being measured (NULL for KEY_CHECK)
  * @buf: pointer to the buffer that needs to be added to the log.
  * @size: size of buffer(in bytes).
@@ -787,12 +787,23 @@ int ima_post_load_data(char *buf, loff_t size,
  * @func: IMA hook
  * @pcr: pcr to extend the measurement
  * @func_data: private data specific to @func, can be NULL.
+ * @measure_buf_hash: measure buffer hash
  *
- * Based on policy, the buffer is measured into the ima log.
+ * Measure the buffer into the IMA log, and extend the @pcr.
+ *
+ * Determine what buffers are allowed to be measured, based on the policy rules
+ * and the IMA hook passed using @func.
+ *
+ * Use @func_data, if provided, to match against the measurement policy rule
+ * data for @func.
+ *
+ * If @measure_buf_hash is set to true - measure hash of the buffer data,
+ * else measure the buffer data itself.
  */
 void process_buffer_measurement(struct inode *inode, const void *buf, int size,
const char *eventname, enum ima_hooks func,
-   int pcr, const char *func_data)
+   int pcr, const char *func_data,
+

[PATCH v26 01/12] landlock: Add object management

2020-12-09 Thread Mickaël Salaün
From: Mickaël Salaün 

A Landlock object enables to identify a kernel object (e.g. an inode).
A Landlock rule is a set of access rights allowed on an object.  Rules
are grouped in rulesets that may be tied to a set of processes (i.e.
subjects) to enforce a scoped access-control (i.e. a domain).

Because Landlock's goal is to empower any process (especially
unprivileged ones) to sandbox themselves, we cannot rely on a
system-wide object identification such as file extended attributes.
Indeed, we need innocuous, composable and modular access-controls.

The main challenge with these constraints is to identify kernel objects
while this identification is useful (i.e. when a security policy makes
use of this object).  But this identification data should be freed once
no policy is using it.  This ephemeral tagging should not and may not be
written in the filesystem.  We then need to manage the lifetime of a
rule according to the lifetime of its objects.  To avoid a global lock,
this implementation make use of RCU and counters to safely reference
objects.

A following commit uses this generic object management for inodes.

Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Signed-off-by: Mickaël Salaün 
Reviewed-by: Jann Horn 
---

Changes since v24:
* Fix typo in comment (spotted by Jann Horn).
* Add Reviewed-by: Jann Horn 

Changes since v23:
* Update landlock_create_object() to return error codes instead of NULL.
  This help error handling in callers.
* When using make oldconfig with a previous configuration already
  including the CONFIG_LSM variable, no question is asked to update its
  content.  Update the Kconfig help to warn about LSM stacking
  configuration.
* Constify variable (spotted by Vincent Dagonneau).

Changes since v22:
* Fix spelling (spotted by Jann Horn).

Changes since v21:
* Update Kconfig help.
* Clean up comments.

Changes since v18:
* Account objects to kmemcg.

Changes since v14:
* Simplify the object, rule and ruleset management at the expense of a
  less aggressive memory freeing (contributed by Jann Horn, with
  additional modifications):
  - Remove object->list aggregating the rules tied to an object.
  - Remove landlock_get_object(), landlock_drop_object(),
{get,put}_object_cleaner() and landlock_rule_is_disabled().
  - Rewrite landlock_put_object() to use a more simple mechanism
(no tricky RCU).
  - Replace enum landlock_object_type and landlock_release_object() with
landlock_object_underops->release()
  - Adjust unions and Sparse annotations.
  Cf. 
https://lore.kernel.org/lkml/cag48ez21ben0wl1bbmtiiu8j9jp5iewthowz4turuj+ki0y...@mail.gmail.com/
* Merge struct landlock_rule into landlock_ruleset_elem to simplify the
  rule management.
* Constify variables.
* Improve kernel documentation.
* Cosmetic variable renames.
* Remove the "default" in the Kconfig (suggested by Jann Horn).
* Only use refcount_inc() through getter helpers.
* Update Kconfig description.

Changes since v13:
* New dedicated implementation, removing the need for eBPF.

Previous changes:
https://lore.kernel.org/lkml/20190721213116.23476-6-...@digikod.net/
---
 MAINTAINERS| 10 +
 security/Kconfig   |  1 +
 security/Makefile  |  2 +
 security/landlock/Kconfig  | 21 +
 security/landlock/Makefile |  3 ++
 security/landlock/object.c | 67 
 security/landlock/object.h | 91 ++
 7 files changed, 195 insertions(+)
 create mode 100644 security/landlock/Kconfig
 create mode 100644 security/landlock/Makefile
 create mode 100644 security/landlock/object.c
 create mode 100644 security/landlock/object.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 6f474153dbec..dc718573317e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9827,6 +9827,16 @@ F:   net/core/sock_map.c
 F: net/ipv4/tcp_bpf.c
 F: net/ipv4/udp_bpf.c
 
+LANDLOCK SECURITY MODULE
+M: Mickaël Salaün 
+L: linux-security-mod...@vger.kernel.org
+S: Supported
+W: https://landlock.io
+T: git https://github.com/landlock-lsm/linux.git
+F: security/landlock/
+K: landlock
+K: LANDLOCK
+
 LANTIQ / INTEL Ethernet drivers
 M: Hauke Mehrtens 
 L: net...@vger.kernel.org
diff --git a/security/Kconfig b/security/Kconfig
index 7561f6f99f1d..15a4342b5d01 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -238,6 +238,7 @@ source "security/loadpin/Kconfig"
 source "security/yama/Kconfig"
 source "security/safesetid/Kconfig"
 source "security/lockdown/Kconfig"
+source "security/landlock/Kconfig"
 
 source "security/integrity/Kconfig"
 
diff --git a/security/Makefile b/security/Makefile
index 3baf435de541..c688f4907a1b 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -13,6 +13,7 @@ subdir-$(CONFIG_SECURITY_LOADPIN) += loadpin
 subdir-$(CONFIG_SECURITY_SAFESETID)+= safesetid
 subdir-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown
 subdir-$(CONFIG_BPF_LSM)   += bpf
+subdir-$(CONFIG_SECURITY_LANDL

[PATCH v26 02/12] landlock: Add ruleset and domain management

2020-12-09 Thread Mickaël Salaün
From: Mickaël Salaün 

A Landlock ruleset is mainly a red-black tree with Landlock rules as
nodes.  This enables quick update and lookup to match a requested
access, e.g. to a file.  A ruleset is usable through a dedicated file
descriptor (cf. following commit implementing syscalls) which enables a
process to create and populate a ruleset with new rules.

A domain is a ruleset tied to a set of processes.  This group of rules
defines the security policy enforced on these processes and their future
children.  A domain can transition to a new domain which is the
intersection of all its constraints and those of a ruleset provided by
the current process.  This modification only impact the current process.
This means that a process can only gain more constraints (i.e. lose
accesses) over time.

Cc: James Morris 
Cc: Jann Horn 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Signed-off-by: Mickaël Salaün 
---

Changes since v25:
* Add build-time checks for the num_layers and num_rules variables
  according to LANDLOCK_MAX_NUM_LAYERS and LANDLOCK_MAX_NUM_RULES, and
  move these limits to a dedicated file.
* Cosmetic variable renames.

Changes since v24:
* Update struct landlock_rule with a layer stack.  This reverts "Always
  intersect access rights" from v24 and also adds the ability to tie
  access rights with their policy layer.  As noted by Jann Horn, always
  intersecting access rights made some use cases uselessly more
  difficult to handle in user space.  Thanks to this new stack, we still
  have a deterministic policy behavior whatever their level in the stack
  of policies, while using a "union" of accesses when building a
  ruleset.  The implementation use a FAM to keep the access checks quick
  and memory efficient (4 bytes per layer per inode).  Update
  insert_rule() accordingly.

Changes since v23:
* Always intersect access rights.  Following the filesystem change
  logic, make ruleset updates more consistent by always intersecting
  access rights (boolean AND) instead of combining them (boolean OR) for
  the same layer.  This defensive approach could also help avoid user
  space to inadvertently allow multiple access rights for the same
  object (e.g.  write and execute access on a path hierarchy) instead of
  dealing with such inconsistency.  This can happen when there is no
  deduplication of objects (e.g. paths and underlying inodes) whereas
  they get different access rights with landlock_add_rule(2).
* Add extra checks to make sure that:
  - there is always an (allocated) object in each used rules;
  - when updating a ruleset with a new rule (i.e. not merging two
rulesets), the ruleset doesn't contain multiple layers.
* Hide merge parameter from the public landlock_insert_rule() API.  This
  helps avoid misuse of this function.
* Replace a remaining hardcoded 1 with SINGLE_DEPTH_NESTING.

Changes since v22:
* Explicitely use RB_ROOT and SINGLE_DEPTH_NESTING (suggested by Jann
  Horn).
* Improve comments and fix spelling (suggested by Jann Horn).

Changes since v21:
* Add and clean up comments.

Changes since v18:
* Account rulesets to kmemcg.
* Remove struct holes.
* Cosmetic changes.

Changes since v17:
* Move include/uapi/linux/landlock.h and _LANDLOCK_ACCESS_FS_* to a
  following patch.

Changes since v16:
* Allow enforcement of empty ruleset, which enables deny-all policies.

Changes since v15:
* Replace layer_levels and layer_depth with a bitfield of layers, cf.
  filesystem commit.
* Rename the LANDLOCK_ACCESS_FS_{UNLINK,RMDIR} with
  LANDLOCK_ACCESS_FS_REMOVE_{FILE,DIR} because it makes sense to use
  them for the action of renaming a file or a directory, which may lead
  to the removal of the source file or directory.  Removes the
  LANDLOCK_ACCESS_FS_{LINK_TO,RENAME_FROM,RENAME_TO} which are now
  replaced with LANDLOCK_ACCESS_FS_REMOVE_{FILE,DIR} and
  LANDLOCK_ACCESS_FS_MAKE_* .
* Update the documentation accordingly and highlight how the access
  rights are taken into account.
* Change nb_rules from atomic_t to u32 because it is not use anymore by
  show_fdinfo().
* Add safeguard for level variables types.
* Check max number of rules.
* Replace struct landlock_access (self and beneath bitfields) with one
  bitfield.
* Remove useless variable.
* Add comments.

Changes since v14:
* Simplify the object, rule and ruleset management at the expense of a
  less aggressive memory freeing (contributed by Jann Horn, with
  additional modifications):
  - Make a domain immutable (remove the opportunistic cleaning).
  - Remove RCU pointers.
  - Merge struct landlock_ref and struct landlock_ruleset_elem into
landlock_rule: get ride of rule's RCU.
  - Adjust union.
  - Remove the landlock_insert_rule() check about a new object with the
same address as a previously disabled one, because it is not
possible to disable a rule anymore.
  Cf. 
https://lore.kernel.org/lkml/cag48ez21ben0wl1bbmtiiu8j9jp5iewthowz4turuj+ki0y...@mail.gmail.com/
* Fix nested domains by implementing a notion of layer level 

[PATCH v26 00/12] Landlock LSM

2020-12-09 Thread Mickaël Salaün
Hi,

This patch series adds new built-time checks, a new test, renames some
variables and functions to improve readability, and shift syscall
numbers to align with -next.

The SLOC count is 1289 for security/landlock/ and 1791 for
tools/testing/selftest/landlock/ .  Test coverage for security/landlock/
is 94.1% of lines.  The code not covered only deals with internal kernel
errors (e.g. memory allocation) and race conditions.

The compiled documentation is available here:
https://landlock.io/linux-doc/landlock-v26/userspace-api/landlock.html

This series can be applied on top of v5.10-rc7 .  This can be tested
with CONFIG_SECURITY_LANDLOCK, CONFIG_SAMPLE_LANDLOCK and by prepending
"landlock," to CONFIG_LSM.  This patch series can be found in a Git
repository here:
https://github.com/landlock-lsm/linux/commits/landlock-v26
I would really appreciate constructive comments on this patch series.


# Landlock LSM

The goal of Landlock is to enable to restrict ambient rights (e.g.
global filesystem access) for a set of processes.  Because Landlock is a
stackable LSM [1], it makes possible to create safe security sandboxes
as new security layers in addition to the existing system-wide
access-controls. This kind of sandbox is expected to help mitigate the
security impact of bugs or unexpected/malicious behaviors in user-space
applications. Landlock empowers any process, including unprivileged
ones, to securely restrict themselves.

Landlock is inspired by seccomp-bpf but instead of filtering syscalls
and their raw arguments, a Landlock rule can restrict the use of kernel
objects like file hierarchies, according to the kernel semantic.
Landlock also takes inspiration from other OS sandbox mechanisms: XNU
Sandbox, FreeBSD Capsicum or OpenBSD Pledge/Unveil.

In this current form, Landlock misses some access-control features.
This enables to minimize this patch series and ease review.  This series
still addresses multiple use cases, especially with the combined use of
seccomp-bpf: applications with built-in sandboxing, init systems,
security sandbox tools and security-oriented APIs [2].

Previous version:
https://lore.kernel.org/lkml/20201201192322.213239-1-...@digikod.net

[1] 
https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b...@schaufler-ca.com/
[2] 
https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad046...@digikod.net/


Casey Schaufler (1):
  LSM: Infrastructure management of the superblock

Mickaël Salaün (11):
  landlock: Add object management
  landlock: Add ruleset and domain management
  landlock: Set up the security framework and manage credentials
  landlock: Add ptrace restrictions
  fs,security: Add sb_delete hook
  landlock: Support filesystem access-control
  landlock: Add syscall implementations
  arch: Wire up Landlock syscalls
  selftests/landlock: Add user space tests
  samples/landlock: Add a sandbox manager example
  landlock: Add user and kernel documentation

 Documentation/security/index.rst  |1 +
 Documentation/security/landlock.rst   |   79 +
 Documentation/userspace-api/index.rst |1 +
 Documentation/userspace-api/landlock.rst  |  280 +++
 MAINTAINERS   |   13 +
 arch/Kconfig  |7 +
 arch/alpha/kernel/syscalls/syscall.tbl|3 +
 arch/arm/tools/syscall.tbl|3 +
 arch/arm64/include/asm/unistd.h   |2 +-
 arch/arm64/include/asm/unistd32.h |6 +
 arch/ia64/kernel/syscalls/syscall.tbl |3 +
 arch/m68k/kernel/syscalls/syscall.tbl |3 +
 arch/microblaze/kernel/syscalls/syscall.tbl   |3 +
 arch/mips/kernel/syscalls/syscall_n32.tbl |3 +
 arch/mips/kernel/syscalls/syscall_n64.tbl |3 +
 arch/mips/kernel/syscalls/syscall_o32.tbl |3 +
 arch/parisc/kernel/syscalls/syscall.tbl   |3 +
 arch/powerpc/kernel/syscalls/syscall.tbl  |3 +
 arch/s390/kernel/syscalls/syscall.tbl |3 +
 arch/sh/kernel/syscalls/syscall.tbl   |3 +
 arch/sparc/kernel/syscalls/syscall.tbl|3 +
 arch/um/Kconfig   |1 +
 arch/x86/entry/syscalls/syscall_32.tbl|3 +
 arch/x86/entry/syscalls/syscall_64.tbl|3 +
 arch/xtensa/kernel/syscalls/syscall.tbl   |3 +
 fs/super.c|1 +
 include/linux/lsm_hook_defs.h |1 +
 include/linux/lsm_hooks.h |3 +
 include/linux/security.h  |4 +
 include/linux/syscalls.h  |7 +
 include/uapi/asm-generic/unistd.h |8 +-
 include/uapi/linux/landlock.h |  128 ++
 kernel/sys_ni.c   |5 +
 samples/Kconfig   |7 +
 samples/Makefile  |1 +
 samples/landlock/.gitignore   |1 +
 samples/landlock/Makefile

Re: [PATCH v5 4/5] Driver core: platform: Add devm_platform_get_irqs_affinity()

2020-12-09 Thread Marc Zyngier

On 2020-12-09 19:13, Greg KH wrote:

On Wed, Dec 09, 2020 at 07:04:02PM +, John Garry wrote:

On 09/12/2020 18:32, Greg KH wrote:
> On Wed, Dec 02, 2020 at 06:36:56PM +0800, John Garry wrote:
> > Drivers for multi-queue platform devices may also want managed interrupts
> > for handling HW queue completion interrupts, so add support.
>

Hi Greg,

> Why would a platform device want all of this?  Shouldn't such a device
> be on a "real" bus instead?

For this HW version, the device is on the system bus, directly 
addressable

by the CPU.


What do you mean by "system bus"?


Motivation is that I wanted to switch the HW completion queues to use
managed interrupts.


Fair enough, seems like overkill for a "platform" bus though :)


You should see the box, really... ;-)




> What in-kernel driver needs this complexity?  I can't take new apis
> without a real user in the tree, sorry.

It's in the final patch in the series 
https://lore.kernel.org/linux-scsi/1606905417-183214-1-git-send-email-john.ga...@huawei.com/T/#m0df7e7cd6f0819b99aaeb6b7f8939ef1e17b8a83.


Ah, I missed that, I thought that was some high-speed scsi thing, not a
tiny platform driver...

I don't anticipate a huge number of users of this API in future, as 
most
multi-queue devices are PCI devices; so we could do the work of this 
API in
the driver itself, but the preference was not to export genirq 
functions
like irq_update_affinity_desc() or irq_create_affinity_masks(), and 
rather

have a common helper in the core platform code.


Ok, I'd like to have the irq maintainers/developers ack this before
taking it in the driver core, as someone is going to have to maintain
this crazy thing for forever if it gets merged.


I'm actually quite happy with this, and as it turns out, the crazy
system that has this SAS thing keeps my backside warm all year long.
As long as this machine keeps ticking, I'm happy to help with this.

So if that helps:

Acked-by: Marc Zyngier 

We need to work out the merge strategy for the whole lot though, given
that it crosses 3 subsystems over two series...

M.
--
Jazz is not dead. It just smells funny...


Re: [PATCH v2 sl-b 3/5] mm: Make mem_dump_obj() handle vmalloc() memory

2020-12-09 Thread Uladzislau Rezki
On Wed, Dec 09, 2020 at 06:51:20PM +0100, Vlastimil Babka wrote:
> On 12/9/20 2:13 AM, paul...@kernel.org wrote:
> > From: "Paul E. McKenney" 
> > 
> > This commit adds vmalloc() support to mem_dump_obj().  Note that the
> > vmalloc_dump_obj() function combines the checking and dumping, in
> > contrast with the split between kmem_valid_obj() and kmem_dump_obj().
> > The reason for the difference is that the checking in the vmalloc()
> > case involves acquiring a global lock, and redundant acquisitions of
> > global locks should be avoided, even on not-so-fast paths.
> > 
> > Note that this change causes on-stack variables to be reported as
> > vmalloc() storage from kernel_clone() or similar, depending on the degree
> > of inlining that your compiler does.  This is likely more helpful than
> > the earlier "non-paged (local) memory".
> > 
> > Cc: Andrew Morton 
> > Cc: Joonsoo Kim 
> > Cc: 
> > Reported-by: Andrii Nakryiko 
> > Signed-off-by: Paul E. McKenney 
> 
> ...
> 
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3431,6 +3431,18 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int 
> > nr_vms)
> >  }
> >  #endif /* CONFIG_SMP */
> >  
> > +bool vmalloc_dump_obj(void *object)
> > +{
> > +   struct vm_struct *vm;
> > +   void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > +
> > +   vm = find_vm_area(objp);
> > +   if (!vm)
> > +   return false;
> > +   pr_cont(" vmalloc allocated at %pS\n", vm->caller);
> 
> Would it be useful to print the vm area boundaries too?
> 
Do you mean va_start/va_end information?

--
Vlad Rezki


[PATCH v26 06/12] fs,security: Add sb_delete hook

2020-12-09 Thread Mickaël Salaün
From: Mickaël Salaün 

The sb_delete security hook is called when shutting down a superblock,
which may be useful to release kernel objects tied to the superblock's
lifetime (e.g. inodes).

This new hook is needed by Landlock to release (ephemerally) tagged
struct inodes.  This comes from the unprivileged nature of Landlock
described in the next commit.

Cc: Al Viro 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Signed-off-by: Mickaël Salaün 
Reviewed-by: Jann Horn 
---

Changes since v22:
* Add Reviewed-by: Jann Horn 

Changes since v17:
* Initial patch to replace the direct call to landlock_release_inodes()
  (requested by James Morris).
  https://lore.kernel.org/lkml/alpine.lrh.2.21.2005150536440.7...@namei.org/
---
 fs/super.c| 1 +
 include/linux/lsm_hook_defs.h | 1 +
 include/linux/lsm_hooks.h | 2 ++
 include/linux/security.h  | 4 
 security/security.c   | 5 +
 5 files changed, 13 insertions(+)

diff --git a/fs/super.c b/fs/super.c
index 98bb0629ee10..751cad8c081f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -454,6 +454,7 @@ void generic_shutdown_super(struct super_block *sb)
evict_inodes(sb);
/* only nonzero refcount inodes can have marks */
fsnotify_sb_delete(sb);
+   security_sb_delete(sb);
 
if (sb->s_dio_done_wq) {
destroy_workqueue(sb->s_dio_done_wq);
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 32a940117e7a..1ba9b4dfecb3 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -59,6 +59,7 @@ LSM_HOOK(int, 0, fs_context_dup, struct fs_context *fc,
 LSM_HOOK(int, -ENOPARAM, fs_context_parse_param, struct fs_context *fc,
 struct fs_parameter *param)
 LSM_HOOK(int, 0, sb_alloc_security, struct super_block *sb)
+LSM_HOOK(void, LSM_RET_VOID, sb_delete, struct super_block *sb)
 LSM_HOOK(void, LSM_RET_VOID, sb_free_security, struct super_block *sb)
 LSM_HOOK(void, LSM_RET_VOID, sb_free_mnt_opts, void *mnt_opts)
 LSM_HOOK(int, 0, sb_eat_lsm_opts, char *orig, void **mnt_opts)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index ff0f03a45c56..dbfcec05a176 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -108,6 +108,8 @@
  * allocated.
  * @sb contains the super_block structure to be modified.
  * Return 0 if operation was successful.
+ * @sb_delete:
+ * Release objects tied to a superblock (e.g. inodes).
  * @sb_free_security:
  * Deallocate and clear the sb->s_security field.
  * @sb contains the super_block structure to be modified.
diff --git a/include/linux/security.h b/include/linux/security.h
index bc2725491560..a4603b62d444 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -287,6 +287,7 @@ void security_bprm_committed_creds(struct linux_binprm 
*bprm);
 int security_fs_context_dup(struct fs_context *fc, struct fs_context *src_fc);
 int security_fs_context_parse_param(struct fs_context *fc, struct fs_parameter 
*param);
 int security_sb_alloc(struct super_block *sb);
+void security_sb_delete(struct super_block *sb);
 void security_sb_free(struct super_block *sb);
 void security_free_mnt_opts(void **mnt_opts);
 int security_sb_eat_lsm_opts(char *options, void **mnt_opts);
@@ -619,6 +620,9 @@ static inline int security_sb_alloc(struct super_block *sb)
return 0;
 }
 
+static inline void security_sb_delete(struct super_block *sb)
+{ }
+
 static inline void security_sb_free(struct super_block *sb)
 { }
 
diff --git a/security/security.c b/security/security.c
index 4ffd6c3af9d7..4563e7a79216 100644
--- a/security/security.c
+++ b/security/security.c
@@ -899,6 +899,11 @@ int security_sb_alloc(struct super_block *sb)
return rc;
 }
 
+void security_sb_delete(struct super_block *sb)
+{
+   call_void_hook(sb_delete, sb);
+}
+
 void security_sb_free(struct super_block *sb)
 {
call_void_hook(sb_free_security, sb);
-- 
2.29.2



[PATCH v26 03/12] landlock: Set up the security framework and manage credentials

2020-12-09 Thread Mickaël Salaün
From: Mickaël Salaün 

Process's credentials point to a Landlock domain, which is underneath
implemented with a ruleset.  In the following commits, this domain is
used to check and enforce the ptrace and filesystem security policies.
A domain is inherited from a parent to its child the same way a thread
inherits a seccomp policy.

Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Signed-off-by: Mickaël Salaün 
Reviewed-by: Jann Horn 
---

Changes since v25:
* Rename function to landlock_add_cred_hooks().

Changes since v23:
* Add an early check for the current domain in hook_cred_free() to avoid
  superfluous call.
* Cosmetic cleanup to make the code more readable.

Changes since v22:
* Add Reviewed-by: Jann Horn 

Changes since v21:
* Fix copyright dates.

Changes since v17:
* Constify returned domain pointers from landlock_get_current_domain()
  and landlock_get_task_domain() helpers.

Changes since v15:
* Optimize landlocked() for current thread.
* Display the greeting message when everything is initialized.

Changes since v14:
* Uses pr_fmt from common.h .
* Constify variables.
* Remove useless NULL initialization.

Changes since v13:
* totally get ride of the seccomp dependency
* only keep credential management and LSM setup.

Previous changes:
https://lore.kernel.org/lkml/20191104172146.30797-4-...@digikod.net/
---
 security/Kconfig   | 10 +++
 security/landlock/Makefile |  3 +-
 security/landlock/common.h | 20 +
 security/landlock/cred.c   | 46 ++
 security/landlock/cred.h   | 58 ++
 security/landlock/setup.c  | 31 
 security/landlock/setup.h  | 16 +++
 7 files changed, 178 insertions(+), 6 deletions(-)
 create mode 100644 security/landlock/common.h
 create mode 100644 security/landlock/cred.c
 create mode 100644 security/landlock/cred.h
 create mode 100644 security/landlock/setup.c
 create mode 100644 security/landlock/setup.h

diff --git a/security/Kconfig b/security/Kconfig
index 15a4342b5d01..0ced7fd33e4d 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -278,11 +278,11 @@ endchoice
 
 config LSM
string "Ordered list of enabled LSMs"
-   default 
"lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" 
if DEFAULT_SECURITY_SMACK
-   default 
"lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" 
if DEFAULT_SECURITY_APPARMOR
-   default "lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if 
DEFAULT_SECURITY_TOMOYO
-   default "lockdown,yama,loadpin,safesetid,integrity,bpf" if 
DEFAULT_SECURITY_DAC
-   default 
"lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"
+   default 
"landlock,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf"
 if DEFAULT_SECURITY_SMACK
+   default 
"landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf"
 if DEFAULT_SECURITY_APPARMOR
+   default "landlock,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" 
if DEFAULT_SECURITY_TOMOYO
+   default "landlock,lockdown,yama,loadpin,safesetid,integrity,bpf" if 
DEFAULT_SECURITY_DAC
+   default 
"landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"
help
  A comma-separated list of LSMs, in initialization order.
  Any LSMs left off this list will be ignored. This can be
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index d846eba445bb..041ea242e627 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
-landlock-y := object.o ruleset.o
+landlock-y := setup.o object.o ruleset.o \
+   cred.o
diff --git a/security/landlock/common.h b/security/landlock/common.h
new file mode 100644
index ..5dc0fe15707d
--- /dev/null
+++ b/security/landlock/common.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock LSM - Common constants and helpers
+ *
+ * Copyright © 2016-2020 Mickaël Salaün 
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#ifndef _SECURITY_LANDLOCK_COMMON_H
+#define _SECURITY_LANDLOCK_COMMON_H
+
+#define LANDLOCK_NAME "landlock"
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) LANDLOCK_NAME ": " fmt
+
+#endif /* _SECURITY_LANDLOCK_COMMON_H */
diff --git a/security/landlock/cred.c b/security/landlock/cred.c
new file mode 100644
index ..6725af24c684
--- /dev/null
+++ b/security/landlock/cred.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - Credential hooks
+ *
+ * Copyright © 2017-2020 Mickaël Salaün 
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#include 
+#include 
+
+#include "common.h"
+#include "cred.h"
+#include "ruleset.h"
+#include "setup.h"
+
+static int hook_cred_prepare(struct cred *const new,
+   const struct cred *const old, const gfp_t gfp)
+{
+   struct

Re: [PATCH v11 0/4] Add support for mv88e6393x family of Marvell

2020-12-09 Thread Jakub Kicinski
On Wed,  9 Dec 2020 15:02:54 +1000 Pavana Sharma wrote:
> Updated patchset after incorporating feedback.

This set does not apply to net-next. Please rebase.


Re: [PATCH v5 4/5] Driver core: platform: Add devm_platform_get_irqs_affinity()

2020-12-09 Thread John Garry

On 09/12/2020 19:13, Greg KH wrote:

Hi Greg,


For this HW version, the device is on the system bus, directly addressable
by the CPU.

What do you mean by "system bus"?


Maybe my terminology is wrong, the point is that we have a platform 
device driver.





Motivation is that I wanted to switch the HW completion queues to use
managed interrupts.

Fair enough, seems like overkill for a "platform" bus though:)


What in-kernel driver needs this complexity?  I can't take new apis
without a real user in the tree, sorry.

It's in the final patch in the 
serieshttps://lore.kernel.org/linux-scsi/1606905417-183214-1-git-send-email-john.ga...@huawei.com/T/#m0df7e7cd6f0819b99aaeb6b7f8939ef1e17b8a83.

Ah, I missed that, I thought that was some high-speed scsi thing, not a
tiny platform driver...


It is actually is a high-speed SCSI thing also, SAS 3.0 :)




I don't anticipate a huge number of users of this API in future, as most
multi-queue devices are PCI devices; so we could do the work of this API in
the driver itself, but the preference was not to export genirq functions
like irq_update_affinity_desc() or irq_create_affinity_masks(), and rather
have a common helper in the core platform code.

Ok, I'd like to have the irq maintainers/developers ack this before
taking it in the driver core, as someone is going to have to maintain
this crazy thing for forever if it gets merged.



irq experts are cc'ed and have been very helpful here

So the API mushroomed a bit over time, as I realized that we need to 
support tearing down the irq mapping, make as devm method, use 
irq_calc_affinity_vectors(). Not sure how we could factor any of it out 
to become less of your problem.


Thanks,
John


[PATCH] mt76: Fixed kernel test robot warning

2020-12-09 Thread Souptick Joarder
Kernel test robot throws below warning ->

   drivers/net/wireless/mediatek/mt76/tx.c: In function
'mt76_txq_schedule':
>> drivers/net/wireless/mediatek/mt76/tx.c:499:21: warning: variable 'q'
>> set but not used [-Wunused-but-set-variable]
 499 |  struct mt76_queue *q;
 | ^

This patch will silence this warning.

Reported-by: kernel test robot 
Signed-off-by: Souptick Joarder 
---
 drivers/net/wireless/mediatek/mt76/tx.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/wireless/mediatek/mt76/tx.c 
b/drivers/net/wireless/mediatek/mt76/tx.c
index 1e20afb..25627e7 100644
--- a/drivers/net/wireless/mediatek/mt76/tx.c
+++ b/drivers/net/wireless/mediatek/mt76/tx.c
@@ -504,14 +504,11 @@ void mt76_tx_complete_skb(struct mt76_dev *dev, u16 
wcid_idx, struct sk_buff *sk
 
 void mt76_txq_schedule(struct mt76_phy *phy, enum mt76_txq_id qid)
 {
-   struct mt76_queue *q;
int len;
 
if (qid >= 4)
return;
 
-   q = phy->q_tx[qid];
-
rcu_read_lock();
 
do {
-- 
1.9.1



Re: [PATCH v2 sl-b 3/5] mm: Make mem_dump_obj() handle vmalloc() memory

2020-12-09 Thread Uladzislau Rezki
On Tue, Dec 08, 2020 at 05:13:01PM -0800, paul...@kernel.org wrote:
> From: "Paul E. McKenney" 
> 
> This commit adds vmalloc() support to mem_dump_obj().  Note that the
> vmalloc_dump_obj() function combines the checking and dumping, in
> contrast with the split between kmem_valid_obj() and kmem_dump_obj().
> The reason for the difference is that the checking in the vmalloc()
> case involves acquiring a global lock, and redundant acquisitions of
> global locks should be avoided, even on not-so-fast paths.
> 
> Note that this change causes on-stack variables to be reported as
> vmalloc() storage from kernel_clone() or similar, depending on the degree
> of inlining that your compiler does.  This is likely more helpful than
> the earlier "non-paged (local) memory".
> 
> Cc: Andrew Morton 
> Cc: Joonsoo Kim 
> Cc: 
> Reported-by: Andrii Nakryiko 
> Signed-off-by: Paul E. McKenney 
> ---
>  include/linux/vmalloc.h |  6 ++
>  mm/util.c   | 12 +++-
>  mm/vmalloc.c| 12 
>  3 files changed, 25 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 938eaf9..c89c2be 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -248,4 +248,10 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
>  int register_vmap_purge_notifier(struct notifier_block *nb);
>  int unregister_vmap_purge_notifier(struct notifier_block *nb);
>  
> +#ifdef CONFIG_MMU
> +bool vmalloc_dump_obj(void *object);
> +#else
> +static inline bool vmalloc_dump_obj(void *object) { return false; }
> +#endif
> +
>  #endif /* _LINUX_VMALLOC_H */
> diff --git a/mm/util.c b/mm/util.c
> index 8c2449f..ee99a0a 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -984,6 +984,12 @@ int __weak memcmp_pages(struct page *page1, struct page 
> *page2)
>   */
>  void mem_dump_obj(void *object)
>  {
> + if (kmem_valid_obj(object)) {
> + kmem_dump_obj(object);
> + return;
> + }
> + if (vmalloc_dump_obj(object))
> + return;
>   if (!virt_addr_valid(object)) {
>   if (object == NULL)
>   pr_cont(" NULL pointer.\n");
> @@ -993,10 +999,6 @@ void mem_dump_obj(void *object)
>   pr_cont(" non-paged (local) memory.\n");
>   return;
>   }
> - if (kmem_valid_obj(object)) {
> - kmem_dump_obj(object);
> - return;
> - }
> - pr_cont(" non-slab memory.\n");
> + pr_cont(" non-slab/vmalloc memory.\n");
>  }
>  EXPORT_SYMBOL_GPL(mem_dump_obj);
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 6ae491a..7421719 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3431,6 +3431,18 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int 
> nr_vms)
>  }
>  #endif   /* CONFIG_SMP */
>  
> +bool vmalloc_dump_obj(void *object)
> +{
> + struct vm_struct *vm;
> + void *objp = (void *)PAGE_ALIGN((unsigned long)object);
>
Paul, vmalloced addresses are already aligned to PAGE_SIZE, so that one
is odd.

--
Vlad Rezki


[PATCH v26 09/12] arch: Wire up Landlock syscalls

2020-12-09 Thread Mickaël Salaün
From: Mickaël Salaün 

Wire up the following system calls for all architectures:
* landlock_create_ruleset(2)
* landlock_add_rule(2)
* landlock_enforce_ruleset_current(2)

Cc: Arnd Bergmann 
Cc: James Morris 
Cc: Jann Horn 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Signed-off-by: Mickaël Salaün 
---

Changes since v25:
* Rebase and leave space for the new epoll_pwait2(2) and memfd_secret(2)
  from -next.

Changes since v21:
* Rebase and leave space for watch_mount(2) from -next.

Changes since v20:
* Remove landlock_get_features(2).
* Decrease syscall numbers to stick to process_madvise(2) in -next.
* Rename landlock_enforce_ruleset(2) to
  landlock_enforce_ruleset_current(2).

Changes since v19:
* Increase syscall numbers by 4 to leave space for new ones (in
  linux-next): watch_mount(2), watch_sb(2), fsinfo(2) and
  process_madvise(2) (requested by Arnd Bergmann).
* Replace the previous multiplexor landlock(2) with 4 syscalls:
  landlock_get_features(2), landlock_create_ruleset(2),
  landlock_add_rule(2) and landlock_enforce_ruleset(2).

Changes since v18:
* Increase the syscall number because of the new faccessat2(2).

Changes since v14:
* Add all architectures.

Changes since v13:
* New implementation.
---
 arch/alpha/kernel/syscalls/syscall.tbl  | 3 +++
 arch/arm/tools/syscall.tbl  | 3 +++
 arch/arm64/include/asm/unistd.h | 2 +-
 arch/arm64/include/asm/unistd32.h   | 6 ++
 arch/ia64/kernel/syscalls/syscall.tbl   | 3 +++
 arch/m68k/kernel/syscalls/syscall.tbl   | 3 +++
 arch/microblaze/kernel/syscalls/syscall.tbl | 3 +++
 arch/mips/kernel/syscalls/syscall_n32.tbl   | 3 +++
 arch/mips/kernel/syscalls/syscall_n64.tbl   | 3 +++
 arch/mips/kernel/syscalls/syscall_o32.tbl   | 3 +++
 arch/parisc/kernel/syscalls/syscall.tbl | 3 +++
 arch/powerpc/kernel/syscalls/syscall.tbl| 3 +++
 arch/s390/kernel/syscalls/syscall.tbl   | 3 +++
 arch/sh/kernel/syscalls/syscall.tbl | 3 +++
 arch/sparc/kernel/syscalls/syscall.tbl  | 3 +++
 arch/x86/entry/syscalls/syscall_32.tbl  | 3 +++
 arch/x86/entry/syscalls/syscall_64.tbl  | 3 +++
 arch/xtensa/kernel/syscalls/syscall.tbl | 3 +++
 include/uapi/asm-generic/unistd.h   | 8 +++-
 19 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl 
b/arch/alpha/kernel/syscalls/syscall.tbl
index ee7b01bb7346..80cd3f9be707 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -480,3 +480,6 @@
 548common  pidfd_getfd sys_pidfd_getfd
 549common  faccessat2  sys_faccessat2
 550common  process_madvise sys_process_madvise
+554common  landlock_create_ruleset 
sys_landlock_create_ruleset
+555common  landlock_add_rule   
sys_landlock_add_rule
+556common  landlock_enforce_ruleset_current
sys_landlock_enforce_ruleset_current
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index d056a548358e..e0a342365810 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -454,3 +454,6 @@
 438common  pidfd_getfd sys_pidfd_getfd
 439common  faccessat2  sys_faccessat2
 440common  process_madvise sys_process_madvise
+444common  landlock_create_ruleset 
sys_landlock_create_ruleset
+445common  landlock_add_rule   
sys_landlock_add_rule
+446common  landlock_enforce_ruleset_current
sys_landlock_enforce_ruleset_current
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index b3b2019f8d16..727bfc3be99b 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -38,7 +38,7 @@
 #define __ARM_NR_compat_set_tls(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls   441
+#define __NR_compat_syscalls   447
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 107f08e03b9f..948c94b81bf4 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -889,6 +889,12 @@ __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
 #define __NR_process_madvise 440
 __SYSCALL(__NR_process_madvise, sys_process_madvise)
+#define __NR_landlock_create_ruleset 444
+__SYSCALL(__NR_landlock_create_ruleset, sys_landlock_create_ruleset)
+#define __NR_landlock_add_rule 445
+__SYSCALL(__NR_landlock_add_rule, sys_landlock_add_rule)
+#define __NR_landlock_enforce_ruleset_current 446
+__SYSCALL(__NR_landlock_enforce_ruleset_current, 
sys_landlock_enforce_ruleset_current)
 
 /*
  * Please add new compat syscalls above this comment and update
diff -

[PATCH v26 04/12] landlock: Add ptrace restrictions

2020-12-09 Thread Mickaël Salaün
From: Mickaël Salaün 

Using ptrace(2) and related debug features on a target process can lead
to a privilege escalation.  Indeed, ptrace(2) can be used by an attacker
to impersonate another task and to remain undetected while performing
malicious activities.  Thanks to  ptrace_may_access(), various part of
the kernel can check if a tracer is more privileged than a tracee.

A landlocked process has fewer privileges than a non-landlocked process
and must then be subject to additional restrictions when manipulating
processes. To be allowed to use ptrace(2) and related syscalls on a
target process, a landlocked process must have a subset of the target
process's rules (i.e. the tracee must be in a sub-domain of the tracer).

Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Signed-off-by: Mickaël Salaün 
Reviewed-by: Jann Horn 
---

Changes since v25:
* Rename function to landlock_add_ptrace_hooks().

Changes since v22:
* Add Reviewed-by: Jann Horn 

Changes since v21:
* Fix copyright dates.

Changes since v14:
* Constify variables.

Changes since v13:
* Make the ptrace restriction mandatory, like in the v10.
* Remove the eBPF dependency.

Previous changes:
https://lore.kernel.org/lkml/20191104172146.30797-5-...@digikod.net/
---
 security/landlock/Makefile |   2 +-
 security/landlock/ptrace.c | 120 +
 security/landlock/ptrace.h |  14 +
 security/landlock/setup.c  |   2 +
 4 files changed, 137 insertions(+), 1 deletion(-)
 create mode 100644 security/landlock/ptrace.c
 create mode 100644 security/landlock/ptrace.h

diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index 041ea242e627..f1d1eb72fa76 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
 landlock-y := setup.o object.o ruleset.o \
-   cred.o
+   cred.o ptrace.o
diff --git a/security/landlock/ptrace.c b/security/landlock/ptrace.c
new file mode 100644
index ..f55b82446de2
--- /dev/null
+++ b/security/landlock/ptrace.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - Ptrace hooks
+ *
+ * Copyright © 2017-2020 Mickaël Salaün 
+ * Copyright © 2019-2020 ANSSI
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common.h"
+#include "cred.h"
+#include "ptrace.h"
+#include "ruleset.h"
+#include "setup.h"
+
+/**
+ * domain_scope_le - Checks domain ordering for scoped ptrace
+ *
+ * @parent: Parent domain.
+ * @child: Potential child of @parent.
+ *
+ * Checks if the @parent domain is less or equal to (i.e. an ancestor, which
+ * means a subset of) the @child domain.
+ */
+static bool domain_scope_le(const struct landlock_ruleset *const parent,
+   const struct landlock_ruleset *const child)
+{
+   const struct landlock_hierarchy *walker;
+
+   if (!parent)
+   return true;
+   if (!child)
+   return false;
+   for (walker = child->hierarchy; walker; walker = walker->parent) {
+   if (walker == parent->hierarchy)
+   /* @parent is in the scoped hierarchy of @child. */
+   return true;
+   }
+   /* There is no relationship between @parent and @child. */
+   return false;
+}
+
+static bool task_is_scoped(const struct task_struct *const parent,
+   const struct task_struct *const child)
+{
+   bool is_scoped;
+   const struct landlock_ruleset *dom_parent, *dom_child;
+
+   rcu_read_lock();
+   dom_parent = landlock_get_task_domain(parent);
+   dom_child = landlock_get_task_domain(child);
+   is_scoped = domain_scope_le(dom_parent, dom_child);
+   rcu_read_unlock();
+   return is_scoped;
+}
+
+static int task_ptrace(const struct task_struct *const parent,
+   const struct task_struct *const child)
+{
+   /* Quick return for non-landlocked tasks. */
+   if (!landlocked(parent))
+   return 0;
+   if (task_is_scoped(parent, child))
+   return 0;
+   return -EPERM;
+}
+
+/**
+ * hook_ptrace_access_check - Determines whether the current process may access
+ *   another
+ *
+ * @child: Process to be accessed.
+ * @mode: Mode of attachment.
+ *
+ * If the current task has Landlock rules, then the child must have at least
+ * the same rules.  Else denied.
+ *
+ * Determines whether a process may access another, returning 0 if permission
+ * granted, -errno if denied.
+ */
+static int hook_ptrace_access_check(struct task_struct *const child,
+   const unsigned int mode)
+{
+   return task_ptrace(current, child);
+}
+
+/**
+ * hook_ptrace_traceme - Determines whether another process may trace the
+ *  current one
+ *
+ * @parent: Task proposed to be the tracer.
+ *
+ * If the parent has Landlock rules, then the current task must have the same
+ * or more rules.  Else

Re: [PATCH v8] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-12-09 Thread Ashish Kalra
On Wed, Dec 09, 2020 at 06:51:05PM +0100, Borislav Petkov wrote:
> On Wed, Dec 09, 2020 at 01:19:46PM +, Ashish Kalra wrote:
> > reserve_crashkernel() calls swiotlb_size_or_default() to get SWIOTLB
> ...
> 
> Thanks for explaining.
> 
> > There is a need to introduce an architecture specific callback
> > for swiotlb_adjust() because of the following reason :
> 
> So what your version currently does is:
> 
> 1. from arch code, call generic code - swiotlb_adjust
> 
> 2. in generic code, call back into arch code - arch_swiotlb_adjust
> 
> But that's twice the work needed to get you where you wanna go.
> 
> What you wanna do is, from arch code, call into swiotlb generic code.
> That's it, no more.
> 
> Just like mem_encrypt.c calls swiotlb_update_mem_attributes(), for
> example.
> 
> And other architectures can simply do the same thing and you have it all
> solved and other architectures don't even need to refactor - they simply
> copy what x86 does.
> 
> IOW, something like this:
> 

This should work, but i am concerned about making IO_TLB_DEFAULT_SIZE
(which is pretty much private to generic swiotlb code) to be visible
externally, i don't know if there are any concerns with that ?

Thanks,
Ashish

> ---
> diff --git a/arch/x86/include/asm/mem_encrypt.h 
> b/arch/x86/include/asm/mem_encrypt.h
> index 2f62bbdd9d12..31c4df123aa0 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -37,6 +37,7 @@ void __init sme_map_bootdata(char *real_mode_data);
>  void __init sme_unmap_bootdata(char *real_mode_data);
>  
>  void __init sme_early_init(void);
> +void __init sev_setup_arch(void);
>  
>  void __init sme_encrypt_kernel(struct boot_params *bp);
>  void __init sme_enable(struct boot_params *bp);
> @@ -69,6 +70,7 @@ static inline void __init sme_map_bootdata(char 
> *real_mode_data) { }
>  static inline void __init sme_unmap_bootdata(char *real_mode_data) { }
>  
>  static inline void __init sme_early_init(void) { }
> +static inline void __init sev_setup_arch(void) { }
>  
>  static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
>  static inline void __init sme_enable(struct boot_params *bp) { }
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index a23130c86bdd..740f3bdb3f61 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1049,6 +1049,12 @@ void __init setup_arch(char **cmdline_p)
>   memblock_set_current_limit(ISA_END_ADDRESS);
>   e820__memblock_setup();
>  
> + /*
> +  * Needs to run after memblock setup because it needs the physical
> +  * memory size.
> +  */
> + sev_setup_arch();
> +
>   reserve_bios_regions();
>  
>   efi_fake_memmap();
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index bc0833713be9..f3db85673eae 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -198,6 +198,37 @@ void __init sme_early_init(void)
>   swiotlb_force = SWIOTLB_FORCE;
>  }
>  
> +void __init sev_setup_arch(void)
> +{
> + phys_addr_t total_mem = memblock_phys_mem_size();
> + unsigned long size;
> +
> + if (!sev_active())
> + return;
> +
> + /*
> +  * For SEV, all DMA has to occur via shared/unencrypted pages.
> +  * SEV uses SWOTLB to make this happen without changing device
> +  * drivers. However, depending on the workload being run, the
> +  * default 64MB of SWIOTLB may not be enough and`SWIOTLB may
> +  * run out of buffers for DMA, resulting in I/O errors and/or
> +  * performance degradation especially with high I/O workloads.
> +  *
> +  * Adjust the default size of SWIOTLB for SEV guests using
> +  * a percentage of guest memory for SWIOTLB buffers.
> +  * Also as the SWIOTLB bounce buffer memory is allocated
> +  * from low memory, ensure that the adjusted size is within
> +  * the limits of low available memory.
> +  *
> +  * The percentage of guest memory used here for SWIOTLB buffers
> +  * is more of an approximation of the static adjustment which
> +  * 64MB for <1G, and ~128M to 256M for 1G-to-4G, i.e., the 6%
> +  */
> + size = total_mem * 6 / 100;
> + size = clamp_val(size, IO_TLB_DEFAULT_SIZE, SZ_1G);
> + swiotlb_adjust_size(size);
> +}
> +
>  static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
>  {
>   pgprot_t old_prot, new_prot;
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index fbdc65782195..7aa94e2f99c6 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -30,6 +30,9 @@ enum swiotlb_force {
>   */
>  #define IO_TLB_SHIFT 11
>  
> +/* default to 64MB */
> +#define IO_TLB_DEFAULT_SIZE (64UL<<20)
> +
>  extern void swiotlb_init(int verbose);
>  int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
>  extern unsigned long swiotlb_nr_tbl(void);
> @@ -78,6 +81,7 @@ void __init swiotlb_exit(void);
>  unsigned

Re: [PATCH] dt-bindings: usb: Add new compatible string for AM64 SoC

2020-12-09 Thread Rob Herring
On Wed, Dec 09, 2020 at 10:27:32PM +0530, Aswath Govindraju wrote:
> Add compatible string in j721e-usb binding file as similar USB subsystem
> is present in AM64.
> 
> Signed-off-by: Aswath Govindraju 
> Acked-by: Roger Quadros 
> ---
>  Documentation/devicetree/bindings/usb/ti,j721e-usb.yaml | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/usb/ti,j721e-usb.yaml 
> b/Documentation/devicetree/bindings/usb/ti,j721e-usb.yaml
> index 388245b91a55..05d976bb06d0 100644
> --- a/Documentation/devicetree/bindings/usb/ti,j721e-usb.yaml
> +++ b/Documentation/devicetree/bindings/usb/ti,j721e-usb.yaml
> @@ -11,8 +11,11 @@ maintainers:
>  
>  properties:
>compatible:
> -items:
> -  - const: ti,j721e-usb
> +anyOf:
> +  - items:
> +   - const: ti,j721e-usb
> +  - items:
> +   - const: ti,am64-usb

Use 'enum'.

>  
>reg:
>  description: module registers
> -- 
> 2.17.1
> 


Re: [PATCH] spi: dt-bindings: clarify CS behavior for spi-cs-high and gpio descriptors

2020-12-09 Thread H. Nikolaus Schaller


> Am 09.12.2020 um 20:04 schrieb Sven Van Asbroeck :
> 
> On Wed, Dec 9, 2020 at 1:16 PM H. Nikolaus Schaller  
> wrote:
>> 
>> This is also what made me wonder if that is really intended because then
>> the whole discussion about the cs-gpio-flags and inversion and the fixes
>> would not have been needed. The current code and fixes are all about
>> not ignoring the flags...
> 
> The inversion you witnessed was a bug caused by spi client drivers that

The inversion we witnessed came from:

commit 6953c57ab172 "gpio: of: Handle SPI chipselect legacy bindings"

There, I read a verbal description of the table I want to formalize
with this patch, because natural language is not as precise as the language
of logic.

This has nothing to do with driver code, which remained and remains unchanged
for long time.

> 
>> Secondly, please imagine some reader of a device tree who finds
>> 
>>   cs-gpios = <&gpio 7 ACTIVE_LOW>;
>>   spi-cs-high;
> 
> That reader looks at the rules, sees that:
> - the ACTIVE_LOW is ignored,
> - presence of spi-cs-high means active-high
> and concludes this chip-select is active-high.

This misses information what the reader should do to resolve the
obviously missing beauty of the DT.

a) remove spi-cs-high;
b) change to ACTIVE_HIGH

Both appear valid in first place. But one is preferred. This is
again nowhere documented if you simplify the table.




Re: [PATCH] dt-bindings: usb: Add new compatible string for AM64 SoC

2020-12-09 Thread Rob Herring
On Wed, 09 Dec 2020 22:27:32 +0530, Aswath Govindraju wrote:
> Add compatible string in j721e-usb binding file as similar USB subsystem
> is present in AM64.
> 
> Signed-off-by: Aswath Govindraju 
> Acked-by: Roger Quadros 
> ---
>  Documentation/devicetree/bindings/usb/ti,j721e-usb.yaml | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 


My bot found errors running 'make dt_binding_check' on your patch:

yamllint warnings/errors:
./Documentation/devicetree/bindings/usb/ti,j721e-usb.yaml:16:1: [error] syntax 
error: found character '\t' that cannot start any token (syntax)

dtschema/dtc warnings/errors:
Traceback (most recent call last):
  File "/usr/local/bin/dt-extract-example", line 45, in 
binding = yaml.load(open(args.yamlfile, encoding='utf-8').read())
  File "/usr/local/lib/python3.8/dist-packages/ruamel/yaml/main.py", line 343, 
in load
return constructor.get_single_data()
  File "/usr/local/lib/python3.8/dist-packages/ruamel/yaml/constructor.py", 
line 111, in get_single_data
node = self.composer.get_single_node()
  File "_ruamel_yaml.pyx", line 706, in _ruamel_yaml.CParser.get_single_node
  File "_ruamel_yaml.pyx", line 724, in _ruamel_yaml.CParser._compose_document
  File "_ruamel_yaml.pyx", line 775, in _ruamel_yaml.CParser._compose_node
  File "_ruamel_yaml.pyx", line 889, in 
_ruamel_yaml.CParser._compose_mapping_node
  File "_ruamel_yaml.pyx", line 775, in _ruamel_yaml.CParser._compose_node
  File "_ruamel_yaml.pyx", line 889, in 
_ruamel_yaml.CParser._compose_mapping_node
  File "_ruamel_yaml.pyx", line 775, in _ruamel_yaml.CParser._compose_node
  File "_ruamel_yaml.pyx", line 889, in 
_ruamel_yaml.CParser._compose_mapping_node
  File "_ruamel_yaml.pyx", line 773, in _ruamel_yaml.CParser._compose_node
  File "_ruamel_yaml.pyx", line 850, in 
_ruamel_yaml.CParser._compose_sequence_node
  File "_ruamel_yaml.pyx", line 775, in _ruamel_yaml.CParser._compose_node
  File "_ruamel_yaml.pyx", line 889, in 
_ruamel_yaml.CParser._compose_mapping_node
  File "_ruamel_yaml.pyx", line 731, in _ruamel_yaml.CParser._compose_node
  File "_ruamel_yaml.pyx", line 904, in _ruamel_yaml.CParser._parse_next_event
ruamel.yaml.scanner.ScannerError: while scanning for the next token
found character that cannot start any token
  in "", line 16, column 1
make[1]: *** [Documentation/devicetree/bindings/Makefile:20: 
Documentation/devicetree/bindings/usb/ti,j721e-usb.example.dts] Error 1
make[1]: *** Deleting file 
'Documentation/devicetree/bindings/usb/ti,j721e-usb.example.dts'
make[1]: *** Waiting for unfinished jobs
make[1]: *** [Documentation/devicetree/bindings/Makefile:59: 
Documentation/devicetree/bindings/processed-schema-examples.json] Error 123
make: *** [Makefile:1364: dt_binding_check] Error 2


See https://patchwork.ozlabs.org/patch/1413512

The base for the patch is generally the last rc1. Any dependencies
should be noted.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit.



[PATCH v26 05/12] LSM: Infrastructure management of the superblock

2020-12-09 Thread Mickaël Salaün
From: Casey Schaufler 

Move management of the superblock->sb_security blob out of the
individual security modules and into the security infrastructure.
Instead of allocating the blobs from within the modules, the modules
tell the infrastructure how much space is required, and the space is
allocated there.

Cc: Kees Cook 
Cc: John Johansen 
Signed-off-by: Casey Schaufler 
Signed-off-by: Mickaël Salaün 
Reviewed-by: Stephen Smalley 
---

Changes since v20:
* Remove all Reviewed-by except Stephen Smalley:
  
https://lore.kernel.org/lkml/CAEjxPJ7ARJO57MBW66=xsBzMMRb=9ulgqock5eskhcaivmx...@mail.gmail.com/
* Cosmetic fix in the commit message.

Changes since v17:
* Rebase the original LSM stacking patch from v5.3 to v5.7: I fixed some
  diff conflicts caused by code moves and function renames in
  selinux/include/objsec.h and selinux/hooks.c .  I checked that it
  builds but I didn't test the changes for SELinux nor SMACK.
  https://lore.kernel.org/r/20190829232935.7099-2-ca...@schaufler-ca.com
---
 include/linux/lsm_hooks.h |  1 +
 security/security.c   | 46 
 security/selinux/hooks.c  | 58 ---
 security/selinux/include/objsec.h |  6 
 security/selinux/ss/services.c|  3 +-
 security/smack/smack.h|  6 
 security/smack/smack_lsm.c| 35 +--
 7 files changed, 85 insertions(+), 70 deletions(-)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index c503f7ab8afb..ff0f03a45c56 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1563,6 +1563,7 @@ struct lsm_blob_sizes {
int lbs_cred;
int lbs_file;
int lbs_inode;
+   int lbs_superblock;
int lbs_ipc;
int lbs_msg_msg;
int lbs_task;
diff --git a/security/security.c b/security/security.c
index a28045dc9e7f..4ffd6c3af9d7 100644
--- a/security/security.c
+++ b/security/security.c
@@ -202,6 +202,7 @@ static void __init lsm_set_blob_sizes(struct lsm_blob_sizes 
*needed)
lsm_set_blob_size(&needed->lbs_inode, &blob_sizes.lbs_inode);
lsm_set_blob_size(&needed->lbs_ipc, &blob_sizes.lbs_ipc);
lsm_set_blob_size(&needed->lbs_msg_msg, &blob_sizes.lbs_msg_msg);
+   lsm_set_blob_size(&needed->lbs_superblock, &blob_sizes.lbs_superblock);
lsm_set_blob_size(&needed->lbs_task, &blob_sizes.lbs_task);
 }
 
@@ -332,12 +333,13 @@ static void __init ordered_lsm_init(void)
for (lsm = ordered_lsms; *lsm; lsm++)
prepare_lsm(*lsm);
 
-   init_debug("cred blob size = %d\n", blob_sizes.lbs_cred);
-   init_debug("file blob size = %d\n", blob_sizes.lbs_file);
-   init_debug("inode blob size= %d\n", blob_sizes.lbs_inode);
-   init_debug("ipc blob size  = %d\n", blob_sizes.lbs_ipc);
-   init_debug("msg_msg blob size  = %d\n", blob_sizes.lbs_msg_msg);
-   init_debug("task blob size = %d\n", blob_sizes.lbs_task);
+   init_debug("cred blob size   = %d\n", blob_sizes.lbs_cred);
+   init_debug("file blob size   = %d\n", blob_sizes.lbs_file);
+   init_debug("inode blob size  = %d\n", blob_sizes.lbs_inode);
+   init_debug("ipc blob size= %d\n", blob_sizes.lbs_ipc);
+   init_debug("msg_msg blob size= %d\n", blob_sizes.lbs_msg_msg);
+   init_debug("superblock blob size = %d\n", blob_sizes.lbs_superblock);
+   init_debug("task blob size   = %d\n", blob_sizes.lbs_task);
 
/*
 * Create any kmem_caches needed for blobs
@@ -669,6 +671,27 @@ static void __init lsm_early_task(struct task_struct *task)
panic("%s: Early task alloc failed.\n", __func__);
 }
 
+/**
+ * lsm_superblock_alloc - allocate a composite superblock blob
+ * @sb: the superblock that needs a blob
+ *
+ * Allocate the superblock blob for all the modules
+ *
+ * Returns 0, or -ENOMEM if memory can't be allocated.
+ */
+static int lsm_superblock_alloc(struct super_block *sb)
+{
+   if (blob_sizes.lbs_superblock == 0) {
+   sb->s_security = NULL;
+   return 0;
+   }
+
+   sb->s_security = kzalloc(blob_sizes.lbs_superblock, GFP_KERNEL);
+   if (sb->s_security == NULL)
+   return -ENOMEM;
+   return 0;
+}
+
 /*
  * The default value of the LSM hook is defined in linux/lsm_hook_defs.h and
  * can be accessed with:
@@ -866,12 +889,21 @@ int security_fs_context_parse_param(struct fs_context 
*fc, struct fs_parameter *
 
 int security_sb_alloc(struct super_block *sb)
 {
-   return call_int_hook(sb_alloc_security, 0, sb);
+   int rc = lsm_superblock_alloc(sb);
+
+   if (unlikely(rc))
+   return rc;
+   rc = call_int_hook(sb_alloc_security, 0, sb);
+   if (unlikely(rc))
+   security_sb_free(sb);
+   return rc;
 }
 
 void security_sb_free(struct super_block *sb)
 {
call_void_hook(sb_free_security, sb);
+   kfree(sb->s_s

[PATCH v26 08/12] landlock: Add syscall implementations

2020-12-09 Thread Mickaël Salaün
From: Mickaël Salaün 

These 3 system calls are designed to be used by unprivileged processes
to sandbox themselves:
* landlock_create_ruleset(2): Creates a ruleset and returns its file
  descriptor.
* landlock_add_rule(2): Adds a rule (e.g. file hierarchy access) to a
  ruleset, identified by the dedicated file descriptor.
* landlock_enforce_ruleset_current(2): Enforces a ruleset on the current
  thread and its future children (similar to seccomp).  This syscall has
  the same usage restrictions as seccomp(2): the caller must have the
  no_new_privs attribute set or have CAP_SYS_ADMIN in the current user
  namespace.

All these syscalls have a "flags" argument (not currently used) to
enable extensibility.

Here are the motivations for these new syscalls:
* A sandboxed process may not have access to file systems, including
  /dev, /sys or /proc, but it should still be able to add more
  restrictions to itself.
* Neither prctl(2) nor seccomp(2) (which was used in a previous version)
  fit well with the current definition of a Landlock security policy.

All passed structs (attributes) are checked at build time to ensure that
they don't contain holes and that they are aligned the same way for each
architecture.

See the user and kernel documentation for more details (provided by a
following commit):
* Documentation/userspace-api/landlock.rst
* Documentation/security/landlock.rst

Cc: Arnd Bergmann 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Signed-off-by: Mickaël Salaün 
Reviewed-by: Jann Horn 
---

Changes since v25:
* Revert build_check_abi() as non-inline to trigger a warning if it is
  not called.
* Use the new limit names.

Changes since v24:
* Add Reviewed-by: Jann Horn 
* Set build_check_abi() as inline.

Changes since v23:
* Rewrite get_ruleset_from_fd() to please the 0-DAY CI Kernel Test
  Service that reported an uninitialized variable (false positive):
  
https://lore.kernel.org/linux-security-module/202011101854.zgbwwusk-...@intel.com/
  Anyway, it is cleaner like this.
* Add a comment about E2BIG which can be returned by
  landlock_enforce_ruleset_current(2) when there is no more room for
  another stacked ruleset (i.e. domain).

Changes since v22:
* Replace security_capable() with ns_capable_noaudit() (suggested by
  Jann Horn) and explicitly return EPERM.
* Fix landlock_enforce_ruleset_current(2)'s out_put_creds (spotted by
  Jann Horn).
* Add __always_inline to copy_min_struct_from_user() to make its
  BUILD_BUG_ON() checks reliable (suggested by Jann Horn).
* Simplify path assignation in get_path_from_fd() (suggested by Jann
  Horn).
* Fix spelling (spotted by Jann Horn).

Changes since v21:
* Fix and improve comments.

Changes since v20:
* Remove two arguments to landlock_enforce_ruleset(2) (requested by Arnd
  Bergmann) and rename it to landlock_enforce_ruleset_current(2): remove
  the enum landlock_target_type and the target file descriptor (not used
  for now).  A ruleset can only be enforced on the current thread.
* Remove the size argument in landlock_add_rule() (requested by Arnd
  Bergmann).
* Remove landlock_get_features(2) (suggested by Arnd Bergmann).
* Simplify and rename copy_struct_if_any_from_user() to
  copy_min_struct_from_user().
* Rename "options" to "flags" to allign with current syscalls.
* Rename some types and variables in a more consistent way.
* Fix missing type declarations in syscalls.h .

Changes since v19:
* Replace the landlock(2) syscall with 4 syscalls (one for each
  command): landlock_get_features(2), landlock_create_ruleset(2),
  landlock_add_rule(2) and landlock_enforce_ruleset(2) (suggested by
  Arnd Bergmann).
  https://lore.kernel.org/lkml/56d15841-e2c1-2d58-59b8-3a6a09b23...@digikod.net/
* Return EOPNOTSUPP (instead of ENOPKG) when Landlock is disabled.
* Add two new fields to landlock_attr_features to fit with the new
  syscalls: last_rule_type and last_target_type.  This enable to easily
  identify which types are supported.
* Pack landlock_attr_path_beneath struct because of the removed
  ruleset_fd.
* Update documentation and fix spelling.

Changes since v18:
* Remove useless include.
* Remove LLATTR_SIZE() which was only used to shorten lines. Cf. commit
  bdc48fa11e46 ("checkpatch/coding-style: deprecate 80-column warning").

Changes since v17:
* Synchronize syscall declaration.
* Fix comment.

Changes since v16:
* Add a size_attr_features field to struct landlock_attr_features for
  self-introspection, and move the access_fs field to be more
  consistent.
* Replace __aligned_u64 types of attribute fields with __u16, __s32,
  __u32 and __u64, and check at build time that these structures does
  not contain hole and that they are aligned the same way (8-bits) on
  all architectures.  This shrinks the size of the userspace ABI, which
  may be appreciated especially for struct landlock_attr_features which
  could grow a lot in the future.  For instance, struct
  landlock_attr_features shrinks from 72 bytes to 32 bytes.  This change
  also 

[PATCH v26 10/12] selftests/landlock: Add user space tests

2020-12-09 Thread Mickaël Salaün
From: Mickaël Salaün 

Test all Landlock system calls, ptrace hooks semantic and filesystem
access-control.

Test coverage for security/landlock/ is 94.1% of lines.  The code not
covered only deals with internal kernel errors (e.g. memory allocation)
and race conditions.

Cc: James Morris 
Cc: Jann Horn 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Cc: Shuah Khan 
Signed-off-by: Mickaël Salaün 
Reviewed-by: Vincent Dagonneau 
---

Changes since v25:
* Add a new test to check that Landlock ruleset file descriptors
  received through UNIX sockets are usable.  Contributed by Vincent
  Dagonneau.
* Improve hierarchy.trace tests to not hang when testing on a kernel
  that don't support Landlock.
* Replace EXPECT_EQ(0, close(*)) with ASSERT_EQ(0, close(*)).
* Guard WEXITSTATUS() use with WIFEXITED() in ptrace tests.
* Use pipe2(2) with O_CLOEXEC.
* Remove useless errno set for syscall wrappers, and related useless checks.
* Rename test.
* Add Microsoft copyright for layout1.interleaved_masked_accesses .

Changes since v24:
* Revert the ruleset_overlap test from v24: check that access righs are
  ORed together when building a ruleset.  Keep the extra checks
  added with v24.
* Revert inherit_subset test from v24: use the automatic ORing of
  access rights for the same file.
* Update interleaved_masked_accesses test (added with v24) to stop when
  all layers allowed at least one time an inode in the path walk.
* Extend interleaved_masked_accesses test with new tricky interleaved
  layers which would not work as intended with (allow or deny) bitmask
  layer implementations.
* Simplify and rename test_path*() to test_open*() to make easier the
  diagnostic in case of unattended errors.
* Replace most call to open(2) with a call to test_open(), which
  reduces the number of lines and make tests more readable.
* Fix erroneous check in inherit_superset.

Changes since v23:
* Add an interleaved_masked_accesses test to check corner cases for
  interleaved layered ruleset combinations.
* Update ruleset_overlap and inherit_subset tests to follow the new
  intersect access rights behavior.
* Extend the inherit_superset test to check that layers are handled as
  expected in the superset use case, which complete the inherit_subset
  checks.
* Fix comment (spotted by Vincent Dagonneau).

Changes since v22:
* Extend and add a new test to better check rules applied to the root
  directory: rule_over_root_allow_then_deny, rule_over_root_deny.
* Change the signature of test_path*() to make the calls clearer.

Changes since v21:
* Remove layout1.chroot test and update layout1.unhandled_access to not
  rely on LANDLOCK_ACCESS_FS_CHROOT.
* Clean up comments.

Changes since v20:
* Update with new syscalls and type names.
* Use the full syscall interfaces: explicitely set the "flags" field to
  zero.
* Update the empty_path_beneath_attr test to check for EFAULT.
* Update and merge tests for the simplified copy_min_struct_from_user().
* Clean up makefile.
* Rename some types and variables in a more consistent way.

Changes since v19:
* Update with the new Landlock syscalls.
* Fix device creation.
* Check the new landlock_attr_features members: last_rule_type and
  last_target_type .
* Constify variables.

Changes since v18:
* Replace ruleset_rw.inval with layout1.inval to avoid inexistent test
  layout.
* Use the new FIXTURE_VARIANT for ptrace_test: makes the tests more
  readable and usable.
* Add ARRAY_SIZE() macro to please checkpatch.

Changes since v17:
* Add new test for mknod with a zero mode.
* Use memset(3) to initialize attr_features in base_test.

Changes since v16:
* Add new unpriv_enforce_without_no_new_privs test: check that ruleset
  enforcing is forbiden without no_new_privs and CAP_SYS_ADMIN.
* Drop capabilities when useful.
* Check the new size_attr_features field from struct
  landlock_attr_features.
* Update the empty_or_same_ruleset test to check complementary empty
  ruleset.
* Update base_test according to the new attribute structures and fix the
  inconsistent_attr test accordingly.
* Switch syscall attribute pointer and size arguments.
* Rename test files with a "_test" suffix.

Changes since v14:
* Add new tests:
  - superset: check new layer bitmask.
  - max_layers: check maximum number of layers.
  - release_inodes: check that umount work well.
  - empty_or_same_ruleset.
  - inconsistent_attr: checks copy_to_user limits.
  - in ruleset_rw.inval to check ruleset FD.
  - proc_unlinked_file: check file access through /proc/self/fd .
  - file_access_rights: check that a file can only get consistent access
rights.
  - unpriv: check that NO_NEW_PRIVS or CAP_SYS_ADMIN is required.
  - check pipe access through /proc/self/fd .
  - check move_mount(2).
  - check ruleset file descriptor properties.
  - proc_nsfs: extend to check that internal filesystems (e.g. nsfs) are
allowed.
* Double-check read and write effective actions.
* Fix potential desynchronization between the kernel sources and
  installed headers by overri

[PATCH v26 12/12] landlock: Add user and kernel documentation

2020-12-09 Thread Mickaël Salaün
From: Mickaël Salaün 

This documentation can be built with the Sphinx framework.

Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Signed-off-by: Mickaël Salaün 
Reviewed-by: Vincent Dagonneau 
Reviewed-by: Jann Horn 
---

Changes since v24:
* Add Reviewed-by: Jann Horn 
* Add a paragraph to explain how the ruleset layers work.
* Bump date.

Changes since v23:
* Explain limitations for the maximum number of stacked ruleset, and the
  memory usage restrictions.

Changes since v22:
* Fix spelling and remove obsolete sentence (spotted by Jann Horn).
* Bump date.

Changes since v21:
* Move the user space documentation to userspace-api/landlock.rst and
  the kernel documentation to security/landlock.rst .
* Add license headers.
* Add last update dates.
* Update MAINTAINERS file.
* Add (back) links to git.kernel.org .
* Fix spelling.

Changes since v20:
* Update examples and documentation with the new syscalls.

Changes since v19:
* Update examples and documentation with the new syscalls.

Changes since v15:
* Add current limitations.

Changes since v14:
* Fix spelling (contributed by Randy Dunlap).
* Extend documentation about inheritance and explain layer levels.
* Remove the use of now-removed access rights.
* Use GitHub links.
* Improve kernel documentation.
* Add section for tests.
* Update example.

Changes since v13:
* Rewrote the documentation according to the major revamp.

Previous changes:
https://lore.kernel.org/lkml/20191104172146.30797-8-...@digikod.net/
---
 Documentation/security/index.rst |   1 +
 Documentation/security/landlock.rst  |  79 +++
 Documentation/userspace-api/index.rst|   1 +
 Documentation/userspace-api/landlock.rst | 280 +++
 MAINTAINERS  |   2 +
 5 files changed, 363 insertions(+)
 create mode 100644 Documentation/security/landlock.rst
 create mode 100644 Documentation/userspace-api/landlock.rst

diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
index 8129405eb2cc..16335de04e8c 100644
--- a/Documentation/security/index.rst
+++ b/Documentation/security/index.rst
@@ -16,3 +16,4 @@ Security Documentation
siphash
tpm/index
digsig
+   landlock
diff --git a/Documentation/security/landlock.rst 
b/Documentation/security/landlock.rst
new file mode 100644
index ..92317b2ad732
--- /dev/null
+++ b/Documentation/security/landlock.rst
@@ -0,0 +1,79 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. Copyright © 2017-2020 Mickaël Salaün 
+.. Copyright © 2019-2020 ANSSI
+
+==
+Landlock LSM: kernel documentation
+==
+
+:Author: Mickaël Salaün
+:Date: December 2020
+
+Landlock's goal is to create scoped access-control (i.e. sandboxing).  To
+harden a whole system, this feature should be available to any process,
+including unprivileged ones.  Because such process may be compromised or
+backdoored (i.e. untrusted), Landlock's features must be safe to use from the
+kernel and other processes point of view.  Landlock's interface must therefore
+expose a minimal attack surface.
+
+Landlock is designed to be usable by unprivileged processes while following the
+system security policy enforced by other access control mechanisms (e.g. DAC,
+LSM).  Indeed, a Landlock rule shall not interfere with other access-controls
+enforced on the system, only add more restrictions.
+
+Any user can enforce Landlock rulesets on their processes.  They are merged and
+evaluated according to the inherited ones in a way that ensures that only more
+constraints can be added.
+
+User space documentation can be found here: :doc:`/userspace-api/landlock`.
+
+Guiding principles for safe access controls
+===
+
+* A Landlock rule shall be focused on access control on kernel objects instead
+  of syscall filtering (i.e. syscall arguments), which is the purpose of
+  seccomp-bpf.
+* To avoid multiple kinds of side-channel attacks (e.g. leak of security
+  policies, CPU-based attacks), Landlock rules shall not be able to
+  programmatically communicate with user space.
+* Kernel access check shall not slow down access request from unsandboxed
+  processes.
+* Computation related to Landlock operations (e.g. enforcing a ruleset) shall
+  only impact the processes requesting them.
+
+Tests
+=
+
+Userspace tests for backward compatibility, ptrace restrictions and filesystem
+support can be found here: `tools/testing/selftests/landlock/`_.
+
+Kernel structures
+=
+
+Object
+--
+
+.. kernel-doc:: security/landlock/object.h
+:identifiers:
+
+Ruleset and domain
+--
+
+A domain is a read-only ruleset tied to a set of subjects (i.e. tasks'
+credentials).  Each time a ruleset is enforced on a task, the current domain is
+duplicated and the ruleset is imported as a new layer of rules in the new
+domain.  Indeed, once in a domain, each rule is tied to a layer level.  To
+grant acc

Does SCM_RIGHTS limit on fd array apply to a particular cmsghdr or the whole ancillary data?

2020-12-09 Thread Wolfgang Draxinger
Hi,

in the documentation on SCM_RIGHTS the maximum number of file
descriptors that can be passed is described as (release 5.09 of the
Linux man-pages project):

| SCM_RIGHTS
|   Send or receive a set of open file descriptors from another
|   process. The data portion contains an integer array of the file
|   descriptors.
| (...)
|   The kernel constant SCM_MAX_FD defines a limit on the number of
|   file descriptors in the array.  Attempting to send an array
|   larger than this limit causes sendmsg(2) to fail with
the error  |   EINVAL.

In my reading this might be interpreted as SCM_MAX_FD per SCM_RIGHTS
cmsghdr, xor SCM_MAX_FD per sendmsg. It would be good to know, which one
it is. So which is the proper strategy if I have to pass more than
SCM_MAX_FD file descriptors between processes?


Cheers,
Wolfgang


[PATCH v26 11/12] samples/landlock: Add a sandbox manager example

2020-12-09 Thread Mickaël Salaün
From: Mickaël Salaün 

Add a basic sandbox tool to launch a command which can only access a
whitelist of file hierarchies in a read-only or read-write way.

Cc: James Morris 
Cc: Jann Horn 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Signed-off-by: Mickaël Salaün 
---

Changes since v25:
* Remove useless errno set in the syscall wrappers.
* Cosmetic variable renames.

Changes since v23:
* Re-add hints to help users understand the required kernel
  configuration.  This was removed with the removal of
  landlock_get_features(2).

Changes since v21:
* Remove LANDLOCK_ACCESS_FS_CHROOT.
* Clean up help.

Changes since v20:
* Update with new syscalls and type names.
* Update errno check for EOPNOTSUPP.
* Use the full syscall interfaces: explicitely set the "flags" field to
  zero.

Changes since v19:
* Update with the new Landlock syscalls.
* Comply with commit 5f2fb52fac15 ("kbuild: rename hostprogs-y/always to
  hostprogs/always-y").

Changes since v16:
* Switch syscall attribute pointer and size arguments.

Changes since v15:
* Update access right names.
* Properly assign access right to files according to the new related
  syscall restriction.
* Replace "select" with "depends on" HEADERS_INSTALL (suggested by Randy
  Dunlap).

Changes since v14:
* Fix Kconfig dependency.
* Remove access rights that may be required for FD-only requests:
  mmap, truncate, getattr, lock, chmod, chown, chgrp, ioctl.
* Fix useless hardcoded syscall number.
* Use execvpe().
* Follow symlinks.
* Extend help with common file paths.
* Constify variables.
* Clean up comments.
* Improve error message.

Changes since v11:
* Add back the filesystem sandbox manager and update it to work with the
  new Landlock syscall.

Previous changes:
https://lore.kernel.org/lkml/20190721213116.23476-9-...@digikod.net/
---
 samples/Kconfig  |   7 ++
 samples/Makefile |   1 +
 samples/landlock/.gitignore  |   1 +
 samples/landlock/Makefile|  15 +++
 samples/landlock/sandboxer.c | 233 +++
 5 files changed, 257 insertions(+)
 create mode 100644 samples/landlock/.gitignore
 create mode 100644 samples/landlock/Makefile
 create mode 100644 samples/landlock/sandboxer.c

diff --git a/samples/Kconfig b/samples/Kconfig
index 0ed6e4d71d87..e6129496ced5 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -124,6 +124,13 @@ config SAMPLE_HIDRAW
bool "hidraw sample"
depends on CC_CAN_LINK && HEADERS_INSTALL
 
+config SAMPLE_LANDLOCK
+   bool "Build Landlock sample code"
+   depends on HEADERS_INSTALL
+   help
+ Build a simple Landlock sandbox manager able to launch a process
+ restricted by a user-defined filesystem access control.
+
 config SAMPLE_PIDFD
bool "pidfd sample"
depends on CC_CAN_LINK && HEADERS_INSTALL
diff --git a/samples/Makefile b/samples/Makefile
index c3392a595e4b..087e0988ccc5 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -11,6 +11,7 @@ obj-$(CONFIG_SAMPLE_KDB)  += kdb/
 obj-$(CONFIG_SAMPLE_KFIFO) += kfifo/
 obj-$(CONFIG_SAMPLE_KOBJECT)   += kobject/
 obj-$(CONFIG_SAMPLE_KPROBES)   += kprobes/
+subdir-$(CONFIG_SAMPLE_LANDLOCK)   += landlock
 obj-$(CONFIG_SAMPLE_LIVEPATCH) += livepatch/
 subdir-$(CONFIG_SAMPLE_PIDFD)  += pidfd
 obj-$(CONFIG_SAMPLE_QMI_CLIENT)+= qmi/
diff --git a/samples/landlock/.gitignore b/samples/landlock/.gitignore
new file mode 100644
index ..f43668b2d318
--- /dev/null
+++ b/samples/landlock/.gitignore
@@ -0,0 +1 @@
+/sandboxer
diff --git a/samples/landlock/Makefile b/samples/landlock/Makefile
new file mode 100644
index ..21eda5774948
--- /dev/null
+++ b/samples/landlock/Makefile
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: BSD-3-Clause
+
+hostprogs := sandboxer
+
+always-y := $(hostprogs)
+
+KBUILD_HOSTCFLAGS += -I$(objtree)/usr/include
+
+.PHONY: all clean
+
+all:
+   $(MAKE) -C ../.. samples/landlock/
+
+clean:
+   $(MAKE) -C ../.. M=samples/landlock/ clean
diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c
new file mode 100644
index ..82b2738b216c
--- /dev/null
+++ b/samples/landlock/sandboxer.c
@@ -0,0 +1,233 @@
+// SPDX-License-Identifier: BSD-3-Clause
+/*
+ * Simple Landlock sandbox manager able to launch a process restricted by a
+ * user-defined filesystem access control.
+ *
+ * Copyright © 2017-2020 Mickaël Salaün 
+ * Copyright © 2020 ANSSI
+ */
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifndef landlock_create_ruleset
+static inline int landlock_create_ruleset(
+   const struct landlock_ruleset_attr *const attr,
+   const size_t size, const __u32 flags)
+{
+   return syscall(__NR_landlock_create_ruleset, attr, size, flags);
+}
+#endif
+
+#ifndef landlock_add_rule
+static inline int landlock_add_rule(const int rule

[PATCH v26 07/12] landlock: Support filesystem access-control

2020-12-09 Thread Mickaël Salaün
From: Mickaël Salaün 

Thanks to the Landlock objects and ruleset, it is possible to identify
inodes according to a process's domain.  To enable an unprivileged
process to express a file hierarchy, it first needs to open a directory
(or a file) and pass this file descriptor to the kernel through
landlock_add_rule(2).  When checking if a file access request is
allowed, we walk from the requested dentry to the real root, following
the different mount layers.  The access to each "tagged" inodes are
collected according to their rule layer level, and ANDed to create
access to the requested file hierarchy.  This makes possible to identify
a lot of files without tagging every inodes nor modifying the
filesystem, while still following the view and understanding the user
has from the filesystem.

Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
keep the same struct inodes for the same inodes whereas these inodes are
in use.

This commit adds a minimal set of supported filesystem access-control
which doesn't enable to restrict all file-related actions.  This is the
result of multiple discussions to minimize the code of Landlock to ease
review.  Thanks to the Landlock design, extending this access-control
without breaking user space will not be a problem.  Moreover, seccomp
filters can be used to restrict the use of syscall families which may
not be currently handled by Landlock.

Cc: Al Viro 
Cc: Anton Ivanov 
Cc: James Morris 
Cc: Jann Horn 
Cc: Jeff Dike 
Cc: Kees Cook 
Cc: Richard Weinberger 
Cc: Serge E. Hallyn 
Signed-off-by: Mickaël Salaün 
---

Changes since v25:
* Move build_check_layer() to ruleset.c, and add built-time checks for
  the fs_access_mask and access variables according to
  _LANDLOCK_ACCESS_FS_MASK.
* Move limits to a dedicated file and rename them:
  _LANDLOCK_ACCESS_FS_LAST and _LANDLOCK_ACCESS_FS_MASK.
* Set build_check_layer() as non-inline to trigger a warning if it is
  not called.
* Use BITS_PER_TYPE() macro.
* Rename function to landlock_add_fs_hooks().
* Cosmetic variable renames.

Changes since v24:
* Use the new struct landlock_rule and landlock_layer to not mix
  accesses from different layers.  Revert "Enforce deterministic
  interleaved path rules" from v24, and fix the layer check.  This
  enables to follow a sane semantic: an access is granted if, for each
  policy layer, at least one rule encountered on the pathwalk grants the
  access, regardless of their position in the layer stack (suggested by
  Jann Horn).  See layout1.interleaved_masked_accesses tests from
  tools/testing/selftests/landlock/fs_test.c for corner cases.
* Add build-time checks for layers.
* Use the new landlock_insert_rule() API.

Changes since v23:
* Enforce deterministic interleaved path rules.  To have consistent
  layered rules, granting access to a path implies that all accesses
  tied to inodes, from the requested file to the real root, must be
  checked.  Otherwise, stacked rules may result to overzealous
  restrictions.  By excluding the ability to add exceptions in the same
  layer (e.g. /a allowed, /a/b denied, and /a/b/c allowed), we get
  deterministic interleaved path rules.  This removes an optimization
  which could be replaced by a proper cache mechanism.  This also
  further simplifies and explain check_access_path_continue().
* Fix memory allocation error handling in landlock_create_object()
  calls.  This prevent to inadvertently hold an inode.
* In get_inode_object(), improve comments, make code more readable and
  move kfree() call out of the lock window.
* Use the simplified landlock_insert_rule() API.

Changes since v22:
* Simplify check_access_path_continue() (suggested by Jann Horn).
* Remove prefetch() call for now (suggested by Jann Horn).
* Fix spelling and remove superfluous comment (spotted by Jann Horn).
* Cosmetic variable renaming.

Changes since v21:
* Rename ARCH_EPHEMERAL_STATES to ARCH_EPHEMERAL_INODES (suggested by
  James Morris).
* Remove the LANDLOCK_ACCESS_FS_CHROOT right because chroot(2) (which
  requires CAP_SYS_CHROOT) doesn't enable to bypass Landlock (as tests
  demonstrate it), and because it is often used by sandboxes, it would
  be counterproductive to forbid it.  This also reduces the code size.
* Clean up documentation.

Changes since v19:
* Fix spelling (spotted by Randy Dunlap).

Changes since v18:
* Remove useless include.
* Fix spelling.

Changes since v17:
* Replace landlock_release_inodes() with security_sb_delete() (requested
  by James Morris).
* Replace struct super_block->s_landlock_inode_refs with the LSM
  infrastructure management of the superblock (requested by James
  Morris).
* Fix mknod restriction with a zero mode (spotted by Vincent Dagonneau).
* Minimize executed code in path_mknod and file_open hooks when the
  current tasks is not sandboxed.
* Remove useless checks on the file pointer and inode in
  hook_file_open() .
* Constify domain pointers.
* Rename inode_landlock() to landlock_inode().
* Import include/uapi

Re: [PATCH v2 5/7] ubsan: Enable for all*config builds

2020-12-09 Thread Arnd Bergmann
On Wed, Dec 9, 2020 at 7:46 PM Kees Cook  wrote:
>
> On Thu, Dec 03, 2020 at 09:51:40AM +0100, Arnd Bergmann wrote:
> > On Thu, Dec 3, 2020 at 1:44 AM Kees Cook  wrote:
> > >
> > > With UBSAN_OBJECT_SIZE disabled for GCC, only UBSAN_ALIGNMENT remained
> > > a noisy UBSAN option. Disable it for COMPILE_TEST so the rest of UBSAN
> > > can be used for full all*config builds or other large combinations.
> > >
> > > Link: 
> > > https://lore.kernel.org/lkml/CAHk-=wgXW=ylxgn0qvpp-1w5gdd2pf1w-fqy15pokzovfik...@mail.gmail.com/
> > > Signed-off-by: Kees Cook 
> >
> > Have you checked if this has a notable impact on allmodconfig compile speed
> > with gcc or clang? I think I've seen significant increases in build times 
> > before
> > with this, but I don't remember the actual magnitude.
> >
> > Making it 20% slower would probably be ok, but making it twice as slow might
> > be too much.
>
> And for Clang, it's about 7m40s before and 8m30s after, so roughly 12% slower.

Ok, that doesn't sound too bad then.

   Arnd


Re: [PATCH v8 00/16] Add support for Clang LTO

2020-12-09 Thread Arnd Bergmann
On Wed, Dec 9, 2020 at 5:09 PM 'Sami Tolvanen' via Clang Built Linux
 wrote:
> On Tue, Dec 8, 2020 at 1:02 PM Arnd Bergmann  wrote:
> > On Tue, Dec 8, 2020 at 9:59 PM Arnd Bergmann  wrote:
> > >
> > > Attaching the config for "ld.lld: error: Never resolved function from
> > >   blockaddress (Producer: 'LLVM12.0.0' Reader: 'LLVM 12.0.0')"
> >
> > And here is a new one: "ld.lld: error: assignment to symbol
> > init_pg_end does not converge"
>
> Thanks for these. I can reproduce the "Never resolved function from
> blockaddress" issue with full LTO, but I couldn't reproduce this one
> with ToT Clang, and the config doesn't have LTO enabled:
>
> $ grep LTO 0x2824F594_defconfig
> CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y
>
> Is this the correct config file?

It is the right file, and so far this is the only defconfig on which I
see the "does not converge" error, so I don't have any other one.

I suspect this might be an issue in the version of lld that I have here
and unrelated to LTO, and I can confirm that I see the error
with LTO still disabled.

It seems to be completely random. I do see the bug on next-20201203
but not on a later one. I also tried bisecting through linux-next and
arrived at "lib: stackdepot: add support to configure STACK_HASH_SIZE",
which is almost certainly not related, other than just changing a few
symbols around.

  Arnd


Re: [GIT PULL] IOMMU fix for 5.10 (-final)

2020-12-09 Thread Jerry Snitselaar
On Wed, Dec 9, 2020 at 12:18 PM Linus Torvalds
 wrote:
>
> On Wed, Dec 9, 2020 at 11:12 AM Jerry Snitselaar  wrote:
> >
> > Since the field in the device table entry format expects it to be n
> > where there are 2^n entries in the table I guess it should be:
> >
> > #define DTE_IRQ_TABLE_LEN 9
> > #define MAX_IRQS_PER_TABLE (1 << DTE_IRQ_TABLE_LEN)
>
> No, that "DTE_IRQ_TABLE_LEN" is not the size shift - it's the size
> shift value in that DTE field, which is shifted up by 1.
>
> That's why the current code does that
>
>#define DTE_IRQ_TABLE_LEN   (9ULL << 1)
>
> there..
>
> Which was why I suggested that new #define that is the *actual* shift
> value, and then the DTE thing and the MAX_IRQS_PER_TABLE values would
> depend on that.
>
>Linus
>

Yes, when I read it my head was translating it as setting them both to
512 and then
I forgot that it gets shifted over 1. Which considering I was the once
who noticed the
original problem  of it still being 8 was a nice brain fart. This
should be fixed like you suggest.



Re: [GIT PULL] IOMMU fix for 5.10 (-final)

2020-12-09 Thread Linus Torvalds
On Wed, Dec 9, 2020 at 11:12 AM Jerry Snitselaar  wrote:
>
> Since the field in the device table entry format expects it to be n
> where there are 2^n entries in the table I guess it should be:
>
> #define DTE_IRQ_TABLE_LEN 9
> #define MAX_IRQS_PER_TABLE (1 << DTE_IRQ_TABLE_LEN)

No, that "DTE_IRQ_TABLE_LEN" is not the size shift - it's the size
shift value in that DTE field, which is shifted up by 1.

That's why the current code does that

   #define DTE_IRQ_TABLE_LEN   (9ULL << 1)

there..

Which was why I suggested that new #define that is the *actual* shift
value, and then the DTE thing and the MAX_IRQS_PER_TABLE values would
depend on that.

   Linus


Re: [PATCH v4 11/19] sched/core: Make migrate disable and CPU hotplug cooperative

2020-12-09 Thread Valentin Schneider


On 08/12/20 13:46, Qian Cai wrote:
> On Mon, 2020-12-07 at 19:27 +, Valentin Schneider wrote:
>> Ok, can reproduce this on a TX2 on next-20201207. I didn't use your config,
>> I oldconfig'd my distro config and only modified it to CONFIG_PREEMPT_NONE.
>> Interestingly the BUG happens on CPU127 here too...
>
> I think that number is totally random. For example, on this x86, it could 
> happen
> for CPU8 or CPU111.

Actually on the TX2 it seems to *always* happen on CPU127. Your hotplug
script sequentially offlines CPUs in increasing id values, so when CPU127
gets hotplugged it is the last online CPU of NUMA node 0.

I've been staring at traces collected via

  echo 2 > /proc/sys/kernel/ftrace_dump_on_oops
  trace-cmd start -e 'sched:*' -e 'cpuhp:*' -e 'workqueue:*'
  ./hotplug.sh

but it's still not entirely clear to me WTH is going on. I do see kworkers
getting their affinity reset in workqueue_offline_cpu(), but for some
reason there's a new one that wakes up on CPU127 sometime later. I haven't
been able to figure out where it comes from - it obviously isn't part of
the percpu worker pools, as it isn't handled during
workqueue_offline_cpu(), but it still ends up affined to a single CPU...

It looks something like this; traces are only from CPU127

  cpuhp:sched_cpu_wait_empty() # Resets the affinity of some kworker/127:x<2

  sched_switch(idle)

  sched_wakeup(kworker/127:2) # picks CPU127
  sched_switch(kworker/127:2)
  # maybe_create_worker() -> creates kworker/127:3
  sched_wakeup(kworker/127:3) # picks CPU127

  sched_switch(kworker/127:3)
  # maybe_create_worker() -> creates kworker/127:4
  sched_wakeup(kworker/127:4) # picks CPU127

  sched_switch(kworker/127:4)
  # maybe_create_worker() -> creates kworker/127:5
  sched_wakeup(kworker/127:5) # picks CPU127
  sched_wakeup(migration/127)

  sched_switch(migration/127)
  cpuhp:take_cpu_down()

  BUG


Re: [GIT PULL] IOMMU fix for 5.10 (-final)

2020-12-09 Thread Jerry Snitselaar
On Wed, Dec 9, 2020 at 12:12 PM Jerry Snitselaar  wrote:
>
>
> Will Deacon @ 2020-12-09 11:50 MST:
>
> > On Wed, Dec 09, 2020 at 10:07:46AM -0800, Linus Torvalds wrote:
> >> On Wed, Dec 9, 2020 at 6:12 AM Will Deacon  wrote:
> >> >
> >> > Please pull this one-liner AMD IOMMU fix for 5.10. It's actually a fix
> >> > for a fix, where the size of the interrupt remapping table was increased
> >> > but a related constant for the size of the interrupt table was forgotten.
> >>
> >> Pulled.
> >
> > Thanks.
> >
> >> However, why didn't this then add some sanity checking for the two
> >> different #defines to be in sync?
> >>
> >> IOW, something like
> >>
> >>#define AMD_IOMMU_IRQ_TABLE_SHIFT 9
> >>
> >>#define MAX_IRQS_PER_TABLE (1 << AMD_IOMMU_IRQ_TABLE_SHIFT)
> >>#define DTE_IRQ_TABLE_LEN ((u64)AMD_IOMMU_IRQ_TABLE_SHIFT << 1)
>
> Since the field in the device table entry format expects it to be n
> where there are 2^n entries in the table I guess it should be:
>
> #define DTE_IRQ_TABLE_LEN 9
> #define MAX_IRQS_PER_TABLE (1 << DTE_IRQ_TABLE_LEN)
>
No, ignore that. I'm being stupid.


> >>
> >> or whatever. Hmm?
> >
> > This looks like a worthwhile change to me, but I don't have any hardware
> > so I've been very reluctant to make even "obvious" driver changes here.
> >
> > Suravee -- please can you post a patch implementing the above?
> >
> >> That way this won't happen again, but perhaps equally importantly the
> >> linkage will be more clear, and there won't be those random constants.
> >>
> >> Naming above is probably garbage - I assume there's some actual
> >> architectural name for that irq table length field in the DTE?
> >
> > The one in the spec is even better: "IntTabLen".
> >
> > Will
> > ___
> > iommu mailing list
> > io...@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu
>



Re: [PATCH] files: rcu free files_struct

2020-12-09 Thread Linus Torvalds
On Wed, Dec 9, 2020 at 10:05 AM Eric W. Biederman  wrote:
>
> -   struct file * file = xchg(&fdt->fd[i], NULL);
> +   struct file * file = fdt->fd[i];
> if (file) {
> +   rcu_assign_pointer(fdt->fd[i], NULL);

This makes me nervous. Why did we use to do that xchg() there? That
has atomicity guarantees that now are gone.

Now, this whole thing should be called for just the last ref of the fd
table, so presumably that atomicity was never needed in the first
place. But the fact that we did that very expensive xchg() then makes
me go "there's some reason for it".

Is this xchg() just bogus historical leftover? It kind of looks that
way. But maybe that change should be done separately?

  Linus


Re: [GIT PULL] IOMMU fix for 5.10 (-final)

2020-12-09 Thread Jerry Snitselaar


Will Deacon @ 2020-12-09 11:50 MST:

> On Wed, Dec 09, 2020 at 10:07:46AM -0800, Linus Torvalds wrote:
>> On Wed, Dec 9, 2020 at 6:12 AM Will Deacon  wrote:
>> >
>> > Please pull this one-liner AMD IOMMU fix for 5.10. It's actually a fix
>> > for a fix, where the size of the interrupt remapping table was increased
>> > but a related constant for the size of the interrupt table was forgotten.
>> 
>> Pulled.
>
> Thanks.
>
>> However, why didn't this then add some sanity checking for the two
>> different #defines to be in sync?
>> 
>> IOW, something like
>> 
>>#define AMD_IOMMU_IRQ_TABLE_SHIFT 9
>> 
>>#define MAX_IRQS_PER_TABLE (1 << AMD_IOMMU_IRQ_TABLE_SHIFT)
>>#define DTE_IRQ_TABLE_LEN ((u64)AMD_IOMMU_IRQ_TABLE_SHIFT << 1)

Since the field in the device table entry format expects it to be n
where there are 2^n entries in the table I guess it should be:

#define DTE_IRQ_TABLE_LEN 9
#define MAX_IRQS_PER_TABLE (1 << DTE_IRQ_TABLE_LEN)

>> 
>> or whatever. Hmm?
>
> This looks like a worthwhile change to me, but I don't have any hardware
> so I've been very reluctant to make even "obvious" driver changes here.
>
> Suravee -- please can you post a patch implementing the above?
>
>> That way this won't happen again, but perhaps equally importantly the
>> linkage will be more clear, and there won't be those random constants.
>> 
>> Naming above is probably garbage - I assume there's some actual
>> architectural name for that irq table length field in the DTE?
>
> The one in the spec is even better: "IntTabLen".
>
> Will
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu



Re: [PATCH 00/18] keys: Miscellaneous fixes

2020-12-09 Thread Ben Boeckel
On Wed, Dec 09, 2020 at 12:14:24 +, David Howells wrote:
> I've extended my collection of minor keyrings fixes for the next merge
> window.  Anything else I should add (or anything I should drop)?
> 
> The patches can be found on the following branch:
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=keys-fixes

1-16 LGTM (modulo the typo in patch 7's commit message). 17 and 18 are
outside my knowledge right now.

Reviewed-by: Ben Boeckel 

--Ben


Re: Urgent: BUG: PPP ioctl Transport endpoint is not connected

2020-12-09 Thread Martin Zaharinov



> On 9 Dec 2020, at 20:10, Guillaume Nault  wrote:
> 
> On Wed, Dec 09, 2020 at 06:57:44PM +0200, Martin Zaharinov wrote:
>>> On 9 Dec 2020, at 18:40, Guillaume Nault  wrote:
>>> On Wed, Dec 09, 2020 at 04:47:52PM +0200, Martin Zaharinov wrote:
 Hi All
 
 I have problem with latest kernel release 
 And the problem is base on this late problem :
 
 
 https://www.mail-archive.com/search?l=net...@vger.kernel.org&q=subject:%22Re%5C%3A+ppp%5C%2Fpppoe%2C+still+panic+4.15.3+in+ppp_push%22&o=newest&f=1
 
 I have same problem in kernel 5.6 > now I use kernel 5.9.13 and have same 
 problem.
 
 
 In kernel 5.9.13 now don’t have any crashes in dimes but in one moment 
 accel service stop with defunct and in log have many of this line :
 
 
 error: vlan608: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
 error: vlan617: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
 error: vlan679: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
 
 In one moment connected user bump double or triple and after that service 
 defunct and need wait to drop all session to start .
 
 I talk with accel-ppp team and they said this is kernel related problem 
 and to back to kernel 4.14 there is not this problem.
 
 Problem is come after kernel 4.15 > and not have solution to this moment.
>>> 
>>> I'm sorry, I don't understand.
>>> Do you mean that v4.14 worked fine (no crash, no ioctl() error)?
>>> Did the problem start appearing in v4.15? Or did v4.15 work and the
>>> problem appeared in v4.16?
>> 
>> In Telegram group I talk with Sergey and Dimka and told my the problem is 
>> come after changes from 4.14 to 4.15 
>> Sergey write this : "as I know, there was a similar issue in kernel 4.15 so 
>> maybe it is still not fixed"
> 
> Ok, but what is your experience? Do you have a kernel version where
> accel-ppp reports no ioctl() error and doesn't crash the kernel?
Reported by Sergey and Dimka  version 4.14 < don’t have this problem,
Only version after 4.15.0 > have problem and with patch only fix to don’t put 
crash log in dimes and freeze system.


> 
> There wasn't a lot of changes between 4.14 and 4.15 for PPP.
> The only PPP patch I can see that might have been risky is commit
> 0171c4183559 ("ppp: unlock all_ppp_mutex before registering device").

For my changes is a atomic and skb kfree .
> 
>> I don’t have options to test with this old kernel 4.14.xxx i don’t have 
>> support for them.
>> 
>> 
>>> 
 Please help to find the problem.
 
 Last time in link I see is make changes in ppp_generic.c 
 
 ppp_lock(ppp);
   spin_lock_bh(&pch->downl);
   if (!pch->chan) {
   /* Don't connect unregistered channels */
   spin_unlock_bh(&pch->downl);
   ppp_unlock(ppp);
   ret = -ENOTCONN;
   goto outl;
   }
   spin_unlock_bh(&pch->downl);
 
 
 But this fix only to don’t display error and freeze system 
 The problem is stay and is to big.
>>> 
>>> Do you use accel-ppp's unit-cache option? Does the problem go away if
>>> you stop using it?
>>> 
>> 
>> No I don’t use unit-cache , if I set unit-cache accel-ppp defunct same but 
>> user Is connect and disconnet more fast.
>> 
>> The problem is same with unit and without . 
>> Only after this patch I don’t see error in dimes but this is not solution.
> 
> Soryy, what's "in dimes"?
> Do you mean that reverting commit 77f840e3e5f0 ("ppp: prevent
> unregistered channels from connecting to PPP units") fixes your problem?
Sorry text correct in dmesg*

I don’t make any changes of ppp part of kernel 5.9.13 I use clean build for ppp 
.


> 
>> In network have customer what have power cut problem, when drop 600 user and 
>> back Is normal but in this moment kernel is locking and start to make this : 
>> sessions:
>>  starting: 4235
>>  active: 3882
>>  finishing: 378
>> The problem is starting session is not real user normal user in this server 
>> is ~4k customers .
> 
> What type of session is it? L2TP, PPPoE, PPTP?
> 
Session is PPPoE only with radius auth

>> I use pppd_compat .
>> 
>> Any idea ?
>> 
 
 Please help to fix.
>> Martin



Re: [PATCH v5 4/5] Driver core: platform: Add devm_platform_get_irqs_affinity()

2020-12-09 Thread Greg KH
On Wed, Dec 09, 2020 at 07:04:02PM +, John Garry wrote:
> On 09/12/2020 18:32, Greg KH wrote:
> > On Wed, Dec 02, 2020 at 06:36:56PM +0800, John Garry wrote:
> > > Drivers for multi-queue platform devices may also want managed interrupts
> > > for handling HW queue completion interrupts, so add support.
> > 
> 
> Hi Greg,
> 
> > Why would a platform device want all of this?  Shouldn't such a device
> > be on a "real" bus instead?
> 
> For this HW version, the device is on the system bus, directly addressable
> by the CPU.

What do you mean by "system bus"?

> Motivation is that I wanted to switch the HW completion queues to use
> managed interrupts.

Fair enough, seems like overkill for a "platform" bus though :)

> > What in-kernel driver needs this complexity?  I can't take new apis
> > without a real user in the tree, sorry.
> 
> It's in the final patch in the series 
> https://lore.kernel.org/linux-scsi/1606905417-183214-1-git-send-email-john.ga...@huawei.com/T/#m0df7e7cd6f0819b99aaeb6b7f8939ef1e17b8a83.

Ah, I missed that, I thought that was some high-speed scsi thing, not a
tiny platform driver...

> I don't anticipate a huge number of users of this API in future, as most
> multi-queue devices are PCI devices; so we could do the work of this API in
> the driver itself, but the preference was not to export genirq functions
> like irq_update_affinity_desc() or irq_create_affinity_masks(), and rather
> have a common helper in the core platform code.

Ok, I'd like to have the irq maintainers/developers ack this before
taking it in the driver core, as someone is going to have to maintain
this crazy thing for forever if it gets merged.

thanks,

greg k-h


Re: [PATCHv3 net-next] octeontx2-pf: Add RSS multi group support

2020-12-09 Thread Saeed Mahameed
On Wed, 2020-12-09 at 22:39 +0530, Geetha sowjanya wrote:
> Hardware supports 8 RSS groups per interface. Currently we are using
> only group '0'. This patch allows user to create new RSS
> groups/contexts
> and use the same as destination for flow steering rules.
> 
> usage:
> To steer the traffic to RQ 2,3
> 
> ethtool -X eth0 weight 0 0 1 1 context new
> (It will print the allocated context id number)
> New RSS context is 1
> 
> ethtool -N eth0 flow-type tcp4 dst-port 80 context 1 loc 1
> 
> To delete the context
> ethtool -X eth0 context 1 delete
> 
> When an RSS context is removed, the active classification
> rules using this context are also removed.
> 
> Change-log:
> v2
> - Removed unrelated whitespace
> - Coverted otx2_get_rxfh() to use new function.
> 
> v3
> - Coverted otx2_set_rxfh() to use new function.
> 
> Signed-off-by: Sunil Kovvuri Goutham 
> Signed-off-by: Geetha sowjanya 
> ---

...

> -/* Configure RSS table and hash key */
> -static int otx2_set_rxfh(struct net_device *dev, const u32 *indir,
> -  const u8 *hkey, const u8 hfunc)
> +static int otx2_get_rxfh_context(struct net_device *dev, u32 *indir,
> +  u8 *hkey, u8 *hfunc, u32 rss_context)
>  {
>   struct otx2_nic *pfvf = netdev_priv(dev);
> + struct otx2_rss_ctx *rss_ctx;
>   struct otx2_rss_info *rss;
>   int idx;
>  
> - if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc !=
> ETH_RSS_HASH_TOP)
> - return -EOPNOTSUPP;
> -
>   rss = &pfvf->hw.rss_info;
>  
>   if (!rss->enable) {
> - netdev_err(dev, "RSS is disabled, cannot change
> settings\n");
> + netdev_err(dev, "RSS is disabled\n");
>   return -EIO;
>   }

I see that you init/enable rss on open, is this is your way to block
getting rss info if device is not open ? why do you need to report an
error anyway, why not just report whatever default config you will be
setting up on next open ? 

to me reporting errors to ethtool queries when device is down is a bad
user experience.

> + if (rss_context >= MAX_RSS_GROUPS)
> + return -EINVAL;
> +

-ENOENT
> + rss_ctx = rss->rss_ctx[rss_context];
> + if (!rss_ctx)
> + return -EINVAL;
> 

-ENOENT




Re: [PATCH] mm/vmalloc: randomize vmalloc() allocations

2020-12-09 Thread Topi Miettinen

On 3.12.2020 8.58, Mike Rapoport wrote:

On Wed, Dec 02, 2020 at 08:49:06PM +0200, Topi Miettinen wrote:

On 1.12.2020 23.45, Topi Miettinen wrote:

Memory mappings inside kernel allocated with vmalloc() are in
predictable order and packed tightly toward the low addresses. With
new kernel boot parameter 'randomize_vmalloc=1', the entire area is
used randomly to make the allocations less predictable and harder to
guess for attackers.



This also seems to randomize module addresses. I was going to check that
next, so nice surprise!


Heh, that's because module_alloc() uses vmalloc() in that way or another :)


The modules are still allocated from their small (1.5GB) separate area 
instead of the much larger (32TB/12.5PB) vmalloc area, which would 
greatly improve ASLR for the modules. To fix that, I tried to to #define 
MODULES_VADDR to VMALLOC_START etc. like x86_32 does, but then kernel 
dies very early without even any output.


-Topi


Re: [PATCH 07/18] KEYS: remove redundant memset

2020-12-09 Thread Ben Boeckel
On Wed, Dec 09, 2020 at 12:15:19 +, David Howells wrote:
> From: Tom Rix 
> 
> Reviewing use of memset in keyctrl_pkey.c

Typo: `keyctl_pkey.c`

--Ben


[tip: x86/cache] x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled

2020-12-09 Thread tip-bot2 for Xiaochen Shen
The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 2ba836dbe2467d31fffb439258c2f454c6f1a317
Gitweb:
https://git.kernel.org/tip/2ba836dbe2467d31fffb439258c2f454c6f1a317
Author:Xiaochen Shen 
AuthorDate:Fri, 04 Dec 2020 14:27:59 +08:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Dec 2020 20:00:46 +01:00

x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled

The MBA software controller (mba_sc) is a feedback loop which periodically
reads MBM counters and tries to restrict the bandwidth below a user
specified bandwidth. It tags along the MBM counter overflow handler to do
the updates with 1s interval in mbm_update() and update_mba_bw().

The purpose of mbm_update() is to periodically read the MBM counters to
make sure that the hardware counter doesn't wrap around more than once
between user samplings. mbm_update() calls __mon_event_count() for local
bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
instead when mba_sc is enabled. __mon_event_count() will not be called
for local bandwidth updating in MBM counter overflow handler, but it is
still called when reading MBM local bandwidth counter file
'mbm_local_bytes', the call path is as below:

  rdtgroup_mondata_show()
mon_event_read()
  mon_event_count()
__mon_event_count()

In __mon_event_count(), m->chunks is updated by delta chunks which is
calculated from previous MSR value (m->prev_msr) and current MSR value.
When mba_sc is enabled, m->chunks is also updated in mbm_update() by
mistake by the delta chunks which is calculated from m->prev_bw_msr
instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in
the mba_sc feedback loop.

When reading MBM local bandwidth counter file, m->chunks was changed
unexpectedly by mbm_bw_count(). As a result, the incorrect local
bandwidth counter which calculated using incorrect m->chunks is shown to
the user.

Fix this by removing incorrect m->chunks updating in mbm_bw_count() in
MBM counter overflow handler, and always calling __mon_event_count() in
mbm_update() to make sure that the hardware local bandwidth counter
doesn't wrap around.

Test steps:
  # Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
  git clone https://github.com/intel/intel-cmt-cat && cd intel-cmt-cat
  && make
  ./tools/membw/membw -c 0 -b 1 --read

  # Enable MBA software controller
  mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

  # Create control group c1
  mkdir /sys/fs/resctrl/c1

  # Set MB throttle to 6 GB/s
  echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata

  # Write PID of the workload to tasks file
  echo `pidof membw` > /sys/fs/resctrl/c1/tasks

  # Read local bytes counters twice with 1s interval, the calculated
  # local bandwidth is not as expected (approaching to 6 GB/s):
  local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  sleep 1
  local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  echo "local b/w (bytes/s):" `expr $local_2 - $local_1`

Before fix:
  local b/w (bytes/s): 11076796416

After fix:
  local b/w (bytes/s): 5465014272

Fixes: ba0f26d8529c (x86/intel_rdt/mba_sc: Prepare for feedback loop)
Signed-off-by: Xiaochen Shen 
Signed-off-by: Borislav Petkov 
Reviewed-by: Tony Luck 
Link: 
https://lkml.kernel.org/r/1607063279-19437-1-git-send-email-xiaochen.s...@intel.com
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c 
b/arch/x86/kernel/cpu/resctrl/monitor.c
index 622073f..93a33b7 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -320,7 +320,6 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
}
 
chunks = mbm_overflow_count(m->prev_msr, tval, rr->r->mbm_width);
-   m->chunks += chunks;
m->prev_msr = tval;
 
rr->val += get_corrected_mbm_count(rmid, m->chunks);
@@ -514,15 +513,14 @@ static void mbm_update(struct rdt_resource *r, struct 
rdt_domain *d, int rmid)
}
if (is_mbm_local_enabled()) {
rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
+   __mon_event_count(rmid, &rr);
 
/*
 * Call the MBA software controller only for the
 * control groups and when user has enabled
 * the software controller explicitly.
 */
-   if (!is_mba_sc(NULL))
-   __mon_event_count(rmid, &rr);
-   else
+   if (is_mba_sc(NULL))
mbm_bw_count(rmid, &rr);
}
 }


Re: [PATCH] lockdep: report broken irq restoration

2020-12-09 Thread Andy Lutomirski
On Wed, Dec 9, 2020 at 10:33 AM Mark Rutland  wrote:
>
> We generally expect local_irq_save() and local_irq_restore() to be
> paired and sanely nested, and so local_irq_restore() expects to be
> called with irqs disabled. Thus, within local_irq_restore() we only
> trace irq flag changes when unmasking irqs.
>
> This means that a seuence such as:
>
> | local_irq_disable();
> | local_irq_save(flags);
> | local_irq_enable();
> | local_irq_restore(flags);
>
> ... is liable to break things, as the local_irq_restore() would mask
> IRQs without tracing this change.
>
> We don't consider such sequences to be a good idea, so let's define
> those as forbidden, and add tooling to detect such broken cases.
>
> This patch adds debug code to WARN() when local_irq_restore() is called
> with irqs enabled. As local_irq_restore() is expected to pair with
> local_irq_save(), it should never be called with interrupts enabled.
>
> To avoid the possibility of circular header dependencies beteen
> irqflags.h and bug.h, the warning is handled in a separate C file.
>
> The new code is all conditional on a new CONFIG_DEBUG_IRQFLAGS symbol
> which is independent of CONFIG_TRACE_IRQFLAGS. As noted above such cases
> will confuse lockdep, so CONFIG_DEBUG_LOCKDEP now selects
> CONFIG_DEBUG_IRQFLAGS.
>
> Signed-off-by: Mark Rutland 
> Cc: Andy Lutomirski 
> Cc: Ingo Molnar 
> Cc: Juergen Gross 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> ---
>  include/linux/irqflags.h   | 18 +-
>  kernel/locking/Makefile|  1 +
>  kernel/locking/irqflag-debug.c | 12 
>  lib/Kconfig.debug  |  7 +++
>  4 files changed, 37 insertions(+), 1 deletion(-)
>  create mode 100644 kernel/locking/irqflag-debug.c
>
> Note: as things stand this'll blow up at boot-time on x86 within the io-apic
> timer_irq_works() boot-time test. I've proposed a fix for that:
>
> https://lore.kernel.org/lkml/20201209181514.GA14235@C02TD0UTHF1T.local/
>
> ... which was sufficient for booting under QEMU without splats. I'm giving 
> this
> a soak under Syzkaller on arm64 as that booted cleanly to begin with.
>
> Mark.
>
> diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
> index 3ed4e8771b64..bca3c6fa8270 100644
> --- a/include/linux/irqflags.h
> +++ b/include/linux/irqflags.h
> @@ -220,10 +220,26 @@ do {  \
>
>  #else /* !CONFIG_TRACE_IRQFLAGS */
>
> +#ifdef CONFIG_DEBUG_IRQFLAGS
> +extern void warn_bogus_irq_restore(bool *warned);
> +#define check_bogus_irq_restore()  \
> +   do {\
> +   static bool __section(".data.once") __warned;   \
> +   if (unlikely(!raw_irqs_disabled())) \
> +   warn_bogus_irq_restore(&__warned);  \
> +   } while (0)

What's the benefit of having a per-caller __warned instead of just
having a single global one in warn_bogus_irq_restore()?


[PATCH v3] drm/vkms: Add setup and testing information

2020-12-09 Thread Sumera Priyadarsini
Update the vkms documentation to contain steps to:

 - setup the vkms driver
 - run tests using igt

Signed-off-by: Sumera Priyadarsini 
___
Changes in v2:
 - Change heading to title case (Daniel)
 - Add examples to run tests directly (Daniel)
 - Add examples to run subtests (Melissa)

Changes in v3:
 - Add example using run-tests.sh script(Daniel)
---
 Documentation/gpu/vkms.rst | 70 ++
 1 file changed, 70 insertions(+)

diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
index 13bab1d93bb3..9e030c74a82e 100644
--- a/Documentation/gpu/vkms.rst
+++ b/Documentation/gpu/vkms.rst
@@ -7,6 +7,76 @@
 .. kernel-doc:: drivers/gpu/drm/vkms/vkms_drv.c
:doc: vkms (Virtual Kernel Modesetting)
 
+Setup
+=
+
+The VKMS driver can be setup with the following steps:
+
+To check if VKMS is loaded, run::
+
+  lsmod | grep vkms
+
+This should list the VKMS driver. If no output is obtained, then
+you need to enable and/or load the VKMS driver.
+Ensure that the VKMS driver has been set as a loadable module in your
+kernel config file. Do::
+
+  make nconfig
+
+  Go to `Device Drivers> Graphics support`
+
+  Enable `Virtual KMS (EXPERIMENTAL)`
+
+Compile and build the kernel for the changes to get reflected.
+Now, to load the driver, use::
+
+  sudo modprobe vkms
+
+On running the lsmod command now, the VKMS driver will appear listed.
+You can also observe the driver being loaded in the dmesg logs.
+
+To disable the driver, use ::
+
+  sudo modprobe -r vkms
+
+Testing With IGT
+
+
+The IGT GPU Tools is a test suite used specifically for debugging and
+development of the DRM drivers.
+The IGT Tools can be installed from
+`here `_ .
+
+The tests need to be run without a compositor, so you need to switch to text
+only mode. You can do this by::
+
+  sudo systemctl isolate multi-user.target
+
+To return to graphical mode, do::
+
+  sudo systemctl isolate graphical.target
+
+Once you are in text only mode, you can run tests using the --device switch
+or IGT_DEVICE variable to specify the device filter for the driver we want
+to test. IGT_DEVICE can also be used with the run-test.sh script to run the
+tests for a specific driver::
+
+  sudo ./build/tests/ --device "sys:/sys/devices/platform/vkms"
+  sudo IGT_DEVICE="sys:/sys/devices/platform/vkms" ./build/tests/
+  sudo IGT_DEVICE="sys:/sys/devices/platform/vkms" ./scripts/run-tests.sh -t 

+
+For example, to test the functionality of the writeback library,
+we can run the kms_writeback test::
+
+  sudo ./build/tests/kms_writeback --device "sys:/sys/devices/platform/vkms"
+  sudo IGT_DEVICE="sys:/sys/devices/platform/vkms" ./build/tests/kms_writeback
+  sudo IGT_DEVICE="sys:/sys/devices/platform/vkms" ./scripts/run-tests.sh -t 
kms_writeback
+
+You can also run subtests if you do not want to run the entire test::
+
+  sudo ./build/tests/kms_flip --run-subtest basic-plain-flip --device 
"sys:/sys/devices/platform/vkms"
+  sudo IGT_DEVICE="sys:/sys/devices/platform/vkms" ./build/tests/kms_flip 
--run-subtest basic-plain-flip
+
 TODO
 
 
-- 
2.25.1



Re: [PATCH v5 4/5] Driver core: platform: Add devm_platform_get_irqs_affinity()

2020-12-09 Thread John Garry

On 09/12/2020 18:32, Greg KH wrote:

On Wed, Dec 02, 2020 at 06:36:56PM +0800, John Garry wrote:

Drivers for multi-queue platform devices may also want managed interrupts
for handling HW queue completion interrupts, so add support.




Hi Greg,


Why would a platform device want all of this?  Shouldn't such a device
be on a "real" bus instead?


For this HW version, the device is on the system bus, directly 
addressable by the CPU.


Motivation is that I wanted to switch the HW completion queues to use 
managed interrupts.




What in-kernel driver needs this complexity?  I can't take new apis
without a real user in the tree, sorry.


It's in the final patch in the series 
https://lore.kernel.org/linux-scsi/1606905417-183214-1-git-send-email-john.ga...@huawei.com/T/#m0df7e7cd6f0819b99aaeb6b7f8939ef1e17b8a83.


I don't anticipate a huge number of users of this API in future, as most 
multi-queue devices are PCI devices; so we could do the work of this API 
in the driver itself, but the preference was not to export genirq 
functions like irq_update_affinity_desc() or 
irq_create_affinity_masks(), and rather have a common helper in the core 
platform code.


Thanks,
John



Re: [PATCH] spi: dt-bindings: clarify CS behavior for spi-cs-high and gpio descriptors

2020-12-09 Thread Sven Van Asbroeck
On Wed, Dec 9, 2020 at 1:16 PM H. Nikolaus Schaller  wrote:
>
> This is also what made me wonder if that is really intended because then
> the whole discussion about the cs-gpio-flags and inversion and the fixes
> would not have been needed. The current code and fixes are all about
> not ignoring the flags...

The inversion you witnessed was a bug caused by spi client drivers that
simply "plow over" the SPI_CS_HIGH mode flag. This includes the panel driver
you're using, see:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/panel/panel-tpo-td028ttec1.c?h=v5.10-rc6#n337

You responded to this inversion bug by inverting your chip select, which made
things work again. Now that the bug is gone, you can indicate the correct
polarity in your devicetree, i.e. add 'spi-cs-high' for an active-high
CS, and leave it out for an active-low CS.

Your panel's CS is active-low, so it should not contain spi-cs-high.

> Secondly, please imagine some reader of a device tree who finds
>
> cs-gpios = <&gpio 7 ACTIVE_LOW>;
> spi-cs-high;

That reader looks at the rules, sees that:
- the ACTIVE_LOW is ignored,
- presence of spi-cs-high means active-high
and concludes this chip-select is active-high.


Re: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting

2020-12-09 Thread Linus Torvalds
On Wed, Dec 9, 2020 at 10:40 AM Will Deacon  wrote:
>
> > And yes, that probably means that you need to change "alloc_set_pte()"
> > to actually pass in the real address, and leave "vmf->address" alone -
> > so that it can know which ones are prefaulted and which one is real,
> > but that sounds like a good idea anyway.
>
> Right, I deliberately avoided that based on the feedback from Jan on an
> older version [1], but I can certainly look at it again.

Side note: I absolutely detest alloc_set_pte() in general, and it
would be very good to try to just change the rules of that function
entirely.

The alloc_set_pte() function is written to be some generic "iterate
over ptes setting them'" function, but it has exactly two users, and
none of those users really want the semantics it gives them, I feel.
And the semantics that alloc_set_pte() does have actually *hurts*
them.

In particular, it made it a nightmare to read what do_fault_around()
does: it does that odd

if (pmd_none(*vmf->pmd)) {
vmf->prealloc_pte = pte_alloc_one(vmf->vma->vm_mm);

and then it calls ->map_pages() (which is always filemap_map_pages(),
except for xfs, where it is also always filemap_map_pages but it takes
a lock first).

And it did that prealloc_pte, because that's what alloc_set_pte()
needs to be atomic - and it needs to be atomic because it's all done
under the RCU lock.

So essentially, the _major_ user of alloc_set_pte() requires atomicity
- but that's not at all obvious when you actually read alloc_set_pte()
itself, and in fact when you read it, you see that

if (!vmf->pte) {
ret = pte_alloc_one_map(vmf);

which can block. Except it won't block if we have vmf->prealloc_pte,
because then it will just take that instead.

In other words, ALL THAT IS COMPLETE GARBAGE. And it's written to be
as obtuse as humanly possible, and there is no way in hell anybody
understands it by just looking at the code - you have to follow all
these odd quirks back to see what's going on.

So it would be much better to get rid of alloc_set_pte() entirely, and
move the allocation and the pte locking into the caller, and just
clean it up (because it turns out the callers have their own models
for allocation anyway). And then each of the callers would be a lot
more understandable, instead of having this insanely subtle "I need to
be atomic, so I will pre-allocate in one place, and then take the RCU
lock in another place, in order to call a _third_ place that is only
atomic because of that first step".

The other user of alloc_set_pte() is "finish_fault()", and honestly,
that's what alloc_set_pte() was written for, and why alloc_set_pte()
does all those things that filemap_map_pages() doesn't even want.

But while that other user does want what alloc_set_pte() does, there's
no downside to just moving those things into the caller. It would once
again just clarify things to make the allocation and the setting be
separate operations - even in that place where it doesn't have the
same kind of very subtle behavior with pre-allocation and atomicity.

I think the problem with alloc_set_pte() is hat it has had many
different people working on it over the years, and Kirill massaged
that old use to fit the new pre-alloc use. Before the pre-faulting, I
think both cases were much closer to each other.

So I'm not really blaming anybody here, but the ugly and very
hard-to-follow rules seem to come from historical behavior that was
massaged over time to "just work" rather than have that function be
made more logical.

Kirill - I think you know this code best. Would you be willing to look
at splitting out that "allocate and map" from alloc_set_pte() and into
the callers?

At that point, I think the current very special and odd
do_fault_around() pre-allocation could be made into just a _regular_
"allocate the pmd if it doesn't exist". And then the pte locking could
be moved into filemap_map_pages(), and suddenly the semantics and
rules around all that would be a whole lot more obvious.

Pretty please?

  Linus


Re: [PATCH v24 5/5] remoteproc: Add initial zynqmp R5 remoteproc driver

2020-12-09 Thread Mathieu Poirier
On Mon, Nov 30, 2020 at 07:57:17AM -0800, Ben Levinsky wrote:
> R5 is included in Xilinx Zynq UltraScale MPSoC so by adding this
> remotproc driver, we can boot the R5 sub-system in two different
> configurations -
>   * Split
>   * Lockstep
> 
> The Xilinx R5 Remoteproc Driver boots the R5's via calls to the Xilinx
> Platform Management Unit that handles the R5 configuration, memory access
> and R5 lifecycle management. The interface to this manager is done in this
> driver via zynqmp_pm_* function calls.
> 
> Signed-off-by: Wendy Liang 
> Signed-off-by: Michal Simek 
> Signed-off-by: Ed Mooring 
> Signed-off-by: Jason Wu 
> Signed-off-by: Ben Levinsky 
> ---
>  drivers/remoteproc/Kconfig|   8 +
>  drivers/remoteproc/Makefile   |   1 +
>  drivers/remoteproc/zynqmp_r5_remoteproc.c | 872 ++
>  3 files changed, 881 insertions(+)
>  create mode 100644 drivers/remoteproc/zynqmp_r5_remoteproc.c
> 
> diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
> index c6659dfea7c7..c2fe54b1d94f 100644
> --- a/drivers/remoteproc/Kconfig
> +++ b/drivers/remoteproc/Kconfig
> @@ -275,6 +275,14 @@ config TI_K3_DSP_REMOTEPROC
> It's safe to say N here if you're not interested in utilizing
> the DSP slave processors.
>  
> +config ZYNQMP_R5_REMOTEPROC
> + tristate "ZynqMP R5 remoteproc support"
> + depends on PM && ARCH_ZYNQMP
> + select RPMSG_VIRTIO
> + select ZYNQMP_IPI_MBOX
> + help
> +   Say y or m here to support ZynqMP R5 remote processors via the remote
> +   processor framework.
>  endif # REMOTEPROC
>  
>  endmenu
> diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
> index 3dfa28e6c701..ef1abff654c2 100644
> --- a/drivers/remoteproc/Makefile
> +++ b/drivers/remoteproc/Makefile
> @@ -33,3 +33,4 @@ obj-$(CONFIG_ST_REMOTEPROC) += st_remoteproc.o
>  obj-$(CONFIG_ST_SLIM_REMOTEPROC) += st_slim_rproc.o
>  obj-$(CONFIG_STM32_RPROC)+= stm32_rproc.o
>  obj-$(CONFIG_TI_K3_DSP_REMOTEPROC)   += ti_k3_dsp_remoteproc.o
> +obj-$(CONFIG_ZYNQMP_R5_REMOTEPROC)   += zynqmp_r5_remoteproc.o
> diff --git a/drivers/remoteproc/zynqmp_r5_remoteproc.c 
> b/drivers/remoteproc/zynqmp_r5_remoteproc.c
> new file mode 100644
> index ..2593de618409
> --- /dev/null
> +++ b/drivers/remoteproc/zynqmp_r5_remoteproc.c
> @@ -0,0 +1,872 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Zynq R5 Remote Processor driver
> + *
> + * Based on origin OMAP and Zynq Remote Processor driver
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "remoteproc_internal.h"
> +
> +#define MAX_RPROCS   2 /* Support up to 2 RPU */
> +#define MAX_MEM_PNODES   4 /* Max power nodes for one RPU memory 
> instance */
> +
> +#define BANK_LIST_PROP   "sram"
> +#define DDR_LIST_PROP"memory-region"
> +
> +/* IPI buffer MAX length */
> +#define IPI_BUF_LEN_MAX  32U
> +/* RX mailbox client buffer max length */
> +#define RX_MBOX_CLIENT_BUF_MAX   (IPI_BUF_LEN_MAX + \
> +  sizeof(struct zynqmp_ipi_message))
> +
> +/*
> + * Map each Xilinx on-chip SRAM  Bank address to their own respective
> + * pm_node_id.
> + */
> +struct sram_addr_data {
> + phys_addr_t addr;
> + enum pm_node_id id;
> +};
> +
> +#define NUM_SRAMS 4U
> +static const struct sram_addr_data zynqmp_banks[NUM_SRAMS] = {
> + {0xffe0UL, NODE_TCM_0_A},
> + {0xffe2UL, NODE_TCM_0_B},
> + {0xffe9UL, NODE_TCM_1_A},
> + {0xffebUL, NODE_TCM_1_B},
> +};
> +
> +/**
> + * struct zynqmp_r5_rproc - ZynqMP R5 core structure
> + *
> + * @rx_mc_buf: rx mailbox client buffer to save the rx message
> + * @tx_mc: tx mailbox client
> + * @rx_mc: rx mailbox client
> + * @mbox_work: mbox_work for the RPU remoteproc
> + * @tx_mc_skbs: socket buffers for tx mailbox client
> + * @dev: device of RPU instance
> + * @rproc: rproc handle
> + * @tx_chan: tx mailbox channel
> + * @rx_chan: rx mailbox channel
> + * @pnode_id: RPU CPU power domain id
> + * @elem: linked list item
> + */
> +struct zynqmp_r5_rproc {
> + unsigned char rx_mc_buf[RX_MBOX_CLIENT_BUF_MAX];
> + struct mbox_client tx_mc;
> + struct mbox_client rx_mc;
> + struct work_struct mbox_work;
> + struct sk_buff_head tx_mc_skbs;
> + struct device *dev;
> + struct rproc *rproc;
> + struct mbox_chan *tx_chan;
> + struct mbox_chan *rx_chan;
> + u32 pnode_id;
> + struct list_head elem;
> +};
> +
> +/*
> + * r5_set_mode - set RPU operation mode
> + * @z_rproc: Remote processor private data
> + * @rpu_mode: mode specified by device tree to configure the RPU to
> + *
> + * set RPU operation mode
> + *
> + * Return: 0 for success, negative value for failure
> + */
> +static int r5_set_mode(struct zynqmp_r5_rproc *z_rproc,
> +  

Re: [PATCH] Staging: silabs si4455 serial driver

2020-12-09 Thread Dan Carpenter
On Wed, Dec 09, 2020 at 06:56:13PM +, József Horváth wrote:
> Thank you for your suggestions. I made the canges suggested by you.
> 
> I'll test the driver in my development environment, then I'll come
> back with a new patch soon.

Take your time.  No rush.

> 
> I set up a mail client on my linux dev environment, I hope my
> mails/patches/codes will be in proper quality in the future.

No problem at all.  Part of getting used to the process.

regards,
dan carpenter



Re: [PATCH mm 1/2] kasan: don't use read-only static keys

2020-12-09 Thread Marco Elver
On Wed, 9 Dec 2020 at 19:57, Kees Cook  wrote:
>
> On Wed, Dec 09, 2020 at 07:49:36PM +0100, Marco Elver wrote:
> > On Wed, 9 Dec 2020 at 19:24, Andrey Konovalov  wrote:
> > > __ro_after_init static keys are incompatible with usage in loadable kernel
> > > modules and cause crashes. Don't use those, use normal static keys.
> > >
> > > Signed-off-by: Andrey Konovalov 
> >
> > Reviewed-by: Marco Elver 
> >
> > > ---
> > >
> > > This fix can be squashed into
> > > "kasan: add and integrate kasan boot parameters".
> > >
> > > ---
> > >  mm/kasan/hw_tags.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/mm/kasan/hw_tags.c b/mm/kasan/hw_tags.c
> > > index c91f2c06ecb5..55bd6f09c70f 100644
> > > --- a/mm/kasan/hw_tags.c
> > > +++ b/mm/kasan/hw_tags.c
> > > @@ -43,11 +43,11 @@ static enum kasan_arg_stacktrace kasan_arg_stacktrace 
> > > __ro_after_init;
> > >  static enum kasan_arg_fault kasan_arg_fault __ro_after_init;
> > >
> > >  /* Whether KASAN is enabled at all. */
> > > -DEFINE_STATIC_KEY_FALSE_RO(kasan_flag_enabled);
> > > +DEFINE_STATIC_KEY_FALSE(kasan_flag_enabled);
> >
> > Side-node: This appears to be just a bad interface; I think the macro
> > DEFINE_STATIC_KEY_FALSE_RO() is error-prone, if it can't be guaranteed
> > that this is always safe, since the presence of the macro encourages
> > its use and we'll inevitably run into this problem again.
> >
> > >  EXPORT_SYMBOL(kasan_flag_enabled);
> >
> > DEFINE_STATIC_KEY_FALSE_RO() + EXPORT_SYMBOL() is an immediate bug.
> > Given its use has not increased substantially since its introduction,
> > it may be safer to consider its removal.
>
> Right -- it seems the export is the problem, not the RO-ness. What is
> actually trying to change the flag after __init?

It seems to want to add it to a list on module loads:
https://lore.kernel.org/lkml/20201208125129.gy2...@hirez.programming.kicks-ass.net/

-- Marco


Re: [PATCH mm 1/2] kasan: don't use read-only static keys

2020-12-09 Thread Kees Cook
On Wed, Dec 09, 2020 at 07:49:36PM +0100, Marco Elver wrote:
> On Wed, 9 Dec 2020 at 19:24, Andrey Konovalov  wrote:
> > __ro_after_init static keys are incompatible with usage in loadable kernel
> > modules and cause crashes. Don't use those, use normal static keys.
> >
> > Signed-off-by: Andrey Konovalov 
> 
> Reviewed-by: Marco Elver 
> 
> > ---
> >
> > This fix can be squashed into
> > "kasan: add and integrate kasan boot parameters".
> >
> > ---
> >  mm/kasan/hw_tags.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/kasan/hw_tags.c b/mm/kasan/hw_tags.c
> > index c91f2c06ecb5..55bd6f09c70f 100644
> > --- a/mm/kasan/hw_tags.c
> > +++ b/mm/kasan/hw_tags.c
> > @@ -43,11 +43,11 @@ static enum kasan_arg_stacktrace kasan_arg_stacktrace 
> > __ro_after_init;
> >  static enum kasan_arg_fault kasan_arg_fault __ro_after_init;
> >
> >  /* Whether KASAN is enabled at all. */
> > -DEFINE_STATIC_KEY_FALSE_RO(kasan_flag_enabled);
> > +DEFINE_STATIC_KEY_FALSE(kasan_flag_enabled);
> 
> Side-node: This appears to be just a bad interface; I think the macro
> DEFINE_STATIC_KEY_FALSE_RO() is error-prone, if it can't be guaranteed
> that this is always safe, since the presence of the macro encourages
> its use and we'll inevitably run into this problem again.
> 
> >  EXPORT_SYMBOL(kasan_flag_enabled);
> 
> DEFINE_STATIC_KEY_FALSE_RO() + EXPORT_SYMBOL() is an immediate bug.
> Given its use has not increased substantially since its introduction,
> it may be safer to consider its removal.

Right -- it seems the export is the problem, not the RO-ness. What is
actually trying to change the flag after __init?

-- 
Kees Cook


Re: [PATCH] Staging: silabs si4455 serial driver

2020-12-09 Thread József Horváth
On Wed, Dec 09, 2020 at 03:42:06PM +0300, Dan Carpenter wrote:
> Change the From email header to say your name.
> 
> The patch is corrupted and can't be applied.  Please read the first
> paragraphs of Documentation/process/email-clients.rst
> 
> This driver is pretty small and it might be easier to clean it up first
> before merging it into staging.  Run checkpatch.pl --strict on the
> driver.
> 
> > --- /dev/null
> > +++ b/drivers/staging/si4455/si4455.c
> > @@ -0,0 +1,1465 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2020 József Horváth 
> > + *
> > + */
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include "si4455_api.h"
> > +
> > +#define PORT_SI44551096
> > +#define SI4455_NAME"si4455"
> > +#define SI4455_MAJOR   432
> > +#define SI4455_MINOR   567
> > +#define SI4455_UART_NRMAX  1
> > +#define SI4455_FIFO_SIZE   64
> > +
> > +#define SI4455_CMD_ID_EZCONFIG_CHECK   0x19
> > +#define SI4455_CMD_ID_PART_INFO0x01
> > +#define SI4455_CMD_REPLY_COUNT_PART_INFO   9
> > +#define SI4455_CMD_ID_GET_INT_STATUS   0x20
> > +#define SI4455_CMD_REPLY_COUNT_GET_INT_STATUS  8
> > +#define SI4455_CMD_ID_FIFO_INFO0x15
> > +#define SI4455_CMD_ARG_COUNT_FIFO_INFO 2
> > +#define SI4455_CMD_REPLY_COUNT_FIFO_INFO   2
> > +#define SI4455_CMD_FIFO_INFO_ARG_TX_BIT0x01
> > +#define SI4455_CMD_FIFO_INFO_ARG_RX_BIT0x02
> > +#define SI4455_CMD_ID_READ_CMD_BUFF0x44
> > +#define SI4455_CMD_ID_READ_RX_FIFO 0x77
> > +#define SI4455_CMD_ID_WRITE_TX_FIFO0x66
> > +#define SI4455_CMD_ID_START_RX 0x32
> > +#define SI4455_CMD_ARG_COUNT_START_RX  8
> > +#define SI4455_CMD_START_RX_ARG_RXTIMEOUT_STATE_ENUM_RX8
> > +#define SI4455_CMD_START_RX_ARG_RXVALID_STATE_ENUM_RX  8
> > +#define SI4455_CMD_START_RX_ARG_RXINVALID_STATE_ENUM_RX8
> > +#define SI4455_CMD_ID_START_TX 0x31
> > +#define SI4455_CMD_ARG_COUNT_START_TX  5
> > +#define SI4455_CMD_ID_CHANGE_STATE 0x34
> > +#define SI4455_CMD_ARG_COUNT_CHANGE_STATE  2
> > +#define SI4455_CMD_CHANGE_STATE_ARG_NEW_STATE_ENUM_READY   3
> > +#define SI4455_CMD_GET_CHIP_STATUS_REP_CMD_ERROR_PEND_MASK 0x08
> > +#define SI4455_CMD_GET_CHIP_STATUS_REP_CMD_ERROR_PEND_BIT  0x08
> > +#define SI4455_CMD_GET_INT_STATUS_REP_PACKET_SENT_PEND_BIT 0x20
> > +#define SI4455_CMD_GET_INT_STATUS_REP_PACKET_RX_PEND_BIT   0x10
> > +#define SI4455_CMD_GET_INT_STATUS_REP_CRC_ERROR_BIT0x08
> 
> These names are unreadably long and just make the code messy.
> 
> > +#define SI4455_CMD_ID_GET_MODEM_STATUS 0x22
> > +#define SI4455_CMD_ARG_COUNT_GET_MODEM_STATUS  2
> > +#define SI4455_CMD_REPLY_COUNT_GET_MODEM_STATUS8
> > +
> > +struct si4455_part_info {
> > +   u8  CHIPREV;
> > +   u16 PART;
> > +   u8  PBUILD;
> > +   u16 ID;
> > +   u8  CUSTOMER;
> > +   u8  ROMID;
> > +   u8  BOND;
> 
> Also the huge gap between the type and the struct member makes it
> hard to read.
> 
>   u8  chip_rev;
>   u16 part;
>   u8  pbuild;
> 
> etc.
> 
> > +};
> > +
> > +struct si4455_int_status {
> > +   u8  INT_PEND;
> > +   u8  INT_STATUS;
> > +   u8  PH_PEND;
> > +   u8  PH_STATUS;
> > +   u8  MODEM_PEND;
> > +   u8  MODEM_STATUS;
> > +   u8  CHIP_PEND;
> > +   u8  CHIP_STATUS;
> > +};
> > +
> > +struct si4455_modem_status {
> > +   u8  MODEM_PEND;
> > +   u8  MODEM_STATUS;
> > +   u8  CURR_RSSI;
> > +   u8  LATCH_RSSI;
> > +   u8  ANT1_RSSI;
> > +   u8  ANT2_RSSI

Re: linux-next: build warning after merge of the akpm tree

2020-12-09 Thread Kees Cook
On Tue, Dec 08, 2020 at 11:01:57PM +1100, Stephen Rothwell wrote:
> Hi Stephen,
> 
> On Fri, 4 Dec 2020 21:00:00 +1100 Stephen Rothwell  
> wrote:
> >
> > Hi all,
> > 
> > After merging the akpm tree, today's linux-next build (powerpc
> > allyesconfig) produced warnings like this:
> > 
> > ld: warning: orphan section `.data..Lubsan_data177' from 
> > `arch/powerpc/oprofile/op_model_pa6t.o' being placed in section 
> > `.data..Lubsan_data177'
> > 
> > (lots of these latter ones)
> 
> 781584 of them today!
> 
> > I don't know what produced these, but it is in the akpm-current or
> > akpm trees.
> 
> Presumably the result of commit
> 
>   186c3e18dba3 ("ubsan: enable for all*config builds")
> 
> from the akpm-current tree.
> 
> arch/powerpc/kernel/vmlinux.lds.S has:
> 
> #ifdef CONFIG_PPC32
> .data : AT(ADDR(.data) - LOAD_OFFSET) {
> DATA_DATA
> #ifdef CONFIG_UBSAN
> *(.data..Lubsan_data*)
> *(.data..Lubsan_type*)
> #endif
> *(.data.rel*)
> *(SDATA_MAIN)
> 
> added by commit
> 
>   beba24ac5913 ("powerpc/32: Add .data..Lubsan_data*/.data..Lubsan_type* 
> sections explicitly")
> 
> in 2018, but no equivalent for 64 bit.
> 
> I will try the following patch tomorrow:
> 
> From: Stephen Rothwell 
> Date: Tue, 8 Dec 2020 22:58:24 +1100
> Subject: [PATCH] powerpc: Add .data..Lubsan_data*/.data..Lubsan_type* 
> sections explicitly
> 
> Similarly to commit
> 
>   beba24ac5913 ("powerpc/32: Add .data..Lubsan_data*/.data..Lubsan_type* 
> sections explicitly")
> 
> since CONFIG_UBSAN bits can now be enabled for all*config.
> 
> Signed-off-by: Stephen Rothwell 
> ---
>  arch/powerpc/kernel/vmlinux.lds.S | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
> b/arch/powerpc/kernel/vmlinux.lds.S
> index 3b4c26e94328..0318ba436f34 100644
> --- a/arch/powerpc/kernel/vmlinux.lds.S
> +++ b/arch/powerpc/kernel/vmlinux.lds.S
> @@ -296,6 +296,10 @@ SECTIONS
>  #else
>   .data : AT(ADDR(.data) - LOAD_OFFSET) {
>   DATA_DATA
> +#ifdef CONFIG_UBSAN
> + *(.data..Lubsan_data*)
> + *(.data..Lubsan_type*)
> +#endif
>   *(.data.rel*)
>   *(.toc1)
>   *(.branch_lt)
> -- 
> 2.29.2
> 
> -- 
> Cheers,
> Stephen Rothwell

Reviewed-by: Kees Cook 

Thanks for figuring this one out. :) Andrew, can you add this to your
ubsan patch stack, or do you want me to resend it to you directly?


-- 
Kees Cook


RE: [PATCH net 1/4] net: freescale/fman: Split the main resource region reservation

2020-12-09 Thread Madalin Bucur
> -Original Message-
> From: Patrick Havelange 
> Sent: 09 December 2020 16:17
> To: Madalin Bucur ; David S. Miller
> ; Jakub Kicinski ;
> net...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH net 1/4] net: freescale/fman: Split the main resource
> region reservation
> 
> >>> area. I'm assuming this is the problem you are trying to address here,
> >>> besides the stack corruption issue.
> >>
> >> Yes exactly.
> >> I did not add this behaviour (having a main region and subdrivers using
> >> subregions), I'm just trying to correct what is already there.
> >> For example: this is some content of /proc/iomem for one board I'm
> >> working with, with the current existing code:
> >> ffe40-ffe4fdfff : fman
> >> ffe4e-ffe4e0fff : mac
> >> ffe4e2000-ffe4e2fff : mac
> >> ffe4e4000-ffe4e4fff : mac
> >> ffe4e6000-ffe4e6fff : mac
> >> ffe4e8000-ffe4e8fff : mac
> >>
> >> and now with my patches:
> >> ffe40-ffe4fdfff : /soc@ffe00/fman@40
> >> ffe40-ffe480fff : fman
> >> ffe488000-ffe488fff : fman-port
> >> ffe489000-ffe489fff : fman-port
> >> ffe48a000-ffe48afff : fman-port
> >> ffe48b000-ffe48bfff : fman-port
> >> ffe48c000-ffe48cfff : fman-port
> >> ffe4a8000-ffe4a8fff : fman-port
> >> ffe4a9000-ffe4a9fff : fman-port
> >> ffe4aa000-ffe4aafff : fman-port
> >> ffe4ab000-ffe4abfff : fman-port
> >> ffe4ac000-ffe4acfff : fman-port
> >> ffe4c-ffe4d : fman
> >> ffe4e-ffe4e0fff : mac
> >> ffe4e2000-ffe4e2fff : mac
> >> ffe4e4000-ffe4e4fff : mac
> >> ffe4e6000-ffe4e6fff : mac
> >> ffe4e8000-ffe4e8fff : mac
> >>
> >>> While for the latter I think we can
> >>> put together a quick fix, for the former I'd like to take a bit of
> time
> >>> to select the best fix, if one is really needed. So, please, let's
> split
> >>> the two problems and first address the incorrect stack memory use.
> >>
> >> I have no idea how you can fix it without a (more correct this time)
> >> dummy region passed as parameter (and you don't want to use the first
> >> patch). But then it will be useless to do the call anyway, as it won't
> >> do any proper verification at all, so it could also be removed entirely,
> >> which begs the question, why do it at all in the first place (the
> >> devm_request_mem_region).
> >>
> >> I'm not an expert in that part of the code so feel free to correct me
> if
> >> I missed something.
> >>
> >> BR,
> >>
> >> Patrick H.
> >
> > Hi, Patrick,
> >
> > the DPAA entities are described in the device tree. Adding some
> hardcoding in
> > the driver is not really the solution for this problem. And I'm not sure
> we have
> 
> I'm not seeing any problem here, the offsets used by the fman driver
> were already there, I just reorganized them in 2 blocks.
> 
> > a clear problem statement to start with. Can you help me on that part?
> 
> - The current call to __devm_request_region in fman_port.c is not correct.
> 
> One way to fix this is to use devm_request_mem_region, however this
> requires that the main fman would not be reserving the whole region.
> This leads to the second problem:
> - Make sure the main fman driver is not reserving the whole region.
> 
> Is that clearer like this ?
> 
> Patrick H.

The overlapping IO areas result from the device tree description, that in turn
mimics the HW description in the manual. If we really want to remove the 
nesting,
we should change the device trees, not the drivers. If we want to hack it,
instead of splitting ioremaps, we can reserve 4 kB in the FMan driver,
and keep the ioremap as it is now, with the benefit of less code churn.
In the end, what the reservation is trying to achieve is to make sure there
is a single driver controlling a certain peripeheral, and this basic
requirement would be addressed by that change plus devm_of_iomap() for child
devices (ports, MACs).

Madalin


Re: [PATCH v2 1/3] KVM: x86: implement KVM_{GET|SET}_TSC_STATE

2020-12-09 Thread Marcelo Tosatti
On Tue, Dec 08, 2020 at 10:33:15PM +0100, Thomas Gleixner wrote:
> On Tue, Dec 08 2020 at 15:11, Marcelo Tosatti wrote:
> > On Tue, Dec 08, 2020 at 05:02:07PM +0100, Thomas Gleixner wrote:
> >> On Tue, Dec 08 2020 at 16:50, Maxim Levitsky wrote:
> >> > On Mon, 2020-12-07 at 20:29 -0300, Marcelo Tosatti wrote:
> >> >> > +This ioctl allows to reconstruct the guest's IA32_TSC and TSC_ADJUST 
> >> >> > value
> >> >> > +from the state obtained in the past by KVM_GET_TSC_STATE on the same 
> >> >> > vCPU.
> >> >> > +
> >> >> > +If 'KVM_TSC_STATE_TIMESTAMP_VALID' is set in flags,
> >> >> > +KVM will adjust the guest TSC value by the time that passed since 
> >> >> > the moment
> >> >> > +CLOCK_REALTIME timestamp was saved in the struct and current value of
> >> >> > +CLOCK_REALTIME, and set the guest's TSC to the new value.
> >> >> 
> >> >> This introduces the wraparound bug in Linux timekeeping, doesnt it?
> >> 
> >> Which bug?
> >
> > max_cycles overflow. Sent a message to Maxim describing it.
> 
> Truly helpful. Why the hell did you not talk to me when you ran into
> that the first time?

Because 

1) Users wanted CLOCK_BOOTTIME to stop counting while the VM 
is paused (so we wanted to stop guest clock when VM is paused anyway).

2) The solution to inject NMIs to the guest seemed overly
complicated.

> >> For one I have no idea which bug you are talking about and if the bug is
> >> caused by the VMM then why would you "fix" it in the guest kernel.
> >
> > 1) Stop guest, save TSC value of cpu-0 = V.
> > 2) Wait for some amount of time = W.
> > 3) Start guest, load TSC value with V+W.
> >
> > Can cause an overflow on Linux timekeeping.
> 
> Yes, because you violate the basic assumption which Linux timekeeping
> makes. See the other mail in this thread.
> 
> Thanks,
> 
> tglx



Re: [PATCH] kcov: don't instrument with UBSAN

2020-12-09 Thread Kees Cook
On Wed, Dec 09, 2020 at 11:01:52AM +0100, Dmitry Vyukov wrote:
> Both KCOV and UBSAN use compiler instrumentation. If UBSAN detects a bug
> in KCOV, it may cause infinite recursion via printk and other common
> functions. We already don't instrument KCOV with KASAN/KCSAN for this
> reason, don't instrument it with UBSAN as well.
> 
> As a side effect this also resolves the following gcc warning:
> 
> conflicting types for built-in function '__sanitizer_cov_trace_switch';
> expected 'void(long unsigned int,  void *)' [-Wbuiltin-declaration-mismatch]
> 
> It's only reported when kcov.c is compiled with any of the sanitizers
> enabled. Size of the arguments is correct, it's just that gcc uses 'long'
> on 64-bit arches and 'long long' on 32-bit arches, while kernel type is
> always 'long long'.
> 
> Reported-by: Stephen Rothwell 
> Suggested-by: Marco Elver 
> Signed-off-by: Dmitry Vyukov 

Reviewed-by: Kees Cook 

Thanks for chasing this down!

Andrew, can you add this to the stack of ubsan patches you're carrying,
please?

-- 
Kees Cook


Re: [PATCH -tip 23/32] sched: Add a per-thread core scheduling interface

2020-12-09 Thread Chris Hyser
On Sun, Dec 06, 2020 at 12:34:18PM -0500, Joel Fernandes wrote:
>
...
> Looks ok to me except the missing else { } clause you found. Also, maybe
> dest/src can be renamed to from/to to make meaning of variables more clear?
> 
> Also looking forward to the docs/test updates.
> 
> thanks!
> 
>  - Joel

I fixed the else clause, made the name changes, added the documentation and
fixed the selftests so they pass. Patch inline.

-chrish

 .../admin-guide/hw-vuln/core-scheduling.rst| 98 --
 include/linux/sched.h  |  4 +-
 include/uapi/linux/prctl.h |  3 +
 kernel/sched/coretag.c | 59 -
 kernel/sched/debug.c   |  1 +
 kernel/sched/sched.h   |  2 +-
 kernel/sys.c   |  6 +-
 tools/include/uapi/linux/prctl.h   |  3 +
 tools/testing/selftests/sched/test_coresched.c | 14 +++-
 9 files changed, 136 insertions(+), 54 deletions(-)

-- >8 --
>From beca9bae6750a66d8c30bbed1d6b8b26b2da05f4 Mon Sep 17 00:00:00 2001
From: chris hyser 
Date: Tue, 10 Nov 2020 15:35:59 -0500
Subject: [PATCH] sched: Provide a more extensive prctl interface for core
 scheduling.

The current prctl interface is a "from" only interface allowing a task to
join an existing core scheduling group by getting the "cookie" from a
specified task/pid.

Additionally, sharing from a task without an existing core scheduling group
(cookie == 0) creates a new cookie shared between the two.

"From" functionality works well for programs modified to use the prctl(),
but many applications will not be modified simply for core scheduling.
Using a wrapper program to share the core scheduling cookie from another
task followed by an "exec" can work, but there is no means to assign a
cookie for an unmodified running task.

Simply inverting the interface to a "to"-only interface, i.e. the calling
task shares it's cookie with the specified task/pid also has limitations,
there being no means to enable tasks to join an existing core scheduling
group for instance.

The solution is to allow both, or more specifically provide a flags
argument to allow various core scheduling commands, currently FROM, TO, and
CLEAR.

The combination of FROM and TO allow a helper program to share the core
scheduling cookie of one task/core scheduling group, with additional tasks.

if (prctl(PR_SCHED_CORE_SHARE, PR_SCHED_CORE_SHARE_FROM, src_pid, 0, 0) < 0)
handle_error("src_pid sched_core failed");
if (prctl(PR_SCHED_CORE_SHARE, PR_SCHED_CORE_SHARE_TO, dest_pid, 0, 0) < 0)
handle_error("dest_pid sched_core failed");

Signed-off-by: chris hyser 

diff --git a/Documentation/admin-guide/hw-vuln/core-scheduling.rst 
b/Documentation/admin-guide/hw-vuln/core-scheduling.rst
index c211a7b..0eefa9b 100644
--- a/Documentation/admin-guide/hw-vuln/core-scheduling.rst
+++ b/Documentation/admin-guide/hw-vuln/core-scheduling.rst
@@ -100,33 +100,83 @@ The color is an 8-bit value allowing for up to 256 unique 
colors.
 
 prctl interface
 ###
-A ``prtcl(2)`` command ``PR_SCHED_CORE_SHARE`` is available to a process to 
request
-sharing a core with another process.  For example, consider 2 processes ``P1``
-and ``P2`` with PIDs 100 and 200. If process ``P1`` calls
-``prctl(PR_SCHED_CORE_SHARE, 200)``, the kernel makes ``P1`` share a core with 
``P2``.
-The kernel performs ptrace access mode checks before granting the request.
+A ``prtcl(2)`` command ``PR_SCHED_CORE_SHARE`` provides an interface for the
+creation of and admission and removal of tasks from core scheduling groups.
 
-.. note:: This operation is not commutative. P1 calling
-  ``prctl(PR_SCHED_CORE_SHARE, pidof(P2)`` is not the same as P2 
calling the
-  same for P1. The former case is P1 joining P2's group of processes
-  (which P2 would have joined with ``prctl(2)`` prior to P1's 
``prctl(2)``).
+::
 
-.. note:: The core-sharing granted with prctl(2) will be subject to
-  core-sharing restrictions specified by the CGroup interface. For 
example
-  if P1 and P2 are a part of 2 different tagged CGroups, then they will
-  not share a core even if a prctl(2) call is made. This is analogous
-  to how affinities are set using the cpuset interface.
+#include 
 
-It is important to note that, on a ``CLONE_THREAD`` ``clone(2)`` syscall, the 
child
-will be assigned the same tag as its parent and thus be allowed to share a core
-with them. This design choice is because, for the security usecase, a
-``CLONE_THREAD`` child can access its parent's address space anyway, so there's
-no point in not allowing them to share a core. If a different behavior is
-desired, the child thread can call ``prctl(2)`` as needed.  This behavior is
-specific to the ``prctl(2)`` interface. For the CGroup interface, the child of 
a
-fork always shares a core with its parent

Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf

2020-12-09 Thread Thomas Gleixner
On Wed, Dec 09 2020 at 18:15, Mark Rutland wrote:
> In arch/x86/kernel/apic/io_apic.c's timer_irq_works() we do:
>
>   local_irq_save(flags);
>   local_irq_enable();
>
>   [ trigger an IRQ here ]
>
>   local_irq_restore(flags);
>
> ... and in check_timer() we call that a number of times after either a
> local_irq_save() or local_irq_disable(), eventually trailing with a
> local_irq_disable() that will balance things up before calling
> local_irq_restore().
>
> I guess that timer_irq_works() should instead do:
>
>   local_irq_save(flags);
>   local_irq_enable();
>   ...
>   local_irq_disable();
>   local_irq_restore(flags);
>
> ... assuming we consider that legitimate?

Nah. That's old and insane gunk.

Thanks,

tglx
---
 arch/x86/kernel/apic/io_apic.c |   22 ++
 1 file changed, 6 insertions(+), 16 deletions(-)

--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1618,21 +1618,16 @@ static void __init delay_without_tsc(voi
 static int __init timer_irq_works(void)
 {
unsigned long t1 = jiffies;
-   unsigned long flags;
 
if (no_timer_check)
return 1;
 
-   local_save_flags(flags);
local_irq_enable();
-
if (boot_cpu_has(X86_FEATURE_TSC))
delay_with_tsc();
else
delay_without_tsc();
 
-   local_irq_restore(flags);
-
/*
 * Expect a few ticks at least, to be sure some possible
 * glue logic does not lock up after one or two first
@@ -1641,10 +1636,10 @@ static int __init timer_irq_works(void)
 * least one tick may be lost due to delays.
 */
 
-   /* jiffies wrap? */
-   if (time_after(jiffies, t1 + 4))
-   return 1;
-   return 0;
+   local_irq_disable();
+
+   /* Did jiffies advance? */
+   return time_after(jiffies, t1 + 4);
 }
 
 /*
@@ -2117,13 +2112,12 @@ static inline void __init check_timer(vo
struct irq_cfg *cfg = irqd_cfg(irq_data);
int node = cpu_to_node(0);
int apic1, pin1, apic2, pin2;
-   unsigned long flags;
int no_pin1 = 0;
 
if (!global_clock_event)
return;
 
-   local_irq_save(flags);
+   local_irq_disable();
 
/*
 * get/set the timer IRQ vector:
@@ -2191,7 +2185,6 @@ static inline void __init check_timer(vo
goto out;
}
panic_if_irq_remap("timer doesn't work through 
Interrupt-remapped IO-APIC");
-   local_irq_disable();
clear_IO_APIC_pin(apic1, pin1);
if (!no_pin1)
apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
@@ -2215,7 +2208,6 @@ static inline void __init check_timer(vo
/*
 * Cleanup, just in case ...
 */
-   local_irq_disable();
legacy_pic->mask(0);
clear_IO_APIC_pin(apic2, pin2);
apic_printk(APIC_QUIET, KERN_INFO "... failed.\n");
@@ -2232,7 +2224,6 @@ static inline void __init check_timer(vo
apic_printk(APIC_QUIET, KERN_INFO ". works.\n");
goto out;
}
-   local_irq_disable();
legacy_pic->mask(0);
apic_write(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_FIXED | cfg->vector);
apic_printk(APIC_QUIET, KERN_INFO ". failed.\n");
@@ -2251,7 +2242,6 @@ static inline void __init check_timer(vo
apic_printk(APIC_QUIET, KERN_INFO ". works.\n");
goto out;
}
-   local_irq_disable();
apic_printk(APIC_QUIET, KERN_INFO ". failed :(.\n");
if (apic_is_x2apic_enabled())
apic_printk(APIC_QUIET, KERN_INFO
@@ -2260,7 +2250,7 @@ static inline void __init check_timer(vo
panic("IO-APIC + timer doesn't work!  Boot with apic=debug and send a "
"report.  Then try booting with the 'noapic' option.\n");
 out:
-   local_irq_restore(flags);
+   local_irq_enable();
 }
 
 /*


Re: [PATCH v3 0/2] siox: two cleanups

2020-12-09 Thread Greg Kroah-Hartman
On Wed, Nov 25, 2020 at 03:47:20PM +0100, Thorsten Scherer wrote:
> Hello,
> 
> On Wed, Nov 25, 2020 at 10:31:04AM +0100, Uwe Kleine-König wrote:
> > Hello,
> >
> > compared to v2 sent starting with Message-Id:
> > 20201124141834.3096325-1-u.kleine-koe...@pengutronix.de:
> >
> >  - fix typo in commit log of patch 1
> >  - add Ack by Thorsten for patch 1
> >
> > Uwe Kleine-König (2):
> >   siox: Use bus_type functions for probe, remove and shutdown
> >   siox: Make remove callback return void
> 
> Successfully ran our siox testcases on v3.
> 
> Tested-by: Thorsten Scherer 

Are you going to take these patches, or do you need/want me to take them
through one of my trees?

Either is fine for me.

thanks,

greg k-h


Re: [PATCH mm 2/2] Revert "kasan, arm64: don't allow SW_TAGS with ARM64_MTE"

2020-12-09 Thread Marco Elver
On Wed, 9 Dec 2020 at 19:24, Andrey Konovalov  wrote:
>
> This reverts "kasan, arm64: don't allow SW_TAGS with ARM64_MTE".
>
> In earlier versions on the hardware tag-based KASAN patchset in-kernel
> MTE used to be always enabled when CONFIG_ARM64_MTE is on. This caused
> conflicts with the software tag-based KASAN mode.
>
> This is no logner the case: in-kernel MTE is never enabled unless the
> CONFIG_KASAN_HW_TAGS is enabled, so there are no more conflicts with
> CONFIG_KASAN_SW_TAGS.
>
> Allow CONFIG_KASAN_SW_TAGS to be enabled even when CONFIG_ARM64_MTE is
> enabled.
>
> Signed-off-by: Andrey Konovalov 

Reviewed-by: Marco Elver 


> ---
>  arch/arm64/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 6fefab9041d8..62a7668976a2 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -135,7 +135,7 @@ config ARM64
> select HAVE_ARCH_JUMP_LABEL
> select HAVE_ARCH_JUMP_LABEL_RELATIVE
> select HAVE_ARCH_KASAN if !(ARM64_16K_PAGES && ARM64_VA_BITS_48)
> -   select HAVE_ARCH_KASAN_SW_TAGS if (HAVE_ARCH_KASAN && !ARM64_MTE)
> +   select HAVE_ARCH_KASAN_SW_TAGS if HAVE_ARCH_KASAN
> select HAVE_ARCH_KASAN_HW_TAGS if (HAVE_ARCH_KASAN && ARM64_MTE)
> select HAVE_ARCH_KFENCE
> select HAVE_ARCH_KGDB
> --
> 2.29.2.576.ga3fc446d84-goog
>


<    2   3   4   5   6   7   8   9   10   11   >